The setup guide walks you through every step with actual screenshots of what you will see — from the signup page to understanding the interface and choosing the right settings.
It also covers things that trip up beginners: which plan to choose (Free — it's all you need for Phases 0–4), what Sonnet vs Haiku vs Opus mean, and the difference between Chats and Projects.
Opens in a new tab. Come back here when you're done.
Key terms — plain English definitions
These are the words you will see throughout this course. Read through them once now. You do not need to memorise them — come back here any time a term is unfamiliar.
What you will learn
- What Claude actually is and how it works (it is not a search engine — and this matters more than you think)
- What Claude is genuinely good at vs where it struggles
- What "context" means — the single most important concept in the entire course
- How context limits work and why they matter for client work
What you will build
- Exercise 1 — Experience context firsthand with a structured experiment
- Exercise 2 — Map Claude's strengths and limits across five test tasks
What Claude is and why it matters for your clients
What is Claude?
Claude is an AI assistant made by a company called Anthropic. You use it through a website at claude.ai, or through a technical interface called an API if you are building software. For this course, you will be using the website.
The most important thing to understand from the start: Claude is not a search engine. This distinction matters more than it might seem at first.
What makes it fundamentally different from Googling
When you Google something, you are searching a library. Google finds pages that already exist and already contain an answer. You get a list of links to things other people have already written.
Claude works completely differently. There are no links. Claude reads your question, thinks through it, and writes you a new answer from scratch — every single time. It is more like asking a knowledgeable friend than searching a library. Your friend does not hand you a pile of books — they think about your specific situation and talk to you directly.
This is why Claude can do things Google cannot: draft a custom email in your voice, analyse a document you paste in, follow complex multi-step instructions, or reason through a problem with you over several exchanges. It is not retrieving — it is thinking.
Why does this matter for client work?
It means Claude is exceptionally good at things like:
- Writing and editing — emails, reports, summaries, job descriptions, policy documents
- Analysis — reading a document and identifying key information, categorising items, spotting patterns
- Structured data extraction — reading something messy and turning it into a clean, organised format
- Following complex instructions reliably and at speed
What it is not good at:
- Current events — Claude's knowledge has a cutoff date and it does not browse the internet by default
- Precise arithmetic — it can do maths but can make errors on large calculations; always verify
- Knowing things it was never trained on — it cannot read your client's private data unless you give it that data in the conversation
One honest caveat
Claude can make mistakes. It can sometimes produce information that sounds confident but is wrong. This is called a hallucination and it is a known limitation of all current AI systems. In this course you will learn specific techniques to reduce hallucinations and to build systems that check their own work. For now, simply know it is something to be aware of.
Before you start typing — four things to know
These are small things that catch almost every beginner off guard at least once. Read them now and you will not have to discover them the hard way.
1. Enter sends — Shift+Enter makes a new line
When you press Enter in Claude's message box, your message sends immediately. If you want to start a new line without sending — for example, to write a multi-paragraph prompt — hold Shift and then press Enter. This is the single most common thing new users stumble on.
2. Always start a new conversation for a new task
Claude carries everything from the current conversation in its memory — which is powerful, but it also means an old conversation can muddy a new task. Any time you are starting something unrelated, click New chat in the top left corner of the screen. Think of each conversation as a clean sheet of paper.
3. Claude does not remember previous conversations
Starting a new chat does not just clear the screen — it clears Claude's memory entirely. Anything you told Claude last week, in a different conversation, is completely gone. You will cover this in depth in Lesson 2. For now, just know it: every new conversation starts from zero.
4. How to copy Claude's response
Hover your mouse over any of Claude's responses and a small toolbar will appear at the bottom of it. Click the copy icon (it looks like two overlapping squares) to copy the full response to your clipboard. You can then paste it anywhere — a document, an email, your client's system.
Context — the most important concept in this entire course
What is context?
Context means: everything Claude can see in the current conversation. At the start of a new conversation, Claude can only see what you have typed in that conversation. It does not remember anything from previous conversations. Every new conversation starts completely fresh.
This is not a bug — it is how the system works. Understanding it deeply will save you enormous frustration.
A demonstration you can try right now
Open claude.ai. Type: "My name is Jordan." Press enter. Now in the same conversation, type: "What is my name?" Claude will tell you Jordan — because that information is in the current context.
Now start a brand new conversation. Type: "What is my name?" Claude has no idea. Because you are in a fresh conversation, and context starts from zero.
The practical consequence for client work
Every piece of information Claude needs to do its job must be present in the conversation. Your instructions. Background about the client. The data to be processed. All of it. You cannot assume Claude remembers anything from before.
In later phases you will learn how to structure this information efficiently so you are not repeating yourself every time. But first, make sure you truly feel this principle, not just understand it intellectually. The exercise below is designed for exactly that.
Context has limits
There is a maximum amount of text Claude can hold in a single conversation before older parts of it become less reliable. Think of it like a notepad with a fixed number of pages. For most client work you will not hit this limit quickly, but it is something to manage deliberately in longer tasks. Phase 4 covers this in depth.
Context does not have to be text
So far we have talked about context as the words you type into Claude. But Claude can also read images — screenshots, photos, diagrams, documents captured as images. This opens up a whole new category of context you can give it.
For example, you could show Claude a screenshot of an error message and ask what it means. You could photograph a client's hand-written notes and ask Claude to turn them into a structured document. You could grab a screenshot of a competitor's pricing page and ask Claude to compare it to your client's.
To do this you need a way to capture screenshots quickly. On Windows, the built-in tool for this is called Snipping Tool.
What is Snipping Tool and how to get it
Snipping Tool is a free screenshot application built directly into Windows — you do not need to download or install anything. It is already on your computer.
How to open it
- Press the Windows key on your keyboard (the key with the Windows logo, usually bottom left)
- Type Snipping Tool using your keyboard
- Click the Snipping Tool app when it appears in the search results
How to take a screenshot and send it to Claude
- Press Windows + Shift + S
- Click and drag to select the area of your screen you want to capture
- Release the mouse — the screenshot is now copied to your clipboard
- Go to Claude and click in the message box
- Press Ctrl + V to paste the image directly into Claude
- Type your question about the image and press Enter to send
What to use this for in client work
- Error messages — screenshot the error and ask Claude what it means and how to fix it
- Client documents — capture a section of a PDF or spreadsheet you cannot easily copy as text
- Existing systems — show Claude a client's current process or interface and ask how to improve it
- Layouts and designs — show Claude what something looks like and ask it to describe, replicate, or critique it
What you will do
DO NOT SKIP this exercise. Reading about context is not the same as experiencing it. This takes 5 minutes and will change how you think about every conversation you have with Claude from here on.
Steps
- Go to claude.ai and start a new conversation.
- Type: "I am building a tool for a gym called FitPro. Their main problem is that trainers spend 2 hours a day manually scheduling client sessions." Press enter.
- In the same conversation, type: "Summarise the client problem in one sentence." Claude should give you a specific, relevant answer about the gym.
- Now start a BRAND NEW conversation. Type the same thing: "Summarise the client problem in one sentence." Notice what happens — Claude has no idea what you are referring to.
- Go back to the original conversation. Type: "What are three ways an AI tool could help with this problem?" Claude uses the gym context to give specific, useful suggestions.
What to notice
- Step 3 is specific and relevant — Claude is using your context.
- Step 4 is confused — no context was provided in the new conversation.
- Step 5 is specific again — the original context is still there.
Write your answer
After completing the steps, complete this sentence in your own words: "Context in Claude means ________________________."
What you will do
DO NOT SKIP this exercise. Before building tools for clients, you need to know what Claude is genuinely good at versus where it struggles. These five tests take about 15 minutes total and will inform every project you build.
The five tests
- Writing — New chat. Ask Claude to write a professional email declining a meeting in under 100 words. Would you actually send it?
- Summarisation — New chat. Paste any article from the web and ask for a 3-bullet summary. Is it accurate?
- Analysis — New chat. Give Claude 5 made-up customer complaints and ask it to group them by category and count each. Does it categorise correctly?
- Current events — New chat. Ask "What happened in the news today?" Is the answer current and reliable?
- Maths — New chat. Ask "What is 3,847 multiplied by 29?" Verify the answer with a calculator.
Record your findings
For each test, write one of these three words: Strong / Acceptable / Weak. Your five answers will guide you on where to trust Claude confidently in client work and where to add safeguards.
The scenario running through this entire phase
To make these concepts concrete, every example in Phase 1 is built around a real scenario: you have been hired to help a business handle their customer service emails.
Every day, the team receives 50–100 emails from customers. Some are complaints about products. Some are refund requests. Some are general questions. Right now, a human reads every one and decides what to do. It takes hours. Your job is to build an AI tool that categorises and handles these emails automatically.
This scenario will walk alongside you through every lesson in this phase. By the end, you will have the skills to actually build it.
💡 Work side by side — Claude on one side, this course on the other
Every lesson in this phase has code examples you can copy directly into Claude. The best way to learn is to try each example as you read it — not after you finish the lesson.
- This course in your browser
- Press Win + Left arrow
- Open Claude.ai in new window
- Press Win + Right arrow
- This course in your browser
- Hold green button → Tile Left
- Open Claude.ai in new window
- Click it to fill the right
What you will learn in Phase 1
- Why vague instructions fail — and how to write specific criteria instead
- How to use few-shot examples to teach Claude consistent behaviour
- How to get structured, predictable output instead of conversational paragraphs
- How to handle failures automatically with retry logic
- Bonus lesson: Advanced prompt structures used by professional AI builders
Why vague instructions fail — the power of specific criteria
The customer service email scenario
You have been tasked with building a tool to handle your client's customer service emails. The first decision: what should the tool actually do with each email? You start with a simple instruction:
What Claude might do with this: write a rambling paragraph, flag things that are not actually problems, miss things that are, and do it differently every single time. Run this instruction on 50 emails and you will get 50 different formats with no consistency.
Now here is the same instruction rewritten with specific criteria:
What changed?
The specific version defines categories — defect, refund, escalation. It tells Claude exactly how to format the output. And it tells Claude what to say when none of the conditions apply. Run this on 50 emails and you get 50 consistent, predictable results.
The false positive problem
There is a practical reason this matters beyond accuracy. If Claude flags things it should not — called false positives — your client loses trust in the tool. They start ignoring the flags. And if they ignore all the flags, your tool is worse than useless. The fix is always to tighten the criteria, not to be more lenient.
How to build specific criteria
A useful technique: think about the worst case. What would be the most embarrassing mistake your tool could make? Then write a rule that prevents exactly that. Keep doing this until you have covered all the cases that matter.
Showing instead of telling — how examples make Claude consistent
Back to the customer service emails
You have written specific criteria for flagging emails. Now you need to categorise them — route each one to the right team. You try a classification instruction, but Claude keeps getting the ambiguous ones wrong. You rewrite the criteria. Still inconsistent. There is a better tool: examples.
The key: show the reasoning, not just the answer
Notice the note in the last example — it explains why that email is a Complaint rather than a Refund Request. This is what makes examples powerful. It teaches Claude to generalise to new situations it has never seen, not just to copy the exact examples you gave it.
When to use examples
- When your instructions produce inconsistent results
- When Claude keeps making the same type of mistake
- When you are working with ambiguous cases where reasonable people could disagree
- When extracting information from documents that come in different formats
How many examples?
Start with 2. If you still get inconsistency, add a third. If that does not fix it, your examples are probably targeting the wrong cases — look at what Claude is getting wrong and build examples specifically for those edge cases.
What you will do
DO NOT SKIP. Rewrite three vague prompts into specific ones, test both versions, and observe the difference in consistency.
The three prompts to rewrite
- Vague: "Summarise this customer review and tell me if it's positive or negative." — Write a specific version that defines what counts as positive, negative, and mixed, specifies the output format, and states what to say for each category.
- Vague: "Review this invoice and flag anything unusual." — Write a specific version listing at least 4 concrete categories of what counts as unusual.
- Vague: "Analyse this job application and tell me if the candidate is suitable." — Invent a fictional role and write specific criteria with at least 3 things that must be present to qualify.
Testing your work
For each prompt pair: run the vague version on 3 different sample inputs. Then run your specific version on the same 3 inputs. Count how many times the output format was consistent. The specific version should score higher.
Structured output — making Claude's answers usable
From classification to action
Your email tool can now classify emails consistently. But the next problem: how do you get that information into your client's systems? If Claude just writes a paragraph, a human still has to read it and decide what to do. The answer is structured output — making Claude respond in a consistent, predictable format that can feed directly into other systems.
The simplest form: a template
Ask Claude to fill in a specific template. The word "exactly" is important — it signals that you want precision, not creativity.
The fabrication problem
Here is a critical pitfall with structured output. If you make a field required and the source email does not contain that information, Claude will sometimes invent a plausible-sounding value rather than leave the field blank. This is called fabrication.
A customer who signs their email "A frustrated customer" does not have a customer name. If you require that field, Claude might write "Unknown Customer" or even invent a name. The fix: for any field that might genuinely be absent, add "or Not found" as a valid answer.
JSON output — for when machines need to read it
When your output needs to feed directly into a spreadsheet, database, or another AI step, JSON format is more reliable than a template. JSON is a format computers can read directly.
When it goes wrong — retry logic and self-correction
What is retry logic?
Even with great prompts, Claude sometimes produces output that does not meet your requirements. In professional tools you build for clients, you need a way to handle these failures automatically rather than having a human check every output.
The technique is called retry with error feedback. You run Claude, check the output against your requirements, and if it fails, you send it back with a specific description of what went wrong. Claude uses that description to fix its answer.
A concrete example from the email tool
You ask Claude to extract an order number from an email in the format ORD-XXXXX. Claude returns "Order 44729" instead of "ORD-44729". Your check catches this. You send back:
What retry can and cannot fix
- Works for: format errors, structural mistakes, values placed in the wrong fields
- Does NOT work for: information that genuinely is not in the source email. If the customer did not include an order number, retrying will not produce one — Claude will either repeat "Not found" or fabricate a value.
Know the difference between a format mistake and a missing piece of information before designing your retry logic.
Multi-instance review
For high-stakes client work — legal documents, financial data — consider having a second, fresh instance of Claude review the output of the first. The second instance has not seen the reasoning that produced the first answer, so it catches mistakes the first one would overlook. This is the AI equivalent of a second pair of eyes.
The scenario
You are building a customer service email classifier for a software company. Emails need to go to one of three teams: Billing, Technical, or Account Management. Emails are often ambiguous.
Step 1 — Write the base prompt without examples
In a new Claude conversation, write a prompt that describes the three categories and asks Claude to classify each email.
Step 2 — Test without examples on these 5 emails
- "I cancelled my account last week but was still charged this month."
- "The dashboard shows my usage as zero even though I've been using it every day."
- "I need to add two new team members to my workspace — how do I do that?"
- "My trial expired but I want to continue on the free plan, not the paid one."
- "The export function downloads a file but my spreadsheet app says it's corrupted."
DO NOT SKIP step 2. You need the baseline to see what your examples actually change.
Step 3 — Add 3 examples and retest
Choose 3 examples that target the most ambiguous cases above. Write each showing both the input and the correct output, plus a brief note explaining the reasoning. Retest the same 5 emails and compare results.
Advanced prompt structures — how professionals write prompts
The four lessons so far have given you the core skills. This bonus lesson covers the advanced structures used by experienced AI builders. These are the patterns that take your prompts from good to consistently excellent.
The 9-element professional prompt framework
When building a system prompt for a client tool, professional AI builders structure it around these nine elements. You do not need all nine every time — but knowing what each one does lets you diagnose why a prompt is underperforming.
| # | Element | What it does | Example |
|---|---|---|---|
| 1 | Task context | Sets the role and situation | "You are a customer service handler for FitPro gym." |
| 2 | Tone context | Defines how Claude should communicate | "Respond professionally but warmly. No jargon." |
| 3 | Background data | Provides the information Claude needs | "Here is the customer record: [data]" |
| 4 | Detailed rules | Your specific criteria and constraints | "Only escalate if the issue involves a refund over £500." |
| 5 | Examples | Shows correct behaviour on edge cases | "Example: Input: [email] → Output: [response]" |
| 6 | Conversation history | Prior context the model should know | "Previous messages: [history]" |
| 7 | Immediate request | The specific task for this message | "Now classify this email: [email]" |
| 8 | Think step by step | Activates deeper reasoning on complex tasks | "Before answering, think through each criterion carefully." |
| 9 | Output format | Defines exactly how to structure the response | "Respond only in the JSON format shown above." |
The five foundations — what every good prompt needs
Before thinking about advanced techniques, make sure your prompt has these five fundamentals:
- Task context — set the role and task upfront. "You are [role] and your job is [task]."
- Background data — upload relevant files, documents, or data. Better inputs produce sharper outputs.
- Detailed rules — expand your goals into specific constraints. Ask Claude to propose a plan before executing.
- Examples — use concrete examples to show ideal outputs. This dramatically improves quality.
- Output format — define the structure before Claude replies. "Respond in under 200 words in this exact format."
Advanced techniques worth knowing
Structured prompting with XML tags
For complex prompts with multiple sections, use XML tags to clearly separate them. This is essentially speaking Claude's native language — it processes tagged sections more reliably than prose paragraphs.
Chain prompting
Break complex tasks into sequential prompts where each step builds on the previous one. Instead of asking Claude to do everything at once, prompt it through a chain: first analyse, then categorise, then respond. Each step compounds quality.
After Claude responds with the analysis, your second prompt builds on it: "Based on your analysis, now classify this as Complaint / Refund Request / General Enquiry and explain why." The third prompt then: "Now draft the response."
Reverse prompting
Ask Claude to question you before it starts. This surfaces considerations you would miss — especially in domains you are not deeply familiar with.
Feedback looping
After Claude produces output, critique it explicitly and request revisions. "This is good but the tone is too formal and the third point is vague — rewrite it with a friendlier tone and a concrete example for point 3." This iterative approach produces consistently better output than a single prompt attempt.
What a tool is and why it changes everything
The limitation of a pure conversation
So far, Claude has been reasoning purely from the information you provide in the conversation. That is powerful, but limited. What if you want Claude to look up a live order status? Read from a database? Send an email when a condition is met? For that, you need tools.
What a tool is
A tool is a capability you give Claude that lets it interact with the outside world. When Claude has access to a tool, it can decide to use that tool during a conversation — fetch data, write data, trigger an action — and then reason about the result.
Think about what this unlocks for clients:
- A client who tracks sales in a spreadsheet could give Claude a tool that reads from that spreadsheet. Claude can now answer "Who are my top 5 customers this month?" without anyone pulling a report manually.
- A client with a customer database could give Claude a tool that looks up account information. Claude can now handle queries that reference real, live data.
- A client who approves expenses could give Claude a tool that reads pending requests and one that submits approvals. Claude can now draft a summary and process approvals in one flow.
How tools connect to the outside world
Tools connect through something called an API — Application Programming Interface. An API is simply a way for software to talk to other software. When building for clients, you will often use existing APIs — things like Google Sheets, Salesforce, or whatever database your client already uses.
Writing tool descriptions that actually work
The description is everything
Here is something that surprises almost everyone when they first learn about tools: the description you write for a tool is the primary mechanism Claude uses to decide when to call it. Not any special configuration. Not the code behind the tool. The description.
This means a bad description causes bad behaviour — and the fix is almost always to improve the description, not to add more complexity.
The misrouting problem
Here is what bad descriptions look like:
These two descriptions are nearly identical in structure. Claude constantly picks the wrong one because it cannot tell the difference between them.
Here is what good descriptions look like:
What makes a good tool description
- Primary purpose — what the tool is fundamentally for
- Input format — exactly what to pass in (format, type, constraints)
- Concrete examples — 3–4 specific kinds of information the tool returns
- Explicit exclusions — when NOT to use this tool
How many tools is too many?
Keep it to 4–5 tools per agent, scoped to that agent's specific job. A customer support agent needs tools to look up orders, check account status, submit refunds, and escalate to a human. That is enough. Giving it 15 tools causes selection to degrade — Claude starts picking tools that are close but not quite right.
What you will do
DO NOT SKIP. Rewrite three weak tool descriptions, then test whether Claude picks the right tool in an ambiguous scenario.
The weak descriptions to improve
Tool 1 — get_account: "Gets account information."
Tool 2 — get_transactions: "Gets transaction information."
Tool 3 — get_support_history: "Gets support history."
Your task
- Rewrite each description so that: the primary purpose is clear, the exact input format is specified, 3–4 concrete examples of what the tool returns are listed, and explicit guidance on when NOT to use it is included.
- Once you have written all three, give Claude this scenario (telling it about the three tools using only your new descriptions): "A customer emails saying they were charged twice last month and also that their last support ticket was never resolved. What would you need to look up?" Claude should correctly identify which tool to use for each part of the query.
Success criteria
Claude identifies the correct tool for each part of the query without prompting. If it picks wrong, look at which descriptions are still ambiguous and tighten them.
When tools fail — designing good error responses
Tools fail. Plan for it.
Networks time out. Data is not found. Permissions expire. If your tools do not communicate failures clearly, Claude cannot respond appropriately — and your client's users get a confusing or broken experience.
The four types of errors
1. Transient errors
The service was temporarily unavailable or the request timed out. Usually fixable by trying again. Tell Claude: error type is transient, retryable is true, description is "The customer database timed out. Please retry."
2. Validation errors
The input was wrong — wrong format, missing field, out-of-range value. Claude should fix its input and try again. Tell Claude: error type is validation, retryable is true with corrected input, description is "Order number format invalid. Expected ORD-XXXXX, received 12345."
3. Business errors
The request violates a policy — refund exceeds the limit, the account is locked. These are NOT retryable. The answer is not "try again" — it is "take a different action." Tell Claude: error type is business, retryable is false, plus a customer-friendly explanation Claude can relay.
4. Permission errors
The current credentials do not allow this action. Needs escalation or different credentials. Tell Claude: error type is permission, retryable is false, plus the suggested next step.
The most important distinction in error handling
There is a critical difference between a tool failure and a valid empty result:
- Tool failure: the tool could not reach the data source (timeout, auth failure). Consider retrying.
- Valid empty result: the tool successfully reached the source and found no matches. Do NOT retry. The answer is "no results found."
If you confuse these two, you will build retry logic that hammers your database looking for a customer who simply does not exist. Your tool must clearly signal which situation it is.
The scenario
Your client is an e-commerce company. You have a tool called process_refund that submits refund requests to their payment system.
Write an error response for each situation
- The payment system timed out after 5 seconds.
- The refund amount was submitted as a negative number.
- The refund is for £620, but company policy allows a maximum of £500 per request.
- The API credentials have expired.
For each response, include
- error_type: transient / validation / business / permission
- is_retryable: true or false
- message: what Claude should know
- customer_message: what Claude can tell the customer
Bonus
Write a fifth scenario: a refund lookup returns an empty array because the order was placed as a guest checkout with no account linked. Make absolutely clear this is a valid empty result, not a failure.
Built-in tools — Grep, Glob, and the file editing tools
Two tools that confuse almost everyone
When using Claude Code — the developer version of Claude — you get access to built-in tools for working with code and files. The two most commonly confused are Grep and Glob.
Grep — searches file contents
Use Grep when you want to find every file that contains a specific piece of text — a function name, an error message, an import statement. Think of it as Ctrl+F but across every file in your project simultaneously.
When to use Grep: "Find every file that calls the calculateDiscount function" / "Find every import of the payments module" / "Find every place the word 'deprecated' appears in comments"
Glob — matches file paths
Use Glob when you want to find files by their name or location — every file ending in .test.js, every file in the config folder. It is not looking inside files; it is looking at the names and locations of files.
When to use Glob: "Find all test files" (pattern: **/*.test.js) / "Find all TypeScript files" (pattern: **/*.ts) / "Find all files in the config directory" (pattern: config/**)
The file editing tools
When you need to modify a file, use them in this priority order:
- Edit first — makes targeted modifications using unique text as an anchor point. Fast and precise. Use this whenever possible.
- Read + Write as fallback — if Edit fails because the text you are using as an anchor appears more than once in the file, fall back to reading the entire file, making your changes, then writing the whole file back. Slower, but always works.
How to explore a new codebase efficiently
Never start by reading every file. That is a context budget killer — you will use up Claude's working memory before you have even started the real work.
Instead: use Grep to find entry points (the main function, key imports), then follow the trail from there. You will understand the codebase faster and use far less context.
The efficient approach: Grep for the function name → find which files call it → Read those specific files → understand the flow. Never read everything upfront.
What you will do
DO NOT SKIP. This exercise makes the Grep vs Glob distinction concrete before you need it in Phase 5.
The five scenarios — for each, write Grep or Glob and why
- You want to find every file in a project that imports a function called
calculateTax. - You want to find all TypeScript files in the project.
- You want to find every place in the codebase where the string "API_KEY" appears.
- You want to find all files inside a folder called
config/. - You want to find every file that contains the word "deprecated" in a comment.
Answers to check yourself
1. Grep — searching inside files for a function name. 2. Glob — matching files by extension. 3. Grep — searching inside files for a string. 4. Glob — matching files by path. 5. Grep — searching inside files for a word.
The agent loop — how Claude works through a multi-step task
What is an agent loop?
In a simple conversation, you send a message and Claude responds. That is one round. An agent loop is different: Claude receives a task, works through multiple steps, uses tools along the way, and only stops when the task is genuinely complete.
How the loop works, step by step
- You send Claude a task — for example: "Investigate order #8891 and tell me what's wrong."
- Claude decides it needs to call a tool — say, a tool to look up the order. It calls the tool.
- The tool returns data. Claude reads it and decides what to do next — maybe call another tool to check the customer's account.
- Claude calls that tool, gets more data.
- Eventually Claude has enough information and gives you a complete final answer. The loop ends.
How your system knows when Claude is finished
Every time Claude responds, it includes a field called stop_reason. This is the reliable, machine-readable signal you use to know whether to keep going or stop.
- stop_reason is "tool_use": Claude wants to call a tool. Keep the loop going.
- stop_reason is "end_turn": Claude is finished. Present the final answer.
Do NOT try to detect completion by reading Claude's text — for example, checking if Claude wrote "I have completed my analysis." Natural language is ambiguous. The stop_reason field exists precisely because you need a reliable, unambiguous signal.
The agent's working memory
Each time Claude calls a tool, the tool's result is added to the conversation history. That full history is sent back to Claude in the next step. This accumulated history is how Claude knows what it discovered in previous steps — it is the agent's working memory during a task.
Multi-agent systems — coordinators and subagents
Why use multiple agents?
Simple agents handle one task at a time. Complex client work often requires coordinating multiple specialised tasks simultaneously — which is faster and produces better results when each piece is handled by a focused, purpose-built agent.
The hub-and-spoke architecture
There is one coordinator agent at the centre and multiple specialised subagents around it. The coordinator receives the main task, breaks it into pieces, delegates each piece to the right subagent, collects the results, and assembles the final answer.
Example: a client wants a tool that analyses a new product idea. The coordinator receives the request. It sends one subagent to gather market information from the web. Simultaneously, it sends another subagent to analyse uploaded competitor research. When both finish, the coordinator passes their findings to a synthesis subagent that writes the final report.
The most important rule: all communication goes through the coordinator
Subagents never talk to each other directly. Everything routes through the coordinator. This seems inefficient but it is critical for reliability — if subagents communicated directly, you would lose visibility into what is happening and errors would become very hard to trace.
The rule people get wrong most often
Subagents do not automatically know what the coordinator knows. They do not share memory. They do not inherit context. Every single piece of information a subagent needs must be explicitly included in the message you send to it.
What happens if you forget? Say the coordinator passes findings from the web search agent to the synthesis agent — but forgets to include the source URLs. The synthesis agent writes a report with no way to verify any of the claims. The sources are permanently lost. The fix: always pass structured data that includes both the content and the metadata — source, date, document name.
What you will do
DO NOT SKIP. Design a complete agent loop for a customer support scenario. This is a design exercise — no coding required. Write your answers in plain English.
The scenario
Your client is an online retailer. They want an AI agent to handle customer emails — look up the customer's account, check the status of any referenced orders, determine the appropriate resolution, and either resolve the issue or escalate to a human.
Your four deliverables
- Tool list: Name 4 tools the agent needs. For each, write a full description using the guidelines from Lesson 8.
- Loop diagram: On paper or in a document, draw the flow from "receive email" to "send resolution" or "escalate to human." Show every decision point.
- Escalation rules: List 3 specific conditions that trigger human escalation. Be precise — not "if the issue is complex" but specific criteria.
- Programmatic gate: Identify one action in your flow that must be enforced programmatically. Explain why an instruction alone is not sufficient.
Enforcing business rules — when instructions are not enough
The question every serious builder faces
If you want Claude to always do something in a specific order — like always verify identity before processing a refund — do you rely on instructions in your prompt, or do you enforce it in a different way?
The decision depends on what is at stake
For low-stakes preferences — formatting, tone, style — instructions in your prompt are fine. Claude will follow them most of the time. A small failure rate is acceptable when the consequence is a slightly oddly-formatted email.
For anything financial, security-related, or compliance-related — you enforce it programmatically. That means building a gate in your system that physically prevents the next step from happening until the required step is complete. Not an instruction saying "please verify identity first." A technical barrier that makes it impossible to proceed without verification.
Why this matters
"Most of the time" is not good enough when money is on the line. If your agent processes refunds without verifying account ownership even 2% of the time, that is a serious problem for a client processing hundreds of refunds a day. The fix is not a better prompt. The fix is removing Claude's ability to skip the step.
The technical approach
This is called a prerequisite gate or a hook. Before the refund tool can run, a verification check runs first. If the check has not passed, the refund tool is locked — Claude cannot call it. This happens 100% of the time, guaranteed.
You can also intercept Claude's tool calls and redirect them. If Claude tries to process a refund above a certain amount, your system intercepts that call and redirects it to a human escalation workflow instead of processing it automatically.
Breaking work into pieces — decomposition strategies
What is decomposition?
Decomposition means breaking a large task into smaller, manageable pieces. It directly affects the quality and reliability of what your agents produce.
Two main patterns
Fixed sequential pipelines
You know in advance exactly what the steps are and you do them in order. Code review is a good example: step 1, analyse each file individually; step 2, run a cross-file integration analysis; step 3, produce a summary report. Same steps every time. Reliable and predictable. Best for well-understood, consistent tasks.
Dynamic adaptive decomposition
You discover the next step based on what you found in the previous one. Good for open-ended investigation — for example, "find all the problems with our sales process." You cannot know in advance what you will find, so you start with a mapping step, discover what exists, identify important areas, then adapt. More flexible but less predictable.
The attention dilution problem
When you process too many things in a single pass, quality suffers. Imagine reviewing 14 documents at once. Claude spreads its attention across all of them and produces inconsistent depth — detailed feedback for some files, missed problems in others, and contradictions between them.
The fix: multi-pass architecture. Process each item individually in the first pass. Then run a separate integration pass that looks at patterns across all the items. More setup required, but the quality improvement is significant.
For any task involving more than 4 or 5 documents or files, seriously consider multi-pass.
What you will do
DO NOT SKIP. Design a multi-agent research system that handles this client request: produce a market analysis report on any given industry topic.
Required design elements
- The coordinator: what is its exact job? What does it receive? What does it produce?
- At least 2 subagents: name them, describe what they do, list the tools each needs.
- Context passing: for each subagent, list exactly what information the coordinator must explicitly pass to it.
- Error handling: what happens if one subagent fails? Does the whole system stop or continue with partial results?
- Output format: design the structured output template the synthesis subagent uses.
The narrow decomposition check
Before finalising: if the topic were "impact of AI on the creative industries," would your system cover music, film, writing, and visual arts — or would it default to just one? The coordinator is responsible for complete decomposition.
Session management — pausing, resuming, and staying on track
Three ways to handle a session
Resume a named session
Pick up exactly where you left off. Works well when the context is still valid — the files have not changed, the data is the same.
Fork a session
Create an independent branch from a shared analysis point. Useful when you want to explore two different solutions starting from the same baseline — for example, comparing two architectural approaches to the same problem.
Fresh start with summary injection
Begin a new session but immediately inject a structured summary of what was found previously. The most reliable approach when significant time has passed or files have changed.
The stale context problem
If you resume a session after files have changed and you do not tell Claude what changed, Claude will give advice based on its memory of the old files. This produces contradictory output. The fix: when resuming after changes, explicitly tell Claude which files were modified and ask it to re-analyse those specific files rather than relying on what it already knows.
When context gets full
If you are doing extended exploration and Claude starts giving vague answers — saying "following typical patterns" instead of referencing specific code it found earlier — that is a sign the context is getting full. Save your key findings to a file, start a fresh session, and inject those findings at the start. This resets the context without losing your progress.
Managing long conversations — preserving what matters
How information gets lost in long conversations
As conversations get longer, something subtle but damaging happens: Claude starts losing grip on the specific facts from earlier in the exchange. It may still remember the general topic, but precise values — an exact refund amount, a specific order number, the exact date a problem started — get compressed into vague summaries.
Imagine a customer support conversation that has been going for a while. Early on, the customer said their refund was for £247.83 for order #8891 placed on 3rd March. Twenty messages later, Claude's working understanding of the case might just be "customer wants a refund for a recent order." The specifics are gone.
The fix: a persistent facts block
Extract all specific, verifiable facts into a dedicated block that gets included in every message. Literally a fixed block of text at the top of each prompt:
This block is never summarised. It is always included verbatim. The result: Claude gives precise answers even deep into a long conversation.
Lost in the middle
Research has shown that AI models process the beginning and end of long inputs most reliably, with the middle being least reliable. If you have five important findings and put the most critical one in the middle, it is more likely to get missed.
The fix: put the most important information at the beginning of your prompt. Use explicit section headers. Put a summary of key findings at the top, then the supporting detail below.
Trim verbose tool results
An order lookup might return 40 fields when you only need 5. Trim tool results to only the relevant fields before including them in the conversation. This prevents the context window from filling up with irrelevant data that pushes out the important stuff.
The scenario
A customer contacts support: name is Maria Santos, email maria@example.com, order number ORD-44729, issue is they were charged £189 but expected £139 (a £50 discrepancy), noticed on 14th February.
Steps
- Design your persistent facts block — a formatted block of text capturing every specific, verifiable fact. Use clear field names.
- Start a conversation in Claude where you first establish the facts, then have a 10-message back-and-forth. You play both the customer and the agent.
- After 10 exchanges, ask Claude: "What was the exact amount the customer was overcharged and on what date?" Check whether it gives you the specific figures.
- Start a fresh conversation WITHOUT the persistent facts block. Have the same 10-message exchange. Ask the same question. Compare precision.
Escalation — when to hand off to a human
Three reliable escalation triggers
1. The customer explicitly requests a human
If a customer says "I want to talk to a person" or "please connect me with your support team" — escalate immediately. Do not try to resolve the issue first. Honour the request without condition.
2. The request falls outside documented policy
If a customer asks for something your AI does not have a clear rule to handle — escalate. Do not let Claude improvise a policy on the spot.
3. The agent genuinely cannot make progress
After reasonable effort, if the agent is going in circles and not advancing toward a resolution, escalate.
Two triggers that seem reliable but are not
Customer frustration
A frustrated customer does not necessarily have a complex problem — they might just be frustrated. And a complex problem does not necessarily produce frustration. Do not use sentiment as your escalation signal. Use the actual content of the request.
Claude's own confidence score
Claude can be very confident about wrong answers and uncertain about correct ones. Self-reported confidence does not correlate reliably with actual accuracy. Build your escalation logic on objective criteria, not on Claude's self-assessment.
An important nuance about frustration
If a customer is frustrated but their issue is straightforward — say, they cannot find the returns page — acknowledge the frustration and offer the resolution. Only escalate if they then reiterate that they want a human. Frustration + wanting the problem solved does not require escalation. Frustration + explicit human request does.
The scenario
Your client runs an insurance comparison service. Their AI assistant helps customers understand quotes and make changes to their policies. Some decisions require a human licensed broker.
Write escalation rules for each category
- Explicit customer requests: what exact phrases should trigger immediate escalation?
- Policy scope: list 4 specific types of requests the AI must escalate. Be specific — not "complex requests" but actual examples.
- Regulatory requirements: what types of advice require a licensed broker regardless?
- Failure to progress: write a specific, measurable rule.
For each rule, also write the handoff message
What information must be included when passing to a human so they don't need to ask the customer to repeat themselves?
Error propagation — handling failures without breaking everything
Two bad ways to handle failures
Silent suppression
The tool fails but returns a response that looks like success. The system has no idea anything went wrong and continues as if everything is fine. This is the worst possible outcome — you get an answer that looks complete but is missing critical information, with no way to know it is incomplete.
Workflow termination
The entire pipeline stops on any single failure. One tool failing causes everything to be thrown away, including all the work that succeeded. For complex tasks this wastes enormous effort.
The right approach
Subagents should handle transient failures locally — try again, try a slightly different approach. Only propagate errors they genuinely cannot resolve. When they do propagate an error, they should include exactly what was attempted, what failed, and what partial results were obtained. This gives the coordinator the information it needs to make a decision: try a different approach, proceed with partial results, or escalate.
Annotate gaps rather than silently omitting them
Partial results with annotations are almost always better than nothing. If your research system finds information on 4 of 5 topics because one source was unavailable, the report should say "Note: information on this topic is limited due to unavailable source" rather than simply omitting it entirely. Transparency about gaps is more valuable than a complete-looking but secretly incomplete answer.
The scenario
You have built a research agent that gathers information from 5 different sources and produces a summary report. During a run, source 3 becomes unavailable.
Write two versions of how the system handles this
- The wrong way (silent suppression): Write what the agent produces when it silently skips source 3 and presents the report as if it were complete. What does the client see? Why is this dangerous?
- The right way (graceful degradation): Write the annotation the agent should include in the report when source 3 is unavailable. What information should it include? Where in the report should it appear?
The key question
In your right-way version — does the agent still deliver value to the client even with source 3 missing? If yes, graceful degradation is working correctly.
Tracking where information came from — provenance and attribution
Why provenance matters
"The market is growing at 23% annually" is only useful if you know where that number came from. Without a source, your client cannot verify it, cite it, or trust it. Building provenance into your system from the start is far easier than adding it later.
Structured claim-source mappings
Require every finding to include: the claim itself, the source document or URL, the relevant excerpt, and the date of the source. This structure should travel through your entire pipeline — from the tool that retrieved it, through the subagents that processed it, to the final report.
Where attribution dies: summarisation
The critical moment where sources get lost is when an agent summarises information. The claim survives but the source does not. Explicitly instruct your synthesis agents to preserve and pass through all claim-source mappings, not just the content.
Conflicting sources
When two credible sources report different numbers for the same statistic, do not pick one arbitrarily. Annotate the conflict: "Source A reports 23% growth. Source B reports 18% growth." Include both attributions. Dates often explain the discrepancy — one source is simply newer. Let your client decide which to use.
The scenario
You are building a research agent for a client who wants weekly competitive intelligence reports on their industry. The agent gathers information from web searches and produces a written summary.
Your task
- Design the data structure the agent should use to store each finding. It must include: the claim, the source, the date, and the relevant excerpt. Write it as a simple template with field names.
- Write the instruction you would include in your synthesis agent's system prompt to ensure it preserves source attribution when writing the final report — rather than just stating facts without sources.
- Conflicting sources: Your agent finds two sources that disagree on market size — one says $4.2 billion, one says $5.8 billion. Write exactly how the report should present this conflict, including both attributions.
Once you have VS Code and Claude Code installed and working, come back here and click the button below to start Phase 5.
The instruction hierarchy — where your settings live
Three levels, three different audiences
When you work with Claude Code, you can put instructions in three different places. Where they live determines who gets them and whether they are shared.
Level 1 — User level
Stored on your personal machine at: ~/.claude/CLAUDE.md
These apply only to you. When a new team member clones your project, they do not get these instructions. Use this for your personal preferences and working style.
Level 2 — Project level
Stored inside the project repository at .claude/CLAUDE.md or the root CLAUDE.md. Tracked in version control. Shared with the whole team. If a standard should apply to everyone working on this project, it lives here.
Level 3 — Directory level
A CLAUDE.md file inside a specific directory within your project. Applies only when Claude is working in that directory. Useful for directory-specific conventions.
The most common mistake
A developer sets up perfect instructions at the user level, then wonders why a new team member gets inconsistent behaviour from Claude. The instructions were never in the project — they existed only on the original developer's machine. The fix: move shared standards to the project-level CLAUDE.md.
How to debug which instructions are loaded
Use the /memory command. It shows you exactly which memory files Claude is currently using. This is your first troubleshooting tool when Claude's behaviour is inconsistent across team members.
Keeping it organised
For larger projects, use the @import syntax to reference external files from your CLAUDE.md, and keep separate rule files for different topics in a .claude/rules directory — for example, testing.md, api-conventions.md, deployment.md.
The client
A medium-sized law firm wants to use Claude Code to help their junior associates draft documents, review contracts, and manage research. Four associates will all use the same setup.
Design the following — write your answers in plain English
- Project-level CLAUDE.md: What universal standards should apply? Think about document formatting, tone, confidentiality notices, citation format. Write the actual content.
- Two path-specific rule files: one for contract review (files matching **/*.contract.md) and one for research documents (files matching **/*.research.md). Write the specific rules each should contain.
- One team command: design a /review command available to the whole team. Write what it does and what criteria it applies.
- User vs project decision: your personal CLAUDE.md includes rules about your preferred code style from a previous project. Should these go in the project-level config for this client? Why or why not?
Custom commands and skills — reusable workflows for your team
What are custom commands and skills?
Custom commands are saved workflows you or your team can trigger on demand using a /command shortcut. Skills are similar but with additional configuration options for how they run.
Where they live determines who gets them
- .claude/commands/ (inside the project repo): available to the whole team, tracked in version control.
- ~/.claude/commands/ (your personal directory): just for you, not shared.
The context: fork setting
The most important skill configuration is the context setting. When set to fork, the skill runs in an isolated sub-agent. All the verbose output — the exploration, the intermediate reasoning, the dead ends — stays contained in the fork. Your main conversation stays clean. When the skill finishes, you receive a summary.
Use this for anything where the process is noisy but the result is what matters — codebase analysis, brainstorming, document review.
Tool restrictions in skills
You can restrict which tools a skill is allowed to use. A review skill that should never modify files can be restricted to read-only tools. This prevents accidents during skill execution.
The golden rule for what goes where
- CLAUDE.md is for standards that always apply — the rules of the road.
- Skills are for specific tasks you do sometimes — the specific routes.
Do not put task-specific procedures in CLAUDE.md or you end up with a bloated file full of instructions that are irrelevant most of the time. Do not put universal standards in skills or they only apply when someone remembers to invoke the skill.
The self-improvement loop
One of the most powerful uses of a custom command is one most people never think to build: a command that captures and learns from mistakes.
The idea is simple. Create a file called tasks/lessons.md in your project. Any time Claude makes an error and you correct it, run a /lessons command that tells Claude to write down what went wrong and what rule prevents the same mistake next time. Over a project's lifetime, this file becomes a living record of everything your specific setup has learned to avoid.
At the start of each new session on the same project, Claude reads tasks/lessons.md automatically (because it lives in the project). The mistakes from last week inform this week's work without you having to re-explain anything.
The scenario
You are setting up Claude Code for a marketing agency. They write blog posts, review client briefs, and produce monthly performance reports. You have the following list of instructions to organise.
For each instruction, decide: CLAUDE.md or a custom command?
- Always write in a professional but friendly tone.
- Run a full SEO audit of any blog post draft.
- Never include specific client names in documents without prior approval.
- Summarise a client brief into a one-page overview.
- All output should use British English spelling.
- Generate a monthly performance report from the provided data.
- Always cite sources when making factual claims.
What to look for
CLAUDE.md items should be things that must always apply regardless of the task. Command items should be specific workflows that are only needed sometimes. If you find yourself writing an instruction that starts with "when asked to..." it probably belongs in a command, not CLAUDE.md.
Path-specific rules — smart instructions that only load when needed
The problem they solve
You have testing standards that should apply to every test file. But your test files are scattered across dozens of directories. If you put the testing rules in your root CLAUDE.md, they load for every single file, even when you are working on something completely unrelated to tests. Wasted context, irrelevant instructions.
The solution: path-specific rules with glob patterns
In your .claude/rules/ directory, create rule files with a YAML header that specifies which file paths the rules apply to:
The pattern **/*.test.tsx matches every TypeScript test file in the entire codebase, regardless of which directory it lives in. The rules only load when Claude is working on a matching file.
Why this is better than directory-level CLAUDE.md for test files
A directory-level CLAUDE.md file only applies to files in that one specific directory. A path-specific rule with a glob pattern applies to every matching file across the entire codebase. For standards that must apply to files scattered across many directories, path-specific rules are the correct solution.
The scenario
You are building Claude Code configuration for a law firm. They have three types of files: contract drafts (**/*.contract.md), research notes (**/*.research.md), and client correspondence (**/*.letter.md).
Write a path-specific rule file for each type
- Contract rules — what standards should apply only when Claude is working on a contract file? Think about: legal language precision, required clauses to check for, what Claude should never change without flagging.
- Research rules — what standards apply only to research documents? Think about: citation format, how to handle conflicting sources, required structure.
- Letter rules — what applies only to client correspondence? Think about: tone, confidentiality notices, sign-off format.
The test
For each rule file, ask yourself: would this rule make sense if Claude were working on one of the other file types? If the answer is no — you've scoped it correctly. If the answer is yes — the rule probably belongs in the project-level CLAUDE.md instead.
Plan mode versus direct execution
The two modes
Plan mode
Claude explores first, understands the full scope, identifies the different possible approaches, and presents a plan before making any changes. You review the plan, adjust if needed, then say go.
Use plan mode when: there are multiple valid approaches that need evaluation, the task involves many files, the changes would be hard to undo, or you are making architectural decisions.
Direct execution
Claude makes the change immediately. No planning stage.
Use direct execution when: you know exactly what needs to happen, the scope is limited, there is no real ambiguity about approach, and the change is contained to one or a small number of files.
A practical test
If you can describe the exact change in one clear sentence — direct execution. If you need a paragraph to explain what you want and even then you are not sure which approach is best — plan mode.
The hybrid pattern
For multi-phase tasks, combine both. Use plan mode for investigation and design. Then use direct execution to implement the specific actions in the plan. Think before you act, but do not keep stopping to re-plan simple implementation steps.
Categorise each of these 8 tasks
DO NOT SKIP. Write: plan / direct / hybrid and one sentence of reasoning for each.
- Fix a typo in a function name that appears in one file.
- Migrate the entire application from one logging library to a different one (affects 30+ files).
- Add input validation to one specific form field.
- Restructure the database schema to support multi-tenancy.
- Change the colour of a button in a CSS file.
- Investigate why an API endpoint is returning 500 errors intermittently, then fix it.
- Add unit tests for a specific function that already has a clear interface.
- Evaluate two different caching approaches and recommend the better one for this specific use case.
Packaging your work as a Skill
What is a Skill?
Throughout this course you have learned to write great prompts, use tools, build agents, and configure Claude with custom instructions. A Skill is how you package all of that into something reusable and portable.
Every time you start a new conversation with Claude, it has no memory of how you have worked before. You have been solving this with system prompts and custom commands — but those live inside one specific setup. A Skill is a self-contained folder you upload once, and Claude will automatically know when to use it.
What is inside a Skill
A Skill is a folder. The only required file is SKILL.md. Everything else is optional.
2. The folder name must use hyphens only, no spaces or capitals —
my-proposal-writer not My Proposal Writer3. Do not put a README file inside the skill folder. All instructions go in SKILL.md.
Anatomy of a SKILL.md file
Your SKILL.md has two parts: a header that tells Claude when to use the skill, and a body that tells Claude how to use it.
Part 1 — The header (YAML frontmatter)
The header sits between two sets of triple dashes --- at the very top of the file. Claude reads this before anything else to decide whether the skill is relevant to what you are asking.
Good vs bad descriptions
Part 2 — The body (your instructions)
Everything after the header is your actual instructions — written in plain Markdown, exactly like the system prompts and CLAUDE.md files you have written throughout this course. The difference is that these instructions are now self-contained and portable.
references/ folder and mention it in your instructions. Claude will read it when needed without loading everything upfront.Testing your skill
After uploading, check two things: does it trigger when it should, and does it stay quiet when it should not.
If it is not triggering — your description needs more trigger phrases. Add the actual words people use when they need this workflow.
If it is triggering too broadly — make your description more specific, or add a line like: "Do NOT use for general writing tasks."
How to upload your skill
- Zip your skill folder (right-click → Send to → Compressed folder)
- In Claude, go to Settings → Capabilities → Skills
- Click Upload Skill and select your zip file
- Toggle the skill on
- Open a brand new conversation and test it
What to build skills for
Skills work best for workflows you run repeatedly. The patterns that appear most in client work:
- Document creation — proposals, reports, briefs, email templates. Anything where format and tone need to be consistent across clients.
- Workflow automation — multi-step processes like client onboarding or content audits that follow a consistent order of steps.
- Domain expertise — industry-specific knowledge your clients should not have to re-explain each time, like compliance rules or scoring criteria.
- Iterative refinement — workflows that improve through review loops, like draft → feedback → revise → finalise, where the steps are always the same.
How everything connects
A Skill is what you hand a client so they do not have to think about any of what you have built. Here is how each phase of this course lives inside one:
- Phase 1 — Prompt Engineering: your SKILL.md body is just a structured, reusable prompt
- Phases 2–3 — Tools & Agents: your skill can describe multi-step agent workflows Claude should follow automatically
- Phase 4 — Reliability: your skill embeds error handling and retry logic so nothing breaks mid-workflow
- Phase 5 — Configuration: your skill replaces one-off system prompts with something permanent and portable
DO NOT SKIP THIS EXERCISE
You are going to package a workflow as a reusable Skill. Use the UrbanNest scenario or any repeatable workflow you have built during this course.
Build it
- Create a folder on your computer with a hyphenated name — e.g.
urbannest-assistant - Inside it, create a file named exactly SKILL.md
- Write your frontmatter: give it a name and a description with at least three trigger phrases
- Write your instructions: describe the workflow Claude should follow, step by step
- Zip the folder and upload it via Settings → Capabilities → Skills
Test it — in a fresh conversation for each test
- Test all three trigger phrases — confirm the skill loads
- Test two unrelated requests — confirm the skill stays quiet
- Ask Claude: "When would you use the [skill name] skill?" — read its answer
Reflect
Does Claude's description of when to use the skill match what you intended? If not, revise the description and re-upload. This iteration loop — write, test, refine — is exactly how professional skills are built.
The professional build process — from client brief to delivered tool
Step 1 — Understand the problem, not the solution
When a client says "I want an AI tool," the first thing to do is understand their actual pain. Where is time being wasted? Where are errors being made? Where is information getting lost? The AI solution will emerge from the pain, not the other way around. Use the Client Discovery Template for your first meeting.
Step 2 — Identify the right scope
Not every client problem needs a multi-agent system. Some need a well-crafted prompt in a simple tool. Resist the temptation to over-engineer. A simple solution that works is worth infinitely more than a complex solution that does not.
Step 3 — Design before you build
Write out what tools you need, what the agent loop looks like, what the escalation rules are, and what the output format should be. On paper. Before any code. The exercises throughout this course have been training you to do exactly this.
Step 4 — Build the happy path first
Get the main scenario working end-to-end before you handle errors and edge cases. A tool that works for 80% of cases is deployable. A tool that handles every edge case but fails on the main case is not.
Step 5 — Test with real data
Use your client's actual data — or realistic synthetic data — to test your tool. Problems that do not appear with made-up test data almost always appear with real data.
Step 6 — Document your prompt decisions
What criteria are you using? Why those criteria? What did you try that did not work? This documentation is invaluable when you need to adjust the tool later.
Step 7 — Build in visibility
Your client needs to be able to see what the tool is doing and why. An opaque AI that produces answers with no explanation is hard to trust and hard to fix. Build in logging, annotation, and clear confidence indicators from the start.
The ten most common mistakes — read this before you build
These are the mistakes that trip up builders at every level of experience. Read every one before you start the capstone.
1. Vague instructions
The fix is always specificity — categorical criteria, not general guidance. Covered in Lesson 3.
2. Forgetting that subagents do not share context
In a multi-agent system, every subagent starts with zero knowledge of what the coordinator knows. You must pass everything explicitly. Covered in Lesson 12.
3. Confusing empty results with tool failures
A lookup that finds nothing is a valid answer. Do not retry it as if it were an error. Covered in Lesson 9.
4. Using Claude's confidence score as an escalation trigger
Claude can be confidently wrong. Build escalation on objective criteria. Covered in Lesson 17.
5. Single-pass review of many items
Processing too many things at once causes attention dilution. Multi-pass architecture produces consistently better results. Covered in Lesson 14.
6. Using prompt instructions for business-critical rules
Prompts are followed most of the time, not all of the time. For rules that must hold 100%, enforce programmatically. Covered in Lesson 13.
7. Personal instructions in project configuration
If you want your whole team to benefit from a standard, it must be in the project-level config. Covered in Lesson 20.
8. Building for the demo, not the deployment
A demo uses perfect inputs. Real deployment uses messy, inconsistent, incomplete inputs from real users. Test with the most difficult inputs you can imagine.
9. No source attribution
Every claim your system makes should be traceable to its source. Build provenance in from the start. Covered in Lesson 19.
10. Not involving the client in escalation design
Your escalation rules reflect your client's values and risk tolerance. They need to sign off on those rules, not discover them after deployment.
🏆 UrbanNest — Build your first client deliverable
The scenario
Your client is a property management company called UrbanNest. They manage 340 residential units across 12 buildings. Every day they receive 40–60 maintenance requests from tenants by email. A human coordinator currently reads every email, categorises it, assigns it to the right contractor, and sends an acknowledgement to the tenant. This takes 3–4 hours a day.
They want an AI tool to automate this process. Your job is to design and build it.
DO NOT SKIP the design phase. Write your answers to all of these before touching Claude. The exercises throughout this course have been training you for exactly this.
- Problem analysis: What exactly is being automated? What must remain human?
- Tool list: What 4–6 tools does this system need? Write full descriptions for each using the guidelines from Lesson 8.
- Agent loop: Describe the step-by-step process from receiving an email to completing the workflow.
- Categorisation criteria: Define the exact categories and the specific criteria for each.
- Escalation rules: Define 4 specific conditions that should escalate to the human coordinator.
- Programmatic gate: Identify one rule that must be enforced programmatically. Explain why an instruction alone is not sufficient.
- Output format: Design the structured output template for each processed request.
- Handoff message: What information must be included when escalating to a human?
Write the complete system prompt for your main agent. It must include:
- The agent's role and scope — what it does and what it does not do
- The categorisation criteria with at least 3 few-shot examples covering ambiguous cases
- The output template the agent must use for every processed request
- Explicit escalation rules using the criteria you defined in Part 1
- Instructions for writing the tenant acknowledgement email
Test your system against these 6 maintenance requests. For each: document the category assigned, the action taken, whether escalation was triggered (and why or why not), and the tenant acknowledgement email.
- "Hi, the hot water in unit 4B has been cold for three days. This is really not OK, I have a baby."
- "There's a small drip under the kitchen sink. It's not urgent but should probably be looked at. Unit 7A."
- "URGENT: water is coming through my ceiling from the unit above. It's getting worse. Unit 2C."
- "My front door lock is stiff and takes a few tries to open. Unit 9D."
- "This is the fourth time I'm reporting the broken heating. Nobody has come. I'm calling the council tomorrow. Unit 11B."
- "Hi, I wanted to check if I could install a dishwasher in my unit. Who do I need to talk to? Unit 6F."
- Which of the 6 requests was hardest for your system to handle? Why?
- Did your system correctly identify which requests needed human review?
- What would you change in your system prompt after seeing the test results?
- What is the one thing your client most needs to understand before deploying it?