Copilot multi-agent coordination
A couple of months ago I wanted to build a new API endpoint with permission checks, tests, and documentation. I also wanted to experiment properly with the multi-agent features in GitHub Copilot.
Probably a lot of you can relate to this. I opened Copilot chat and started explaining. One hour and forty-seven messages later, the context window was full of noise. Copilot was forgetting things I told it twenty minutes ago. The implementation did not match the plan I described earlier. I had to rewrite half of it.
We typically ask an agent to plan, code, test, review, and document all at the same time. No human team works that way. Why should an AI agent?
I want to share the patterns that worked for me, what is still painful, and the exact configuration files you can use today.
The solo agent problem
Think of it like a solo developer handling an entire feature alone. She plans the architecture, codes it, writes tests, reviews her own code, and deploys. She context-switches constantly. Her plan might not match what she codes. Her self-review misses issues that fresh eyes would catch.
Now imagine a small team instead. An architect designs cleanly. A builder focuses on implementation. A reviewer catches mistakes with fresh perspective. Each person does one job well. They pass work like a relay race.
That second team ships faster and catches more bugs. Not because they are smarter, but because focused roles reduce cognitive noise.
This is the core insight behind multi-agent Copilot. Not “more agents.” The right agent at the right stage, with a narrow, clear role.
Start simple: Plan then Agent
The simplest coordination is just a deliberate pause between thinking and building. VS Code offers three built-in agents: Plan, Agent, and Ask. You do not need custom agents to get value here.
The workflow: switch to Plan, describe your task, review the output. Only after you approve the plan, hand off to Agent for implementation.
What this gives you: the planning phase is separate from execution noise. You see reasoning before code is written. Context for the implementation phase starts cleaner.
This sounds basic. It is basic. But almost everyone skips it. They jump straight to Agent, let it plan and code in the same breath, then wonder why the implementation drifts. That five-minute pause to review a plan prevents hours of rework.
Subagents: send someone to research, keep your thread clean
Imagine you are a project manager and you need to understand three different frameworks before picking one. You could research all three yourself. Or you could ask three colleagues: “Go research Framework A, B, and C. Come back in thirty minutes with a one-page summary each.”
They go off. Read docs. See examples. Note patterns. Then each hands you a clean summary. You never see the hundred browser tabs or the scattered notes. Just the conclusions.
That is what a subagent does. A subagent runs in its own isolated context. It does not add noise to your main thread. When it finishes, only the summary comes back.
Key characteristics from the docs:
- Each subagent gets its own context window, completely isolated from the main conversation
- Multiple subagents can run in parallel for independent analyses
- By default, subagents cannot spawn further subagents
- Nested subagents are possible when explicitly enabled, with a maximum depth of 5
This is powerful for quality. Instead of one agent switching between correctness review, security review, and architecture review in the same thread, you spawn three parallel subagents. Each one focuses on one angle. You get three independent reports without context contamination.
flowchart LR
subgraph solo["Solo Agent — Overloaded"]
direction TB
P[Plan] --> C[Code]
C --> T[Test]
T --> R[Review]
R --> D[Document]
D -->|context lost| P
C -->|drift| R
T -->|forgotten| P
P -.->|chaos| T
C -.->|tangled| D
end
subgraph multi["Coordinated Subagents"]
direction TB
Coord["🎯 Coordinator"]
Coord -->|task| Planner["📋 Planner"]
Coord -->|task| Implementer["💻 Implementer"]
Coord -->|task| Reviewer["✓ Reviewer"]
Planner -->|summary| Coord
Implementer -->|summary| Coord
Reviewer -->|summary| Coord
end Custom agents: give each role its own tools
Here is a dangerous pattern: giving one agent all the tools.
If your planner can edit files, it will. Instead of finishing the plan, it will start refactoring mid-thought. I have seen this happen three times in the same conversation. I asked for review, got unsolicited rewrites instead.
Custom agents fix this by restricting what each agent can do. A planner gets read and search only. An implementer gets edit and test. A reviewer gets read and diagnostics. Constraints create focus.
Custom agents live in .github/agents/ as .agent.md files. Here is a real set I use.
The Planner, read-only, cannot touch a file:
---
name: Planner
description: Create implementation plans without editing code
tools: ['read', 'search']
user-invocable: true
handoffs:
- label: Start Implementation
agent: Implementer
prompt: Implement the plan outlined above.
send: false
---
You are a planning specialist. When given a feature request:
1. Search the codebase for relevant patterns
2. Ask clarifying questions about scope
3. Design the approach step by step
4. Identify risks and unknowns
5. Propose success criteria
Do not write code. Do not edit files. Only plan.
The Implementer, has editing power but cannot plan or review:
---
name: Implementer
description: Execute structured plans with code changes and tests
tools: ['read', 'search', 'edit', 'terminalCommand']
user-invocable: false
model: ['Claude Sonnet 4.6 (copilot)', 'GPT-5 (copilot)']
handoffs:
- label: Request Review
agent: Reviewer
prompt: Review the implementation above against the original plan.
send: false
---
You are an implementation specialist. You receive a plan and execute it.
Follow the plan precisely. Write tests as you go. If the plan has gaps, ask for clarification. Do not redesign on the fly.
The Reviewer, read-only with diagnostics:
---
name: Reviewer
description: Review code for correctness, security, and quality
tools: ['read', 'search']
user-invocable: false
---
You are a thorough code reviewer. Evaluate changes for:
1. Correctness: does it work? Edge cases?
2. Security: injection risks, data exposure, auth checks?
3. Quality: naming, duplication, test coverage?
Do not edit code. Report findings with severity levels. Flag blockers separately from nice-to-haves.
Notice the frontmatter fields:
toolscontrols what each agent can do. Planner cannot edit. Reviewer cannot edit.user-invocable: falsemeans the Implementer and Reviewer are only reachable through handoffs or as subagents. Users cannot call them directly from the dropdown.handoffscreates buttons that appear after a response. “Start Implementation” takes you from Planner to Implementer with context.modellets you pick different models per agent. Cheaper models for focused tasks, expensive models for complex reasoning.
The coordinator-worker pattern
For larger features, a coordinator agent orchestrates the whole flow. It never writes code itself. It delegates to specialists.
---
name: Feature Builder
description: Coordinates feature development across specialists
tools: ['agent', 'read', 'search']
agents: ['Planner', 'Implementer', 'Reviewer']
---
You are a feature development coordinator. For each request:
1. Use the Planner agent to break down the feature into tasks
2. Present the plan to the user for approval
3. Use the Implementer agent to write code for each task
4. Use the Reviewer agent to check the implementation
5. If the reviewer finds issues, send them back to the Implementer
Iterate between implementation and review until each phase converges. Never implement code yourself.
The agents field restricts which subagents the coordinator can invoke. Without it, the coordinator might pick any available agent. With it, you control the team roster.
This keeps the coordinator’s context clean. It sees high-level progress. The noisy details of implementation and review happen in isolated subagent contexts and come back as summaries.
TDD workflow: Red, Green, Refactor agents
This is my favorite orchestration pattern because it maps directly to test-driven development.
The coordinator:
---
name: TDD
description: Test-driven development workflow
tools: ['agent']
agents: ['Red', 'Green', 'Refactor']
---
Implement the requested feature using test-driven development:
1. Use the Red agent to write failing tests that define the feature
2. Use the Green agent to write minimal code to pass those tests
3. Use the Refactor agent to improve the code without breaking tests
After each step, verify the result before moving to the next.
The three specialists:
---
name: Red
user-invocable: false
tools: ['read', 'search', 'edit']
---
Write failing tests that define the feature behavior. Do not write implementation code. The tests should fail. They define what "done" means.
---
name: Green
user-invocable: false
tools: ['read', 'edit', 'terminalCommand']
model: ['Claude Haiku 4.5 (copilot)', 'GPT-5-mini (copilot)']
---
Write minimal code to make the failing tests pass. Do not optimize. Do not refactor. Just make the tests green. Speed over elegance.
---
name: Refactor
user-invocable: false
tools: ['read', 'search', 'edit', 'terminalCommand']
---
Improve the code without breaking tests. Remove dead code, consolidate duplication, clarify names, improve performance. Keep all tests passing.
Notice the Green agent uses a cheaper, faster model. It does not need deep reasoning. It needs quick, working code. The Refactor agent gets the full model because code quality decisions need more thought.
Multi-perspective parallel code review
A single pass review often misses problems that become obvious when you look through a different lens. Subagents let you run independent reviews in parallel.
---
name: Thorough Reviewer
description: Parallel multi-perspective code review
tools: ['agent', 'read', 'search']
---
When asked to review code, run these subagents in parallel:
- Correctness reviewer: logic errors, edge cases, type issues
- Security reviewer: input validation, injection risks, data exposure
- Architecture reviewer: codebase patterns, design consistency
After all complete, synthesize findings into a prioritized summary. Note which issues are critical versus nice-to-have. Acknowledge what the code does well.
This works because each subagent approaches the code fresh, without being anchored by what other reviewers found. Three independent perspectives beat one agent trying to switch modes.
Cloud sessions: background agents while you keep working
Think of cloud sessions like hiring a contractor. She understands the job, goes to her own office, works on it, and sends you a pull request when done. You do not hover. You check progress when you want. You steer if the direction is wrong.
GitHub cloud agent works this way. You start a task from the Agents tab or by assigning an issue to Copilot. The agent runs in GitHub infrastructure. You monitor session logs. You can steer mid-session.
The practical workflow:
- Write a well-scoped issue with clear acceptance criteria
- Assign it to Copilot (or start from the Agents tab)
- Optionally select a custom agent for the task
- Monitor logs while the agent works
- Steer if the agent goes off track
- When it opens a PR, review and merge
The real power is parallelism. You can start five cloud sessions for five different tickets. While they run, you work locally on something else. When PRs arrive, you review.
You can also continue a cloud session locally. Click “Open in VS Code” from the agents tab, and the full context transfers to your editor. Or copy the copilot --resume=SESSION-ID command to continue in the CLI.
Agent skills: team playbooks that actually run
Every team has tribal knowledge. How to write tests. How to handle errors. How to structure commits. Usually this lives in someone’s head or in docs nobody reads.
Agent skills turn that knowledge into something agents actually follow. A skill is a SKILL.md file in .github/skills/. Agents load it automatically when relevant.
---
name: error-handling
description: Team conventions for error handling patterns
---
When writing error handling:
1. Always catch specific exceptions, never bare except
2. Log with context: what action failed, what error occurred, what happens next
3. For user-facing errors, provide a clear message
4. For network errors, implement retry with exponential backoff
5. Never silently swallow errors
Skills are an open standard. They work across VS Code, Copilot CLI, and cloud agent. You write them once, and every agent in your project follows them.
The difference from custom instructions: instructions define coding standards that always apply. Skills define specialized capabilities that load on demand, only when the agent recognizes they are relevant for the current task.
The honest scorecard
Let me be fair about what delivers value and what still hurts.
Subagents for context isolation. Works well. Genuinely reduces noise. Parallel subagents for multi-perspective review is solid. Caveat: nested subagents are limited to depth 5 and disabled by default.
Tool-scoped custom agents. Works well. When you restrict a planner to read-only, it stops trying to be a hero. Trust goes up. Caveat: tool availability varies by surface and plan. Test that each agent can actually do its job.
Handoffs and review gates. Works well. They create deliberate pauses where drift gets caught. Caveat: handoffs are suggestions, not enforcement. You can ignore them and keep going.
Coordinator-worker patterns. Works well. Breaking big tasks into focused work is obvious but powerful. Caveat: a bad coordinator creates bad assignments. The instructions matter a lot.
Cloud sessions for parallel work. Works well. Running multiple features at once is genuinely useful. Caveat: steering mid-session has latency, costs premium requests, and only applies after the current tool call finishes.
Agent skills. Works well for scaling team conventions. Caveat: the system is relatively new. Skills evolve, and what works today might need adjustment as tools change.
What is still frustrating
I will not hide the sharp edges.
Plan mode sometimes ignores its own plan. You create a detailed plan. The agent swears it will follow it. Halfway through implementation, it decides the plan was wrong and changes direction. You end up with code that does not match the plan. Then you have to revert and force it back on track. This wastes real time.
Premium request burn is hard to predict. Steering agents mid-session, invoking parallel subagents, switching between surfaces, all of it costs premium requests. A complex workflow can eat through requests faster than you expect. There is no budget control you can declare per agent or per workflow.
Agent name ambiguity. When you have five agents and two of them have similar descriptions, the model might pick the wrong one. The agents field in frontmatter helps, but you need to be deliberate about naming and restricting the roster.
Setup overhead is real. Creating a good custom agent library takes work. You need to understand your tasks, define agents precisely, test their tool lists, and maintain the files as features change. For a solo developer, this might feel like over-engineering.
Experimental features change. The agent system is still young. Subagent controls, nested agents, and some handoff behaviors are experimental. Things that work today might change in the next update.
Context window limits still apply per subagent. Subagents have their own clean context, which is great. But each one still has size limits. For very large codebases, you still need to manage what information each agent sees.
What I would change if I could
Four honest wishes.
First, make handoffs enforceable. Right now they are suggestions. I want agents to automatically pause at handoff points, not just nudge.
Second, add explicit budget controls. Let me declare: “This coordinator can spawn up to three subagents, each capped at twenty messages.” Hard limits prevent runaway automation and surprise premium burns.
Third, smarter context passing at handoffs. Let me control what transfers: “Pass file diffs and the plan. Do not pass the full chat history.”
Fourth, built-in agent versioning. Let me pin agent configurations. “Always use Planner v2.” Right now, changes to agent files affect all sessions immediately.
A starter kit for this week
If you want to try this today without overcomplicating, here is the minimum setup.
Your directory structure:
.github/
agents/
planner.agent.md
implementer.agent.md
reviewer.agent.md
copilot-instructions.md
Create the three agents I showed earlier (Planner, Implementer, Reviewer). Add a copilot-instructions.md with your project’s build commands, test runner, and coding conventions. That file guides every agent automatically.
For your next non-trivial ticket, try this flow instead of one long chat:
- Start in Planner. Get a detailed plan.
- Review the plan. Adjust it.
- Hand off to Implementer.
- Review the diffs and checkpoints.
- Hand off to Reviewer.
- Address findings.
Notice what feels better. Notice what feels worse. Iterate from there.
You do not need cloud sessions, nested subagents, or coordinator patterns on day one. Basic role separation is enough to see a real difference.
Experimenting from scratch is still worth doing once. When you write agents manually, you understand what these tools are abstracting. That understanding makes you better at using them. But you do not need to keep doing it by hand once you see how the pieces fit.
References
-
GitHub Copilot for Linear - Assign Linear issues directly to the coding agent
-
awesome-copilot repository - Community collection of agents, skills, and instructions