Use this as a loose guide for evaluating your own software engineering team’s LLM adoption journey.
How to code a statement or function
How a technology works
How technologies compare and contrast
Find better ways to implement small sections of code
Compare libraries and approaches
Summarize docs and long specs
Parse and analyze CSV/JSON/etc for quick answers
Interpret logs, traces, and stack traces
Propose 2–3 hypotheses for what's going wrong
Understand security and privacy pitfalls
Identify licensing concerns
Ask "what could go wrong?" before you build
Autocomplete single lines and small blocks as you type
Accept, reject, or modify suggestions without leaving your editor
Autocomplete changes that span files, following your existing patterns and conventions
Generate test scaffolds, fixtures, mocks, and edge cases from your own specs
Rename, extract, reorganize — well-defined, low-risk changes
Tidy up docs and comments
"Where is X implemented?" "What calls this function?" "How does Y work in this repo?"
Understand unfamiliar code quickly without spelunking manually
Triage a bug, isolate a repro case, and propose fixes with verification steps
Walk through a failing test together to understand root cause
Fill gaps in existing coverage
Add regression tests for newly discovered bugs
Summarize what a diff actually does in plain language
Spot risks, edge cases, and missing tests before human reviewers see it
Draft an implementation together and iterate based on your feedback
You're still the one deciding what goes in — LLM is your thought partner
Turn fuzzy asks into user stories, non-goals, and open questions
Generate acceptance criteria and phased rollout plans
Produce classic requirements docs or feature-based PRDs
LLM drafts code, tests, and docs together
You review the output and validate against acceptance criteria
You're measuring outcomes, not writing lines
LLM owns building out a meaningful unit test suite
LLM generates effective end-to-end tests (e.g. Playwright)
State goal, non-goals, constraints, and definition of done clearly enough that the LLM can execute without constant clarification
Provide "what good looks like" with concrete examples (inputs/outputs, acceptance checks)
Name what must not change: APIs, behaviors, performance budgets, accessibility expectations
Break work into small, independently verifiable steps
Ask for a plan first: milestones, risks, unknowns, and a test plan — before a single line is written
Choose a safe execution order: scaffolding → tests → implementation → cleanup
Give it repo-specific conventions: file locations, patterns to follow, naming rules
Teach it your working agreements: style, commit hygiene, review expectations
Define scope explicitly: "only touch these files," "non-interactive commands only"
LLM runs linters and automated tests, interprets failures, and proposes focused patches — without you in the loop for each step
LLM stops and asks when uncertain instead of guessing its way forward
Require tests for any behavior changes
Ask for a verification checklist, edge cases, failure modes, and rollback considerations
Require evidence for risky changes: diffs, benchmarks, logs
Build and tune AGENTS.md files that encode your guardrails and expectations
Adjust scope, tool permissions, and stop conditions as you learn what works
Define and assign distinct agent roles: Planner, Implementer, Reviewer, Verifier
Each agent has a clear responsibility and clear handoff point
Agents work simultaneously on independent slices: tests, refactors, docs, migrations
Strict file/module ownership prevents agents from stepping on each other
Shared definition of done, shared guardrails, and shared tool permissions across all agents
Central change log: what changed, why, what evidence supports it, what risks remain
Human approval gates at key milestones: design review, pre-merge, pre-release
You're not debugging a line of code — you're debugging a workflow
Evaluate whether agent coordination is actually working or just creating new complexity
Know when to simplify back to a single agent