The Pragmatic Guide to Agentic Coding in Engineering Teams
The promise of AI coding assistants is compelling: write code faster, reduce bugs, automate repetitive tasks. The reality is more nuanced. After deploying agentic coding systems at engineering teams ranging from 20 to 2,000 developers, we've learned what separates teams that get real productivity gains from those that burn resources on expensive technology demos.
The Productivity Numbers (What Actually Happens)
Before diving into implementation, let's be clear about what we measure. In controlled studies across our deployments, we've observed:
- 15-30% reduction in time to first commit for new feature development, measured from task assignment to code review submission
- 40-60% reduction in time on boilerplate and scaffold code—the types of code that are repetitive but require care (API clients, test fixtures, data models)
- 25-40% reduction in code review cycles when AI generates initial implementations, primarily because AI-generated code follows team conventions more consistently than human-generated first drafts
- No significant change in time on complex algorithmic problems, which remains constrained by the developer's understanding
- 10-20% increase in test coverage when AI assists with test generation, primarily because developers are more willing to write tests when it's easier
The productivity gains are real but not uniform. Some tasks benefit dramatically; others don't benefit at all. Understanding this distribution is key to successful adoption.
What Actually Works
Code Review Automation
The highest-value application we've found is automating the tedious parts of code review: style violations, obvious security issues, missing error handling, test coverage gaps.
AI Code Review Examples
✓ Flagging SQL injection vulnerabilities before human review
✓ Detecting missing null checks and potential null pointer exceptions
✓ Identifying resource leaks (unclosed connections, file handles)
✓ Suggesting test cases for edge cases the developer missed
✓ Enforcing API contract consistency across endpoints
✓ Checking for dependency conflicts or known vulnerabilities
✓ Verifying error handling completeness
What doesn't work: asking AI to evaluate algorithmic correctness or architectural decisions. It's fine at suggesting improvements, but the architectural judgment still belongs with experienced engineers.
Test Generation
AI-assisted test generation works well when:
- The test cases are deterministic
- The developer provides the input/output examples
- Edge cases can be enumerated
- The test structure follows patterns the AI has seen in the codebase
AI struggles with tests that require business logic understanding, tests that depend on external state, or tests that verify behavior the AI doesn't fully understand.
Boilerplate and Scaffold
Creating new API endpoints, data models, or service files? AI is excellent at generating the boilerplate that follows your team's patterns. The key is training the AI on your existing codebase so it generates code that matches your conventions.
Documentation Generation
The most underrated application. AI generates docs that are accurate because they're generated from the code itself. Docstrings, API documentation, README files—all get maintained because the friction of generating them approaches zero.
What Doesn't Work (And Why)
Asking AI to Solve Problems It Doesn't Understand
AI coding assistants have seen more code than any human. But they don't understand your business logic, your domain constraints, or your customer's needs. When you ask AI to "build the checkout flow," it generates code based on patterns it's seen—which may not match your actual requirements.
What works: AI assists with implementation after humans define requirements. What doesn't work: AI defines requirements or architecture.
Assuming AI Understands Your Architecture
AI generates code that matches what it's seen in its training data. If your architecture has patterns that diverge from common practices, AI will generate code that doesn't fit. You need to:
- Provide context (project structure, existing patterns)
- Review AI output for architectural fit
- Train/fine-tune on your codebase if you want AI to match your patterns
Unsupervised AI in Critical Systems
We've seen teams try to have AI make autonomous changes to production systems without human review. This is a recipe for incidents. AI makes subtle errors that are hard to catch—incorrect edge case handling, subtle logic errors, integration failures. Any AI-assisted change to production code should go through human review.
Implementation Playbook
Phase 1: Foundation (Weeks 1-4)
1. Choose your toolstack
├── GitHub Copilot: Good general-purpose assistance, integrates with VS Code
├── Cursor: Custom model training, good for Teams with specific patterns
├── Custom agent: Maximum control, requires engineering investment
└── We recommend starting with Copilot or Cursor for most teams
2. Configure your environment
├── Set up context provisioning (project documentation, architecture docs)
├── Configure IDE integration
├── Establish clear guidelines on when to use AI vs. manual coding
└── Document team conventions and patterns for AI context
3. Baseline measurement
├── Track code review time before AI adoption
├── Track test coverage
├── Track time-to-deploy for different task types
└── Use these as comparison points for ROI measurement
Phase 2: Gradual Rollout (Weeks 5-8)
1. Start with one team, one project type
└── Choose tasks with clear requirements and lower risk
2. Establish feedback loops
├── Weekly review of AI-generated code quality
├── Track which types of tasks benefit most
├── Collect developer experience and pain points
└── Iterate on guidelines based on learnings
3. Develop internal best practices
├── Create team guidelines for AI usage
├── Document which patterns work well
└── Build internal examples and templates
Phase 3: Scaled Adoption (Weeks 9-16)
1. Expand to more teams and project types
└── Adjust guidelines based on team-specific needs
2. Invest in custom tooling if justified
├── Fine-tuning on internal codebase
├── Custom prompts for team patterns
└── Integration with internal tooling (CI/CD, code review)
3. Measure and optimize
├── Quantify productivity improvements
└── Identify remaining friction points
The Organizational Dimension
Technology is the easy part. The organizational challenges are where implementations commonly fail:
Trust Issues
Experienced developers often resist AI assistance because it threatens their expertise. The framing matters enormously:
- AI as a tool that handles tedious work, freeing developers for interesting problems
- AI as augmentation, not replacement—developers remain responsible and in control
- Focus on what developers gain (speed, reduced tedium) rather than what they lose
Knowledge Gaps
Junior developers can generate more code with AI. This creates risk of junior developers working on tasks beyond their skill level. Mitigation: maintain strong code review regardless of AI assistance level, and ensure juniors have access to senior review.
Code Quality Culture
AI makes it easier to generate code—which can lead to more code that needs to be maintained. Teams need to maintain standards for code quality regardless of how the code was generated. This means: same code review standards, same test coverage requirements, same architectural reviews.
The Questions We Get Asked
"Should we be worried about AI introducing bugs?"
Yes, but less than you might think. AI-generated code has about the same defect rate as human-written first drafts—maybe slightly lower for well-defined tasks, slightly higher for complex logic. The key is maintaining human review standards for AI-generated code.
"What about code quality? Does AI-generated code meet our standards?"
It depends on your training. AI trained on your codebase generates code that matches your conventions. AI not trained on your codebase generates code that follows generic patterns. Start with strong context provision, then invest in fine-tuning if needed.
"Which teams should adopt AI assistance first?"
Teams with:
- High volume of repetitive coding tasks
- Strong code review culture (this ensures quality anyway)
- Clear patterns and conventions (easy to teach AI)
- Leadership buy-in for measured experimentation
"How do we measure ROI?"
Measure:
- Time-to-first-commit for specific task types
- Code review cycle count and time
- Test coverage percentage
- Developer satisfaction surveys
- Defect rate (before/after)
Bottom Line
AI coding assistants work when they're deployed thoughtfully—on the right task types, with proper human oversight, and in an organizational context that supports their adoption. They amplify what developers can do; they don't replace developer judgment.
The teams that will get the most value aren't the ones rushing to adopt every AI feature. They're the ones systematically identifying which tasks benefit from AI assistance, building the infrastructure to support it, and maintaining quality standards regardless of how code was generated.
If you're thinking about deploying AI coding assistance at your engineering team and want a pragmatic assessment of what's possible, talk to our engineering team. We've helped dozens of teams deploy these systems at scale.