How I learned to stop worrying and supervise the robot

After spending the last year integrating AI coding assistants into my workflow, I've come to a sobering realization: AI is absolutely a game-changer for software development, but it's not "there" yet. Not even close. And the journey to understanding its limitations cost me some late nights and a few production headaches.
The Honeymoon Phase: 50% Feels Like Superpowers
When I first started using Claude and GPT-4 for coding, it felt like I'd unlocked cheat codes for software development. Need a React component? Done in seconds. Want to parse some JSON and transform it? Here's your function. That first 50% of any project was blazing fast.
I remember thinking, "This is it. This changes everything." And I wasn't wrong—it does change everything. Just not in the way I expected.
Reality Check: The Polars Pipeline Incident
Let me tell you about the moment reality hit. I was building a massive data pipeline to process human behavior data for ML training. We're talking tens of thousands of lines of Polars code, processing web interaction data at scale.
The AI helped me build it quickly. It generated clean-looking code, complete with tests. Everything worked beautifully on my sampled development data. Ship it, right?
Wrong.
When we threw real production data at it—just a couple of gigabytes—the pipeline slowed to a crawl. What took minutes in dev was projecting to take days in production.
I spent the next 48 hours manually reading through every line of that AI-generated code. The culprits? Everywhere I looked:
# What the AI generated
results = []
for row in df.iter_rows():
processed = complex_transformation(row)
results.append(processed)
final_df = pl.DataFrame(results)
# What it should have been
final_df = df.select([
pl.col("column").map_elements(complex_transformation)
])
The AI consistently favored imperative for-loops over native Polars operations. It would construct DataFrames row by row instead of using vectorized operations. Complex joins were implemented as nested loops instead of using Polars' optimized join operations.
On small data? Unnoticeable. On production scale? Exponential slowdown.
The Pattern Emerges
This wasn't an isolated incident. As I paid closer attention, I noticed patterns in AI-generated code that would bite me later:
Over-Engineering in Infrastructure
The Terraform configurations it generated were technically correct but absurdly complex. Resources that could be managed with 20 lines were expanded into 200-line monuments to abstraction. Every possible edge case was handled, even ones that would never occur in our environment.
Tests That Test Nothing
def test_user_service():
service = UserService()
assert service is not None
assert isinstance(service, UserService)
# Thanks, AI. Very helpful.
The AI loved generating tests—lots of them. But many were redundant, testing implementation details rather than behavior, or worse, testing that Python's type system works as expected.
Ignoring Established Patterns
The most frustrating part was watching AI consistently ignore established patterns in our codebase:
- Creating new database connections instead of using our connection pool
- Implementing custom locking mechanisms when we had a battle-tested distributed lock service
- Writing synchronous code for our async task workers
All of these "worked" in development. All of them would have been disasters in production.
The Real Bottleneck Was Never Writing Code
Here's the thing that took me too long to realize: writing code was never our bottleneck. Reading, understanding, and maintaining code—that's where we spend most of our time. And AI doesn't help with that. In fact, it often makes it worse.
When you can generate 1,000 lines of code in minutes, you now have 1,000 lines of code to:
- Review for correctness
- Understand deeply enough to modify
- Debug when things go wrong
- Maintain for the next five years
The real challenges in software engineering live upstream (design, architecture) and downstream (integration, operations, maintenance). AI doesn't solve these—and sometimes actively makes them harder.
New Costs, Hidden Complexity
Using AI effectively introduced costs I didn't anticipate:
- Prompt engineering time: Getting the AI to generate the right code often took multiple iterations
- Hallucination management: Verifying that generated code doesn't use non-existent APIs or methods
- Context window juggling: Keeping relevant context in the conversation without hitting token limits
- Additional review overhead: Every piece of AI-generated code needs extra scrutiny
And there's the overproduction risk. When you can generate code 10x faster, you might just generate 10x more code—not 10x more value. More code to verify, more to debug, more to eventually delete when you realize it wasn't needed.
Treating AI Like a Supervised Junior Engineer
After all these hard lessons, I've found a mental model that works: treat AI like a talented but inexperienced junior engineer who needs constant supervision.
Just like with a junior:
- You need to be extremely specific in your instructions
- You have to review everything they produce
- You can't assume they understand the broader context
- They might technically solve the problem while missing the point entirely
I've tried all the techniques: memory banks, structured prompts, having the AI plan in markdown before coding, carefully crafted system prompts. They help, but they don't eliminate the fundamental need for supervision.
You still need to explicitly tell it:
- "Make this more modular"
- "This function is too long, break it up"
- "This should be multiple files"
- "You used a class here, but a simple function would suffice"
- "Use the existing patterns in our codebase"
The Future Is Still Bright
Despite all of this, I'm still incredibly excited about where AI coding assistants are headed. They've already transformed how I work—I just had to adjust my expectations and workflow.
The key insights:
- AI excels at the "first draft" but struggles with the "final polish"
- It's a powerful tool for exploration and prototyping
- Human expertise becomes more valuable, not less—you need to know what good looks like
- The productivity gains are real, but they come from acceleration, not automation
As models improve and tools get better at understanding context and constraints, many of these issues will fade. But for now, we're in the "powerful but flawed" phase of AI coding assistants.
Use them. Embrace them. But keep your eyes open and your code review skills sharp. The robots aren't taking our jobs yet—they're just making the jobs more interesting.