How We Actually Use AI in Production Code (and Where We Don't)

The Person Writing This Uses AI Every Day

We need to get something out of the way: this is not an anti-AI post. The person writing this is VP of Product at an AI company and runs a development studio that ships production code with AI tools daily. We are not here to tell you AI is bad. We are here because the conversation about AI code generation has exactly two camps right now, and both of them are wrong.

Camp one says AI will replace developers within 18 months. Camp one has been saying this for three years. Camp two says AI-generated code is unusable garbage that no serious engineer would touch. Camp two has not updated their opinion since GPT-3. Both camps are extremely loud on LinkedIn, which is how you know neither of them is shipping production code.

The truth is boring by comparison: AI is a genuinely useful tool that is also confidently wrong about 30% of the time, and that 30% will absolutely ship to production if you are not paying attention. We know this because we have caught it. Multiple times. In our own code. On a Tuesday. Before lunch.

So here is what using AI in production actually looks like. Not the LinkedIn version where every developer is suddenly 10x more productive and the future of software is a guy typing prompts into a terminal while a line graph goes up. The real version, where it is a fast, opinionated junior engineer who has read the entire internet and retained almost none of the nuance.

What AI Is Genuinely Good At

We will start with the good news because the good news is genuinely good. AI code generation saves us real time on a specific category of work, and that category is larger than most skeptics want to admit.

Boilerplate and scaffolding. Need a new React component with TypeScript props, a test file, and Storybook story? AI writes that in about 4 seconds. It used to take us 10 minutes of copying from the last component and doing a find-and-replace on the name while pretending that counted as engineering. This is where AI shines brightest: the work that is not difficult but is tedious enough that your brain quietly leaves the room halfway through. AI does not have a brain to leave the room. Advantage: AI.

Test scaffolding. Describe the function, get a test suite. The tests will need editing. Some of the assertions will be confidently, hilariously wrong. But the structure, the describe blocks, the setup and teardown, all of that is correct about 80% of the time. Writing tests went from "the thing we do after the feature is done, if we have time, which we never do, so we skip it and then lie about it in standup" to "the thing that takes 5 minutes instead of 30." That is a meaningful workflow change.

Regex. Nobody enjoys writing regex. Some people claim to. Those people also claim to enjoy reading YAML by hand, and we do not trust them either. AI writes regex that works on the first try more often than we do, and we have been writing regex for over a decade. This is not a compliment to AI. This is a public admission about regex.

Data transformations. "Take this JSON shape and map it to this other JSON shape." Perfect AI task. Deterministic input, deterministic output, clear success criteria. The kind of work where being fast matters more than being thoughtful. Give it a source and a target and get out of the way.

Documentation. AI writes excellent first-draft documentation. Not because it understands your code deeply, but because it is very good at describing what code does in plain English. You still need to review it. But "review and edit documentation" is a much better Tuesday than "write documentation from scratch while staring out the window and contemplating career alternatives."

What AI Is Confidently Wrong About

Here is where things get interesting, and by interesting we mean "the kind of interesting that costs you a sprint."

AI code generation has a specific failure mode that is different from human error. When a human developer writes bad code, they usually know it is bad. They leave a TODO comment. They mention it in the PR description. They say "this works but I hate it" in standup. Humans have doubt. Doubt is, it turns out, an incredibly useful feature.

AI has no doubt. It will hand you a component with a subtle state management bug, three accessibility violations, and a dependency on a package that was deprecated in 2022, and it will do this with the quiet confidence of a senior architect who has been at the company for nine years. It has been at the company for zero years. It has been predicting the next token. Those are not the same thing, but the output looks identical right up until the moment it does not.

Complex state management. AI-generated state logic works in simple cases and falls apart the moment your state has dependencies, side effects, or race conditions. We asked an AI tool to build a multi-step form with conditional validation and dependent field visibility. It produced code that looked clean, passed a visual review, and had a bug where going back to step 2 and changing a field did not invalidate the dependent fields in step 4. We found this bug because an actual human used the form in an actual way. The AI did not find it because AI does not use forms. It does not use anything. It has never filled out a form in its life, and honestly it could not even if it wanted to.

Accessibility. This is our area, so we notice it more than most. AI-generated components are almost never accessible out of the box. It will add aria-label to everything because that is the pattern it has seen most often. It will use a <div> with onClick instead of a <button> because approximately 40% of the internet does this and the AI learned from the internet. We wrote an entire post about this pattern because it is that common. AI did not create this problem, but it is very efficient at reproducing it at scale.

Architectural decisions. "Should this be a server component or a client component?" "Should this state live in the URL or in React state?" "Should this API call happen at the page level or the component level?" These are judgment calls that depend on your specific application, your performance requirements, your user patterns. AI will give you an answer immediately, with zero hesitation, as if it has thought about this deeply. It has thought about this for zero seconds. It does not think about things. It selects the most probable response. Sometimes probable and correct are the same. Sometimes you end up refactoring an entire page because the AI confidently chose the wrong rendering strategy and you trusted it because the code was clean. We are not bitter about a specific incident here. Yes we are.

Security-sensitive code. Authentication flows, input sanitization, database queries, anything that touches user data. We never ship AI-generated code in these areas without line-by-line review. Last month an AI tool generated a login flow that looked perfect, worked correctly, and stored the session token in localStorage. If you just felt your eye twitch, you are a security-conscious developer and we appreciate you. If you did not, please go read about XSS attacks and then come back. We will wait.

How to Prompt for Actually Useful Output

Most "AI coding tips" articles tell you to "be specific in your prompts." That is technically true in the same way that "score more points" is technically valid basketball advice. Here is what being specific actually looks like when you are trying to get code you will not immediately throw away.

Give it your conventions. "Write a React component" gets you a generic React component that looks like it was generated in 2021. "Write a React component using our pattern: named exports, props interface above the component, useCallback for event handlers, cn() for className merging" gets you something that might actually survive a PR review. The 30 seconds you spend describing your conventions saves 10 minutes of reformatting output that technically works but matches nobody's codebase, including yours.

Show it an example. Paste in a component that follows your patterns and say "write a new component that does X, following the same patterns as this one." AI is excellent at pattern matching. That is literally all it does. Give it a pattern to match. We keep a few "reference components" specifically for this purpose. It feels silly, like bringing a cheat sheet to a test. It works extremely well. We have made our peace with it.

Scope it small. "Build me a dashboard" is a prompt that will produce a dashboard-shaped object that does not actually work. "Write a function that takes an array of transactions and returns them grouped by month with running totals" is a prompt that will produce a function that works. The smaller and more specific the task, the higher the success rate. We treat AI the way we would treat a new hire on their first week: small, well-defined tasks with clear acceptance criteria and a code review before anything gets merged.

Tell it what not to do. "Do not use any external dependencies." "Do not add error handling, I will add that myself." "Do not use any deprecated React patterns like defaultProps." Negative constraints are often more useful than positive instructions, because AI's default behavior is to include everything it has ever seen. And it has seen a lot of bad code. It has seen all of the bad code. The internet is mostly bad code. That is not a hot take. That is a census.

Reviewing AI Code: A Different Kind of Code Review

Reviewing AI-generated code is not the same as reviewing human-written code, and if you are using the same review process for both, you are missing things. Guaranteed.

Human code reviews focus on logic, architecture, and style. You trust that the developer understood the problem, made intentional choices, and can explain their reasoning if asked. None of that applies to AI output. The code might look intentional. It is not. It is statistically probable. Those are different things, and the difference matters at 2 AM when something breaks and you are trying to figure out why a function works the way it does. The answer is "no reason." It is just the most likely sequence of tokens. Enjoy debugging that.

Read it like you did not write it. Because you did not. The most dangerous mistake we see is engineers treating AI output as "their code that they wrote with help." It is not your code. It is a stranger's code that appeared in your editor uninvited. Review it with the same skepticism you would apply to a Stack Overflow answer from 2017 with three upvotes.

Check the imports. AI will import packages that do not exist. We realize this sounds like something that should not be possible. And yet. We have seen AI confidently import from @react-aria/utils (real package, wrong import path), lodash/deepMerge (not a thing), and our personal favorite, a package called react-accessible-dropdown that exists absolutely nowhere on npm. The AI made it up. It invented a fake dependency, imported it, and used it in three files. It was not embarrassed about this. It is never embarrassed. Embarrassment requires self-awareness, which is currently not on the roadmap.

Test the unhappy paths. AI-generated code almost always works for the happy path. That is the path it was trained on, because tutorials and blog posts show the happy path. What happens when the API returns a 500? What happens when the array is empty? What happens when the user double-clicks the submit button because they are impatient? Test those cases specifically, because the AI definitely did not consider them. It does not consider things. We cannot say this enough, and every time we say it we mean it more.

Watch for hallucinated patterns. AI will sometimes combine patterns from different frameworks, different versions, or different paradigms into something that looks coherent but is quietly insane. A React component using Angular-style dependency injection. A Next.js App Router component using Pages Router conventions. TypeScript types that are syntactically valid, semantically meaningless, and would pass a linter but confuse every developer who reads them for the next two years. These are not bugs. They are dreams. The AI is dreaming about code it has seen, and sometimes the dreams cross the streams.

Where We Draw the Line

Every custom web application we build uses AI tools somewhere in the process. But "somewhere in the process" is doing a tremendous amount of heavy lifting in that sentence. Here is what it actually means.

AI writes, we review: Boilerplate, test scaffolding, utility functions, data transformations, documentation drafts. Low stakes, high time savings. The engineering equivalent of having someone else do the dishes.

We write, AI assists: Business logic, component architecture, API design. We make the decisions and write the core logic. AI might suggest alternatives or fill in details. It is helpful in the way that talking to someone while you work is helpful. The thinking is ours.

We write, AI does not touch: Authentication and authorization. Database migrations. Accessibility patterns. Security-sensitive validation. Anything where "almost right" is functionally identical to "completely wrong." We mentioned the localStorage session token incident. We will probably mention it in every conversation about AI for the rest of our careers. Some mistakes leave a mark.

The Real Productivity Gain

If you have read this far, you might be wondering: is AI actually worth it? Is the productivity gain real, or is it just vibes and venture capital?

It is real. It is just not 10x.

Our honest assessment after using AI code generation tools daily for over two years: we are about 20-30% more productive on the tasks where AI is useful, and those tasks make up roughly 40% of our total work. If you do that math (and we have, multiple times, because we kept hoping the number would be bigger), the overall productivity gain is somewhere around 10-15%. Not nothing. Not the revolution that LinkedIn promised. Not "fire half the team and let the robots handle it." More like "we spend less time on the boring parts and more time on the parts that actually require a human being to think."

That is the real story. AI did not make us 10x engineers. It made us engineers who waste less time writing boilerplate and more time on architecture, accessibility, performance, and the decisions that actually determine whether a product is good or just functional. The hard parts are still hard. The easy parts got faster. That is a genuine improvement, and it does not need to be inflated to 10x to be worth talking about. The people inflating it to 10x are usually selling something. Check their bio. There is always a link.

The engineers who benefit most from AI tools are the ones who were already good engineers. They know what good code looks like, so they can tell when AI output misses the mark. They know what questions to ask, so they prompt effectively. They know where the risks are, so they review the right things carefully. AI makes good engineers faster. It does not make bad engineers good. Nobody wants to say that out loud because it is not great marketing for AI tools, but it is the most true thing in this entire post.

We will keep using AI. It will keep getting better. And "better" will keep meaning "requires slightly less babysitting," which is genuine progress, just not the kind that gets keynote slots at tech conferences. The gap between "fewer issues" and "zero issues" is where the bugs live, and we plan to keep reviewing every line until that gap actually closes. Based on current trajectory, we have time.

How We Actually Use AI in Production Code (and Where We Don't)

The Person Writing This Uses AI Every Day

What AI Is Genuinely Good At

What AI Is Confidently Wrong About

How to Prompt for Actually Useful Output

Reviewing AI Code: A Different Kind of Code Review

Where We Draw the Line

The Real Productivity Gain

Want Us to Handle This?

How We Actually Use AI in Production Code (and Where We Don't)

The Person Writing This Uses AI Every Day

What AI Is Genuinely Good At

What AI Is Confidently Wrong About

How to Prompt for Actually Useful Output

Reviewing AI Code: A Different Kind of Code Review

Where We Draw the Line

The Real Productivity Gain

Want Us to Handle This?

Want Us to Handle This?

We give away tools. Subscribe and they come to you.

Want Us to Handle This?

We give away tools. Subscribe and they come to you.