Five Things AI Coding Agents Need from Your Repository
You have tried AI coding agents. Maybe Claude Code, maybe Copilot, maybe Cursor or Windsurf. The demos looked incredible. An agent writes a feature, opens a PR, handles the tests. Then you pointed it at your actual codebase and the results were... fine. Sometimes good, sometimes baffling, often requiring so many corrections that you wonder if you should have just written the code yourself.
The problem is rarely the model. It is almost always the repository.
AI coding agents are not magic. They explore your codebase the same way a new hire would: reading files, looking for patterns, trying to understand how things fit together. When your repo makes that easy, agents produce good work. When it does not, they guess. And they guess confidently.
Here are five concrete things you can do to get dramatically better results from AI coding agents. None of them require changing your architecture or adopting new tools. Most take less than an hour.
1. Clear build and test commands
An AI agent needs to verify its own work. When it writes code, it needs to build the project and run the tests to check whether anything broke. If your build process lives in someone's head, or requires three manual steps that are not written down anywhere, the agent will fail at the verification step and either ship untested code or waste time guessing at commands.
Why this matters from the agent's perspective: Agents operate in a loop. They make a change, then try to validate it. If they cannot find a build command, they have to guess (npm run build? make? yarn build? go build ./...?). Sometimes they guess right. Often they do not, and then they spend several tool calls trying variations before either succeeding or giving up.
Before:
Your team knows that the build requires installing a specific Python version, running a database migration, and then using a custom script. None of this is documented. The agent opens your repo and sees a package.json with no build script and a Makefile with targets that assume environment variables are already set.
After:
# Makefile
.PHONY: setup build test lint
setup: ## First-time setup
python3 -m venv .venv
. .venv/bin/activate && pip install -r requirements.txt
cp .env.example .env
make db-migrate
build: ## Build the project
. .venv/bin/activate && python -m build
test: ## Run all tests
. .venv/bin/activate && pytest tests/ -v
lint: ## Run linters
. .venv/bin/activate && ruff check . && mypy src/
db-migrate: ## Run database migrations
. .venv/bin/activate && alembic upgrade head
Or if you are using Claude Code specifically, a CLAUDE.md at the project root:
# Build & Test
- Build: `npm run build`
- Test: `npm test`
- Lint: `npm run lint`
- Single test file: `npx jest path/to/file.test.ts`
- DB required: Run `docker compose up -d postgres` first
The key point: the agent should be able to go from zero to "tests passing" without any human intervention. Put the commands where they are easy to find.
2. A map of your architecture
AI agents read code file by file. They can see that src/api/users.ts exports a function called createUser, but they cannot automatically infer your team's architectural decisions. They do not know that business logic belongs in src/services/, that src/api/ is only for HTTP-layer concerns, or that src/repositories/ is the only place that should contain database queries.
Without this context, agents put code in plausible but wrong locations. They add database queries directly in route handlers. They put validation logic in the data layer. The code works, but it violates your patterns, and you end up rewriting it anyway.
Why this matters from the agent's perspective: When an agent receives a task like "add an endpoint to deactivate a user account," it needs to decide where to put the route, where to put the business logic, and where to put the database query. With no architectural guidance, it scans existing files for patterns. If your codebase is inconsistent (and most are), it picks up on the wrong pattern or splits the difference in a way nobody would choose.
Before:
The agent looks at your src/ directory and sees 200 files. It finds a few route handlers that query the database directly and a few that use a service layer. It picks the pattern that seems most common, which happens to be the one your team has been trying to move away from.
After:
Add a short architecture doc. This does not have to be long. Five to ten lines can transform the agent's output.
# Architecture
## Directory structure
- `src/api/` - Express route handlers. Thin layer: parse request, call service, send response.
- `src/services/` - Business logic. All domain rules live here.
- `src/repositories/` - Database access. Only place that imports from `drizzle-orm`.
- `src/models/` - TypeScript types and Zod schemas for validation.
- `src/middleware/` - Express middleware (auth, logging, error handling).
## Rules
- Route handlers must not import from `repositories/` directly.
- All database queries go through the repository layer.
- Services should not know about HTTP concepts (no req/res objects).
This is the kind of information that a senior developer absorbs over weeks of working in a codebase. Giving it to the agent upfront saves multiple rounds of revision.
3. A working CI pipeline
This is the single biggest predictor of whether an AI agent's PR will be successful.
When an agent opens a PR and CI runs, the agent gets immediate, objective feedback. Tests fail? The agent can read the error output and fix it. Linting violations? Same thing. Type errors? Caught before anyone reviews the PR.
Without CI, the agent submits code and has no way to verify it works in the full context of the project. You become the CI pipeline. You pull the branch, run the tests manually, find the failures, and either fix them yourself or tell the agent what went wrong.
Why this matters from the agent's perspective: Many AI agent workflows are designed to iterate on CI feedback. The agent pushes a commit, waits for CI, reads the results, and pushes a fix if something failed. This loop can resolve issues that the agent could not catch locally (integration tests, environment-specific behavior, dependency conflicts). Without CI, this feedback loop does not exist.
Before:
Your repo has tests, but they only run when someone remembers to run them locally. The agent opens a PR that passes its local test run but breaks an integration test that requires a database connection, which only CI would have caught.
After:
# .github/workflows/ci.yml
name: CI
on:
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_PASSWORD: test
POSTGRES_DB: app_test
ports:
- 5432:5432
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci
- run: npm run lint
- run: npm run typecheck
- run: npm test
env:
DATABASE_URL: postgresql://postgres:test@localhost:5432/app_test
Your CI pipeline does not need to be complex. Run the linter, run the type checker, run the tests. That covers the vast majority of issues an agent might introduce. If you already have CI but it only runs on pushes to main, change it to run on pull requests. That one change makes AI agents significantly more effective.
4. Documented coding conventions
Every team has conventions. Some are enforced by tooling (ESLint, Prettier, Ruff) and some live in people's heads. AI agents handle the first kind automatically. They pick up your .eslintrc and produce code that matches your formatting. The second kind is where things go wrong.
Formatting is the easy part. The harder conventions are architectural and stylistic decisions that your linter does not enforce:
- "We use functional components with hooks, never class components"
- "Error handling happens at the controller level, not in services"
- "We use named exports, not default exports"
- "All API responses follow the
{ data, error, meta }envelope format" - "We never use
anyin TypeScript, useunknownand narrow"
These are the rules that make the difference between code that passes review on the first try and code that gets sent back with fifteen comments.
Why this matters from the agent's perspective: An agent infers conventions from the code it reads. If your codebase has a mix of styles (some files use default exports, some use named exports, some error handling is in services, some is in controllers), the agent has no reliable signal. It picks one approach, and there is roughly a 50% chance it is the approach your team is trying to phase out.
Before:
The agent writes a new API endpoint. It uses a default export, catches errors inside the service function, returns a plain object { users: [...] }, and uses any for the request body type. All of this works. None of it matches how your team writes code.
After:
Add a conventions section to your CLAUDE.md, CONTRIBUTING.md, or a dedicated CONVENTIONS.md:
# Coding Conventions
## TypeScript
- Never use `any`. Use `unknown` and type-narrow, or define a proper type.
- Use named exports, not default exports.
- Prefer `interface` over `type` for object shapes.
## API responses
- All endpoints return `{ data: T, error: string | null, meta?: object }`.
- Use the `ApiResponse<T>` wrapper from `src/types/api.ts`.
## Error handling
- Services throw typed errors (see `src/errors/`).
- Controllers catch errors and map them to HTTP status codes.
- Never let unhandled errors reach the client.
## React
- Functional components only. No class components.
- Use `useQuery` from TanStack Query for data fetching, never `useEffect` + `fetch`.
- Colocate component-specific types in the component file.
Notice that these rules are specific and actionable. "Write clean code" is useless guidance. "All endpoints return { data: T, error: string | null }" tells the agent exactly what to do.
5. Explicit high-risk areas
Every codebase has landmines. The billing module where a bug means you charge customers twice. The auth middleware where a mistake means anyone can access admin endpoints. The database migration scripts where a careless change drops a production table.
Your senior engineers know where these landmines are. An AI agent does not.
Without guidance, an agent treats all code with equal confidence. It will refactor your payment processing module with the same casualness it uses to update a README. That is dangerous.
Why this matters from the agent's perspective: Agents optimize for completing the task. If the task says "refactor the billing service to use the new pricing model," the agent will do exactly that. It does not know that src/billing/charge.ts handles real money and that a bug there has different consequences than a bug in src/utils/format-date.ts. It will not add extra test coverage or flag the change for careful review unless you tell it to.
Before:
The agent receives a task: "Update the discount calculation logic." It finds the relevant function in src/billing/discounts.ts, makes the change, runs the existing tests (which pass), and opens a PR. The change has a subtle edge case where certain discount combinations can result in negative totals, charging the customer's card for -$50. The existing tests did not cover this case, and the agent did not know to add new ones.
After:
Add a high-risk section to your agent instructions:
# High-Risk Areas
## src/billing/
Payment processing code. Changes here affect real financial transactions.
- Always add tests for edge cases (zero amounts, negative values, currency rounding).
- Never modify `charge.ts` or `refund.ts` without explicit approval.
- Test with the `billing-integration` test suite: `npm test -- --suite=billing`
## src/auth/
Authentication and authorization middleware.
- Changes here can create security vulnerabilities.
- Always verify that permission checks are not accidentally removed.
- Run the security test suite: `npm test -- --suite=auth-security`
## migrations/
Database migration files.
- Never modify an existing migration file. Only add new ones.
- All migrations must be reversible (include a `down` method).
- Test migrations against a copy of production data before merging.
You can also use this to mark files that should not be touched at all:
## Do Not Modify
- `src/config/production.ts` - Production configuration. Changes require ops team review.
- `src/legacy/` - Legacy code scheduled for removal. Do not extend or refactor.
- `.github/workflows/deploy.yml` - Deployment pipeline. Changes require DevOps approval.
This is not about limiting the agent. It is about giving it the same awareness that your experienced developers have. A senior engineer knows to be extra careful around billing code. Now your agent does too.
Putting it all together
These five things share a common theme: they make implicit knowledge explicit. The information is already in your team's heads. Writing it down benefits AI agents, new hires, and your future selves equally.
You do not need to do all five at once. If you do nothing else, start with number one (clear build and test commands) and number three (a working CI pipeline). Those two changes alone will significantly improve the quality of AI-generated PRs because they give the agent the ability to check its own work.
If your codebase is a monorepo, or has an unusual structure, or uses custom tooling, the architectural map (number two) becomes especially important. The more your repo deviates from common conventions, the more the agent needs explicit guidance.
And if you are using AI agents for anything beyond trivial changes, documenting your high-risk areas (number five) is not optional. It is a safety measure.
How ready is your repository?
We built the Kerex AI Readiness Checker to answer exactly this question. Point it at any GitHub repository and it will score your repo from 0 to 100 across four dimensions:
- Detection Confidence - Can the tool reliably identify your project's language, framework, and structure?
- Agent Instructions - Does your repo contain explicit guidance for AI agents (CLAUDE.md, CONTRIBUTING.md, or similar)?
- PR Verifiability - Can an agent verify its own changes? This covers CI pipelines, test suites, and linting.
- Documentation - Is your architecture, conventions, and project context documented well enough for an agent to understand the codebase?
Each dimension scores 0 to 10, and the total maps to a score from 0 to 100. Repos scoring above 90 are rated "Ready" for AI agents. Most repos we have seen score between 40 and 70, which means there is real, achievable improvement available.
Check your repository at ready.kerex.app and see where you stand. The analysis takes about a minute and gives you a specific breakdown of what to improve.