Coding trends 2026

In the tech world, there is a constant flow of changes and keeping up with them means the choice for tools and technologies which are the most appropriate to invest your time in.

In 2026 the best programming language or technology stack to learn really depends on your personal aims, hobbies, and apps you are going to create.

The use of AI is increasing. AI as a “Pair Programmer” is becoming the default. Code completion, refactoring, and boilerplate generation are used often. Devs spend more time reviewing and steering code than typing it. “Explain this error” and “why is this slow?” prompts are useful.

In prompt-Driven Development programmers describe the intent in natural language and then let AI generate first drafts of functions, APIs, or configs. Iterate by refining prompts rather than rewriting code. Trend: Knowing how to ask is becoming as important as syntax.

Strong growth in: Auto-generated unit and integration tests and edge-case discovery. Trend: “Test-first” is easier when AI writes the boring parts.

AI is moving up the stack. Trend: AI as a junior architect or reviewer, not the final decider.

AI comes to Security & Code Quality Scanning. Rapid adoption in: Static analysis and vulnerability detection, secret leakage and dependency risk checks. AI can give secure-by-default code suggestions. Trend: AI shifts security earlier in the SDLC (“shift left”).

Instead of one-off prompts: AI agents that plan → code → test → fix → retry. Multi-step autonomous tasks (e.g., “add feature X and update docs”) can be done in best cases. Trend: Still supervised, but moving toward semi-autonomous dev loops.

AI is heavily used for explaining large, unfamiliar codebases and translating between languages/frameworks. It helps onboarding new engineers faster.

What’s changing: Less manual boilerplate work
More focus on problem definition, review, and decision-making. There is stronger emphasis on fundamentals, architecture, and domain knowledge. Trend: Devs become editors, designers, and orchestrators.

AI usage policies and audit trails is necessary. Trend: “Use AI, but safely.”

Likely directions:
Deeper IDE + CI/CD integration
AI maintaining legacy systems
Natural-language → production-ready features
AI copilots customized to your codebase

380 Comments

  1. Tomi Engdahl says:

    tnm
    /
    zclaw
    Public
    Your personal AI assistant at all-in 888KiB (~25KB in app code). Running on an ESP32. GPIO, cron, custom tools, memory, and more.
    https://github.com/tnm/zclaw

    Reply
  2. Tomi Engdahl says:

    Linus Torvalds vitsailee seuraajastaan: ”Pätevämpi henkilö”
    Anna Helakallio23.2.202609:53|päivitetty23.2.202609:53
    Torvalds vitsailee, että Linuxin tuleva pääkehittäjä osaa käsitellä suuria lukuja paremmin kuin hän.
    https://www.tivi.fi/uutiset/a/e4b2aa22-4a31-4073-ae6b-4408503d37c7

    Reply
  3. Tomi Engdahl says:

    doramirdor
    /
    NadirClaw
    Public
    Open-source LLM router that saves you money. Routes simple prompts to cheap/local models, complex ones to premium — automatically. OpenAI-compatible proxy.
    https://github.com/doramirdor/NadirClaw

    Reply
  4. Tomi Engdahl says:

    Wayland’s original promise was a cleaner, safer foundation for Linux desktops, one that could finally stop inheriting decades of X11 baggage. The pitch was modern graphics without the historical quirks, smoother rendering with fewer hacks, and better isolation so one app could not casually snoop on another. It also implied a future where screen sharing, scaling, and input handling would be solved in a way that felt native, not bolted on. In other words, Wayland was supposed to be the grown-up table where the desktop could finally eat without spilling.
    https://www.xda-developers.com/ways-wayland-isnt-living-up-to-promises-yet/

    Reply
  5. Tomi Engdahl says:

    AlexsJones
    /
    llmfit
    Public
    157 models. 30 providers. One command to find what runs on your hardware.
    https://github.com/AlexsJones/llmfit

    Reply
  6. Tomi Engdahl says:

    jQuery Releases v4: First Major Version in Almost 10 Years
    https://www.infoq.com/news/2026/02/jquery-4-release/

    jQuery, the pioneering JavaScript library that revolutionized web development, has released jQuery 4, marking its first major version in almost 10 years. The release coincides with the library’s 20th anniversary, having first been introduced on January 14, 2006.

    jQuery 4 brings extensive modernizations while maintaining the simplicity and developer experience. The team has focused on trimming legacy code, removing deprecated APIs, and dropping support for outdated browsers, resulting in a leaner and more performant library. The jQuery team expects most users will be able to upgrade with minimal changes to their code, supported by a comprehensive upgrade guide and jQuery Migrate plugin.

    A key support change in jQuery 4 is the removal of support for Internet Explorer 10 and older browsers, including Edge Legacy, iOS versions earlier than the last 3, and Android Browser. Internet Explorer 11 remains supported in this release, though the team has indicated that support will be removed in jQuery

    Reply
  7. Tomi Engdahl says:

    I’m an Amazon tech lead who uses AI to write code daily. There’s one situation I hesitate to use it in.
    https://www.businessinsider.com/amazon-tech-lead-vibe-coding-daily-resist-anni-chen-2026-2

    An Amazon tech lead says it’s hard to resist the allure of vibe coding.
    Anni Chen, who vibe codes daily, says it’s definitely a productivity boost.
    But she cautions against using vibe coding at scale or for production, and says technical knowledge matters.

    Reply
  8. Tomi Engdahl says:

    RightNow-AI
    /
    picolm
    Public
    Run a 1-billion parameter LLM on a $10 board with 256MB RAM
    https://github.com/RightNow-AI/picolm

    Reply
  9. Tomi Engdahl says:

    Enterprise use of open source AI coding is changing the ROI calculation
    news
    Feb 18, 2026
    8 mins

    https://www.infoworld.com/article/4134257/enterprise-use-of-open-source-ai-coding-is-changing-the-roi-calculation.html

    Open source has always had issues, but the benefits outweighed the costs/risks. AI is not merely exponentially accelerating tasks, it is disproportionately increasing risks.

    Reply
  10. Tomi Engdahl says:

    Building a Simple MCP Server in Python
    By Bala Priya C on February 19, 2026 in Practical Machine Learning 4
    PostShare
    In this article, you will learn what Model Context Protocol (MCP) is and how to build a simple, practical task-tracker MCP server in Python using FastMCP.

    Topics we will cover include:

    How MCP works, including hosts, clients, servers, and the three core primitives.
    How to implement MCP tools, resources, and prompts with FastMCP.
    How to run and test your MCP server using the FastMCP client.

    https://machinelearningmastery.com/building-a-simple-mcp-server-in-python/

    Reply
  11. Tomi Engdahl says:

    The Ideal Micro-Frontends Platform
    https://www.infoq.com/presentations/distributed-micro-frontends/

    Summary
    Luca Mezzalira shares a decision framework for micro-frontends, covering composition, routing, and communication. He explains how to structure “stream-aligned” teams and use a “tiger team” for foundational architecture. He also discusses the sociotechnical benefits of reducing external dependencies and shares how to use guardrails and discovery services to achieve 25+ deployments per day.

    Reply
  12. Tomi Engdahl says:

    BlockRunAI
    /
    ClawRouter
    Public
    The agent-native LLM router empowering OpenClaw — by BlockRunAI
    https://github.com/BlockRunAI/ClawRouter

    Reply
  13. Tomi Engdahl says:

    OpenAI Publishes Codex App Server Architecture for Unifying AI Agent Surfaces
    https://www.infoq.com/news/2026/02/opanai-codex-app-server/

    Reply
  14. Tomi Engdahl says:

    Vibe coding is transforming development – but at what cost to open source?
    South African software specialists see vibe coding as a double-edged sword for the open-source community.
    https://techcentral.co.za/vibe-coding-is-transforming-development-but-at-what-cost-to-open-source/277853/

    Vibe coding’s transformative impact on developer productivity is having adverse consequences for the open-source ecosystem, potentially exposing enterprises reliant on open-source software to risk.

    A paper from the Central European University (CEU) in January found that vibe coding – driven by AI coding assistants such as Claude Code, Cursor and Lovable – decreases the depth to which developers engage with code, documentation, libraries and other developers.

    “Traditionally, a developer selects packages, reads documentation and interacts with maintainers and other users. Under vibe coding, an AI agent can select, compose and modify packages end to end, and the human developer may not know which upstream components were used,” said the paper, authored by Miklos Koren, an economics professor at CEU, and his colleagues.

    Any developer worth his salt wants to develop code that is maintainable and that they understand…
    Even though open-source software does not have to be paid for, its value, especially to enterprises, is hard to deny. A 2024 Harvard Business School study estimates that firms would have to spend US$8.8-trillion globally to replace open-source software with proprietary builds, and that companies would spend 3.5 times more on software development if open-source software did not exist.

    Calwyn Baldwin, automation team lead at Johannesburg-based enterprise open-source solutions provider Obsidian Systems, said that like any other tool, AI coding assistants tend to reflect the skill and care of the developer using them. Developers with good habits get good results; those with bad habits inevitably produce bad code.

    Different skill set
    “AI coding is certainly a different skill set. It is very easy to tell the AI to go and do something, test if it works and then never look at the code again. If you do that over the long term you are going to have a bunch of developers that don’t know the code base very well.

    “While that is a fair concern, I don’t think that is true for every developer because some are still going to get stuck in and try to understand how it works. Any developer worth his salt wants to develop code that is maintainable, that they understand and can explain to someone else,” he said.

    Baldwin also disagrees with the paper’s argument that vibe coding diminishes social engagement in open-source projects, arguing that AI tools have, to the contrary, raised the need for higher levels of engagement, especially between experienced developers and their more junior counterparts.

    Bennie Kahler-Venter, a senior automation engineer at Obsidian, said it is important that senior staff are the first to engage with AI coding tools so they can identify their strengths, weaknesses and where they will be most useful in a project’s development pipeline.

    Reply
  15. Tomi Engdahl says:

    the effects of AI coding tools on the open-source community will be. But he sees a likely outcome: the velocity at which code can now be developed will flood the market with vibe-coded open-source projects, making it difficult for enterprises to discern which are well written, well maintained and backed by a good team.

    A lot of the market has a vested interest in the open-source world thriving and not imploding
    With many more projects to choose from, the need for reliable, mature open-source projects will rise, creating an incentive for enterprises to invest in their success, he said.

    https://techcentral.co.za/vibe-coding-is-transforming-development-but-at-what-cost-to-open-source/277853/

    Reply
  16. Tomi Engdahl says:

    How to Write a Good Spec for AI Agents
    https://www.oreilly.com/radar/how-to-write-a-good-spec-for-ai-agents/

    TL;DR: Aim for a clear spec covering just enough nuance (this may include structure, style, testing, boundaries. . .) to guide the AI without overwhelming it. Break large tasks into smaller ones versus keeping everything in one large prompt. Plan first in read-only mode, then execute and iterate continuously.

    “I’ve heard a lot about writing good specs for AI agents, but haven’t found a solid framework yet. I could write a spec that rivals an RFC, but at some point the context is too large and the model breaks down.”

    Many developers share this frustration. Simply throwing a massive spec at an AI agent doesn’t work—context window limits and the model’s “attention budget” get in the way. The key is to write smart specs: documents that guide the agent clearly, stay within practical context sizes, and evolve with the project. This guide distills best practices from my use of coding agents including Claude Code and Gemini CLI into a framework for spec-writing that keeps your AI agents focused and productive.

    We’ll cover five principles for great AI agent specs, each starting with a bolded takeaway.

    1. Start with a High-Level Vision and Let the AI Draft the Details
    Kick off your project with a concise high-level spec, then have the AI expand it into a detailed plan.

    Instead of overengineering upfront, begin with a clear goal statement and a few core requirements. Treat this as a “product brief” and let the agent generate a more elaborate spec from it. This leverages the AI’s strength in elaboration while you maintain control of the direction. This works well unless you already feel you have very specific technical requirements that must be met from the start.

    Why this works: LLM-based agents excel at fleshing out details when given a solid high-level directive, but they need a clear mission to avoid drifting off course. By providing a short outline or objective description and asking the AI to produce a full specification (e.g., a spec.md), you create a persistent reference for the agent. Planning in advance matters even more with an agent: You can iterate on the plan first, then hand it off to the agent to write the code. The spec becomes the first artifact you and the AI build together.

    Practical approach: Start a new coding session by prompting

    You are an AI software engineer. Draft a detailed specification for
    [project X] covering objectives, features, constraints, and a step-by-step plan.

    Keep your initial prompt high-level: e.g., “Build a web app where users can
    track tasks (to-do list), with user accounts, a database, and a simple UI.”

    The agent might respond with a structured draft spec: an overview, feature list, tech stack suggestions, data model, and so on. This spec then becomes the “source of truth” that both you and the agent can refer back to. GitHub’s AI team promotes spec-driven development where “specs become the shared source of truth…living, executable artifacts that evolve with the project.” Before writing any code, review and refine the AI’s spec. Make sure it aligns with your vision and correct any hallucinations or off-target details.

    Plan Mode to enforce planning-first: Tools like Claude Code offer a Plan Mode that restricts the agent to read-only operations—it can analyze your codebase and create detailed plans but won’t write any code until you’re ready. This is ideal for the planning phase: Start in Plan Mode (Shift+Tab in Claude Code), describe what you want to build, and let the agent draft a spec while exploring your existing code. Ask it to clarify ambiguities by questioning you about the plan. Have it review the plan for architecture, best practices, security risks, and testing strategy. The goal is to refine the plan until there’s no room for misinterpretation. Only then do you exit Plan Mode and let the agent execute. This workflow prevents the common trap of jumping straight into code generation before the spec is solid.

    Use the spec as context: Once approved, save this spec (e.g., as SPEC.md) and feed relevant sections into the agent as needed.

    Keep it goal oriented: A high-level spec for an AI agent should focus on what and why more than the nitty-gritty how (at least initially). Think of it like the user story and acceptance criteria: Who is the user? What do they need? What does success look like? (For example, “User can add, edit, complete tasks; data is saved persistently; the app is responsive and secure.”) This keeps the AI’s detailed spec grounded in user needs and outcome, not just technical to-dos. As the GitHub Spec Kit docs put it, provide a high-level description of what you’re building and why, and let the coding agent generate a detailed specification focusing on user experience and success criteria. Starting with this big-picture vision prevents the agent from losing sight of the forest for the trees when it later gets into coding.

    2. Structure the Spec Like a Professional PRD (or SRS)
    Treat your AI spec as a structured document (PRD) with clear sections, not a loose pile of notes.

    Many developers treat specs for agents much like traditional product requirement documents (PRDs) or system design docs: comprehensive, well-organized, and easy for a “literal-minded” AI to parse. This formal approach gives the agent a blueprint to follow and reduces ambiguity.

    The six core areas
    GitHub’s analysis of over 2,500 agent configuration files revealed a clear pattern: The most effective specs cover six areas. Use this as a checklist for completeness:

    Commands: Put executable commands early—not just tool names but full commands with flags: npm test, pytest -v, npm run build. The agent will reference these constantly.
    Testing: How to run tests, what framework you use, where test files live, and what coverage expectations exist.
    Project structure: Where source code lives, where tests go, where docs belong. Be explicit: “src/ for application code, tests/ for unit tests, docs/ for documentation.”
    Code style: One real code snippet showing your style beats three paragraphs describing it. Include naming conventions, formatting rules, and examples of good output.
    Git workflow: Branch naming, commit message format, PR requirements. The agent can follow these if you spell them out.
    Boundaries: What the agent should never touch—secrets, vendor directories, production configs, specific folders. “Never commit secrets” was the single most common helpful constraint in the GitHub study.

    Be specific about your stack: Say “React 18 with TypeScript, Vite, and Tailwind CSS,” not “React project.” Include versions and key dependencies. Vague specs produce vague code.

    Use a consistent format: Clarity is king. Many devs use Markdown headings or even XML-like tags in the spec to delineate sections because AI models handle well-structured text better than free-form prose. For example, you might structure the spec as:

    # Project Spec: My team’s tasks app

    ## Objective
    - Build a web app for small teams to manage tasks…

    ## Tech Stack
    - React 18+, TypeScript, Vite, Tailwind CSS
    - Node.js/Express backend, PostgreSQL, Prisma ORM

    This level of organization not only helps you think clearly but also helps the AI find information. Anthropic engineers recommend organizing prompts into distinct sections (like , , , etc.) for exactly this reason: It gives the model strong cues about which info is which. And remember, “minimal does not necessarily mean short”—don’t shy away from detail in the spec if it matters, but keep it focused.

    Integrate specs into your toolchain: Treat specs as “executable artifacts” tied to version control and CI/CD. The GitHub Spec Kit uses a four-phase gated workflow that makes your specification the center of your engineering process.

    1. Specify: You provide a high-level description of what you’re building and why, and the coding agent generates a detailed specification. This isn’t about technical stacks or app design—it’s about user journeys, experiences, and what success looks like. Who will use this? What problem does it solve? How will they interact with it?

    2. Plan: Now you get technical. You provide your desired stack, architecture, and constraints, and the coding agent generates a comprehensive technical plan. If your company standardizes on certain technologies, this is where you say so. If you’re integrating with legacy systems or have compliance requirements, all of that goes here. You can ask for multiple plan variations to compare approaches. If you make internal docs available, the agent can integrate your architectural patterns directly into the plan.

    3. Tasks: The coding agent takes the spec and plan and breaks them into actual work—small, reviewable chunks that each solve a specific piece of the puzzle. Each task should be something you can implement and test in isolation, almost like test-driven development for your AI agent. Instead of “build authentication,” you get concrete tasks like “create a user registration endpoint that validates email format.”

    4. Implement: Your coding agent tackles tasks one by one (or in parallel). Instead of reviewing thousand-line code dumps, you review focused changes that solve specific problems. The agent knows what to build (specification), how to build it (plan), and what to work on (task). Crucially, your role is to verify at each phase: Does the spec capture what you want? Does the plan account for constraints? Are there edge cases the AI missed? The process builds in checkpoints for you to critique, spot gaps, and course-correct before moving forward.

    This gated workflow prevents what Willison calls “house of cards code”: fragile AI outputs that collapse under scrutiny. Anthropic’s Skills system offers a similar pattern, letting you define reusable Markdown-based behaviors that agents invoke. By embedding your spec in these workflows, you ensure the agent can’t proceed until the spec is validated, and changes propagate automatically to task breakdowns and tests.

    Consider agents.md for specialized personas: For tools like GitHub Copilot, you can create agents.md files that define specialized agent personas—a @docs-agent for technical writing, a @test-agent for QA, a @security-agent for code review. Each file acts as a focused spec for that persona’s behavior, commands, and boundaries. This is particularly useful when you want different agents for different tasks rather than one general-purpose assistant.

    Design for agent experience (AX): Just as we design APIs for developer experience (DX), consider designing specs for “agent experience.” This means clean, parseable formats: OpenAPI schemas for any APIs the agent will consume, llms.txt files that summarize documentation for LLM consumption, and explicit type definitions. The Agentic AI Foundation (AAIF) is standardizing protocols like MCP (Model Context Protocol) for tool integration. Specs that follow these patterns are easier for agents to consume and act on reliably.

    PRD versus SRS mindset: It helps to borrow from established documentation practices. For AI agent specs, you’ll often blend these into one document (as illustrated above), but covering both angles serves you well. Writing it like a PRD ensures you include user-centric context (“the why behind each feature”) so the AI doesn’t optimize for the wrong thing. Expanding it like an SRS ensures you nail down the specifics the AI will need to actually generate correct code (like what database or API to use). Developers have found that this extra upfront effort pays off by drastically reducing miscommunications with the agent later.

    Make the spec a “living document”: Don’t write it and forget it. Update the spec as you and the agent make decisions or discover new info. If the AI had to change the data model or you decided to cut a feature, reflect that in the spec so it remains the ground truth. Think of it as version-controlled documentation. In spec-driven workflows, the spec drives implementation, tests, and task breakdowns, and you don’t move to coding until the spec is validated.

    3. Break Tasks into Modular Prompts and Context, Not One Big Prompt
    Divide and conquer: Give the AI one focused task at a time rather than a monolithic prompt with everything at once.

    Experienced AI engineers have learned that trying to stuff the entire project (all requirements, all code, all instructions) into a single prompt or agent message is a recipe for confusion. Not only do you risk hitting token limits; you also risk the model losing focus due to the “curse of instructions”—too many directives causing it to follow none of them well. The solution is to design your spec and workflow in a modular way, tackling one piece at a time and pulling in only the context needed for that piece.

    The curse of too much context/instructions: Research has confirmed what many devs anecdotally saw: as you pile on more instructions or data into the prompt, the model’s performance in adhering to each one drops significantly. One study dubbed this the “curse of instructions”, showing that even GPT-4 and Claude struggle when asked to satisfy many requirements simultaneously. In practical terms, if you present 10 bullet points of detailed rules, the AI might obey the first few and start overlooking others. The better strategy is iterative focus. Guidelines from industry suggest decomposing complex requirements into sequential, simple instructions as a best practice. Focus the AI on one subproblem at a time, get that done, then move on. This keeps the quality high and errors manageable.

    Divide the spec into phases or components: If your spec document is very long or covers a lot of ground, consider splitting it into parts (either physically separate files or clearly separate sections). For example, you might have a section for “backend API spec” and another for “frontend UI spec.” You don’t need to always feed the frontend spec to the AI when it’s working on the backend, and vice versa.

    Extended TOC/summaries for large specs: One clever technique is to have the agent build an extended table of contents with summaries for the spec. This is essentially a “spec summary” that condenses each section into a few key points or keywords, and references where details can be found. For example, if your full spec has a section on security requirements spanning 500 words, you might have the agent summarize it to: “Security: Use HTTPS, protect API keys, implement input validation (see full spec §4.2).” By creating a hierarchical summary in the planning phase, you get a bird’s-eye view that can stay in the prompt, while the fine details remain offloaded unless needed. This extended TOC acts as an index: The agent can consult it and say, “Aha, there’s a security section I should look at,” and you can then provide that section on demand. It’s similar to how a human developer skims an outline and then flips to the relevant page of a spec document when working on a specific part.

    To implement this, you can prompt the agent after writing the spec: “Summarize the spec above into a very concise outline with each section’s key points and a reference tag.” The result might be a list of sections with one or two sentence summaries. That summary can be kept in the system or assistant message to guide the agent’s focus without eating up too many tokens.

    Utilize subagents or “skills” for different spec parts: Another advanced approach is using multiple specialized agents (what Anthropic calls subagents or what you might call “skills”). Each subagent is configured for a specific area of expertise and given the portion of the spec relevant to that area.

    Parallel agents for throughput: Running multiple agents simultaneously is emerging as “the next big thing” for developer productivity. Rather than waiting for one agent to finish before starting another task, you can spin up parallel agents for non-overlapping work. Willison describes this as “embracing parallel coding agents” and notes it’s “surprisingly effective, if mentally exhausting.”

    Focus each prompt on one task/section: Even without fancy multi-agent setups, you can manually enforce modularity. For example, after the spec is written, your next move might be: “Step 1: Implement the database schema.” You feed the agent the database section of the spec only, plus any global constraints from the spec (like tech stack). The agent works on that. Then for Step 2, “Now implement the authentication feature”, you provide the auth section of the spec and maybe the relevant parts of the schema if needed. By refreshing the context for each major task, you ensure the model isn’t carrying a lot of stale or irrelevant information that could distract it. As one guide suggests: “Start fresh: begin new sessions to clear context when switching between major features.” You can always remind the agent of critical global rules (from the spec’s constraints section) each time, but don’t shove the entire spec in if it’s not all needed.

    Use in-line directives and code TODOs: Another modularity trick is to use your code or spec as an active part of the conversation. For instance, scaffold your code with // TODO comments that describe what needs to be done, and have the agent fill them one by one. Each TODO essentially acts as a mini-spec for a small task. This keeps the AI laser focused (“implement this specific function according to this spec snippet”), and you can iterate in a tight loop. It’s similar to giving the AI a checklist item to complete rather than the whole checklist at once.

    The bottom line: Small, focused context beats one giant prompt. This improves quality and keeps the AI from getting “overwhelmed”

    4. Build in Self-Checks, Constraints, and Human Expertise
    Make your spec not just a to-do list for the agent but also a guide for quality control—and don’t be afraid to inject your own expertise.

    A good spec for an AI agent anticipates where the AI might go wrong and sets up guardrails. It also takes advantage of what you know (domain knowledge, edge cases, “gotchas”) so the AI doesn’t operate in a vacuum. Think of the spec as both coach and referee for the AI: It should encourage the right approach and call out fouls.

    Use three-tier boundaries: GitHub’s analysis of 2,500+ agent files found that the most effective specs use a three-tier boundary system rather than a simple list of don’ts. This gives the agent clearer guidance on when to proceed, when to pause, and when to stop

    Always do: Actions the agent should take without asking. “Always run tests before commits.” “Always follow the naming conventions in the style guide.” “Always log errors to the monitoring service.”

    Ask first: Actions that require human approval. “Ask before modifying database schemas.” “Ask before adding new dependencies.” “Ask before changing CI/CD configuration.” This tier catches high-impact changes that might be fine but warrant a human check.

    Never do: Hard stops. “Never commit secrets or API keys.” “Never edit node_modules/ or vendor/.” “Never remove a failing test without explicit approval.” “Never commit secrets” was the single most common helpful constraint in the study.

    This three-tier approach is more nuanced than a flat list of rules. It acknowledges that some actions are always safe, some need oversight, and some are categorically off-limits. The agent can proceed confidently on “Always” items, flag “Ask first” items for review, and hard-stop on “Never” items.

    Encourage self-verification: One powerful pattern is to have the agent verify its work against the spec automatically. If your tooling allows, you can integrate checks like unit tests or linting that the AI can run after generating code. But even at the spec/prompt level, you can instruct the AI to double-check (e.g., “After implementing, compare the result with the spec and confirm all requirements are met. List any spec items that are not addressed.”). This pushes the LLM to reflect on its output relative to the spec, catching omissions. It’s a form of self-audit built into the process.

    For instance, you might append to a prompt: “(After writing the function, review the above requirements list and ensure each is satisfied, marking any missing ones).” The model will then (ideally) output the code followed by a short checklist indicating if it met each requirement. This reduces the chance it forgets something before you even run tests. It’s not foolproof, but it helps.

    LLM-as-a-Judge for subjective checks: For criteria that are hard to test automatically—code style, readability, adherence to architectural patterns—consider using “LLM-as-a-Judge.” This means having a second agent (or a separate prompt) review the first agent’s output against your spec’s quality guidelines. Anthropic and others have found this effective for subjective evaluation. You might prompt “Review this code for adherence to our style guide. Flag any violations.”

    Conformance testing: Willison advocates building conformance suites—language-independent tests (often YAML based) that any implementation must pass. These act as a contract: If you’re building an API, the conformance suite specifies expected inputs/outputs, and the agent’s code must satisfy all cases. This is more rigorous than ad hoc unit tests because it’s derived directly from the spec and can be reused across implementations. Include conformance criteria in your spec’s success section (e.g., “Must pass all cases in conformance/api-tests.yaml”).

    Leverage testing in the spec: If possible, incorporate a test plan or even actual tests in your spec and prompt flow. In traditional development, we use TDD or write test cases to clarify requirements—you can do the same with AI. For example, in the spec’s success criteria, you might say, “These sample inputs should produce these outputs…” or “The following unit tests should pass.” The agent can be prompted to run through those cases in its head or actually execute them if it has that capability. Willison noted that having a robust test suite is like giving the agents superpowers: They can validate and iterate quickly when tests fail. In an AI coding context, writing a bit of pseudocode for tests or expected outcomes in the spec can guide the agent’s implementation. Additionally, you can use a dedicated “test agent” in a subagent setup that takes the spec’s criteria and continuously verifies the “code agent’s” output.

    Bring your domain knowledge: Your spec should reflect insights that only an experienced developer or someone with context would know. For example, if you’re building an ecommerce agent and you know that “products” and “categories” have a many-to-many relationship, state that clearly. (Don’t assume the AI will infer it—it might not.) If a certain library is notoriously tricky, mention pitfalls to avoid. Essentially, pour your mentorship into the spec. The spec can contain advice like “If using library X, watch out for memory leak issue in version Y (apply workaround Z).” This level of detail is what turns an average AI output into a truly robust solution, because you’ve steered the AI away from common traps

    Minimalism for simple tasks: While we advocate thorough specs, part of expertise is knowing when to keep it simple. For relatively simple, isolated tasks, an overbearing spec can actually confuse more than help. If you’re asking the agent to do something straightforward (like “center a div on the page”), you might just say, “Make sure to keep the solution concise and do not add extraneous markup or styles.” No need for a full PRD there.

    Maintain the AI’s “persona” if needed: Sometimes, part of your spec is defining how the agent should behave or respond, especially if the agent interacts with users. For example, if building a customer support agent, your spec might include guidelines like “Use a friendly and professional tone” and “If you don’t know the answer, ask for clarification or offer to follow up rather than guessing.” These kinds of rules (often included in system prompts) help keep the AI’s outputs aligned with expectations. They are essentially spec items for AI behavior. Keep them consistent and remind the model of them if needed in long sessions. (LLMs can “drift” in style over time if not kept on a leash.)

    You remain the exec in the loop: The spec empowers the agent, but you remain the ultimate quality filter. If the agent produces something that technically meets the spec but doesn’t feel right, trust your judgement. Either refine the spec or directly adjust the output. The great thing about AI agents is they don’t get offended—if they deliver a design that’s off, you can say, “Actually, that’s not what I intended, let’s clarify the spec and redo it.” The spec is a living artifact in collaboration with the AI, not a one-time contract you can’t change.

    Simon Willison humorously likened working with AI agents to “a very weird form of management” and even “getting good results out of a coding agent feels uncomfortably close to managing a human intern.”

    Here’s the payoff: A good spec doesn’t just tell the AI what to build; it also helps it self-correct and stay within safe boundaries. By baking in verification steps, constraints, and your hard-earned knowledge, you drastically increase the odds that the agent’s output is correct on the first try (or at least much closer to correct). This reduces iterations and those “Why on Earth did it do that?” moments.

    5. Test, Iterate, and Evolve the Spec (and Use the Right Tools)
    Think of spec writing and agent building as an iterative loop: test early, gather feedback, refine the spec, and leverage tools to automate checks.

    The initial spec is not the end—it’s the beginning of a cycle. The best outcomes come when you continually verify the agent’s work against the spec and adjust accordingly. Also, modern AI devs use various tools to support this process (from CI pipelines to context management utilities).

    Continuous testing: Don’t wait until the end to see if the agent met the spec. After each major milestone or even each function, run tests or at least do quick manual checks. If something fails, update the spec or prompt before proceeding. For example, if the spec said, “Passwords must be hashed with bcrypt” and you see the agent’s code storing plain text, stop and correct it (and remind the spec or prompt about the rule). Automated tests shine here: If you provided tests (or write them as you go), let the agent run them. In many coding agent setups, you can have an agent run npm test or similar after finishing a task. The results (failures) can then feed back into the next prompt, effectively telling the agent “Your output didn’t meet spec on X, Y, Z—fix it.” This kind of agentic loop (code > test > fix > repeat) is extremely powerful and is how tools like Claude Code or Copilot Labs are evolving to handle larger tasks. Always define what “done” means (via tests or criteria) and check for it.

    Iterate on the spec itself: If you discover that the spec was incomplete or unclear (maybe the agent misunderstood something or you realized you missed a requirement), update the spec document.
    Then explicitly resync the agent with the new spec: “I have updated the spec as follows… Given the updated spec, adjust the plan or refactor the code accordingly.” This way the spec remains the single source of truth. It’s similar to how we handle changing requirements in normal dev, but in this case you’re also the product manager for your AI agent. Keep version history if possible (even just via commit messages or notes), so you know what changed and why.

    Utilize context management and memory tools: There’s a growing ecosystem of tools to help manage AI agent context and knowledge. For instance, retrieval-augmented generation (RAG) is a pattern where the agent can pull in relevant chunks of data from a knowledge base (like a vector database) on the fly. If your spec is huge, you could embed sections of it and let the agent retrieve the most relevant parts when needed, instead of always providing the whole thing.

    Parallelize carefully: Some developers run multiple agent instances in parallel on different tasks (as mentioned earlier with subagents). This can speed up development (e.g., one agent generates code while another simultaneously writes tests, or two features are built concurrently). If you go this route, ensure the tasks are truly independent or clearly separated to avoid conflicts. (The spec should note any dependencies.) For example, don’t have two agents writing to the same file at once.

    Version control and spec locks: Use Git or your version control of choice to track what the agent does. Good version control habits matter even more with AI assistance. Commit the spec file itself to the repo. This not only preserves history, but the agent can even use git diff or blame to understand changes. (LLMs are quite capable of reading diffs.) Some advanced agent setups let the agent query the VCS history to see when something was introduced—surprisingly, models can be “fiercely competent at Git.” By keeping your spec in the repo, you allow both you and the AI to track evolution.

    Cost and speed considerations: Working with large models and long contexts can be slow and expensive. A practical tip is to use model selection and batching smartly. Perhaps use a cheaper/faster model for initial drafts or repetitions, and reserve the most capable (and expensive) model for final outputs or complex reasoning. Some developers use GPT-4 or Claude for planning and critical steps, but offload simpler expansions or refactors to a local model or a smaller API model. If using multiple agents, maybe not all need to be top tier; a test-running agent or a linter agent could be a smaller model. Also consider throttling context size: Don’t feed 20K tokens if 5K will do. As we discussed, more tokens can mean diminishing returns.

    Monitor and log everything: In complex agent workflows, logging the agent’s actions and outputs is essential. Check the logs to see if the agent is deviating or encountering errors. Many frameworks provide trace logs or allow printing the agent’s chain of thought (especially if you prompt it to think step-by-step). Reviewing these logs can highlight where the spec or instructions might have been misinterpreted.

    Learn and improve: Finally, treat each project as a learning opportunity to refine your spec-writing skill. Maybe you’ll discover that a certain phrasing consistently confuses the AI, or that organizing spec sections in a certain way yields better adherence. Incorporate those lessons into the next spec. The field of AI agents is rapidly evolving, so new best practices (and tools) emerge constantly. Stay updated via blogs (like the ones by Simon Willison, Andrej Karpathy, etc.), and don’t hesitate to experiment.

    A spec for an AI agent isn’t “write once, done.” It’s part of a continuous cycle of instructing, verifying, and refining. The payoff for this diligence is substantial: By catching issues early and keeping the agent aligned, you avoid costly rewrites or failures later.

    Learn and improve: Finally, treat each project as a learning opportunity to refine your spec-writing skill. Maybe you’ll discover that a certain phrasing consistently confuses the AI, or that organizing spec sections in a certain way yields better adherence. Incorporate those lessons into the next spec. The field of AI agents is rapidly evolving, so new best practices (and tools) emerge constantly. Stay updated via blogs (like the ones by Simon Willison, Andrej Karpathy, etc.), and don’t hesitate to experiment.

    A spec for an AI agent isn’t “write once, done.” It’s part of a continuous cycle of instructing, verifying, and refining. The payoff for this diligence is substantial: By catching issues early and keeping the agent aligned, you avoid costly rewrites or failures later.

    Skipping human review: Willison has a personal rule—“I won’t commit code I couldn’t explain to someone else.” Just because the agent produced something that passes tests doesn’t mean it’s correct, secure, or maintainable. Always review critical code paths. The “house of cards” metaphor applies: AI-generated code can look solid but collapse under edge cases you didn’t test.

    Conflating vibe coding with production engineering: Rapid prototyping with AI (“vibe coding”) is great for exploration and throwaway projects. But shipping that code to production without rigorous specs, tests, and review is asking for trouble. I distinguish “vibe coding” from “AI-assisted engineering”—the latter requires the discipline this guide describes. Know which mode you’re in.

    Ignoring the “lethal trifecta”: Willison warns of three properties that make AI agents dangerous: speed (they work faster than you can review), nondeterminism (same input, different outputs), and cost (encouraging corner cutting on verification). Your spec and review process must account for all three. Don’t let speed outpace your ability to verify.

    Missing the six core areas: If your spec doesn’t cover commands, testing, project structure, code style, git workflow, and boundaries, you’re likely missing something the agent needs. Use the six-area checklist from section 2 as a sanity check before handing off to the agent.

    Conclusion
    Writing an effective spec for AI coding agents requires solid software engineering principles combined with adaptation to LLM quirks. Start with clarity of purpose and let the AI help expand the plan. Structure the spec like a serious design document, covering the six core areas and integrating it into your toolchain so it becomes an executable artifact, not just prose. Keep the agent’s focus tight by feeding it one piece of the puzzle at a time (and consider clever tactics like summary TOCs, subagents, or parallel orchestration to handle big specs). Anticipate pitfalls by including three-tier boundaries (always/ask first/never), self-checks, and conformance tests—essentially, teach the AI how to not fail. And treat the whole process as iterative: use tests and feedback to refine both the spec and the code continuously.

    Follow these guidelines and your AI agent will be far less likely to “break down” under large contexts or wander off into nonsense.

    Happy spec-writing!

    Reply
  17. Tomi Engdahl says:

    The last generation of coders
    By Duncan McLeod18 February 2026
    There is a website called RentaHuman.ai that exists to connect autonomous AI agents with human beings who can carry out physical tasks in the real world – things the agents cannot yet do themselves. Its tagline? “Clear briefs, no drama.”

    It sounds like science-fiction. It is, in fact, very real. More than half a million people have already signed up, hoping to sell their services to machines.

    https://techcentral.co.za/the-last-generation-of-coders/277829/

    Reply
  18. Tomi Engdahl says:

    Everyone uses open source, but patching still moves too slowly
    Enterprise security teams rely on open source across infrastructure, development pipelines, and production applications, even when they do not track it as a separate category of technology. Open source has become a default building block in many environments, and the operational risks now look like standard enterprise security problems: patch delays, version sprawl, and aging platforms that stay online longer than planned.
    https://www.helpnetsecurity.com/2026/02/18/open-source-adoption-patching-challenges/

    Reply
  19. Tomi Engdahl says:

    How I structure Claude Code projects (CLAUDE.md, Skills, MCP)
    I’ve been using Claude Code more seriously over the past months, and a few workflow shifts made a big difference for me.

    The first one was starting in plan mode instead of execution.

    When I write the goal clearly and let Claude break it into steps first, I catch gaps early. Reviewing the plan before running anything saves time. It feels slower for a minute, but the end result is cleaner and needs fewer edits.

    Another big improvement came from using a CLAUDE.md file properly.

    https://www.reddit.com/r/ClaudeAI/comments/1r66oo0/how_i_structure_claude_code_projects_claudemd/

    Reply
  20. Tomi Engdahl says:

    What is fbclid? A Complete Guide to Facebook Click Identifiers and Tracking Ad Performance
    This small URL parameter powers big improvements in Meta ad attribution
    https://www.northbeam.io/blog/what-is-fbclid-guide-to-facebook-click-identifiers

    Reply
  21. Tomi Engdahl says:

    OpenAI Launches Frontier, a Platform to Build, Deploy, and Manage AI Agents across the Enterprise
    https://www.infoq.com/news/2026/02/openai-frontier-agent-platform/

    Reply
  22. Tomi Engdahl says:

    Building a Least-Privilege AI Agent Gateway for Infrastructure Automation with MCP, OPA, and Ephemeral Runners
    https://www.infoq.com/articles/building-ai-agent-gateway-mcp/

    Reply
  23. Tomi Engdahl says:

    Composio Open Sources Agent Orchestrator to Help AI Developers Build Scalable Multi-Agent Workflows Beyond the Traditional ReAct Loops
    https://www.marktechpost.com/2026/02/23/composio-open-sources-agent-orchestrator-to-help-ai-developers-build-scalable-multi-agent-workflows-beyond-the-traditional-react-loops/

    For the past year, AI devs have relied on the ReAct (Reasoning + Acting) pattern—a simple loop where an LLM thinks, picks a tool, and executes. But as any software engineer who has tried to move these agents into production knows, simple loops are brittle. They hallucinate, they lose track of complex goals, and they struggle with ‘tool noise’ when faced with too many APIs.

    Composio team is moving the goalposts by open-sourcing Agent Orchestrator. This framework is designed to transition the industry from ‘Agentic Loops’ to ‘Agentic Workflows’—structured, stateful, and verifiable systems that treat AI agents more like reliable software modules and less like unpredictable chatbots.

    Reply
  24. Tomi Engdahl says:

    One engineer made a production SaaS product in an hour: here’s the governance system that made it possible
    https://venturebeat.com/orchestration/one-engineer-made-a-production-saas-product-in-an-hour-heres-the-governance

    Every engineering leader watching the agentic coding wave is eventually going to face the same question: if AI can generate production-quality code faster than any team, what does governance look like when the human isn’t writing the code anymore?

    Most teams don’t have a good answer yet. Treasure Data, a SoftBank-backed customer data platform serving more than 450 global brands, now has one, though they learned parts of it the hard way.

    Reply
  25. Tomi Engdahl says:

    The three-tier pipeline for AI code generation
    The first tier is an AI-based code reviewer also using Claude Code.

    The code reviewer sits at the pull request stage and runs a structured review checklist against every proposed merge, checking for architectural alignment, security compliance, proper error handling, test coverage and documentation quality. When all criteria are satisfied it can merge automatically. When they aren’t, it flags for human intervention.

    The fact that Treasure Data built the code reviewer in Claude Code is not incidental. It means the tool validating AI-generated code was itself AI-generated, a proof point that the workflow is self-reinforcing rather than dependent on a separate human-written quality layer.

    The second tier is a standard CI/CD pipeline running automated unit, integration and end-to-end tests, static analysis, linting and security checks against every change. The third is human review, required wherever automated systems flag risk or enterprise policy demands sign-off.

    The internal principle Treasure Data operates under: AI writes code, but AI does not ship code.

    https://venturebeat.com/orchestration/one-engineer-made-a-production-saas-product-in-an-hour-heres-the-governance

    Reply
  26. Tomi Engdahl says:

    The MCP Revolution and the Search for Stable AI Use Cases
    A conversation with AI researcher Sebastian Wallkötter reveals insights on standardization, security challenges, and the fundamental question facing enterprise artificial intelligence adoption.
    https://www.kdnuggets.com/the-mcp-revolution-and-the-search-for-stable-ai-use-cases

    Reply
  27. Tomi Engdahl says:

    Why Model Context Protocol is suddenly on every executive agenda
    https://www.cio.com/article/4136548/why-model-context-protocol-is-suddenly-on-every-executive-agenda.html

    As AI agents begin operating across enterprise systems, MCP is emerging as the connective layer IT leaders can’t afford to ignore.

    Technology leaders are used to watching new standards emerge quietly and then disappear into the plumbing of enterprise IT. But Model Context Protocol (MCP) is following a different trajectory. Over the past year, it has moved from an obscure technical concept into the center of conversations about agentic AI, governance, and security risk, and it’s a shift that reflects not hype, but the practical reality of how AI systems are beginning to interact with enterprise environments.

    During a Cyber Sessions interview I conducted last year with veteran security executive Andy Ellis, he predicted the inflection point before most executives had encountered the term.

    “I think MCP is going to be massive at RSA,” he said at the time. “Instead of having an API tightly defined between a client and a server, you put an LLM on either end and let them negotiate what to exchange. It will revolutionize software development – and it’s going to make it really scary.”

    Reply
  28. Tomi Engdahl says:

    AAO: Why assistive agent optimization is the next evolution of SEO
    https://searchengineland.com/aao-assistive-agent-optimization-469919

    Reply
  29. Tomi Engdahl says:

    Cross-Platform Frameworks in 2026: How JavaScript and Python Power Modern App Development
    A Practical Guide to Building Scalable, Maintainable Cross-Platform Applications with Today’s Most Versatile Stacks
    https://medium.com/codetodeploy/cross-platform-frameworks-in-2026-how-javascript-and-python-power-modern-app-development-5b0d8f80c3af

    Reply
  30. Tomi Engdahl says:

    The Secret Life of JavaScript: The Clone
    #
    javascript
    #
    coding
    #
    programming
    #
    softwaredevelopment
    How to use Web Workers to protect the Main Thread and prevent frozen UIs.
    https://dev.to/aaron_rose_0787cc8b4775a0/the-secret-life-of-javascript-the-clone-no9

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

*

*