An academic study found that professional developers using AI agents don’t vibe. They control. After more than 300 Claude Code sessions building a full-stack app in a language I’d never used, I agree. But the control that matters isn’t what most people think.

I wanted to find the limits. Not of Claude Code on a toy script, but on a complex project that might exceed both my skills and the AI’s. So I picked a stack I’d never used, Rust with Leptos and Axum, and started building North, a GTD task manager. Multiple views, a filtering DSL, drag-and-drop, recurring tasks, keyboard navigation. The kind of project where you can’t fake understanding.

The first couple of evenings were fast. Claude Code generated features at a pace that felt unreasonable. Projects, filters, task trees, a decent UI. Then I looked at the code. Not at Rust specifics, I didn’t know those yet, but at the shape of it. Server functions defined inside page files with raw SQL and inline struct definitions. Identical TaskRow structs and mapping code copy-pasted across three pages. A 599-line TaskCard component containing display logic, editing, a dropdown menu, and a 150-line date/time picker. Everything coupled to everything.

I didn’t need Rust expertise to see the problem. Bad separation of concerns looks the same in every language.

The Vibe-Coding Cliff

Andrej Karpathy coined the term for throwaway weekend projects: “fully give in to the vibes, embrace exponentials, and forget that the code even exists.” Collins Dictionary made it Word of the Year for 2025. Then Karpathy himself hand-coded his next project, saying AI agents were “net unhelpful.”

The pattern is predictable. AI coding tools get you to a working prototype fast, then progress stalls. CodeBots calls this the vibe-coding cliff: the point where system complexity exceeds what an LLM can maintain through intent alone.

AI generates architecture whether you ask for it or not. Every line of code embeds structural decisions. vFunction put it well: “AI agents don’t just generate code; they generate architecture by default.” Without guidance, those decisions default to the path of least resistance. Business logic in route handlers. Flat file structures. Happy-path implementations.

Addy Osmani identified the core math in the 70% problem: the first 70% comes fast, but the last 30%, edge cases, maintainability, error handling, is where architecture lives. “Software quality was (perhaps) never primarily limited by coding speed.”

The Mess, Concretely

Here’s what Claude Code produced by default. The inbox page defined its own server function with raw SQL, its own row struct, and its own mapping logic:

// Original inbox.rs — server function, SQL, struct, mapping, and UI in one file
#[server(GetInboxTasks, "/api")]
pub async fn get_inbox_tasks() -> Result<Vec<TaskWithMeta>, ServerFnError> {
    let pool = expect_context::<sqlx::PgPool>();
    let user_id = crate::server_fns::auth::get_auth_user_id().await?;

    #[derive(sqlx::FromRow)]
    struct TaskRow {
        id: i64, project_id: Option<i64>, title: String,
        // ... 15 more fields
    }

    let rows = sqlx::query_as::<_, TaskRow>(
        "SELECT t.id, t.project_id, t.parent_id, ... \
         (SELECT count(*) FROM tasks s WHERE s.parent_id = t.id) as subtask_count, \
         (SELECT json_agg(tg.name) FROM task_tags tt \
          JOIN tags tg ON tg.id = tt.tag_id WHERE tt.task_id = t.id) as tags \
         FROM tasks t WHERE t.project_id IS NULL ..."
    ).bind(user_id).fetch_all(&pool).await?;
    // ... 30 more lines of manual field mapping
}

The “Today” page had its own copy. The “Project” page had another. Thirty-plus raw SQL queries with json_agg, ROW_NUMBER() OVER, and correlated subqueries for what should be simple CRUD.

This is the old PHP joke, but unironic. SQL in the view layer. We spent fifteen years building ORMs, service layers, and repository patterns specifically to stop doing this. Claude Code, left to its defaults, went straight back to 2008.

My reaction: “Why the fuck do I need raw SQL in a view instead of an usual CRUD API? Maybe there are some issues in the code design?”

What I Did: Defining Layers

I didn’t know Rust or Leptos. But I’d spent years building React apps with MobX and Python services with Django. The patterns transferred.

I defined seven layers with a strict dependency graph:

  app / ui
     ↓
  stores
     ↓
  repositories
     ↓
  server-fns
     ↓
  core
     ↓
  db
     ↓
  dto

Each layer talks only to its immediate neighbor. Skip a layer and you break the contract.

Server Side: Core, Server Functions, Diesel

The first fix was migrating from raw SQL to Diesel ORM. That alone forced structure: typed schema, model structs, compile-time query checking. Business logic moved out of SQL and into Rust.

Then came the service layer. Core services own all business logic. Creating a task means parsing #tags and @project from the title, computing a fractional sort key, resolving URLs in the background:

// core/src/task_service.rs — business logic lives here
pub async fn create(pool: &DbPool, user_id: i64, input: &CreateTask) -> ServiceResult<Task> {
    let parsed = crate::filter::text_parser::parse_tokens(&input.title);

    let sort_key = if let Some(ref sk) = input.sort_key {
        sk.clone()
    } else {
        let last_key: Option<String> = tasks::table
            .filter(tasks::parent_id.eq(input.parent_id))
            .filter(tasks::user_id.eq(user_id))
            .order(tasks::sort_key.desc())
            .select(tasks::sort_key)
            .first(&mut conn).await.optional()?;
        north_dto::sort_key_after(last_key.as_deref())
    };

    let row = diesel::insert_into(tasks::table)
        .values(&NewTask { title, sort_key, /* ... */ })
        .returning(TaskRow::as_returning())
        .get_result(&mut conn).await?;

    // Tag extraction and background URL resolution happen here too
    Ok(task)
}

Server functions became thin CRUD wrappers. Extract auth, delegate to core, return the result. Three lines of real logic:

// server-fns/src/tasks.rs — thin RPC boundary
#[server(ApiCreateTaskFn, "/api")]
pub async fn create_task(input: CreateTask) -> Result<Task, ServerFnError> {
    let pool = expect_context::<north_core::DbPool>();
    let user_id = crate::auth::get_auth_user_id().await?;
    north_core::TaskService::create(&pool, user_id, &input)
        .await
        .map_err(|e| ServerFnError::new(e.to_string()))
}

Getting Claude to keep server functions this thin was its own fight. It kept generating domain-specific methods like move_task, get_recently_reviewed, complete_and_cascade. I kept pushing back: “Repository should provide CRUD, not domain methods.” The domain logic belongs in core. Server functions are transport.

Client Side: Stores, Repositories, UI

Repositories are the client-side facade. They wrap server function calls and convert DTOs to domain models:

// repositories/src/task_repo.rs — thin async facade
pub struct TaskRepository;

impl TaskRepository {
    pub async fn create(input: CreateTask) -> Result<TaskModel, ServerFnError> {
        notify_on_error(
            north_server_fns::tasks::create_task(input)
                .await
                .map(TaskModel::from),
        )
    }

    pub async fn update(id: i64, input: UpdateTask) -> Result<TaskModel, ServerFnError> {
        notify_on_error(
            north_server_fns::tasks::update_task(id, input)
                .await
                .map(TaskModel::from),
        )
    }
}

Stores hold reactive state and implement optimistic updates. The UI updates immediately while the server syncs in the background:

// stores/src/task_store.rs — optimistic update pattern
pub fn toggle_complete(&self, id: i64, was_completed: bool) {
    let store = *self;
    let now = Utc::now();

    // Optimistic: update UI immediately
    store.update_in_place(id, |t| {
        t.completed_at = if was_completed { None } else { Some(now) };
    });

    // Async: sync with server
    spawn_local(async move {
        let result = if was_completed {
            TaskRepository::uncomplete(id).await
        } else {
            TaskRepository::complete(id).await
        };
        if result.is_ok() {
            store.refetch_async().await;
        }
    });
}

I got this pattern from MobX. Tell Claude “I want a reactive store like MobX, with optimistic updates and debounced server sync” and it understood the concept, even in a framework it’s less fluent in. I even pointed it to my MobX patterns notes file from a previous React project.

For the UI, I borrowed the container/controller/view split from React:

  • View: pure rendering. Takes data via props and callbacks. No store access.
  • Controller: orchestrates stores, derives reactive state, handles actions.
  • Container: wires controller to view. Thin glue.

The Hard Part: Getting AI to Follow It

Defining architecture is half the battle. Getting Claude Code to respect it is the other half.

First attempts failed. I’d describe the layer rules, Claude would acknowledge them, then generate a component that called a server function directly. Or a view that imported a store. The training data pull was stronger than my instructions.

The turning point: I stopped explaining and started showing. I opened a fresh Claude Code session and micromanaged the inbox page rewrite step by step. Every file, every layer, correcting Claude on each deviation until the result matched what I wanted. Not just the page component. The full tree, every container, controller, view, and store. At the same time, I redesigned the backend: forced CRUD server functions instead of multiple RPC calls, added the repository layer, extracted business logic into core services.

pages/inbox/
├── container.rs          # Wires controller → view
├── controller.rs         # Derives filtered task lists, manages visibility toggles
└── view.rs               # Renders header, action buttons, task list

containers/
├── traversable_task_list/ # Keyboard nav + drag-drop + inline create
│   ├── container.rs
│   ├── controller.rs
│   └── view.rs
├── task_list_item/        # Single task row with actions
│   ├── container.rs
│   ├── controller.rs
│   └── view.rs
├── task_checkbox/         # Optimistic complete/uncomplete
├── task_meta/             # Due dates, tags, recurrence display
├── project_picker/        # Inline project assignment
└── tag_picker/            # Inline tag management

One page. Dozens of files. Each doing one thing.

Once enough correctly-structured code existed, Claude started pattern-matching from the codebase instead of its training data. That’s the key insight: LLMs follow the path of least resistance. If your codebase is the architecture, AI will reproduce it. If your codebase is a mess, AI will faithfully reproduce that too.

After the patterns stabilized, I asked Claude to extract the architectural rules from the working code and add them to CLAUDE.md. Then I moved the frontend conventions into a dedicated Claude Code skill, about 400 lines of codified patterns that activate automatically when editing Leptos files. The CLAUDE.md went from 501 lines to 138, with detailed reference material extracted into a separate architecture doc loaded on demand.

The CLAUDE.md Maintenance Problem

After a few development cycles, something started going wrong again. Claude was following rules that no longer matched the codebase.

The architecture had evolved. Some patterns changed, some layers got refined. But CLAUDE.md still described the old version. Claude dutifully followed the stale instructions, producing code that conflicted with the actual project structure.

The documentation was poisoned.

This is easy to miss. When Claude writes something slightly off, your first instinct is that the model is being stubborn. Sometimes it is. But sometimes it’s doing exactly what you told it to do, and what you told it is outdated.

Architectural docs for AI need the same maintenance discipline as code. Review them after major refactors. Delete rules that no longer apply. CLAUDE.md treated as write-once becomes a source of drift.

Architecture Knowledge as the Real Skill

Simon Willison drew the sharpest line: “If an LLM wrote the code for you, and you then reviewed it, tested it thoroughly and made sure you could explain how it works to someone else, that’s not vibe coding, it’s software development.”

What made North work wasn’t knowing Rust. It was knowing what questions to ask. Where should this logic live? What layer owns this state? Should this component know about the data source or just receive props? These are system design questions. They transfer across every stack.

I described the container/controller/view pattern to Claude in plain English, pointing to my old React/MobX projects as references. I didn’t need Leptos expertise to know that a view component shouldn’t fetch its own data. That’s a principle, not a language feature.

Can someone without engineering experience build complex systems this way? Prototypes, yes. The 70% comes fast and it looks impressive. But systems that survive contact with real usage and change over time? That requires real experience building software systems, or a lot of curiosity and time to learn.

What Worked

After 223 commits and 321 Claude Code sessions across 20 days, North has nine Rust crates, container/controller/view pages, reactive stores with optimistic updates, and comprehensive E2E tests. Built in a language I’m still learning.

The skill that mattered wasn’t writing Rust. It was knowing how systems should be structured, and being willing to fight for that structure when AI takes the path of least resistance.

If you’re building anything past a prototype:

  1. Define your layers before you start generating code. Write the dependency graph. Decide who talks to whom.
  2. Write one reference implementation by hand (or by heavily-guided AI). One page, fully structured, all layers, all conventions.
  3. Codify the patterns. Put them in CLAUDE.md or skills. Make the architecture machine-readable.
  4. Maintain the docs. Review after every major refactor. Stale rules produce stale code.

Professional software developers don’t vibe. They control. The AI will follow your architecture. You just have to build it first.