Andrej Karpathy’s Weekend ‘Vibration Code’ Hack Quietly Sketches the Missing Layer of Enterprise AI Orchestration

this weekend, Andrzej Karpatyformer director of AI at Tesla and founding member of OpenAI, decided he desired to read the book. But he didn’t wish to read it alone. He desired to read it in the company of a committee of artificial intelligences, each of which presents its own perspective, critiques the others, and ultimately synthesizes a final answer under the guidance of the “Chairman”.

To make this occur, Karpathy wrote something he called “vibration code design” – a piece of software quickly written, mostly by AI assistants, intended for fun fairly than function. Posted the result to a repository called “LLM Council”, to GitHub with a sharp disclaimer: “I will not support it in any way… The code is now ephemeral and the libraries are gone.”

But for technical decision makers across the enterprise, looking beyond a simple caveat reveals something much more important than a weekend toy. In several hundred lines Python AND JavaScriptKarpathy sketched a reference architecture for the most critical, undefined layer of the modern software stack: orchestration middleware, sitting between enterprise applications and the volatile marketplace of AI models.

- Advertisement -

As companies finalize platform investments for 2026, LLM Council offers a simplified look at the “build vs. buy” reality of AI infrastructure. It shows that while the logic for routing and aggregating AI models is surprisingly simple, the operational packaging required to ensure enterprise readiness is where the real complexity lies.

How the LLM Council works: Four AI models debate, critique and synthesize responses

For the casual observer, LLM Council the web application looks almost identical to ChatGPT. The user enters a query in the chat window. But behind the scenes, the app runs a sophisticated, three-step workflow that mirrors how human decision-making works.

First, the system sends the user’s query to the boundary models panel. In the default Karpathy configuration, this also includes OpenAI GPT-5.1Google Gemini 3.0 ProAnthropic Claude Sonnet 4.5and xAI Grok 4. These models generate their initial responses in parallel.

In the second stage, the software performs a peer assessment. Each model receives anonymous responses from its counterparts and is asked to rate them based on accuracy and insight. This step transforms the AI ​​from a generator to a critic, enforcing a layer of quality control that is rare in standard chatbot interactions.

Finally, the designated “LLM Chair” – now configured as Google Gemini 3 – receives the original query, individual responses, and rankings from other users. It synthesizes this mass of context into a single, credible response to the user.

Karpathy noted that the results were often surprising. “Quite often, models are surprisingly willing to choose another LLM’s answer as better than their own,” he wrote on X (formerly Twitter). He described using the tool to read book chapters and noted that the models consistently praised GPT-5.1 as the most insightful and rated Claude the lowest. However, Karpathy’s own qualitative assessment diverged from his digital advice; found GPT-5.1 “too verbose” and preferred Gemini’s “condensed and processed” output.

FastAPI, OpenRouter and the justification for treating boundary models as interchangeable components

Value for CTOs and platform architects LLM Council lies not in his literary criticism, but in his construction. The repository serves as a foundational document showing exactly what a modern, minimal AI stack will look like in late 2025.

The application is built on a “thin” architecture. Backend uses FastAPImodern Python framework, while the frontend is standard React application built with Quick. Data storage is not done using a complex database, but in a simple way JSON files saved on local disk.

The basis of the entire operation is OpenRouteran API aggregator that normalizes differences between different model providers. By routing requests through this single broker, Karpathy avoided writing separate integration code for OpenAI, GoogleAND Anthropic. The app doesn’t know or care which company is providing the intelligence; it simply sends a prompt and waits for a response.

This design choice highlights a growing trend in enterprise architecture: the commoditization of the model layer. By treating boundary models as interchangeable components that can be swapped out by editing a single line in the configuration file – specifically the COUNCIL_MODELS list in the back-end code – the architecture protects the application from vendor lock-in. If the new model from Meta Or Mistral If he hits the top of the rankings next week, he can be added to the board in seconds.

What’s missing from prototype to production: authentication, redaction and compliance

Although basic logic LLM Council it’s elegant, but it also serves as a clear illustration of the gap between the “weekend hack” and the production system. For the enterprise platform team, cloning the Karpathy repository is just the first step in a marathon.

A technical code audit reveals missing “boring” infrastructure that commercial vendors sell at higher prices. The system lacks authentication; anyone with access to the web interface can query the models. There is no concept of user roles, which means a junior developer has the same access rights as the CIO.

Moreover, the management layer does not exist. In an enterprise environment, sending data simultaneously to four different third-party AI providers creates immediate compliance concerns. There is no mechanism for redacting personally identifiable information (PII) before it leaves the local network, and there is no audit trail for tracking who asked what.

Reliability is another open question. The system assumes OpenRouter API is always up to date and that the models will respond in a timely manner. It lacks circuit breakers, fallback strategies, and retry logic that keep mission-critical applications running in the event of a vendor failure.

These shortcomings are not flaws in Karpathy’s code – he has clearly stated that he has no intention of supporting or improving the project – but they define the value proposition for the commercial AI infrastructure market.

Companies like LangChain, AWS foundationand various AI gateway startups are essentially selling “augmentation” of the underlying logic that Karpathy demonstrated. They provide security, observability, and compliance that transform a raw orchestration script into a viable enterprise platform.

Why Karpathy believes that code is now “ephemeral” and traditional software libraries are obsolete

Perhaps the most provocative aspect of the design is the philosophy under which it was built. Karpathy described the development process as “99% vibration encoded”, meaning it relied heavily on AI assistants to generate the code, fairly than writing it line by line itself.

“The code is now ephemeral and the libraries are gone, ask your LLM to change it in any way you want,” he wrote in the repository documentation.

This statement represents a radical change in the possibilities of software engineering. Traditionally, firms build internal libraries and abstractions to administer complexity and maintain them over the years. Karpathy suggests a future in which code is treated as “handy scaffolding” — disposable, easy to rewrite by artificial intelligence, and perishable.

This poses a difficult strategic query for corporate decision-makers. If internal tools will be “vibration encoded“Does it make sense to purchase expensive, rigid software packages for internal workflows on the weekend? Or should platform teams enable their engineers to generate custom, one-off tools that meet their exact needs, at a fraction of the cost?”

When AI models evaluate AI: The dangerous gap between machine preferences and human needs

Beyond architecture, LLM Council the project inadvertently sheds light on a specific risk associated with automated AI deployment: the disparity between human and machine judgment.

Karpathy’s observation that his models preferred GPT-5.1 while he preferred Gemini suggests that the AI ​​models may have shared biases. They may prefer verbosity, specific formatting, or rhetorical certainty that doesn’t necessarily meet people’s business needs for conciseness and accuracy.

As businesses increasingly rely on “LLM-as-judge” for assessing the quality of customer-facing bots, this discrepancy matters. If an automated rater consistently rewards “long-winded and comprehensive” responses while human customers want concise solutions, metrics will show success while customer satisfaction plummets. Karpathy’s experiment suggests that relying solely on AI to guage AI is a strategy fraught with hidden alignment problems.

What can enterprise platform teams learn from a weekend of hacking before building their 2026 stack?

Ultimately, LLM Council acts as a Rorschach test for the AI ​​industry. For hobbyists, this is a great strategy to read books. For the supplier, this is a threat and proof that the basic functionality of its products will be recreated in a few hundred lines of code.

But for an enterprise technology leader, it is the reference architecture. It explains the orchestration layer, showing that the technical challenge is not in routing the prompts, but in managing the data.

As platform teams enter 2026, many people will likely be staring at Karpathy’s code to not implement it, but to grasp it. This proves that a multi-model strategy is not beyond technical reach. The query stays whether firms will build the management layer themselves or pay another person to wrap the “vibration code” in enterprise-grade armor.

Latest Posts

Advertisement

More from this stream

Recomended