Build the System, Don’t Just Give Orders
Lessons from building an AI-powered Mandarin Chinese tutor
Disclaimer: The views expressed in this piece are solely my own and do not represent those of my employer. I wrote this piece about technology and industry dynamics for educational purposes only, not to promote any specific products or services.
I’m not convinced that the capabilities of models will be the binding constraint for individual output in future. I believe it will be how good a person is at encoding their intent in an agent-run system.
I, like most other AI news junkies, at one point bought the hype that software would be a complete commodity as AI coding agents improve. If anyone can bark out instructions, their increasingly intelligent army of agents will figure out how to follow them.
Building my own personalized Mandarin Tutor however taught me what should have been obvious from the framing above - an army is only as useful as the system of organizing and directing it toward a clearly specified goal, no matter how smart their individual members are.
Armies are not made when a general plucks 100 random people off the street and yells orders. Similarly, the outstanding effort of highly intelligent and conscientious actors can be wasted when given unclear directions (just see the Charge of the Light Brigade). Effectively utilizing a swarm of independent actors requires a union of clear, reality-grounded direction with the guardrails placed by millennia of tacit institutional knowledge
My goal with this piece is to crystalize what I have learned so far about how to get a growing swarm of AI agents to carry out your intent, and why I believe doing so requires consciously practicing 3 things:
Specifying your intent clearly
Making your intent unavoidable
Verifying, removing yourself from the process, then giving autonomy
Context - What did I build and why?
I have been learning Mandarin Chinese (中文) as a hobby since mid-2020 because I want to understand Chinese literature and culture on its own terms (my goal is to be able to read Chinese history books unassisted and have a conversation about them with a native speaker).
Over time I gradually developed a homegrown set of AI tools to accelerate my learning. I built my own Rust-based MCP server to connect Claude Desktop to my Anki decks, I used deep research queries to parse the linguistics literature to design activities that resolved my current learning bottlenecks, and built Skills to generate novel daily exercises.
These tools were fragmented however and I was constantly needing to re-explain myself across threads, so I decided to build a single source of truth for all my learning context and activity generation.
I built this single platform completely AI native - from coming up with the design brief to final implementation, and the two months of nights and weekends it took to get to my current steady state was the primary foundation of my learnings here.
Principle 1 - Specify your intent (firstly and most importantly, to yourself):
The first and most important principle for giving clear orders is knowing precisely what you want.
A model cannot divine your intent from a history of unclear asks and a tangled codebase, nor can it prepare your system to anticipate a future you never articulated.
I learned the hard way that a lack of personal clarity and willingness to engage with details can lead to days or weeks of untangling, even on a relatively simple project. Prior to my Mandarin tutor project I gave my AI a ~one page prompt and asked it to build a GPU benchmarking tool based only on this input (which took ~one hour) and deliberately tried to prompt engineer my way to an answer without engaging with the code.
It took three weeks to get it working.
While some of this came down to model capabilities, reflecting on it I caused most of the problems with my lack of intention. For example, in my initial one page prompt I had not considered enabling a configuration (response streaming), and at one point I blindly accepted a Claude Code change where the agent left a comment to leave response streaming disabled because it was not needed. When I later asked the agent to add a metric that required response streaming, it read the comments left by the previous agent and decided to implement an approximation instead of measuring the metric I wanted directly.
I only discovered this fact accidentally over a week later, after I saw that the metric data was unusually uniform. This was only one of the more interesting bugs I unearthed during those three weeks of debugging.
Taking this lesson to heart for my Mandarin Tutor I doubled down on precise specification during this project to great effect. I built not just a high level brief but detailed design specs, mockups, explanations of my long term vision behind each of the components. Doing this I spent far less time throwing good prompts after bad.
One of the best practices I found for making my intent consistently clear enough to avoid these prompt-loop bogs was forcing myself to write rather than speak. While voice mode has become a popular way of instructing agents, I found it led me to worse outcomes than I achieved taking time to write.
The process of writing is the process of realising that you don’t know what you are talking about, and often the best part of that is realising mid-scribble that what you needed is completely different from what you thought in your head (or maybe was not needed at all!)
While voice mode lets you more quickly spill your unprocessed thoughts directly into the context window, the lack of refinement makes it too forgiving of sloppy commands that don’t set your agents up to give you what you want.
Principle 2 - Make your intent unavoidable:
Anyone who has led an organization of people, whether an army or a business, will tell you that you need to repeat yourself an uncomfortable amount to get your point across. Whether speeches, posters, or casual hallway conversations - you need to be deliberate in embedding your intent so deeply in the organization that it is completely unavoidable.
While we don’t need to literally repeat ourselves to AI agents (as I did in prior projects constantly copy-pasting my desires from chat session to chat session), the principle of unavoidability still holds.
For this project, I found it most effective to tackle this problem from three main angles.
First, make all project context easily accessible. I deliberately intervened early in the code repo layout to ensure upfront documentation (project brief, design documents, etc.) as well as ongoing project documentation (API, database, etc.) were co-located and easily findable. I also gave the agent access to key project data not kept in its workspace (production application logs, GitHub, etc.).
Second, encode your intent in automated tests. People often overlook that you can easily verify the wrong things in a verifiable domain. Having tests that accurately reflect your intent as clearly as your documentation does makes it substantially easier for
your agents to implement your desires.
Finally, embed your processes in a repeatable mechanism. Don’t try to remember off the top of your head how to instruct a model to do something properly - ask the model to encode it in a Skill (for me) to ensure it always knows how to do what you mean.
Principle 3 - Verify, remove yourself from the process, then give autonomy:
As previously mentioned, my track record with asking a model to do something without engaging with the details is far from glorious. In the early stages of a project especially, you must verify before you trust.
In my Mandarin project, I did this by being deeply engaged in the details of things I knew would be hard to change later.
Early mistakes in database design, codebase layout and API design tend to metastasize as a project grows in complexity, so I put most of my effort here upfront.
Once the model had a solid foundation to build on, my reviews became a bottleneck without adding much value. So I moved my focus onto other areas like manually testing features before merging.
Once I was confident the app was looking how I wanted and my reviews became the bottleneck again, I started leveraging Claude Desktop’s preview tool to let the model self-verify its changes before merging.
After two months of iterating on the process I have now delegated end to end deployment to the agent. If I spot a bug or something I want changed, I write an ask to Claude,
I trust it to work independently for minutes to hours without my intervention and deploy to production.
Conclusion:
I don’t want people to read some article about a non-engineer building things in an afternoon that would have taken a team of experienced software devs months to build and conclude that they can stop developing themselves and just ride the exponential.
This whole process taught me what you want does not come out of thoughtlessness, it requires deliberate effort to truly understand and articulate what you want and how it should be built.
Getting AI systems to make beautiful, useful, and meaningful things we really want still requires us to invest ourselves in that process, and I don’t believe that is going away.


