My Bitter Lessons
January 7, 2026
In the summer of 2014, I had just finished CS 225 at UIUC. Convinced I would not find an internship, I chose to spend the summer as an exchange student at HKUST. The only technical course still open was an unpopular one: Machine Learning.
I was anxious.
Was I really going to "learn machines"?
I barely understood hardware. This felt like a mistake.
The online course description did not help. I understood almost nothing, but I went anyway. In the very first lecture, the instructor asked us to install a piece of software called Neuroph Studio, a crude Java application that looked more like a circuit diagram editor than anything I recognized as computer science. We were then sent to a plain-text webpage to download the MNIST dataset and train a neural network.
At that point, I nearly checked out. The UI already felt like electronics. Now biology too?
But then the training finished. The model actually recognized handwritten digits.
That moment stayed with me.
Up to that point, CS 225 had trained me to expect determinism. Data structures and algorithms are transparent. Every computation is explainable, every result traceable. You know why something works. Here, I pressed a button, and something opaque suddenly worked. I did not understand why. That was unsettling and fascinating.
Professor Kwok's course focused heavily on neural networks, to the point where I briefly believed that neural networks were machine learning. When I returned to UIUC, I started looking for ML-related work. One opportunity was at NCSA, working with researchers modeling agricultural yield using massive amounts of R code, formulas layered on formulas. I bought a copy of R for Machine Learning and tried random forests instead. I actually do not remember how well the model performed. At the time, machine learning was still a relatively niche field.
At CMU, I chose the NLP track, a two-year program with a research requirement. I had an internship offer to join Amazon's deep learning group, but program constraints made that impossible. Between 2016 and 2018, NLP was in a period of rapid transition, and I was deeply uncertain. I probably spent too much time thinking about "computational semantics", entities, large knowledge graphs, and the Chinese Room argument.
Meanwhile, many of my peers were unwavering believers in neural networks. Several of them later went on to found what are now tier-1 LLM companies. I admired them not just as builders, but as believers. They believed this approach would work, and they endured the long valley before the GPT moment. Before that, there had already been a chatbot wave. It mostly failed.
In the summer of 2018, I ran into a classmate on campus. He was working on language models for coding and was still figuring out how to scrape Stack Overflow for data. I remember thinking he was completely insane. We had not even solved natural language properly. How could code possibly work? Recently, he helped train Gemini.
Eventually, I found myself doing work I genuinely enjoyed, building machine learning and statistical models that emphasized interpretability. These models yielded insights. Their behavior could be measured. You could experiment, observe failure modes, iterate, or patch with rules when necessary. It felt grounded.
At the same time, I loved side projects. I gathered people to build tools, automate workflows, and write Slack bots. Coding was fun and social. In 2022, a colleague gave me early access to DALL-E. I tried it. The images were unimpressive. I created a red brick robot and used it as a Slack bot avatar. That was it. That bot still uses the same image today.
Then ChatGPT arrived.
I was stunned. I posted online that this would matter more than the internet. Yet through 2023 and most of 2024, AI was still framed as a chatbot, an assistant, or, if one wanted to sound ambitious, a copilot. Most applications treated AI as a feature: writing, summarization, translation. People enjoyed sharing screenshots of its failures.
In 2025, models began to reason. Agent-based development gained attention, although agents themselves had been studied much earlier. AI started writing code more seriously. Something shifted, but not as much as one might expect. People became more thoughtful about how to work with AI and how to design new experiences. Still, the mental model largely stayed the same.
The past year became the year of vibe coding, writing code alongside AI by feel.
Some people used AI very well.
Some resisted.
Some were still learning.
But most people did the same thing. We treated AI as a tool.
We piled things onto it: more context, more elaborate prompts, more rules, MCPs, constraints, workflows, templates. We tried to inject all our human cleverness into the system.
Looking back, my own approach to "AI writing code" was almost a catalog of anti-patterns.
The first mistake was asking AI to do things without clearly defining the goal, the success criteria, or how results would be evaluated.
The second mistake was defining the goal, then dumping a large pile of documents on it and telling it to "follow them."
The third mistake was interrupting constantly, changing inputs, switching directions, discarding partial solutions.
All three mistakes shared a single root cause. I refused to let go.
That changed over the New Year holiday.
I asked AI to do something I genuinely did not know how to do: GPU optimization. To set up agents, I read guides, skimmed forum posts, tried to absorb best practices, and burned tokens just to learn. Then I changed my approach.
I used files for planning.
I forced explicit reflection and logging.
I used multiple agents.
I defined the goal and the evaluation criteria.
I set up profiling.
Then I shut up.
What followed was an aha moment.
The agents failed.
They hit walls.
They reflected.
They documented mistakes.
They redesigned evaluations.
They searched for references.
They wrote tools for themselves.
They behaved like an actual engineering team.
In under an hour, the task was complete and, by the objective metric, correct.
And yet, I did not dare to use the result. I did not know what traps might exist. I did not know what I had failed to understand.
I deleted all the code.
I kept only two things: the lessons they extracted and the insights they produced.
Then I asked the AI, "Now, given what we learned, start from scratch, and teach me step by step."
Over the next hour, I watched agents build something even better than what might have taken me a year to build, and then teach it to me.
People often ask about AI's impact in terms of time saved. I cannot answer that. For me, the number is effectively infinite. I am not even doing the same things anymore.
We have not yet learned the more important lesson: the bitter lesson.
The term comes from Richard Sutton, a computer scientist and one of the founders of reinforcement learning. The idea is simple and uncomfortable. Over long time horizons, the methods that win are not those infused with human cleverness, handcrafted rules, or expert priors, but those that scale computation, data, and learning itself.
Again and again, history shows the same pattern. Human-designed heuristics work well in the short term, but general methods that learn and scale eventually dominate.
It is called bitter because it challenges our belief that human insight is indispensable. Sutton's conclusion, after decades of AI research, is blunt. Instead of encoding what we think we know, we should build systems that can learn, and let scale do the work.
During the New Year, vibe coding exploded. People discovered how powerful modern models could be when combined with agents and tricks. Yet many articles still framed humans as orchestrators, with agents as subordinates.
We still want to stand at the center of the stage.
We still want to believe AI is something we orchestrate, not something that acts.
A common feedback I hear is, "This still doesn't work, because the AI doesn't know A, B, or C."
The statement is often correct. In my own work, I feel this frustration regularly. AI makes dumb mistakes. Even when I can see that AI could do something, making it work often requires massive structural changes that are hard to justify on ROI grounds.
But this is an old-world problem.
The new world does not evolve by tweaking old assumptions. When the power loom appeared, people did not debate whether workers could weave faster. They redesigned the factory.
I have some predictions that may sound unreasonable, but I am increasingly confident in one thing. Future work will not be about production itself, but about designing, building, and supporting agentic systems that produce.
Agents and AI should be treated as foundational productivity units, not tools. At minimum, agents should be first-class participants, on par with humans. Organizations should plan assuming model capabilities six months ahead. This is not hyperbole. It is a survival issue.
After so many bitter lessons, I am willing to predict what comes next.
Chat interfaces will be marginalized.
Proactive agents will rise.
The reason is simple. Reactive agents scale with human attention. Proactive agents scale with compute. Attention is not what they need (pun intended).
A wave of AI-native companies will emerge. They will not do sprint planning. Their leveling systems and organizational structures will look unfamiliar.
Concretely, a product manager wakes up to ten product proposals that agents debated overnight, each with data, rationale, and production-ready code. The PM selects one and explains why. The feature ships.
A data analyst gets a phone call from an agent. Yesterday's repayment data looks slightly off. The agent has hypotheses but low confidence. What should it check next? The analyst suggests verifying the decision system. After lunch, the agent calls again. It found a bug, fixed it, and learned from it.
A machine learning engineer watches a dashboard showing hundred of experiments, offline and online, run by agents. Some read top-tier conference papers. Some implement. Some launch. Some do error analysis. Some engineer features.
As I write this, I still feel doubt. The workflows are complex. Systems are tangled. Technical debt is real.
Then I remind myself that we are not deploying agents into legacy work. We are building work around agents. This is closer than it seems. MIT is already exploring AI-optimized programming patterns. Software companies are designing agent identities. Programmers are rethinking how they write code. Code itself is compounding, and code is the foundation of productivity.
I hope this does not sound pessimistic. I find it exhilarating. If we embrace the shift, the upside is enormous.
I will end with a pseudo-philosophical note.
In my final year at CMU, a particular paper gained attention (Lilian has a great write-up on this), and my professor asked me to reproduce the experiments. The core insight, from the information bottleneck view of neural networks, is that training is fundamentally about forgetting irrelevant information while preserving what matters for prediction.
In that sense, I understand AI as accumulated intelligence, a compression of human knowledge expressed through text, images, video, and sound over centuries. It is not AGI yet. It is an extraordinarily efficient compressor, and during decompression, it produces useful intelligence.
Over the past year, countless builders have worked on making that decompression useful (reasoning, context engineering, etc.). For the firstime, intelligence is available and shared so effectively. I remain optimistic, as long as we continue to stay hungry, stay foolish.