“Attention Is All You Need” changed the trajectory of the “AI” industry.[1] It’s a paper title, of course, but it also unintentionally became a kind of mantra, one that told an entire generation of engineers that a single mechanism could unlock general intelligence at scale. I think about it often, not because of the transformer architecture itself, but because of the meta-lesson hiding in plain sight: the name tells you where to look, and that act of looking is the thing that actually matters.
So here’s my revision: perspective is all you need.
That’s not a claim about what AI is or where it’s going. It’s a claim about how to survive the noise of building with it. I’ve been an engineer for six years, starting from an internship at a fintech, and the single most useful thing I’ve developed in that time isn’t a model or a pipeline, it’s the ability to decide what to ignore.
There’s an easy way to chart my own evolution, and it maps neatly to three books I read across five years:
- 2021 — ML Design Patterns. Got it as a gift. Read it cover to cover.
- 2024 — Architecting Data & ML Platforms.[2] Picked up at a GenAI at Scale event in London, finished that same year.
- 2025–26 — AI Engineering (Chip Huyen).[3] Bought in 2025, finished last week.
These three books are, roughly, snapshots of where the industry was at each point, but more honestly, snapshots of where I was. In 2021 I was deep in patterns for serving, training, and maintaining traditional ML systems, feature stores, data pipelines, and the kind of engineering rigour where the model was one component in a much larger system. The industry was still firmly in “build the model, serve the model, monitor the model” territory, and so was I.
By 2024 the conversation had shifted. I picked up Architecting Data and Machine Learning Platforms at a GenAI conference, which tells you something about the moment, even the data platform crowd were pivoting their framing toward generative use cases. That book grounded me in something I’ve cared about since my comp sci dissertation in 2019: what does it actually look like to train a model on streaming data? The section on streaming architectures for ML crystallised an idea I’d been circling for years, that the most interesting problems in ML infrastructure are temporal, not just volumetric. Not how much data you have, but how fresh it is, and whether your system can learn from it in motion.
And then Huyen’s AI Engineering. The shift this book captures is real: we are no longer in the business of training models from scratch for every problem. The model-as-a-service approach means engineering now starts with product, not data collection. Prompt engineering, RAG, fine-tuning, evaluation, agents, the stack has changed, and the skills that matter have changed with it. Reading it in 2026 felt like having the last two years of my professional work explained back to me in cleaner language than I’d have managed myself.
What AI means now
Let me be direct about something that I think gets muddled constantly in casual conversation: AI, as most people now use the term, means large language models. That’s the public understanding. But if your job title has “ML” or “AI” in it, you know the field is considerably wider and considerably older than that. Traditional ML systems, recommendation engines, fraud detection, demand forecasting, classification pipelines, these aren’t legacy. They’re production. They run at scale, they have well-understood failure modes, and they solve problems that LLMs are either poorly suited for or absurdly expensive to throw at.
The casual understanding of AI and the practitioner’s understanding of AI are now two different conversations, and the gap between them is where a lot of bad decisions get made. I’ve seen business use cases reach for an LLM when what they needed was a gradient-boosted tree and a decent feature pipeline. The tooling hype makes that mistake easy to make.
There’s also the question of scale. Not “scale” in the VC pitch sense, but the operational reality of running AI systems in production, the monitoring, the latency budgets, the cost per inference, the drift. My own trajectory: three years of ML and NLP at scale, two years of LLMs at scale, six years of engineering total. The through-line is that the engineering discipline doesn’t really change. The model changes. The interface changes. The evaluation gets harder. But like a friend of mine Samson Nwizugbe says, it’s all input, output, and the in-between.
On MCPs, and the discourse
While writing this, I saw a tweet from @levelsio: “Thank god MCP is dead.” His argument, MCPs are unnecessary abstraction, AI can just use APIs directly.[4] In the same thread, @garrytan, CEO of Y Combinator chimed in; context window bloat, flaky auth, vibe-coded a CLI wrapper in thirty minutes that worked better.[5]
And you know, perspective. I haven’t built an MCP server yet in my free time. But have I got agents set up? Yes. Have they been useful? Also yes. “MCP is dead” versus “MCP is the future” is exactly the kind of binary that collapses when you ask: dead for whom? In what context? At what scale?
Perplexity’s CTO reportedly said they’re moving away from MCPs internally in favour of APIs and CLIs. That’s a reasonable architectural choice for a team at that scale with that stack. Meanwhile, Uber is apparently running an internal MCP gateway because at their scale, standardised connectors with auth, telemetry, and guardrails make more sense than a pile of bespoke glue. Both of these can be true at the same time.
Prioritise your use case, not your tooling.
I’m grateful to be in a role that asks “this needs this, so let’s use this” rather than “let’s use this to do this.” That distinction matters more than any protocol debate on Twitter. Skip what doesn’t matter. Filter the noise as you would. What’s your perspective?
On LLMs being a dead end
Richard Sutton, father of reinforcement learning, 2024 Turing Award winner, author of The Bitter Lesson, went on the Dwarkesh Podcast and made a case that LLMs represent a dead end for general intelligence.[6] His argument is that LLMs are fundamentally imitation systems: they model what people say about the world, not the world itself. They can’t learn on the job. They have no goal that changes anything external. And he expects that systems capable of continuous experiential learning will eventually make the current paradigm obsolete.
It’s a strong position from someone who has earned the right to hold it. And it sits alongside a paper from 2021, “Big Data, Scarce Attention and Decision-Making Quality”, that frames a different but related tension: more information doesn’t automatically improve decisions when attention is the bottleneck.[7] The irony of an era defined by an attention mechanism built on top of more data than any human could process is not lost on me.
Whether LLMs are a dead end or a foundation depends on whether you’re asking from a research perspective or an engineering one. From where I sit, building products, shipping features, LLMs are extremely alive and extremely useful. But I don’t confuse usefulness with completeness. The map is not the territory, and the current paradigm is not the destination.
The through-line
In a bit of vanity, I’ll admit what’s driven most of my career decisions: I wanted to work on things that are hard and logical and technical. I started as a backend engineer. Moved into ML and data science. And eventually the desire to be closer to the “action” pushed me into a more forward-facing engineering role where I build AI products and put them in the hands of business teams and engineers. That sweet spot, where the technical depth meets the product surface, is where I’ve ended up, and it’s where I want to be.
My perspective now isn’t fundamentally different from what it was in 2019 when I was writing my dissertation about training models on streaming data. It’s just been sharpened by five more years of building things that had to work. The books changed. The models changed. The conversations on Twitter changed weekly. But the core question has always been the same: given what I know and what I’ve built, what actually matters here?
Don’t take my word for it. Just my perspective.
References
- Vaswani, A., et al. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30, 2017.
- Grover, M., Malaska, T., & Seidman, J. Architecting Data and Machine Learning Platforms. O’Reilly Media, 2023. See streaming for ML section.
- Huyen, C. AI Engineering: Building Applications with Foundation Models. O’Reilly Media, 2025.
- @levelsio, X post on MCP, March 11, 2026.
- @garrytan, X post on MCP, March 11, 2026.
- Sutton, R. Interview on Dwarkesh Podcast: “Father of RL thinks LLMs are a dead end.” Spotify, September 2025.
- Yu, T. & Chen, S.H. “Big Data, Scarce Attention and Decision-Making Quality.” Computational Economics 57, 827–856, 2021.