If We Had Built World Models First, LLMs Would Look Different Today

I like alternate-history questions, because they force you to see the path you did not take. I also like them because they make people stop arguing for a minute and actually think.

Here is my favourite one right now. What would have happened if AI had first been built from a world-model viewpoint, and only later moved toward language-model technology?

I think we would be living in a noticeably different place, and I think some things would be much better than they are today.

I am not saying LLMs are bad. I am saying the sequence matters, and I think we got the sequence backwards.

What I mean by "world-model first"

A world model is the idea that the system is learning reality, not just learning words. It is learning how the world behaves over time, and it is learning cause and effect, and it is learning what happens next.

It learns those things by observing, predicting, being wrong, and correcting itself.

A world-model-first path would have pushed AI into the direction of simulation earlier, and simulation is where the real magic shows up.

What would have been different in the real world

If we had built world models earlier, I think our simulators would already feel like something from science fiction. I do not mean prettier graphics. I mean better behaviour and better realism.

I keep thinking about flight simulators and driving simulators, because those are places where a world model shines.

A flight simulator would not just render a world. It would behave like a world, because the model would be constantly predicting the next state and correcting itself.

The cockpit would feel "alive" in the sense that tiny changes in control inputs would carry through the system in a realistic way, instead of feeling like a scripted response.
Training would be dramatically better, because the model would be learning the dynamics and the edge cases, not just replaying an animation.
The feedback you feel in controls would be more meaningful, because the system would know what you were trying to do and how the aircraft is responding, instead of only translating your joystick movement into a pre-defined curve.

I also think consumer VR would have evolved differently, because the headset would not just be a screen on your face. It would be part of a loop between you and a model that is trying to keep the world consistent.

Why we got chat-first instead

We did not get world models first for a simple reason. Language was easier to scale, and the data was already lying around in piles.

We had the internet, we had books, we had forums, and we had decades of text. We also had cheap compute relative to what it costs to instrument the physical world.

So we built systems that predict the next token, because that was the fastest path to something that looked impressive and could ship.

That decision was rational, and it still produced useful things.

It also created a kind of confusion, because it made people think that talking is the same thing as understanding.

LeCun, the "LLMs are dead" quote, and what I think he actually means

Yann LeCun has been saying for a while that LLMs alone do not get you to the kind of intelligence humans have. He keeps pointing back toward world models, prediction, planning, and learning how the world behaves.

That position gets turned into a slogan online, because the internet loves slogans.

So people keep repeating some version of "LeCun says LLMs are dead."

I am skeptical that he literally said it in that clean, clickbait form, and I also do not think he means it as a funeral announcement even if he did say something sharp.

I think he is using a battle cry to make a big point. I think he is trying to steer the field away from "only scale the language model and call it intelligence."

I also like LeCun, and I think he has been unfairly treated by people who want a simple villain or a simple winner.

Why I do not buy the "world models will kill LLMs" talk

I do not think world models kill LLMs. I think world models evolve LLMs.

I think the future is a hybrid whether anyone likes it or not.

The language model becomes the interface layer, because humans live in language and we coordinate through language.
The world model becomes the reality layer, because planning and prediction depend on modelling consequences in a stable way.
The product becomes the combination, because humans want both understanding and communication in one place.

I also think we should stop pretending that humans think like tensors. Humans do not think like tensors. Humans think in stories, and plans, and promises, and fears, and contracts, and jokes, and regret.

Language is not optional.

My prediction in one sentence

My prediction is that world models and LLMs will get together and have a baby, and that baby will be the thing we end up calling "real AI" in five or ten years.

That baby will not replace language. It will make language less brittle, because it will be grounded in a model of reality.

Where I want to end this piece

I want to end this first article with a hook, because I think we are about to walk into the next missing piece.

I think there is a third leg to this stool, and it is not world models and it is not LLMs.

Next time, I want to take you into the third leg of the milking stool we call AI, and that third leg is linguistics.