AGI is an Expanding Concentric Circle
There's a reason Leonardo da Vinci excelled at both art and engineering
It’s hard to look at all the AI capabilities developments of the past few years and not see a clear pattern of where we are and where we’re heading.
The two major patterns are:
We are constantly creating models for more modalities
Models become more intelligent and more generally capable as they become more closely linked to their modalities
The conventional wisdom of the past has been “build a custom model for each task”.
For instance, if you want a model that recognizes handwriting, you build a model that does that. If you want something that can identify a bird from a picture, you’d build a model for that.
But in recent years, that has flipped. The idea has spread that for many AI cases, the answer is to use GPT-4 or similarly large models and rely on their general capabilities.
An example here is that Bloomberg spent what was likely hundreds of thousands of dollars to train their BloombergGPT model – a fine tuned LLM for finance trained on Bloomberg’s extensive data. Yet recent testing showed the original GPT-4 actually beats out BloombergGPT on nearly all metrics.
On this path, a further discovery has been figured out: Additional modalities cross-pollinate the capabilities of the model. The announcement of GPT-4 Omni has made this clear.
Omni has native image understanding, audio input, and speech output trained into the model itself. In past models such as GPT-4 Vision, separate Image, Audio transcription, and text-to-speech models would all handle conversions to and from text to provide inputs to the model and speak aloud its outputs.
The multimodality here seemed to provide such an improvement across each modality, and general reasoning improvements to the model itself, that OpenAI was able to successfully shrink the model down to approximately half the size of GPT-4 Turbo, and achieve quality roughly on par with it, including serving the new modalities.
At the same time, at a recent Microsoft event, it was stated: “we are nowhere near the point of diminishing marginal returns on how powerful we can make AI models as we increase the scale of compute”.
In other words, the models themselves are showing no signs of slowing down in intelligence capabilities as they are being scaled up in size.
With this, it seems we actually have a clear and simple path forward toward AGI:
Keep adding additional modalities (3D design, video, music, physics intuition etc)
Keep expanding the model size.
Improve data quality and refine data sources.
With each added modality, we can expect cross-pollinated improvements across the other modalities. At the same time, each modality is a multiplier on capabilities across the board, with huge impacts on all forms of tasks across industries.
In other ways, this actually makes sense that AGI might resemble jack-of-all-trades generalists. If you think of some of the smartest people you know, or famous “Renaissance Men” throughout history, they were and are often very talented across a wide range of domains. Intelligence in one domain has a way of being applicable to others.
In essence, human intelligence is the byproduct of nearly 4 billion years of carefully refined evolution, and is by far the most general intelligence we know of. It would make sense that the final implementation of AGI would trend toward the same pattern for intelligence that we have within our species.