There isn’t really an analogy for the type of tool we are building with AI; not yet.
Some liken it to the invention of fire, or the creation of electricity, or the steam engine, or even the internet itself.
But it’s really so much more than that. Earlier innovations had a tendency to speed up the innovations after them. These tools were all accelerants for human civilization by removing some core drudgery that occupied our bodies and hands.
Fire begot steam, steam begot engines, engines begot electricity, electricity begot electronics, electronics begot modern computers.
The earlier innovations aided in removing physical drudgeries. The cooking of food meant less food poisoning. The engines pushed by steam meant being able to move immense amounts of goods over far distances.
And the creation of computers meant a new kind of burden was eased: The cognitive. But for the better part of the past century, it was only a very specific type of cognitive burden that has been increasingly eased.
It was things like the rote memorization and storage of knowledge, the calculation and counting of vast quantities of numbers, or the actual recording of memories and information through video or audio.
People didn’t really need to manually do a lot of these things anymore. And in a sense, they’d never be able to compete with a computer, if the software was coded correctly.
We had taught the microchips made of sand to count, and the implications of it were enormous, but nearly unimaginable at the time. Every app and website we know of today was built from the constraints of these chips, handcrafted from the logic that a programmer at each level of the stack had built.
Video editing apps, streaming services, data analytics, coding platforms, social media apps, communication tools, real-time document sharing, dating apps, real-estate apps, AirBnB, Uber, games — the list of tools built from this simple primitive of sand that can count seems endless, and it has totally transformed the nature of how we work, what we need to remember, and the way we think about the things in general.
However, one thing has remained consistent through the past 50 years of revolution in personal computing: A notion that computers could never really be creative.
It was a notion that what “computers do” is robotic, and what people do is special, and can’t be automated.
If a task was viewed as too simple or robotic, a coder often could build some app or tool to automate it, and the people using that app outcompeted everyone who didn’t use it. Soon, no one was doing that robotic task manually anymore.
This has been a good thing, because the end result has been that most people’s workflows have been streamlined to the point that they get to mainly focus on the creative process, rather than the hiccups of technicalities.
But what is going to happen now when we teach sand to think, and not just count?
The past 70 or so years of the counting transistor will likely look like a blip in comparison to the era of the thinking transistor.
The important question to ask isn’t really “what happens when we automate people?” It’s more: “What happens when we can automate individual cognitive processes?”
It’s often emphasized that modern neural networks don’t specifically copy the brain in the way they work. But I think they have shown that they are able to emulate a vast swath of various cognitive capabilities within a biological mind.
Neural networks today are able to:
Memorize and learn
Recognize objects, activities, people, in pictures and videos
Understand, transcribe and translate audio
Synthesize a voice clone by listening to voice recordings
Generate a 3D mapping of a place by taking an array of pictures
“Imagine” any scene, image, 3D design or video from a text description
Understand complex requests in natural language and respond with intricate reasoning or complicated code
Piece together new ideas from research and data
Read lip movements and translate them into text
Create deepfakes by translating one person’s facial movement’s onto another’s
Read someone’s brain scans and generate images of what they are mentally visualizing
Any one of these things was considered almost impossible 15 years ago. That fact that all of them are possible illustrates a clear pattern of small cognitive processes being increasingly automated. These are processes that many people are only semi-cognizant of when they’re actually performing them, and those who deny their current job could ever be automated probably just lack the self-awareness of what they’re actually doing mentally throughout their workday.
But this still doesn’t answer where this will go.
We’re at a point where people are attempting to play catch-up by thinking within the old paradigm, but I think this will only go so far.
For instance, many apps and services today are building plugins for ChatGPT to enhance its capabilities. This will seem promising and amazing for those apps for a little while, because people will build incredible workflows using them and ChatGPT to automate major parts of their own work.
But I don’t think anyone can build a sustaining business around this long-term. These apps are useful temporarily for enhancing LLMs like ChatGPT. There is a race to have the most useful plugin, and a misguided hope that building these apps for ChatGPT will increase usage and revenue. But LLMs probably will not need to rely on them for long.
From where I stand, it seems likely that the AI giants will quickly cannibalize any app or plugin that proves very useful. In fact, ChatGPT was itself one of these exact kinds of apps.
On top of that, every time these models improve, an entire wave of startups that were working on those new capabilities from the improved model get wiped out, and this cycle repeats.
For instance, a few months ago a crop of startups popped up that would let you upload your PDFs and “chat” with them. These were usually powered by ChatGPT, or by OpenAI’s embeddings API with a ChatGPT layer on top of them.
Now with ChatGPT plugins, there are third-party PDF chatting plugins built right in. A lot of those existing startups where the interface was their own website can expect to lose many customers.
By the next model improvement, I imagine these PDF upload and chat capabilities will become a native feature of the model, and even the third-party plugin providers will themselves be wiped out.
In effect, building a business around trying to augment the capabilities of the model will be a losing game for most people. The most useful capabilities will be absorbed, and probably a handful of not-interesting enough but useful-to-some independent apps can have some success on their own. The rest will perish.
The thinking behind these model-app developments are based on the counting microchip as the primitive.
We need to reframe our approach to the future based on what a primitive of the thinking chip looks like. What does a world look like where cognitive capability is virtually limitless? And what are some of the hard limitations?
Much of cognitive work today consists of fine details. It is planning projects, refining the requirements, communicating these things clearly enough to workers who are capable of implementation details, and then actually shipping the projects.
This applies across an incredibly wide spectrum of work; Software, Hardware, Filmmaking, and commercial productions, ad campaigns, and general product creation.
In the near future, cognitive work will transform to skip the difficulties of hiring workers and fulfilling the actual implementation. People will come to act as mini-CEOs, or directors of agents, with AI acting as a planning assistant and then taking the lead on actual implementation.
For instance, it’s relatively easy to imagine telling an AI “I want to build an app that is an Uber for Dogsleds”.
Before going ahead and implementing it, the AI would first ask you a series of questions on the implementation details.
“Who are the users?”
“Do we require a dogsled license verification process?”
“What are our standards for a clean dogsled?”
“What should the app do if the snow on the route melts?”
The list of questions would probably go on until you as the agent director or the AI itself have decided enough information has been provided to proceed confidently. Then the implementation could be begin, and if it meets your standards, it could also easily be deployed by the AI.
I think filmmakers will be in a similar boat very soon as well. They will write a script, and then have a cast of consistent AI generated characters that they can use to “star” in their movies. There’ll be wide usage of fake voices and fake characters, but the content of the film itself will be the deciding factor on whether or not a film succeeds.
In effect, the intermediate result of cognitive automation proliferation is that everyone gets to move higher up in the realm of thought they operate in. They sort of have to, but they also get to, and that’s a nice positive.
A scriptwriter may not have the resources to create a movie, but now they’ll get to act as the director, and video editor, and the special effects person.
Similarly, a businessperson may have the idea for an app, but will have little knowledge of coding or managing a software project. Suddenly, they’re able to manage a team of AI agents to fulfill their ideas down to the letter.
Basically, the key to the future is within figuring out how to get as much done with AI as possible. Build high-level flows, and learn to think at high levels — not only to learn how to direct the AI, but also so that you start thinking about ideas worth directing the AI to actually accomplish.
What we’ll start to see is that these tools eliminate the friction between an idea and an actual result. And I think kids today will be extremely adapted to this environment, because they’re full of ideas and imagination, but aren’t burdened down with thinking of logistics or the difficulties of the technicalities. They’ll think of a result they want to manifest, reach for an AI tool that can fulfill it, and make it happen without hesitation on their part.
What this probably means for us is that we should begin returning to a child-like state: Full of ideas, wonder, curiosity, big thinking, and a willingness to experiment.
The Hard Limitations
Let’s assume we solve the bottlenecks of chip manufacturing, and that the current array of AI capabilities become 100x faster with advancements in software in hardware, with additional improvements in capabilities of all models.
Someone could technically create infinite E-Book content. Infinite Podcast content. Infinite video content.
But people can’t consume infinite content or products. There are hard limits to what we can take in. We might get faster, sure. Maybe we’ll read thrice as fast, watch every video at 3x speed, and do the same with music and podcasts.
And even then, it wouldn’t be fully be possible for everyone to consume everything. Similarly, it isn’t really possible for someone to buy every single thing they see. Even if everyone had effectively infinite money due to deflationary pressures and a universal basic income, each person still only has so much space in their home to keep their things.
However, these generative AI tools will give people a lot of room to experiment and see what sticks. There won’t be room for every single person to be a content creator for everyone else, but a few people will be quite adept and passionate with it and have viral staying power. Many people may become content creators entirely for their own consumption.
I also suspect that the entire way we interact with media and content in general will change.
We can expect many people to turn to AI for interactive personal tutoring on non-fiction topics and knowledge. The non-fiction books will probably be read less, since a tutor often beats a book in the learning process — especially for complex technical knowledge.
In this process, non-fiction books and media will be consumed less, but the media landscape of what people consume will probably shift towards casual, leisurely genres and topics. Effectively, most passive consumption will simply be for enjoyment.
Entering the Arena
I mentioned to a friend recently that it’s necessary to enter the arena at this time, and he asked what the arena even is at the moment. I had to think hard about this, and it’s actually what prompted this post.
From what I can tell, the arena is in planning big ideas freely, without the very adult constraints of getting hung up on the logistical requirements and difficulties for actually implementing each of them.
Tim Ferriss has a prompt for himself: "What would this look like if it were easy?"
I think this is a useful question to ask yourself. If you could fulfill any idea easily, which ones would you focus on? Increasingly, those ideas are probably the ones worth doing.
Why should you start to focus on those ideas now, even though the AI tools available aren’t 100% of the way there for building them to completion? It’s because they will be there, and when they are, it will be good to have as many visions ready and waiting for them as you possibly can give them.
The other helpful way to be in the arena in this moment is to actually be out there, doing something in a field you care about. If you’re in an industry, you’re likely very aware of the problems on the frontiers of the field. If you’re working on any job or project right now, you’re actually operating in a goldmine of ideas, ripe for having many of its processes automated.
“But my job can’t be automated!”
Most intellectual jobs today involve solving a series of burdensome cognitive tasks. Many of these tasks require having a large backdrop of job-specific knowledge that a modern AI could always have access to. Most times, there are also a series of smaller cognitive tasks that are repeated like a process for handling and solving each task.
Realistically, as much as people may deny it, most of these smaller cognitive tasks could be automated. I think we’re already starting to see the early inklings of this with ChatGPT’s plugins feature.
In fact, OpenAI has cleverly tricked their paid users into training their AI to learn which plugins to use for a given prompt, because at the moment they require the user to select up to 3 plugins to use for a given chat.
It’s likely that the user will only pick plugins that are relevant for the prompts they’re about to ask, so they are effectively providing a lot of free training data to eventually allow ChatGPT to correctly associate any given prompt with the most useful plugin amongst a massive list of them and their capabilities.
I don’t think it has to stop there though. Businesses are highly incentivized to speed up their workers and automate the tasks that are slowing everything else down.
It seems possible that we will hit a point where workers for intense cognitive tasks will be able to wear a brain-scan helmet for a day while working. During that time, it will monitor all of the cognitive processes and facilities they’re using. At the same time, their activities on their computers could easily be recorded, and synced to the brain data down to the millisecond.
A model could then associate their cognitive processes to the most similar AI models, and build a worker model that could handle many job tasks in the same way.
So ultimately, it will be valuable to be thinking now about what you will be able to build and create, when many of the hard and slow parts requiring immense amounts of knowledge, research, and expertise are automated for you.
What if, as a chemist, you could generate a list of new compounds with specific behaviors and properties, and the formulaic process for synthesizing them?
Or as a hardware designer, generate a totally new sensor, with the chip design, onboard-code, and housing for the components?
Or, as a filmmaker, create any scene, with any actor, with any voice, at virtually no cost?
The possibilities are (about to be) almost limitless. The exponential trajectory is here, and we can expect that all work is about to be pushed to the frontiers of all fields as old problems and bottlenecks are solved and removed.
It’s best to get prepared.
—Gabe