The Curious Case of the Human Context Window
AI is in a weird spot right now.
It’s really good, and able to do a LOT. But frankly, it’s not quite perfect. Even the really good foundational models run up against clear and hard limitations.
In some cases, it’s obvious that those models can be fine tuned further for much greater performance. On other tasks, the data simply doesn’t exist yet, and may not for some time.
So where does that leave us, as humans? We are stuck in the middle between our tasks we need to do, and intelligent machines that can only understand bite-sized chunks of context at a time.
For instance, with GPT-4, the standard available context windows are roughly 4k tokens, 8k tokens, and 32k tokens. The approximate number of words in these cases are 3k words, 12,000 words, and 24,000 words. Most people do not have access to the 32k context window.
These numbers might seem like enough for most tasks, and frankly it is. But the reality is there’s a ton of larger-scale tasks that could also be completely done by these models, if only they had a larger context window.
As a result, we’re stuck thinking of bite-size tasks of our larger task, and feeding these smaller tasks in to the model while synthesizing its responses to our bigger job.
However, in these cases, we ourselves are not doing very human work aside from evaluating the quality of the model’s responses. The reality is that we are acting as a biological higher-order context window, plugging pieces together in a larger puzzle.
I don’t think this situation will hold for long, as it’s one of the top complaint amongst AI power users. But I suspect it will change soon.
For instance, let’s take the example of migrating a codebase to an entirely new language / framework. This is something that could take many months for a standard application or company.
AI generally makes this faster. For instance, you can go through your codebase file by file, and effectively migrate the entire file, so long as it isn’t more than a few hundred lines, and the model itself is strongly familiar with both languages.
There are other instances where you can migrate or restructure an entire database. But at the moment, you can only do this for very small schemas, and it’s vital that you can explain the importance of the data. Ideally, you’ll have already set up foreign key structures so that when you dump your schema, GPT can innately understand the relations.
However, we again hit this limit where context windows are an extreme limiting factor for how much can be done at once while the human acts as an auxiliary RAM module for the models..
There have been some attempts to work around this. For instance the gpt-migrate project leverages prompts with subprompts for the user to go into more and more specific levels of detail for what they want to happen during a migration. They also suggest using a GPT-4 with 32k context window (which, again, very few people have access to).
The migrate tool itself will make multiple context-window-sized calls to GPT-4 to fill out the entire migration. The issue is that even with 32k tokens, important details get lost across calls.
For instance, remembering how a library gets used, or what its correct method names are will get forgotten. Essentially, only a very large context window will be able to take everything into “consideration” and act appropriately.
There are likely methods to hack the existing windows; For instance - making them into agents and having them organize code file by file using additional API calls with entire context windows of their own (similar to us in the way we use these tools). But even this will have its limits in the highest-level context window.
Large Language Models like GPT aren’t the only place we see this happening. For instance, the workflow for creating AI videos recently involved generating an image from MidJourney, and then uploading it to RunwayML. Then that image would become animated into a 4-second clip. The human would extend this clip by taking the last frame of the 4-second clip, and then re-uploading it back into RunwayML to get 4 more seconds out of it.
This only recently became a built-in feature, but it’s the type of thing that is becoming ubiquitous: AI becoming a core aspect of an entire workflow, while humans have to engineer their way around its limitations.
This moment is a little awkward for people – we’re in a phase just after we’ve figured out that AI can be used for so many difficult cognitive tasks, but just before the phase where we have wide access to models with an ability to remember and consider everything at or beyond a human level. Essentially, with GPT-4 we have an incredible intelligence that can only work in very short sprints.
It’s also worth remembering that humans adapt incredibly quickly. We call it the “hedonic treadmill”. We get used to incredibly luxurious things very quickly, to a point it no longer amazes us or shocks us, and we start to expect more or get bored. One could almost say we were built for this exact moment of exponential technological progress.
But very soon, I’m sure many will have access to models with 128k+ token context windows, with intelligences even greater than GPT-4.
At that point, we’ll really be off to the races with task automation and the sheer pace of scientific progress.