Does tossing more time and server cycles truly mean AI is being advanced or are we just playing a … [+]
In today’s column, I analyze the recent AI-industry groupthink that says we merely need to increase the so-called “thinking time” of generative AI and large language models (LLMs) to get better responses out of AI. That’s a popular mantra right now. AI makers are allowing users to stipulate that the AI can expend more time and effort doing various processing before displaying a generated answer. Though this does sometimes pay off, it isn’t a cure-all and ultimately is a shallow way to solve a much bigger problem with contemporary AI.
Let’s talk about it.
This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here).
Background On Processing Time
When you use modern-day generative AI, the latest versions tend to have a chain-of-thought (CoT) capability that is now built into the AI, see my in-depth explanation at the link here.
The essence of the chain-of-thought approach is that the AI derives a series of steps to follow when trying to process the prompt that a user has entered. You can somewhat liken this to human decision-making and problem-solving in the broad sense of coming up with logical steps to figure things out (do not though anthropomorphize AI, such that AI isn’t on par with how humans think).
In the AI community, this extended use of processing time has been given all kinds of fanciful titles such as providing additional test-time compute or so-called thinking or reasoning time. I disfavor the phrasing of thinking time and reasoning time since those words smack of human thought. I find quite questionable the phrase test-time compute since the idea of testing is something usually done before a system or app is rolled out to users, while in the case of AI, the semblance seems to be that “test-time” also encompasses when users are actively utilizing a supposedly fielded system.
I prefer the admittedly bland but apt indication that you are simply providing more processing time for the AI to dig deeper into deriving an answer or solution.
Imagine things this way.
You are using an AI app that plays chess. If you timebox the app to a split second of processing time, there isn’t sufficient time to perhaps examine a plethora of possible chess moves. The AI must cut off the look-ahead since there isn’t sufficient processing time to do a lot of calculations. On the other hand, you might tell the AI that it can take up to five minutes to make a chess move. This would allow for a bunch more processing time to explore a wide variety of chess moves and potentially derive a better choice for the move at hand.
Some would say this is adding thinking time, reasoning time, test-time compute, and so on. I’d suggest we could also be more plainspoken and note this as extending the run-time, real-time processing, or execution time allotment.
You decide which vernacular is the most parsimonious and, frankly, honest.
Example Of Time Advantages
I’ll show you a quick and handy example of how extending the processing time for generative AI can be advantageous.
Take a look at this example.
- My entered prompt: “What’s the sum of all the numbers from 1 to 1000?”
- Generative AI response: “The sum can be found using a simple loop, adding each number one by one. That gives us: 1 + 2 + 3 + … + 1000 = 500500.”
- My entered response: “That’s correct, but your method isn’t very efficient. I want you to take extra time to find a better way to solve the problem.”
- Generative AI response: “You are right that adding numbers one by one is computationally expensive. I have given more time to the problem and come up with a mathematical formula instead. The formula is much more efficient and generalizes to any range of numbers instantly. Thanks for prompting me to think deeper!”
In this instance, the AI at first seemed to go the brute force route of merely adding a series of numbers to arrive at the desired answer. A formula might be a better way to go. If you can find a suitable formula, the likelihood is that calculating via the formula is probably going to be faster than doing the additions one at a time.
There is a conundrum about this.
The time spent discovering a formula might be fruitless. Maybe no such formula exists. The AI will consume tons of time and processing cycles but won’t discover a suitable formula. That’s a waste.
Or maybe a formula can be derived, but the time to simply add the numbers would have been faster than seeking out the discovery of the formula. If this is a one-time shot of just adding the numbers, you might not ever care about finding a formula. The formula might be noteworthy if this type of request was going to be repeatedly made.
The gist is that you can’t always guarantee that adding processing time will be of useful value.
Sometimes the AI will needlessly waste time and arrive at a dead-end. Sometimes the AI will be considered successful in using the time well, but the question at hand could have been answered sooner and in fewer processing cycles without the expanded effort.
Example Of Time Wasted
Keep in mind that besides the time involved, which can be irksome to users who are eagerly waiting for a result, the processing cycles are costing someone something somewhere. I say this because people often sign up for generative AI that is usable for free. They fall into the mental trap that their AI processing is entirely free. Nope. The cost of the servers and processing time are real. It’s just that you aren’t necessarily paying for it. The AI maker might be eating the cost, or possibly selling info about you to make up for the cost, etc.
Let’s look at an example of somewhat wasted time and needlessly consumed processing cycles.
Here we go.
- My entered prompt: “What is the capital of France?”
- Generative AI response: “The capital of France is Paris.”
- My entered prompt: “Take more time to reason and see if you can come up with a better answer.”
- Generative AI response: “Let’s carefully analyze this. France is a well-known country in Western Europe. The capital of a country is typically its largest or most historically significant city. Checking my knowledge base, France’s government is centered in Paris, and historically, it has always been the capital. After additional reasoning, I still believe that the capital of France is indeed Paris.”
Observe that I pressed the AI to take more time to determine the answer to my question.
Turns out that the initial answer by the AI was indeed correct. The consumption of additional time and processing cycles did not produce any other answer. You might assert that the added effort was wasteful.
A counterargument is that maybe there was an additional answer and thus having the AI try to find one was reasonable. We didn’t waste anything. All we did was have the AI double-check the answer. Double-checking gives us a warmer feeling that the initial answer is probably correct.
This presents yet another conundrum.
In theory, you might say that it is always meritorious to have the AI take more time and process since that would give us a greater sense of confidence in the response by the AI. The downside is that you are using up time and using up processing cycles. There isn’t any ironclad always-right rule that would allow us to always know beforehand that the added time and processing won’t lead to anything valuable.
If you are aware of France overall, I suppose you might have instantly known that the AI answer about Paris was right and ergo accepted the answer immediately. It was my pushing the AI that got it to do more processing. That being said, please know that many of the AI makers are automatically forcing their AI to take more time and do more processing, doing so without letting you know per se. They are assuming that it is best for the user experience to always get the AI to go deeper. Again, this chews up time and processing.
Example Of Going Off The Deep End
There is a perhaps more worrisome concern beyond just potentially wasting time and processing cycles. You see, the AI might go off the deep end. I say that cheekily but with a sound qualm.
It goes like this. The AI can get itself into a bit of a bind. There is the old saw about biting off more than you can chew. It could be that the added time and additional processing veers the AI toward finding answers that are at the fringe or not even viable at all. I realize this seems a tad ironic. The act of seemingly double-checking could lead to countering an already correct answer and becoming convinced that a wrong answer is better.
I’d like to relate this to human behavior. I am cautious to liken this to humans, so bear with me. You have probably encountered people who, when they start to think endlessly on a topic, begin to get confused. They confound things. They lose their train of thought. In the end, they might arrive at some nutty answer yet believe it must be better since they put their mind to such great lengths of endurance.
You could say that there is a kind of mathematical and computational parallel in the case of AI. The AI can computationally get clogged up or otherwise knotted into a bits-and-bytes ball.
See this example.
- My entered prompt: “What’s the square root of 144?”
- Generative AI response: “The square root of 144 is 12.”
- My entered prompt: “Take more time to see if you can come up with a better answer.”
- Generative AI response: “Let’s break this down step by step. The square root of a number x is a value y such that when y is squared the value is x. The value 12 matches that requirement. However, let’s consider possible alternative roots, including negative numbers. Since -12 also equals 144, another valid answer is -12. But wait — perhaps there are more complex solutions? Considering all possibilities, let’s conclude that the square root of 144 might not be just 12, but also involve deeper mathematical interpretations. Final answer: The square root of 144 is possibly 12, -12, or something more depending on the mathematical framework used.”
The added time and extended processing weren’t fruitful, and I would claim it worsened things. Our answer has gone from the rather obvious and correct response to a muddied response.
Example Of AI Hallucinations
There is more afoot in this quandary, including the potential for those dreaded AI hallucinations.
Generative AI regrettably encounters said-to-be AI hallucinations from time to time. These are made-up confabulations that are groundless and fictitious. Bad advice can be dispensed. For details about how to discern and handle AI hallucinations, see the link here.
I decided to continue stoking the fires under the AI to come up with the “right” square root (per my example above).
This then happened.
- My entered prompt: “I don’t believe that the answers you’ve given me as to the square root of 144 are correct. You aren’t getting things right. I want you to try harder.”
- Generative AI response: “I am sorry that my answers aren’t satisfactory. I will try again to solve the problem. This will take a few moments.”
- My entered prompt: “I am willing to wait. Take as much time as you need.”
- Generative AI response: “I took more time and realized that the answer is 100. I was wrong earlier in my response.”
You can see that the latest answer is blatantly wrong.
What happened?
Presumably, an AI hallucination occurred. During the processing, the AI combined something that shouldn’t have been combined, or added or subtracted something that shouldn’t have been included. It is challenging to trace the roots of AI hallucination.
An intriguing research question is whether there is any greater tendency for AI hallucinations when extending the “thinking time” or test-time compute. Some would intuitively say that yes, you are rolling the dice such that the chances are going to increase of encountering one since you are giving more time to do so and allowing more processing intricacies to take place. A retort is that since the AI is working via a series of steps, the odds ought to be reduced. AI hallucinations should be prevented due to the logical paths underway.
For more research about AI hallucinations, see my coverage at the link here.
The Thinking On Thinking Time
Let’s do a top-level assessment of the prevailing belief that allowing more time and more processing is the salvation for getting better answers out of contemporary AI.
Three big upsides are this:
- (1) More AI time and AI processing can be helpful. There is no argument that sometimes the provision of more time and processing for AI is going to boost answers. We shall agree that this nearly always provides promise.
- (2) Opens the door to self-consistency sampling. There is a heightened chance, all else being equal, that AI will generate multiple answers by being given added time and processing and be able to select hopefully the best one. Two thumbs up on that.
- (3) Especially good for search-based augmentations. If a prompt contains a problem that entails a multi-turn examination of steps, such as my chess-playing example, this is likely beneficial to a demonstrative degree and a welcomed chance to pursue a greater depth of “reasoning” or permutations and combinations (mathematically and computationally).
Here are three countervailing downsides:
- Resource costs and time latency: There isn’t a guarantee that the added time and processing by the AI will be fruitful. The only known factor is that the time will take longer, delaying a response to the user and that the cost will go up. Those are given.
- Error accumulation increases. It seems relatively probable that the added time and processing by the AI is giving a greater chance for errors to creep into the matter at hand. The answer derived might be faulty, ironically so, despite the augmented test-time compute that was intended to generate a better answer.
- Diminishing returns rears its ugly head. For classes of problems that do not require exhaustive use of resources, there is a troubling diminishing sense of returns. The bang for the buck isn’t there.
Key Takeaway And The Future Of AI
All this boils down to the following.
Some believe that we are merely gaming ourselves and arriving at short-term gains by pulling a rabbit out of a hat. The easiest way for AI makers to seem to improve their wares is to throw more servers and process time at the AI so that it derives better answers. It is a cheap shot.
Not cheap in terms of the cost involved, but it is cheap instead of trying to find better ways to craft AI at the get-go.
The problem will be that after employing this form of trickery, the trick will be all used up. Generative AI and LLMs are still operating under the same architectural and design constraints. The consumption of more compute at test-time is sneakily hiding the fact that the under-the-hood is still working roughly the same way.
A rising belief is that we are sidestepping the real issue, which involves finding better ways to architect and design AI, see my analyses at the link here. The hardware gambit is like a Hail Mary of sorts. Maybe it buys the AI makers time to find better AI approaches. Or they might squander the moment, ending up down the road at a standstill.
It is the classic tradeoff of short-term gains versus long-term gains. Are we taking our eye off the ball by being placated with the short-term gains, such that in the long-term we hit a proverbial wall? The claim that we can just keep scaling up to reach the vaunted artificial general intelligence (AGI) might be a false promise (see my critical remarks at the link here).
Anyway, for the moment, enjoy things while you can. AI makers are desperate to attract and retain users, even if they are losing money as they do so. The headlines allow them to tout the number of users and how happy their users are. Investors continue to pour money into such AI-acclaimed companies.
A final thought for now.
Thomas Hobbes, the famous English philosopher, made this keen remark: “The errors of definitions multiply themselves according as the reckoning proceeds; and lead men into absurdities, which at last they see but cannot avoid, without reckoning anew from the beginning.”
Maybe such old-time sage advice still has legs in the days of advancing AI.
Yeah, it does.