Generative AI Gets Shaken Up By Newly Announced Text-Producing Diffusion LLMs


In today’s column, I explore the exciting news that an alternative method to generative AI and large language models (LLMs) appears to be gaining interest and potentially provides some distinct advantages to conventional approaches. Here’s the deal in a nutshell. The usual path to devising generative AI consists of what is known as autoregressive LLMs, while the promising new avenue is referred to as diffusion LLMs (dLLMs).

Yes, indeed, dLLMs just might be a winner-winner chicken dinner. I will share with you how prevailing generative AI works and then introduce the diffusion approach. We don’t know yet that diffusion is going to overtake autoregression for sure, but there is a darned good chance that diffusion will certainly shake things up.

Let’s talk about it.

This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

How Things Usually Work

I’ll start us off with a brief indication of how generative AI customarily functions.

The conventional approach to devising generative AI and LLMs involves having the AI produce a word-by-word response to whatever prompt you enter. The parlance for this technique is that it is considered autoregressive. An autoregressive algorithm essentially tries to predict what word ought to occur next in a sequence of being-composed sentences. This activity takes place when the AI is generating a response to your prompt.

When I mention the prediction of words, please know that inside the AI, there is a numeric representation of words. Those numeric indications are referred to as tokens.

The detailed process works this way. You enter a prompt. Those words are converted into tokens (numeric values). The AI uses the tokens to figure out what other tokens should be generated, doing so on a token-at-a-time basis. The step-by-step generated tokens are ultimately converted back into words when the answer or response is displayed.

You can often see this happening in the sense that some of the generative AI apps will display the generated response on a word-at-a-time basis. It is almost as though someone or something is typing out the response, one word at a time. I’m not saying that this is always the case. There are other factors in play, such as the speed of your network access, the speed of the AI, etc.

For more about the nitty-gritty of AI processing techniques, tokenization, and other facets that occur within conventional generative AI, see my discussion at the link here.

Diffusion Being Used For Images And Video

Now that we’ve got autoregression on the table, keep that in the back of your mind since I want to bring up a different topic that will shortly tie into the autoregressive aspects. Hang in there.

Have you ever used AI that generates an image or a video?

I’m sure you have, or at least you’ve seen or heard about this capability. The customary approach to having AI generate an image or a video is using a technique known as diffusion.

I liken diffusion to how a sculptor works. It goes like this. A sculptor starts with a block of marble and carves away the bits and pieces that will help shape the marble toward the end goal in mind. They are removing whatever doesn’t belong. If the sculptor is making the shape of a person, they remove marble so that what is left will have the figure of a human.

You can starkly contrast the work of a sculptor to that of a painter. A painter starts with a blank canvas. They add paint to the canvas. Step by step, they are creating on canvas the image that they want to portray. Mindfully note how the two types of artisans differ. A sculptor is taking away bits and pieces, while a painter is adding bits and pieces.

Conventional generative AI acts like a painter. Words or tokens are assembled one at a time until the targeted response is fully crafted. You might say that words are being added to a blank canvas.

Diffusion works differently; it works more akin to how a sculptor does their endeavors.

How Diffusion Deals With Images And Video

I’m betting that you are curious about the mechanics of diffusion. I certainly hope so since that’s what we will get into next.

Suppose I want AI to generate an image of a cat. I first need to data-train the AI on what a cat looks like. Once the AI has been data-trained about what cats look like, you can tell the AI to produce an image or a video showcasing a cat.

Get yourself mentally ready for the inside secrets of how this is done. Find a quiet place to read this and grab a glass of fine wine.

Here’s what we can do to data-train AI on what cats look like. First, we find an existing picture or rendering of a cat. Next, we fog up that picture or rendering by putting a bunch of noise into the image. The cat is less recognizable now that we have clouded it with the static or noise.

The AI is fed the original clean version of the picture along with the second version that has the noise. At this juncture, the AI is supposed to figure out how to best remove the noise so that the clean version can be arrived at. The AI takes away from the clouded image the aspects that don’t belong there. It is denoising the static-filled version.

Voila, if this is done well, the AI arrives back at the pristine version of the cat.

The interesting twist or part that I said would be a mind-bender is this. When you enter a prompt and ask diffusion-based AI to produce an image that looks like a cat, the AI first starts with a static-filled frame. It’s all static. The AI then removes as much static as necessary until the frame ends up showcasing a cat.

Say what?

Most people assume that AI starts with a blank canvas and tries to draw a cat. That would be how a painter would work. But the alternative method is to act like a sculptor. Start with a block of marble, or in this case a frame that’s utterly filled with static. Have the AI remove bits and pieces of static until what remains is the image of a cat.

That is how diffusion works in diffusion-based AI and is often noted as a coarse-to-fine principled approach.

Boom, drop the mic.

Diffusion On Text Is Revealing

So far, I’ve mentioned that AI diffusion involves first data-training AI on how to remove static or noise until a desired image is attained. Once we’ve done that, we can use the AI to conjure up new images by feeding a static-filled frame, and the AI will carve out or remove the static until the desired image is reached.

Believe it or not, the same approach can be applied to generating text.

How so?

Imagine that I enter a prompt that asks AI to tell me about Abraham Lincoln.

The conventional generative AI would generate a response by assembling words one at a time. The words being chosen are based on having previously scanned essays, stories, and the like about the life of Abraham Lincoln during the initial data-training of the AI. Patterns of those stories are stored within the AI. The AI taps into those patterns to produce a handy-dandy response about Honest Abe.

That’s how the conventional autoregressive approach would take place.

Here’s how diffusion LLM works.

Just like the above, we will data-train the AI on essays, stories, and the like about the life of Abraham Lincoln. There is a twist. We do so by not only scanning that content, but we take the content and add static or noise to it. The text looks quite garbled if you see it with your naked eye. Numerous letters of the alphabet have been shoved in here and there, and the words look jumbled.

The diffusion takes the noisy version and tries to remove the static and get back to the original version. I trust that this seems familiar – it’s pretty much the same as what we did with the cat image.

Subsequently, when someone asks the diffusion LLM to share something about the life of Abraham Lincoln, we feed the AI with a bunch of seemingly garbled text. It looks like pure nonsense to the human eye. The diffusion LLM removes the noise and transforms the block of garbled text into a sensible rendition about Abraham Lincoln.

Nice.

Illustrative Example To Chew On

Allow me to provide a quick example that might solidify the two approaches, namely, comparing the conventional autoregressive approach versus the diffusion approach to LLMs and generative AI.

I will pose a question for AI that is one of my all-time favorite questions because it’s a question that my children used to ask me when they were very young. The question is: “Why is the sky blue?” Yep, that’s a classic question that I’m guessing most parents inevitably get from their curious-minded youngsters. It’s a beauty.

With any kind of generative AI, regardless of being autoregressive versus diffusion, the prompt and response might look like this:

  • My entered prompt: “Why is the sky blue?”
  • Generative AI response: “The sky is blue because sunlight scatters off air molecules and blue light scatters the most.”

I’d like to unveil the internal mechanics of the AI to show you how that answer was generated. I am going to simplify the mechanics for the sake of brevity. Any of you trolls out there that are chagrined at the simplification, consider reading the details that I’ve covered in prior columns such as at the link here and the link here, thanks. Technically, a diffusion approach entails a latent variable model that uses a fixed Markov chain over a considered latent space (see my discussion at the link here).

If the generative AI is based on an autoregression approach, it would convert the prompt into a series of tokens that represent the words “why”, “is”, “the”, “sky”, “blue”. Those tokens would be fed into the autoregression mechanism. Based on a pattern matching of prior scanned content, the AI would assemble a response, doing so a token at a time, and then convert those tokens into displayed words.

The tokens or words being generated would be like this: “The”, “sky”, “is”, “blue”, “because”, “sunlight”, “scatters”, “off”, “air”, “molecules”, “and”, “blue”, “light”, “scatters”, “the”, “most.” You can think of this as the AI painting a response by applying brush strokes, one at a time, to a blank canvas.

Diffusion Handles The Example

Next, we consider how a diffusion LLM would handle this query.

Assume that we have already done the data training for the diffusion LLM. At various junctures, the data training included content covering various aspects of why the sky is blue. That content was noised up by adding static, and the diffusion algorithm sought to turn it back into pristine text. Patterns were identified in how to do so.

Shift into usage of this diffusion LLM.

We are given a prompt that asks about why the sky is blue. This prompt is used as a seed to produce a bunch of garbled text. The text looks unintelligible to the human eye. Diffusion LLM will take that seemingly non-sensical text and remove the static and noise until a final result is generated.

It might go like this:

  • Initial seeded noisy text: “skbl isz blu soshie rdackis flousy bof nofair soleish pur sang otto movei angok dorf sulu blsk”
  • First pass: “Sky is blue soshie rdackis light flousy air molecules pur and blue light movei angok the most.”
  • Second pass: “Sky is blue because rdackis light scatters off air molecules pur and blue light scatters angok the most.”
  • Final pass: “The sky is blue because sunlight scatters off air molecules, and blue light scatters the most.”

You can see that the seeded noisy text didn’t look like much of an answer. The first pass turned some of the garbled text into something more usable. The second pass went further. The final pass got us to the final result.

The diffusion LLM carved away or removed static and noise until arriving at a final generated answer.

Diffusion Can Be Fast

My example indicated that the diffusion LLM made several passes at taking out the noise. You might design and build diffusion in that manner. Another way to do so would be to have the diffusion happen essentially all at once. No need to do a series of passes.

Just provide the seeded garbled text and have the diffusion do the whole kit-and-kaboodle all at once, like this:

  • Initial seeded noisy text: “skbl isz blu soshie rdackis flousy bof nofair soleish pur sang otto movei angok dorf sulu blsk”
  • Generated result: “The sky is blue because sunlight scatters off air molecules, and blue light scatters the most.”

It is pretty easy for diffusion to do things this way (referred to as one-and-done). The processing can happen in parallel and doesn’t have to proceed on a serial basis.

That’s one of the benefits of diffusion versus autoregression. It is a lot harder to parallelize the autoregression. You generally are going to have autoregression generating each word, one word at a time. I’m not saying that it can’t be sped up, and I’m only saying that it somewhat goes against the normal grain.

Benefits Of Diffusion LLMs

There are various handy benefits from a diffusion LLM approach.

I already noted that the generated response can readily happen in parallel and thus be quite speedy. The response time to you, the user, will likely be faster. It is almost as though your response magically appears all at once, in a flash, rather than on a word-for-word processing basis.

Proponents of diffusion LLMs contend that another benefit is that coherence across large portions of text is more likely than with the autoregressive approach.

Here’s the deal on that claim. You might know that autoregressive has tended to struggle with handling long-range dependencies in a large body of text. Fortunately, recent advances in generative AI based on autoregression have enabled larger and larger bodies of text to be handled, and ergo, this has gradually become less of a problem (see my analysis at the link here). Anyway, diffusion LLMs seem to handle this with ease (well, the research is still tentative, so don’t bank on this just yet).

Some also assert that diffusion LLMs will end up being more “creative” than autoregression-based generative AIs. Please know this is speculative. The logic for the claim is like this. With autoregression, once a generated word is chosen, by and large, the AI stays loyal to that chosen word and won’t readily back up and opt to replace it with something else (all else being equal).

Think of this as a one-way street. You go forward; you can’t go back.

In theory, a diffusion LLM could rework a being-generated response. It’s an easy possibility. You see, I had noted that the diffusion might proceed via a series of passes. In my example, perhaps the AI landed on the word “atmosphere” and then opts in the next pass to change that to the word “troposphere.” Proponents would argue that you can adjust the diffusion toward a semblance of being more creative by allowing that kind of multi-pass alteration.

We will need to wait and see.

A heated debate is underway about whether diffusion LLMs will be less costly, which proponents of diffusion suggest will be the case. This is a mixed bag. The initial data training is probably going to be higher in cost than a comparable autoregression approach. The cost saving is potentially during run-time or said-to-be thinking time when the AI is generating a response. If the underlying hardware allows for parallelism, it seems plausible that the generation process might be faster and less costly.

The cost aspects are hard to pin down since there are so many conflicting and confounding variables that come into the picture when determining costs for any kind of generative AI. A recent big-news story was about a conventional autoregression generative AI called R1 by the vendor DeepSeek that claimed to have dramatically reduced costs when producing their particular generative AI, though not everyone believes the cost claims per se (see my coverage at the link here).

Drawbacks Of Diffusion LLMs

Let’s consider the other side of the coin. The sky is not always blue, and we ought to recognize the chances of storms or overcast days. In other words, diffusion LLMs are not a silver bullet.

Do not let the emerging elation overtake your thoughtful way of thinking. On the one hand, it is absolutely refreshing to have an alternative to conventional groupthink on generative AI. I welcome it. We need to think outside the box if we want to make substantial added progress on AI (my remarks on this point are described at the link here).

That doesn’t mean that the best thing since sliced bread has miraculously been found.

Concerns about diffusion LLM include that such models seem to be less interpretable than autoregression. If you want to generate an explanation or reasoning associated with the response, right now, it is less palatable than conventional generative AI. Research is seeking to enhance that aspect.

Another qualm is that diffusion LLM, which is non-deterministic as is autoregression, seems to act in an even less deterministic way. That is presumably a plus for creativity. Meanwhile, it seems to be negative when it comes to controlling the AI and ascertaining its predictability.

There is more being bandied around:

  • Will this approach suffer from fewer AI hallucinations, the same amount, or more?
  • Do existing architectures that are driven toward autoregression text-based LLMs need to be overhauled or devised anew to best accommodate diffusion LLMs?
  • We already know that with the use of diffusion for image and video generation, there are issues dealing with potential mode collapse. The AI will sometimes generate the same output repeatedly. Will that happen in a text-based generation mode for dLLMs?
  • And so on.

Diffusion LLMs Are Worth Watching

Part of the recent spark for caring about diffusion LLMs was the announcement by a company called Inception Labs regarding their product Mercury Coder, which uses a diffusion LLM approach. This got some outsized headlines within the AI field due to the novelty of the underlying approach.

Just about everyone in AI is always on the hunt for something new and interesting.

Some heavyweights in the AI field quickly harked that the diffusion LLM approach overall is a welcome entrant into the competitive space concerning how to best devise generative AI and LLMs. I agree, as stated above.

The more, the merrier when it comes to innovating AI.

I definitely have my eye keenly focused on the advent of diffusion LLMs. That being said, they need some room to breathe, including being intensely sliced and diced. Let’s give them a hearty test run. AI researchers that I know are already venturing in this direction. I am anticipating some interesting results soon and will keep you posted.

As Albert Einstein famously said about the pursuit of innovation: “To raise new questions, new possibilities, to regard old problems from a new angle, requires creative imagination and marks real advance in science.”

There’s no garbled message in that sage advice; it is clear as a bell.

Leave a Reply

Your email address will not be published. Required fields are marked *