Where are GPT and friends going?

A post today, because this is the easiest way to share three links with people, links I think are very useful in understanding the future of Generative AI / Large Language Models such as ChatGPT, GPT, Bard, Dall-E, etc. The noise level surrounding Generative AI (GAI) is unbelievable and there are many, many false assumptions and conclusions around.

Before I give you the three links, let me address what is probably the biggest misunderstanding about these models. As these models produce answers that show understanding we assume those answers come from understanding. We cannot do otherwise, it is in our nature to draw that conclusion. So, people generally think that the information they get in Generative AI answers implies the information they get has come from some sort of storage in the back that contains that information. But this is not the case.

These models are generators. They generate a stream of words — or a stream of pixels in case of image generators, but for the example here we are going to focus on text. Generative AIs generate these on the basis of a humongous database of numbers that generate a stream that is based on how likely the next word(s) are the ones that would be part of a good reply. Example: that such a model can give you travel instructions to go from A to B is not based on having the time tables of the various modes of transportation in a database underneath. In short:

A Generative AI has no access to the actual information it has been trained with when it produces the reply.

This is the insight many are missing (including most hype-peddling commentators).

One other reason this misunderstanding is so widespread is because the phrase ‘learning’ is used (as in ‘machine learning’). This evokes our own learning which is not necessarily the same as calculating ‘word order likelihoods’. There are neurons with weights in our heads, sure, but these probably do not store likelihoods of word orders. In other words: our brains store understanding and we produce a stream of words from understanding (including understanding what we do not understand — something a Generative AI cannot do unless it is explicitly part of a guardrail such as when an answer by Open AI starts with “As an AI language model, I cannot/am unable”), but the Generative AI produces a stream of words from ‘likelihood of them being in a good order‘. The ‘understanding assumption’ is something we humans put on that result, because that is the way we work.

This is also where Generative AI’s often mentioned ‘hallucinations’ come from. As there is no actual understanding in the model, it can produce ‘likely word combinations’ that are nonsense. Here is something I just asked ChatGPT:

Q: Can you tell me which three books Gerben Wierda has written on Enterprise Architecture?
ChatGPT: Yes, Gerben Wierda has written the following three books on Enterprise Architecture:
1. “Mastering ArchiMate – Edition III: A serious introduction to the ArchiMate® enterprise architecture modeling language” (2019)
2. “Chess and the Art of Enterprise Architecture” (2006)
3. “Enterprise Architecture at Work: Modelling, Communication and Analysis” (2012)
These books cover a range of topics related to enterprise architecture, including modeling languages, communication strategies, and analysis techniques. They are widely regarded as valuable resources for architects, designers, and other professionals working in the field of enterprise architecture.

I’ve written Mastering Archimate. But Edition III is from 2017 and Edition 3.1 is from 2021. I’ve written Chess and the Art of Enterprise Architecture, but it was published in 2015. These are minor errors, but the third book is not by me, it is by Marc Lankhorst. For good measure, I asked ChatGPT also about the three books by Marc Lankhorst and I’ve added the result at the end of this post.

The example shows how ChatGPT tries to satisfy the demands of the prompt as good as it can. I ask for three, I get three, regardless of the fact that I have only written two. That is because that statement is based on the likelihood of words being in a good order, and not on the basis of any information ChatGPT’s model contains. It does not have information on what it produces. Nada. Which is difficult for people to understand. Even the term ‘hallucinating’ for when things go wrong is misleading. All Generative AI output is the same hallucinating at work.

Generative AI’s are hallucinating, even when they are telling the truth.

Me. Now.

It just happens to be that many of the hallucinations are true or ‘good enough’ to be usable. Note: ChatGPT itself does not ‘agree’ with this assessment, see below.

Which brings me to the links I would like to share with you.

The first is a blog post by Dave Karpf from New York University: On Generative AI and Satisficing. ‘Satisficing’ is a concept introduced by Herbert Simon. It combines ‘Satisfying’ and ‘Suffice’. Dave has written from the perspective of ‘what is the use case here?’ or ‘what is it good for?’ before, e.g. this fine piece a while back. But here, he lays out that for most of the things we use or do, the rule is not that it has to be perfect or ‘the best’, it simply has to be ‘good enough’. And that has consequences for Generative AI.

The second is an article by The Guardian’s tech journalist Alex Hern: AI is feared to be apocalyptic or touted as world-changing – maybe it’s neither. He makes a rather good point — also based on a leaked Google document which I’ll link to below — that there might be a lot of use cases that fall in the category ‘good enough’ for which we need not spend billions of dollars to make the models marginally better than already quite good. By the way, OpenAI is reported to have spent about half a billion — yes, with a b — dollars in 2022 to create and offer their GPT model.

Hern also links to the third article I wanted to point out, which is a document that is reported to have leaked from within Google by some anonymous insider/researcher, arguing that Google Microsoft/OpenAI and facebookMeta already have lost it to the ‘just good enough’ Generative AI models out there: We Have No Moat And neither does OpenAI

I can advise all that are interested in the future of Generative AI to read the articles behind those links.

My estimates for Generative AI as of today:

  • I think the ‘very good’ Generative AIs — such as Open AI’s GPT — will find use as productivity amplifiers for jobs like translation of business/technical/etc text. The basic translation will be already so good that translators will not have to do the hard work, they will have to correct the results of a GAI. People will be scared of losing their job, but historically, what will (also?) happen is that we simply will translate ever more. The EU will be very happy. The dream of Eurotra finally coming true. [Update 11 May: I happened to have need of back-translation from Japanese of something I originally had written in English that had been translated to Japanese. I was really surprised that ChatGPT was really bad at it, having assumed that it would be good. But Google Translate turned out to be much, much better.]
  • We will see a lot of ‘small’ GAIs that ‘satisfice’ a need without having to be particularly reliable.
  • We won’t see Artificial General Intelligence (category: no chance in hell), but we may see a large leap forward when these Generative AI’s are combined with old-fashioned Symbolic AIs (e.g. expert systems), which will not be trivial by the way.
  • Intellectual Property Right battles may be won by the Generative AI firms. Because GAI’s do not copy the information they use for ‘training’ into their models (the major misunderstanding out there), they do not contain it at all. As long as the company had a right to read the input, they may be safe from IPR issues, because what the GAI does is not true copying at all. Of course, judges could also allow the argument that if the effect of word statistics is close enough to copying, it is legally copying. Who knows what the legal minds are going to come up with and the fight will be extremely interesting. I’m getting popcorn.

And for good measure, here is the result of me prompting ChatGPT about the three books Marc has written:

Q: Forget the previous prompts. Can you tell me which three books Marc Lankhorst has written on Enterprise Architecture?
ChatGPT: Yes, Marc Lankhorst has written the following three books on Enterprise Architecture:
1. “Enterprise Architecture at Work: Modelling, Communication and Analysis” (2012)
2. “Agile Service Development: Combining Adaptive Methods and Flexible Solutions” (2012)
3. “Agile Service Development: Creating Value with Agile Service Strategy and Design” (2015)
These books cover a range of topics related to enterprise architecture and service development, including modeling, communication, analysis, and agile methodologies. They are widely regarded as valuable resources for architects, designers, and other professionals working in the field of enterprise architecture and service development.

Marc’s Enterprise Architecture at Work also has various editions, and a previous one is actually from 2012. He also has written Agile Service Development, and the subtitle of number 2 above is the real one. The third book is cobbled together from the title of the second and something that combines well but does not exist.

Asking for three when you know there are only two is of course a sure way to force a Generative AI ‘faulty hallucination’. So I also asked ChatGPT “How many books on EA has Gerben Wierda written?”. The answer was: 1. Mastering ArchiMate III (2019), so missing EA Chess. I also asked the same about Marc Lankhorst. Here ChatGPT came with a list of three, missing number 2 above and hallucinating two more that as far as I have been able to find out do not exist. It also happily hallucinates nonexistent ISBN numbers, even for existing books.

Coming back to the insight that all replies by ChatGPT are hallucinations as hallucinations and correct replies are generated in exactly the same way:

Q: Is there a fundamental difference in the generation of correct or hallucination replies by ChatGPT?
ChatGPT:Yes, there is a fundamental difference in the generation of correct and hallucination replies by ChatGPT.
ChatGPT is trained on large amounts of text data and is capable of generating responses based on patterns it has learned from that data. When asked a question or given a prompt, ChatGPT uses its internal algorithms to generate a response that is most likely to be coherent and relevant based on the patterns it has learned from its training data.
When generating correct responses, ChatGPT relies on its understanding of the context and meaning of the input it receives, as well as its ability to generate text that is grammatically correct and semantically coherent. In contrast, when generating hallucination responses, ChatGPT may generate text that is nonsensical or unrelated to the input it receives.
This distinction between correct and hallucination responses is important because correct responses are more likely to be helpful and informative, while hallucination responses are more likely to be confusing or misleading. Therefore, it is important to evaluate the quality of ChatGPT’s responses based on their coherence and relevance to the input it receives.

Well, at least we can say that ChatGPT understands itself about as poorly as humans do.

Image by Marcel Antonisse / Anefo, CC0, via Wikimedia Commons — The Compact Disc as this is a very typical example of an extremely successful ‘good enough’ compromise between conflicting requirements.

1 comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: