Yeah, I know, clickbait. The sex thing did happen by the way (and is quoted below), but that is not what this story is about, so if you’re someone looking for some weird satisfaction, this is not for you. And yes, Generative AI is creative in a way. Why be surprised? Generation is a form of creation. Anyway, let’s get to the heart of the matter.
A while back someone I know — let’s call him Sam — was in a two hour meeting. It was an online meeting that was recorded for those that weren’t there. Now, this meeting was about a contentious subject. Let’s say that party A wanted something and wanted party B to agree, but party B had strong doubts about A’s arguments. So far, so good, such things happen, but then, several days later, one of the participants — but one not really deep into the subject of contention — distributed ‘the notes from the meeting’.
This might seem like a strange start of a short story about AI inventing a new language, but bear with me.
What was the case?
So, the notes distributed looked fine at first glance, but Sam (from ‘doubting’ party B) noticed two things. One, the overall notes were almost completely from the perspective of party A. That irked Sam, but what irked Sam even more was that a point he had made had become mangled into something he did not argue.
So, Sam addressed this and asked the one who had provided the notes: “Listen, these notes do not reflect the discussion well, and besides, the notes put an argument in my mouth that I never made.” The note-provider was shocked, they were convinced at the time that the notes were pretty good. Besides, they said, the notes were a summary made directly from the transcript by Microsoft CoPilot, so how could these be wrong?
And Sam — beging someone with a technical background — immediately noticed what had happened.
- First, as party A had to overcome the doubts of party B, they had spoken at much more length than party B.
- Second, the point that had been made by my contact had been superseded by ‘general ideas’, not so much the argument they had made.
These are the two aspects of ‘summarising’ by GenAI that I described in When ChatGPT summarises, it actually does nothing of the kind. Two elements of GenAI influence the summary: the training materials as condensed into the parameters and the input given to the model. If the subject is well-represented in the training material, the parameters may ‘overwhelm’ the actual input to be summarised — effectively ignoring it and producing something from ‘general statistics’. If this is not the case, and the input is largely what the algorithm has to go on, then ‘summarising’ is more a form of ‘shortening’:
To summarise, you need to understand what the paper is saying. To shorten text, not so much. To truly summarise, you need to be able to detect that from 40 sentences, 35 are leading up to the 36th, 4 follow it with some additional remarks, but it is that 36th that is essential for the summary and that without that 36th, the content is lost.
From When ChatGPT summarises, it actually does nothing of the kind.
The note taker initially suggested “Can you not fix these errors?” to which Sam replied: “Turning my job into fixing the hallucinations of Generative AI is not what I had in mind.”. The note taker at the end — being a reasonable person — accepted they had to be more careful with using CoPilot in the future. They will, I guess, but many will not, and ‘cheap’ is probably going to wreak havoc, but I digress, as usual.
OK, so we already knew that. So why this story?
The account by Sam struck me. But it also dawned on me something unnoticed is happening here. Because, I don’t know about you, but what we get from transcripts is often a lot of almost useless junk. Here is a pretty typical fragment I have personally taken from a recent transcript of a meeting:
Why was X dropped? Is serious sex and then looked at the option to since meter of and That is not a question I’m going to.
Fragment from a transcript of a recent meeting I attended. “Sex” wasn’t mentioned by the speaker, nor was it in anyway related to the subject.
In my experience, the transcripts are mostly (somewhat, sometimes) useful to try to search for a part of the conversation you are interested in. Trying to read such transcripts is not like reading the language spoken. It is often mangled beyond repair. It has completely unfathomable sentences like the ones above, which are nonetheless made up of recognisable words. You have to actually listen to find out what was said.
But here is the thing: CoPilot was actually able to make a summary of this messy transcript. Which in my world is really impressive, as in went gibberish and out came perfect language that — while not being a good summary, let alone notes — gave a convincing impression of being a good summary/notes even to someone (with good intentions) who had been in the meeting (which is a key observation in itself, but which requires a longer treatment about the actual aspect in all of this almost nobody talks about: human intelligence). It is just that CoPilot — like any GenAI — doesn’t understand what it is doing, so summaries are quite unreliable. But in went gobbledygook and out came something I can understand. Now, if you recall from the Harry Potter books, Gobbledygook is mentioned there as an actual ‘language’ one can speak. And that fits perfectly with what struck me here when I noticed that CoPilot can translate gibberish into (condensed) human-language.
So, we can look at it like this: AI has generated a new language. The language is called ‘Transcript’. Humans do not speak Transcript and often can only guess what is being said when they read it. It is like there is a language called ‘Gibberish’ produced by one AI, and another (Generative) AI (e.g. CoPilot) is able to translate this — for humans gibberish — Transcript to Human-Language.
Hence, no ‘sex’ in the resulting summary as there shouldn’t be. Just the ordinary problems of ‘shortening without understanding’ and making the rest up from training material (as laid out here)
Still, I hereby claim to have been the first to discover a new language: Transcript. One that AI has brought into the world.
PS. Or there is a language group Transcript with different offshoots, like TranscriptEnglish, TranscriptDutch, etc. I have considered seeing these as dialects of the spoken language, so TranscriptEnglish as a dialect of English, but ruled against this as there is no linguistic relation to be made between phrases like “Is serious sex and then looked at the option to since meter of” and what was actually said. Oh, and here at the end, I don’t have to say that all this is in part serious observation and in part tongue firmly in cheek, right?
NB: the original examples were in Dutch that I have translated to retain the original meaning/phrasing.
This article is part of the ChatGPT and Friends Collection. In case you found any term or phrase here unclear or confusing (e.g, I can understand that most people do not immediately know that ‘context’ in LLMs is whatever has gone before in human (prompt) and LLM (reply) generated text, before producing the next token), you can probably find a clear explanation there.
[You do not have my permission to use any content on this site for training a Generative AI (or any comparable use), unless you can guarantee your system never misrepresents my content and provides a proper reference (URL) to the original in its output. If you want to use it in any other way, you need my explicit permission]
Photo — without the text overlay, note! — by Joshua Hoehne on Unsplash
I asked a few AIs to summarise this post immediately after publishing it. All but Gemini did a good (though boring, bland) job.
LikeLike