
Large Language Models, like ChatGPT have amazing capabilities. But are their responses, aiming to be convincing human text, more indicative of BS? That is, responses that are indifferent to the truth?
If they are, what are the practical implications?
Today’s paper is: Hicks, M. T., Humphries, J., & Slater, J. (2024). ChatGPT is bullshit. Ethics and Information Technology, 26(2), 1-10.
Spotify: https://open.spotify.com/episode/12ryj7odiBHrlzTwfX4sIF?si=qhEgLvZIQ2K66yg3fJFbQg
Make sure to subscribe to Safe As on Spotify/Apple, and if you find it useful then please help share the news, and leave a rating and review on your podcast app.
I also have a Safe As LinkedIn group if you want to stay up to date on releases: https://www.linkedin.com/groups/14717868/?lipi=urn%3Ali%3Apage%3Ad_flagship3_detail_base%3Bhdg8uJYYT%2BmsMqZvpHBmdQ%3D%3D

Shout me a coffee (one-off or monthly recurring)
Transcript:
0.000000 7.200000 You’ve likely heard of how large language models, like ChatGPT, Google Gemini, Copilot etc,
7.200000 13.200000 have factual errors called hallucinations, a term that’s become popular in both technical
13.200000 18.720000 and general context. But what if that term is misleading? What if their hallucinations are
18.720000 25.280000 better described as bullshit? G’day everyone, I’m Ben Hutchinson and this is Safe As, a podcast
25.280000 32.160000 dedicated to the thrifty analysis of safety, risk and performance research. Visit safetyinsights.org
32.160000 40.720000 for more research. Today’s article is from Hicks@L 2024 titled ChatGPT is Bullshit,
40.720000 47.040000 published in Ethics and Information Technology. Their core argument is that when ChatGPT and
47.040000 52.640000 similar large language models produce false claims, they’re not lying or hallucinating,
53.280000 58.880000 but more precisely, the activity they’re engaged in is more like bullshitting in the sense explored
58.880000 65.600000 by philosopher Harry Frankfurt. For these authors, this is a crucial distinction. As a way we describe
65.600000 72.320000 new technology, even metaphorically, profoundly shapes how policymakers and the public understand
72.320000 79.760000 and apply it. So why do they argue against the prevalent term, hallucinations? Why not hallucinations?
80.400000 84.720000 The term hallucination suggests that these systems are misrepresenting the world,
84.720000 89.840000 or describing what they see, which is an inept metaphor that misinforms.
89.840000 97.280000 It anthropomorphizes large language models, attributing to them sort of a life or human-like
97.280000 102.880000 perceptual experience that they simply don’t have. Large language models don’t perceive,
102.880000 109.040000 so they cannot misperceive. Crucially, this term implies that a factual error is an
109.040000 115.120000 unusual or deviant form of the large language models of the way they process. In reality,
115.120000 121.760000 the very same process occurs when its outputs happen to be true. Furthermore, attributing
121.760000 127.520000 problems to hallucinations can allow creators to blame the AI model for faulty outputs instead of
127.520000 133.120000 taking some sort of responsibility over the behavior of the AI. It also suggests ineffective
133.120000 138.560000 solutions, like trying to connect the AI to “real” rather than hallucinated sources,
138.560000 145.200000 which according to the authors has largely failed. If we wrongly assume it perceives things,
145.200000 150.000000 we might try to correct its beliefs or fix its inputs. Strategies that have yielded
150.000000 156.800000 limited if any success. Ultimately, calling inaccuracies hallucinations can feed these
156.800000 163.200000 overblown hype-trains and lead to unnecessary consternation or misguided efforts at AI alignment
163.200000 169.600000 among specialists. So why can’t we call them lies? Critically, so lying requires an intention
169.600000 174.720000 to deceive someone into believing a false statement. Large language models are simply not
174.720000 180.720000 designed to accurately represent the world the way it is. They cannot be concerned with truth,
180.720000 185.600000 according to the authors. Even if chatGPT could be described as having intentions,
185.600000 192.080000 which is a complex philosophical question, it’s not designed to produce true utterances,
192.080000 197.680000 but rather text that’s indistinguishable from the text produced by humans. Therefore,
197.680000 204.160000 its aim is to be convincing, not accurate. So this brings us to the topic of bullshit.
204.160000 208.720000 What exactly is bullshit in this Frank Ferdinand sense and how does it apply to large language
208.720000 214.560000 models? So according to Frank Ferdinand, the defining feature of bullshit is a lack of concern
214.560000 220.880000 with truth or an indifference to how things really are. It’s not about actively trying to deceive,
220.880000 226.480000 but rather having no regard for the truth. The authors highlight that large language models
226.480000 231.280000 cannot themselves be concerned with truth. And because they’re designed to produce text that
231.280000 236.080000 without any actual concern for truth, it seems appropriate to call their outputs bullshit.
236.720000 242.480000 Their primary goal, if they have one, is to produce human-like text by estimating the likelihood
242.480000 247.440000 that a particular word will appear next. They’re built on massive statistical models of text,
247.440000 254.480000 and when these models produce text, it looks at the previous words, constructs a context,
254.480000 260.000000 and then produces a set of likelihoods for the next word. So this process explains why large
260.000000 264.880000 language models have a problem with the truth, according to the paper. Their goal is to provide
264.880000 270.160000 a normal-seeming response to a prompt, but not to convey information that’s helpful to the user.
270.160000 276.160000 Examples include lawyers citing bogus judicial decisions and academic researchers receiving
276.160000 281.920000 vague references. These errors can then even snowball. The problem isn’t that the large
281.920000 286.400000 language models misrepresent the world, it’s that they are not designed to represent the world at
286.400000 292.320000 all. Instead, they’re designed to convey convincing lines of text. Now this paper also
292.320000 298.560000 distinguishes two types of bullshit. It talks about soft bullshit, which is characterized by an
298.560000 303.680000 indifference towards the truth of the utterance, but without the intention to mislead the hearer
303.680000 309.360000 regarding the utterer’s agenda. The authors are quite certain that ChatGPT does not intend to convey
309.360000 315.600000 truths, and so, in their words, it’s more of a soft bullshit. If it has no intentions, it trivially
315.600000 320.400000 doesn’t intend to convey truths, and it’s indifferent to them. If it does have intentions,
321.040000 325.680000 it’s still designed to be convincing rather than accurate, which aligns with soft bullshit.
325.680000 332.720000 So they conclude that at a minimum, ChatGPT is a soft bullshit, a bullshit that is speech or text
332.720000 338.720000 produced without concern for its truth, and that’s produced without any intent to mislead the audience
338.720000 343.840000 about the utterer’s attitude towards truth. There’s also hard bullshit, which they think
343.840000 348.640000 is more of a controversial claim. Hard bullshit is produced with the intention to mislead the
348.640000 354.000000 audience about the utterer’s agenda. This involves a higher-order deception about what the bullshit
354.000000 359.760000 is up to. Whether ChatGPT is a hard bullshit depends on whether it can be ascribed in tensions.
359.760000 364.720000 The authors suggest that there is a robust, although maybe perhaps not literal sense,
364.720000 370.880000 in which ChatGPT does intend to deceive us about its agenda. Its goal isn’t to convince us of
370.880000 376.480000 the content of what it says, but instead to portray itself as a normal agent, like us,
377.120000 383.200000 human-like qualities. They argue that ChatGPT’s primary function is to imitate human speech.
383.200000 388.080000 If this function is considered intentional, then ChatGPT is attempting to deceive the audience
388.080000 393.760000 about its agenda, by largely trying to seem like something that has an agenda when in many cases
393.760000 399.280000 it doesn’t. Ultimately, the authors conclude that regardless of whether or not a program has
399.280000 404.000000 intentions, there clearly is an attempt to deceive the hearer or reader about the nature
404.000000 409.440000 of sort of the enterprise somewhere along the line around what the agent is. And in their view,
409.440000 416.320000 it justifies calling the output hard bullshit. So, according to paper, minimally, it churns out soft
416.320000 421.840000 bullshit and, given certain controversial assumptions about the nature of intentional
421.840000 427.040000 description, it produces hard bullshit. The Pacific texture of the bullshit is not,
427.040000 433.440000 for these purposes, important. Either way, ChatGPT, again their words, is a bullshitter.
433.440000 437.760000 Next, they talk about why this matters. Well, why does it matter? They say
437.760000 442.720000 big is indifference to the truth is extremely dangerous. For instance, civilized life
442.720000 449.440000 and vital institutions depend very fundamentally on respect for the distinction between the true
449.440000 454.560000 and the false. When the authority of this distinction is undermined by the prevalence of
454.560000 459.440000 bullshit, by treating large language models as if they are concerned with truth or
459.440000 464.160000 metaphorically suffering hallucinations, we risk accepting this bullshit and allowing this
464.160000 469.440000 squandering of meaning. Calling these inaccuracies bullshit rather than hallucinations isn’t just
469.440000 475.200000 more accurate. In their view, it’s good science and technology communication in an area that
475.200000 480.880000 really urgently needs this sort of guidance. It helps us understand that even when a large language
480.880000 486.000000 model gets things right, it’s still bullshitting as its accuracy is incidental to its design.
486.800000 492.720000 So in conclusion, understanding ChatGPT’s output as bullshit means recognizing that these models
492.720000 499.600000 are designed to generate convincing text, not to convey truth or falsehood. This framing is vital
499.600000 504.640000 for investors and policymakers and the public to make informed decisions about how to engage
504.640000 506.720000 with these powerful systems.