artificial intelligence – SafetyInsights.org

AI deception: A survey of examples, risks, and potential solutions

18th Dec 2025 Leave a comment

This study explored how “a range of current AI systems have learned how to deceive humans”. Extracts: · “One part of the problem is inaccurate AI systems, such as chatbots whose confabulations are often assumed to be truthful by unsuspecting users” · “It is difficult to talk about deception in AI systems without psychologizing them. In humans,… Continue reading AI deception: A survey of examples, risks, and potential solutions

Agentic Misalignment: How LLMs could be insider threats (Anthropic research)

17th Dec 2025 Leave a comment

AI and malicious compliance. This research from Anthropic has done the rounds, but quite interesting. In controlled experiments (not real-world applications), they found that AI models could resort to “malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors”. Some extracts:… Continue reading Agentic Misalignment: How LLMs could be insider threats (Anthropic research)

Safe As 33: Is ChatGPT bullsh** you? How Large Language models aim to be convincing rather than truthful

11th Sep 2025 Leave a comment

Large Language Models, like ChatGPT have amazing capabilities. But are their responses, aiming to be convincing human text, more indicative of BS? That is, responses that are indifferent to the truth? If they are, what are the practical implications? Today’s paper is: Hicks, M. T., Humphries, J., & Slater, J. (2024). ChatGPT is bullshit. Ethics and… Continue reading Safe As 33: Is ChatGPT bullsh** you? How Large Language models aim to be convincing rather than truthful

Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil’s Advocate

17th Aug 2025 Leave a comment

Can LLM’s effectively play as devil’s advocate, enhancing group decisions? Something I’ve been working on lately is AI as a co-agent for cognitive diversity / requisite imagination. Here’s a study which explored an LLM as a devil’s advocate, and I’ll post another study next week on AI and red teaming. [Though this study relied on… Continue reading Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil’s Advocate

BEWARE OF BOTSHIT: HOW TO MANAGE THE EPISTEMIC RISKSOF GENERATIVE CHATBOTS

13th Aug 2025 Leave a comment

Really interesting discussion paper on the premise of ‘botshit’: the AI version of bullshit. I can’t do this paper justice – it’s 16 pages, so I can only cover a few extracts. Recommend reading the full paper. Tl;dr: generative chatbots predict responses rather than knowing the meaning of their responses, and hence, “produce coherent-sounding but… Continue reading BEWARE OF BOTSHIT: HOW TO MANAGE THE EPISTEMIC RISKSOF GENERATIVE CHATBOTS

Human Factors and Ergonomics in Industry 5.0 —A Systematic Literature Review

27th Feb 2025 Leave a comment

This open access article may interest people – it explored the future of human factors/ergonomics in Industry 5.0 (I05). Not a summary but you can read the full paper freely. Some extracts: Shout me a coffee Study link: https://doi.org/10.3390/app15042123 LinkedIn post: https://www.linkedin.com/posts/benhutchinson2_this-open-access-article-may-interest-people-activity-7300617102564933632-WGPj?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAeWwekBvsvDLB8o-zfeeLOQ66VbGXbOpJU

SafetyInsights.org

Home of safety & risk research summaries

Tag: artificial intelligence

AI deception: A survey of examples, risks, and potential solutions

Agentic Misalignment: How LLMs could be insider threats (Anthropic research)

Safe As 33: Is ChatGPT bullsh** you? How Large Language models aim to be convincing rather than truthful

Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil’s Advocate

BEWARE OF BOTSHIT: HOW TO MANAGE THE EPISTEMIC RISKSOF GENERATIVE CHATBOTS

Human Factors and Ergonomics in Industry 5.0 —A Systematic Literature Review

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: