
This compared AI performance to orthopaedic surgeons within clinical practice and training.
It’s a systematic review with 16 studies being included. As far as I can, most (all?) studies used ChatGPT 3.5 and 4 (so note the older models).
And, yes, pitting surgeons against AI isn’t the ideal use-case (e.g. compared to co-agents/HAIT)
Extracts:
· “ChatGPT showed high sensitivity in identifying patients achieving clinically meaningful improvements (97% vs. 90% for surgeons) but lower specificity (33% vs. 63%) and accuracy (65% vs. 76%)”
· “AI demonstrates comparable or superior performance to surgeons in emergency scenarios and answering patient FAQs, scoring higher across empathy, accuracy, completeness and overall quality”
· “Residents outperformed AI in examinations (74.2% vs. 47.2%)”
· “AI showed limited accuracy in knee osteoarthritis radiographic staging (35% vs. >80%)”
· “AI demonstrates the potential to support clinical efficiency and patient communication in orthopaedics. However, concerns about bias, quality risks, overconfidence and reliance on outdated information prevent it from replacing human expertise”
· “although AI demonstrates strengths in emergency trauma decision‐making and answering patient FAQs, AI was outperformed by surgeons and trainees in more complex tasks such as SSI risk prediction, MCID patient identification, arthritis radiographic interpretation and OITE performance”
· “AI does not always align with clinician‐generated management plans, lacking the ability to differentiate acceptable management options, filter unreliable sources, perform independent assessments or learn from experience”
· “AI remains limited to release‐date information, limiting responsiveness to new guidelines … Restricted access to paywalled content or non‐English content can miss critical publications and may introduce translational errors”
· “Hallucinations and fabricated references further undermine trust … However, Deep Research models, trained to navigate the open web, aim to overcome reliance on training data and enable access to up‐to‐date research”
Ref: Russell, J., Rosen, J., & Vella‐Baldacchino, M. (2025). Journal of Experimental Orthopaedics, 12(4), e70548.

Shout me a coffee (one-off or monthly recurring)
Study link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12616488/