Relatively new arXiv preprint that got featured on Nature News, I slightly adjusted the title to be less technical. The discovery was done using aggregated online Q&A… one of the funnier sources being 2000 popular questions from r/AmITheAsshole that were rated YTA by the most upvoted response. Study seems robust, and they even did several-hundred participants trials with real humans.
A separate preprint measured sycophancy across various LLMs in a math competition-context (https://arxiv.org/pdf/2510.04721), where apparently GPT-5 was the least sycophantic (+29.0), and DeepSeek-V3.1 was the most (+70.2)
The Nature News report (which I find a bit too biased towards researchers): https://www.nature.com/articles/d41586-025-03390-0


It doesn’t need to understand anything. It just needs to spit out the answer I’m looking for.
A calculator doesn’t need to understand the fundamentals of mathematical modeling to tell me the square root of 144. If I type in 143 by mistake and get a weird answer, I correct my inputs and try again.
Calculators also don’t misinterpret things %45 of the time.