Apple reveals: the “thinking” in AI may be just an illusion


Apple recently published a study that sheds light on a fundamental question: do advanced reasoning language models actually think — or do they just appear to? The study, titled The Illusion of Thinking, analyzes the performance of state-of-the-art language models, known as LRMs (Large Reasoning Models), and reveals surprising limits in their reasoning abilities.

What are LRMs?

Unlike traditional language models (LLMs), which generate text based on statistical patterns, LRMs are designed to handle more complex tasks through structured chains of reasoning. They produce long sequences of thought before arriving at an answer — something many believe to be a step toward artificial general intelligence (AGI). Examples include Claude 3.7 Thinking and DeepSeek-R1.

But is this “thinking ability” real?

Apple’s approach

To investigate this question, Apple created a testing environment different from traditional benchmarks, which are often contaminated by training data. Instead, the team used puzzles such as the Tower of Hanoi, River Crossing, and Blocks World — all with adjustable complexity and clear rules. This setup allowed the team to measure not just the final answers, but also the reasoning steps produced by the models.

What did the research find?

The results show that LRMs have serious limitations. Apple identified three distinct performance phases as problem complexity increases:

  • Low complexity: Standard models (without “thinking”) are more efficient and accurate.
  • Medium complexity: LRMs show advantages by exploring solutions more deeply.
  • High complexity: All models — with or without reasoning — fail completely. Accuracy drops to zero, and interestingly, LRMs begin to “think less” as the challenge grows.

Another intriguing finding was the overthinking phenomenon: in simpler problems, LRMs often found the correct answer early, but continued to explore incorrect paths, wasting compute resources. In complex problems, they couldn't find the correct solution at all.

Even when Apple provided the exact algorithm to solve a task (like the Tower of Hanoi), the models failed to execute it correctly, showing limitations in following logical instructions.

Conclusion

Despite the impressive progress of LRMs, Apple’s research shows we are still far from achieving reliable reasoning in AI. The “thinking” of these machines remains fragile, inconsistent, and heavily dependent on task complexity. The study challenges overly optimistic assumptions about current AI capabilities and emphasizes the need for more rigorous evaluations.

If you thought AI was ready to think like a human — it might be time to think again.


Source: Apple revela: o “pensamento” da inteligência artificial pode ser apenas uma ilusão

Comments