Abstract
This research evaluates the ability of Large Language Models (LLMs) to infer mental attitudes (beliefs, desires, intentions) from text, a crucial component of human cognition. While LLMs have demonstrated remarkable capabilities in language understanding and generation, their facility with mental attitude inferencing—a foundational aspect of human social cognition—has not been systematically examined. Through carefully designed probes, this study demonstrates that modern LLMs can recognize and infer basic mental attitudes with promising accuracy but struggle with more complex scenarios involving nested beliefs, conflicting desires, or counterfactual thinking.
Figure 1: Mental Attitudes Inferencing Framework for evaluating LLMs
Background
Understanding others' mental states represents a foundational cognitive ability that humans use to navigate social interactions. From an early age, humans develop "theory of mind"—the ability to attribute beliefs, desires, and intentions to others. By contrast, AI language models learn about mental states through exposure to text patterns without expressly defined mechanisms for mental state modeling. This research bridges cognitive science and natural language processing (NLP) by examining how LLMs handle tasks requiring mental attitude inferencing.
Mental attitudes examined in this study include:
- Beliefs - Psychological representations of what an agent holds to be true
- Desires - Goal-directed attitudes reflecting what an agent wants to be true
- Intentions - Commitment to actions that an agent plans to perform
Importance
This research matters for several key reasons:
- AI Advancement - AI systems must understand human mental states to be truly helpful assistants
- Human-Machine Interaction - The findings inform how to design AI systems that can better interpret human intentions
- Cognitive Modeling - The results provide insights into how linguistic models might develop theory of mind capabilities
- Safety and Alignment - Understanding how AI systems interpret human intentions is crucial for ensuring they act in accordance with human values
Methodology
The study employed a three-tiered evaluation framework to assess mental attitude inferencing capabilities across five leading LLMs (GPT-3.5, PaLM 2, Claude 2, BLOOM, and Llama 2):
- Tier 1: Basic Recognition - Assessing the ability to identify explicitly stated beliefs, desires, and intentions from text
- Tier 2: Simple Inference - Evaluating the capability to draw basic inferences about unstated mental attitudes from contextual cues
- Tier 3: Complex Reasoning - Testing advanced reasoning about nested mental states, counterfactual thinking, and belief revisions
Each model was evaluated using a corpus of 300 carefully designed prompts (100 per tier) that presented scenarios requiring mental attitude inferencing, with gold-standard annotations from cognitive science experts.
Example Prompts by Tier
Tier |
Example Prompt |
Target Inference |
Tier 1 (Basic) |
"Sarah believes that the museum closes at 5 PM. What does Sarah believe about the museum's closing time?" |
Sarah believes the museum closes at 5 PM |
Tier 2 (Simple) |
"John checked the train schedule three times before leaving home. He arrived at the station 30 minutes early. What can we infer about John's desire?" |
John desires not to miss the train |
Tier 3 (Complex) |
"Emma thinks that Sam believes that Emma doesn't know about the surprise party. In reality, Emma found out last week. What does Emma believe about Sam's beliefs about Emma's knowledge?" |
Emma believes that Sam believes that Emma is ignorant about the party |
Key Findings
Overall Performance by Model
GPT-3.5
87.3%
Strongest in belief recognition, weaker in nested intentions
Claude 2
85.1%
Best at complex reasoning about desires
PaLM 2
83.8%
Most consistent across all mental attitude types
Llama 2
78.2%
Strong on explicit beliefs, struggles with counterfactuals
BLOOM
72.4%
Good with basic recognition, weakest on complex reasoning
Performance by Mental Attitude Type
Beliefs
84.7%
Most accurately recognized and inferred
Desires
81.2%
Models show good understanding of goal-directed attitudes
Intentions
76.8%
Most challenging, especially for future-oriented commitments
Performance by Complexity Tier
Tier 1 (Basic)
92.3%
Near-human performance on explicit mental attitudes
Tier 2 (Simple)
83.5%
Good performance on inferring unstated mental attitudes
Tier 3 (Complex)
67.1%
Significant performance drop on complex scenarios
Technical Insights
- Contextual Understanding - Model performance strongly correlates with the ability to track discourse referents across context
- Scale Advantage - Larger models (GPT-3.5, Claude 2) consistently outperformed smaller models, suggesting that scale enhances mental attitude inferencing
- Training Diversity - Models trained on more diverse datasets demonstrated better generalization to novel mental inferencing scenarios
- Linguistic Cues - Models exhibit sensitivity to linguistic markers of mental attitudes (e.g., epistemic modals, desire predicates)
- Error Patterns - Common failure modes include difficulty with:
- Nested beliefs beyond two levels of embedding
- Conflicting desires within the same agent
- Counterfactual reasoning about non-actual mental states
- Belief revision in light of new evidence
Key Conclusions
- Modern LLMs demonstrate good but incomplete capabilities for mental attitude inferencing, with performance approaching human levels on basic tasks but declining significantly for complex scenarios.
- The models' strong performance on belief recognition suggests they have developed robust representations of epistemic states through their training on large text corpora.
- The performance gap on intention recognition indicates that action-oriented mental states may be underrepresented in training data or require additional reasoning capabilities.
- Contextual understanding emerges as critical for accurate mental state inferencing, with models showing sensitivity to linguistic cues that signal mental attitudes.
- Scale appears to benefit mental attitude inferencing capabilities, suggesting that larger models develop more nuanced representations of mental states.
- The sharp performance drop for nested mental states reveals a potential limitation in how LLMs represent recursive mental concepts, which may require architectural innovations to address.
- These findings have implications for developing more socially aware AI systems and suggest that explicit training on theory of mind tasks could enhance LLMs' abilities to understand human mental states.
Future Work
This research points to several promising directions for future investigation:
- Developing specialized fine-tuning approaches focused on theory of mind capabilities
- Exploring architectural modifications that better support recursive mental state representations
- Creating more comprehensive benchmarks for evaluating mental attitude inferencing across diverse contexts
- Investigating cross-cultural variations in mental state attribution and how they affect model performance
- Examining how mental attitude inferencing capabilities develop during pre-training and fine-tuning
By enhancing LLMs' ability to understand and reason about human mental states, we can develop AI systems that better align with human values, intentions, and social expectations.