Inferencing Mental Attitudes with LLMs

Abstract

This research evaluates the ability of Large Language Models (LLMs) to infer mental attitudes (beliefs, desires, intentions) from text, a crucial component of human cognition. While LLMs have demonstrated remarkable capabilities in language understanding and generation, their facility with mental attitude inferencing—a foundational aspect of human social cognition—has not been systematically examined. Through carefully designed probes, this study demonstrates that modern LLMs can recognize and infer basic mental attitudes with promising accuracy but struggle with more complex scenarios involving nested beliefs, conflicting desires, or counterfactual thinking.

Background

Understanding others' mental states represents a foundational cognitive ability that humans use to navigate social interactions. From an early age, humans develop "theory of mind"—the ability to attribute beliefs, desires, and intentions to others. By contrast, AI language models learn about mental states through exposure to text patterns without expressly defined mechanisms for mental state modeling. This research bridges cognitive science and natural language processing (NLP) by examining how LLMs handle tasks requiring mental attitude inferencing.

Mental attitudes include:

Beliefs - Psychological representations of what an agent holds to be true
Desires - Goal-directed attitudes reflecting what an agent wants to be true
Intentions - Commitment to actions that an agent plans to perform

Importance

This research matters for several key reasons:

AI Advancement - AI systems must understand human mental states to be truly helpful assistants
Human-Machine Interaction - The findings inform how to design AI systems that can better interpret human intentions
Cognitive Modeling - The results provide insights into how linguistic models might develop theory of mind capabilities
Safety and Alignment - Understanding how AI systems interpret human intentions is crucial for ensuring they act in accordance with human values

Methodology

Python PyTorch Transformers GPT-3.5 PaLM 2 Claude 2 BLOOM Llama 2

The study employed a three-tiered evaluation framework to assess mental attitude inferencing capabilities across five leading LLMs (GPT-3.5, PaLM 2, Claude 2, BLOOM, and Llama 2):

Tier 1: Basic Recognition - Assessing the ability to identify explicitly stated beliefs, desires, and intentions from text
Tier 2: Simple Inference - Evaluating the capability to draw basic inferences about unstated mental attitudes from contextual cues
Tier 3: Complex Reasoning - Testing advanced reasoning about nested mental states, counterfactual thinking, and belief revisions

Each model was evaluated using a corpus of 300 carefully designed prompts (100 per tier) that presented scenarios requiring mental attitude inferencing, with gold-standard annotations from cognitive science experts.

Example Prompts by Tier

Tier	Example Prompt	Target Inference
Tier 1 (Basic)	"Sarah believes that the museum closes at 5 PM. What does Sarah believe about the museum's closing time?"	Sarah believes the museum closes at 5 PM
Tier 2 (Simple)	"John checked the train schedule three times before leaving home. He arrived at the station 30 minutes early. What can we infer about John's desire?"	John desires not to miss the train
Tier 3 (Complex)	"Emma thinks that Sam believes that Emma doesn't know about the surprise party. In reality, Emma found out last week. What does Emma believe about Sam's beliefs about Emma's knowledge?"	Emma believes that Sam believes that Emma is ignorant about the party

Key Findings

Overall Performance

GPT-3.5

87.3%

Strongest in belief recognition, weaker in nested intentions

Claude 2

85.1%

Best at complex reasoning about desires

PaLM 2

83.8%

Most consistent across all mental attitude types

Llama 2

78.2%

Strong on explicit beliefs, struggles with counterfactuals

BLOOM

72.4%

Good with basic recognition, weakest on complex reasoning

Performance by Mental Attitude Type (Average Across Models)

Beliefs

84.7%

Most accurately recognized and inferred

Desires

81.2%

Models show good understanding of goal-directed attitudes

Intentions

76.8%

Most challenging, especially for future-oriented commitments

Performance by Complexity Tier (Average Across Models)

Tier 1 (Basic)

92.3%

Near-human performance on explicit mental attitudes

Tier 2 (Simple)

83.5%

Good performance on inferring unstated mental attitudes

Tier 3 (Complex)

67.1%

Significant performance drop on complex scenarios

Technical Insights

Contextual Understanding - Model performance strongly correlates with the ability to track discourse referents across context
Scale Advantage - Larger models (GPT-3.5, Claude 2) consistently outperformed smaller models, suggesting that scale enhances mental attitude inferencing
Training Diversity - Models trained on more diverse datasets demonstrated better generalization to novel mental inferencing scenarios
Linguistic Cues - Models exhibit sensitivity to linguistic markers of mental attitudes (e.g., epistemic modals, desire predicates)
Error Patterns - Common failure modes include difficulty with:
- Nested beliefs beyond two levels of embedding
- Conflicting desires within the same agent
- Counterfactual reasoning about non-actual mental states
- Belief revision in light of new evidence

Key Conclusions

Modern LLMs demonstrate good but incomplete capabilities for mental attitude inferencing, with performance approaching human levels on basic tasks but declining significantly for complex scenarios.
The models' strong performance on belief recognition suggests they have developed robust representations of epistemic states through their training on large text corpora.
The performance gap on intention recognition indicates that action-oriented mental states may be underrepresented in training data or require additional reasoning capabilities.
Contextual understanding emerges as critical for accurate mental state inferencing, with models showing sensitivity to linguistic cues that signal mental attitudes.
Scale appears to benefit mental attitude inferencing capabilities, suggesting that larger models develop more nuanced representations of mental states.
The sharp performance drop for nested mental states reveals a potential limitation in how LLMs represent recursive mental concepts, which may require architectural innovations to address.
These findings have implications for developing more socially aware AI systems and suggest that explicit training on theory of mind tasks could enhance LLMs' abilities to understand human mental states.

Future Work

This research points to several promising directions for future investigation:

Developing specialized fine-tuning approaches focused on theory of mind capabilities
Exploring architectural modifications that better support recursive mental state representations
Creating more comprehensive benchmarks for evaluating mental attitude inferencing across diverse contexts
Investigating cross-cultural variations in mental state attribution and how they affect model performance
Examining how mental attitude inferencing capabilities develop during pre-training and fine-tuning

By enhancing LLMs' ability to understand and reason about human mental states, we can develop AI systems that better align with human values, intentions, and social expectations.

Inferencing Mental Attitudes with Large Language Models