Language Models use Lookbacks to Track Beliefs

This repository contains the code and experiments for the paper "Language Models use Lookbacks to Track Beliefs" by Prakash et al, 2025. The work investigates how language models (specifically Llama-3-70B-Instruct and Llama-3.1-405B-Instruct) represent and track characters' beliefs.

Please check belief.baulab.info for more information.

Setup

Clone the repository:

git clone https://github.com/Nix07/belief_tracking.git
cd belief_tracking

Set up the environment:

uv sync
source .venv/bin/activate

Configure env.yml with following environment variables:

Set NDIF_KEY for API access
Set HF_WRITE for Hugging Face access

To perform subspace level analysis, you would need singular vectors that you can request by sending an email to Nikhil.

Repository Structure

.
├── 📊 data/                              # Dataset files
├── 📓 notebooks/                         # Jupyter notebooks for experiments
│   ├── attn_knockout/                   # Attention knockout experiments
│   ├── bigToM/                          # BigToM causal model experiments 
│   ├── causal_subspace_analysis/        # Causal subspace analysis
│   ├── causalToM_novis/                 # Causal model in no-visibility
│   └── causalToM_vis/                   # Causal model in explicit visibility   
├── 📜 scripts/                          # Utility scripts
│   ├── patching_scripts/                # Patching experiment scripts
│   └── tracing_scripts/                 # Causal mediation analysis scripts
├── 🔧 src/                              # Source code
├── 📈 results/                          # Experiment results
│   ├── attn_knockout/                   # Attention knockout results
│   ├── bigToM/                          # BigToM experiment results
│   ├── causal_mediation_analysis/       # Tracing experiment results
│   ├── causalToM_novis/                 # No-visibility experiment results
│   └── causalToM_vis/                   # Visibility experiment results
├── 📐 svd/                              # Singular vector decompositions
├── 🗂️ additionals/                      # Additional data and caches
└── ⚙️ env.yml                           # Environment configuration

Usage

The repository contains several components:

Dataset: The data/ directory contains the CausalToM templates and synthetic entities used to generate samples. Additionally, it also contains BigToM samples. src/dataset.py file contains code for generating and processing the CausalToM dataset.
Notebooks: The notebooks/ directory contains Jupyter notebooks for various experiments investigating the underlying mechanisms. Use notebooks in notebooks/causalToM_novis and notebooks/causalToM_vis for mechanism exploration. Notebooks do not include subspace intervention experiments.
Scripts: The scripts/ directory contains utility scripts organized by experiment type:
- scripts/patching_scripts/: Contains patching experiment scripts including run_single_layer_patching_exps.py and run_upto_layer_patching_exps.py to conduct large-scale interchange intervention experiments, including subspace patching.
- scripts/tracing_scripts/: Contains causal mediation analysis scripts including trace.py for tracing experiments.
Results: The results/ directory contains experiment outputs organized by experiment type, including attention knockout results, BigToM results, causal mediation analysis results, and CausalToM experiment results.

Citation

If you use this code in your research, please cite our paper:

@misc{prakash2025languagemodelsuselookbacks,
      title={Language Models use Lookbacks to Track Beliefs}, 
      author={Nikhil Prakash and Natalie Shapira and Arnab Sen Sharma and Christoph Riedl and Yonatan Belinkov and Tamar Rott Shaham and David Bau and Atticus Geiger},
      year={2025},
      eprint={2505.14685},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.14685}, 
}

Contact

For questions and issues, please open an issue in this repository or contact Nikhil.

Name		Name	Last commit message	Last commit date
Latest commit History 264 Commits
data		data
notebooks		notebooks
results		results
scripts		scripts
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
causalmodel_novis.png		causalmodel_novis.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language Models use Lookbacks to Track Beliefs

Setup

Repository Structure

Usage

Citation

Contact

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

Nix07/belief_tracking

Folders and files

Latest commit

History

Repository files navigation

Language Models use Lookbacks to Track Beliefs

Setup

Repository Structure

Usage

Citation

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages