This repository contains the code and experiments for the paper "Language Models use Lookbacks to Track Beliefs" by Prakash et al, 2025. The work investigates how language models (specifically Llama-3-70B-Instruct and Llama-3.1-405B-Instruct) represent and track characters' beliefs.
Please check belief.baulab.info for more information.
- Clone the repository:
git clone https://github.com/Nix07/belief_tracking.git
cd belief_tracking- Set up the environment:
uv sync
source .venv/bin/activate- Configure
env.ymlwith following environment variables:
- Set
NDIF_KEYfor API access - Set
HF_WRITEfor Hugging Face access
- To perform subspace level analysis, you would need singular vectors that you can request by sending an email to Nikhil.
.
├── 📊 data/ # Dataset files
├── 📓 notebooks/ # Jupyter notebooks for experiments
│ ├── attn_knockout/ # Attention knockout experiments
│ ├── bigToM/ # BigToM causal model experiments
│ ├── causal_subspace_analysis/ # Causal subspace analysis
│ ├── causalToM_novis/ # Causal model in no-visibility
│ └── causalToM_vis/ # Causal model in explicit visibility
├── 📜 scripts/ # Utility scripts
│ ├── patching_scripts/ # Patching experiment scripts
│ └── tracing_scripts/ # Causal mediation analysis scripts
├── 🔧 src/ # Source code
├── 📈 results/ # Experiment results
│ ├── attn_knockout/ # Attention knockout results
│ ├── bigToM/ # BigToM experiment results
│ ├── causal_mediation_analysis/ # Tracing experiment results
│ ├── causalToM_novis/ # No-visibility experiment results
│ └── causalToM_vis/ # Visibility experiment results
├── 📐 svd/ # Singular vector decompositions
├── 🗂️ additionals/ # Additional data and caches
└── ⚙️ env.yml # Environment configuration
The repository contains several components:
-
Dataset: The
data/directory contains the CausalToM templates and synthetic entities used to generate samples. Additionally, it also contains BigToM samples.src/dataset.pyfile contains code for generating and processing the CausalToM dataset. -
Notebooks: The
notebooks/directory contains Jupyter notebooks for various experiments investigating the underlying mechanisms. Use notebooks innotebooks/causalToM_novisandnotebooks/causalToM_visfor mechanism exploration. Notebooks do not include subspace intervention experiments. -
Scripts: The
scripts/directory contains utility scripts organized by experiment type:scripts/patching_scripts/: Contains patching experiment scripts includingrun_single_layer_patching_exps.pyandrun_upto_layer_patching_exps.pyto conduct large-scale interchange intervention experiments, including subspace patching.scripts/tracing_scripts/: Contains causal mediation analysis scripts includingtrace.pyfor tracing experiments.
-
Results: The
results/directory contains experiment outputs organized by experiment type, including attention knockout results, BigToM results, causal mediation analysis results, and CausalToM experiment results.
If you use this code in your research, please cite our paper:
@misc{prakash2025languagemodelsuselookbacks,
title={Language Models use Lookbacks to Track Beliefs},
author={Nikhil Prakash and Natalie Shapira and Arnab Sen Sharma and Christoph Riedl and Yonatan Belinkov and Tamar Rott Shaham and David Bau and Atticus Geiger},
year={2025},
eprint={2505.14685},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.14685},
}For questions and issues, please open an issue in this repository or contact Nikhil.