This is the implementation for the experiments in the paper Causal Abstractions of Neural Natural Language Inference Models.
See requirements.txt.
intervention/Basic infrastructure for defining computation graphs and performing interventions.compgraphs/Computation graphs for Natural Language Inference causal models and neural models.causal_abstraction/Interchange experiments and analysis.datasets/Class definitions for datasets.modeling/Neural models for NLI and training code.probing/Probing experiments.experiment/Utilities for launching experiments and automatically recording experiment results in databases.feature_importance/Utilities for integrated gradients experiments.
Training models
train_bert.pyandtrain_lstm.py. Train one instance of a model.train_manager.py. Utilities for interfacing with theexperimentmodule and managing grid search training.
Interchange experiments
interchange.py. Run one set of an interchange experiment on a given causal model intermediate node, and all neural model locations for that node. Analyze the success rates of interventions.graph_analysis.py. Composes the graph linking the examples after interchange experiments and finds cliques.interchange_manager.py. Utilities for interfacing with theexperimentmodule and run large batches of interchange experiments on a computing cluster.
Probing experiments
probe.pyprobe_manager.py