intro

This repository accompanies the paper (arXiv). Accepted to ACL 2020.

tl;dr read intro and one of the usage headings.

intro

Tools to understand neural representations, and application to contextualizers. A “contexualizer” is a model producing a context-dependent word embedding.

Concretely, similarity measures, eg.

CKA
SVCCA

applied to SOTA contextualizers, eg.

BERT
ELMo

All similarity measures can be found in corr_methods.py.

We also experimented with novel attention-based similarity measures in attention_corr_methods.py.

This repository should be on your Python path.

export PYTHONPATH="${PWD}:${PYTHONPATH}"

usage (script)

The main script is main.py.

main.py [--methods [METHODS ...]] REPRESENTATION_FILES OUTPUT_FILE

For examples, see slurm (eg. mk_resultsN.sh and mk_resultsN-helper.sh). To see all options, run python main.py --help. Note that REPRESENTATION_FILES is a file containing an input file on each line. OUTPUT_FILE is a pickle dump.

main_attn.py is analogous.

usage (python)

You can also call the correlation methods directly from python. See ex.ipynb.

dir

var.py. Stuff you might want to change if you use this. eg, it has function fname2mname (filename to model name) that transforms /data/sls/temp/belinkov/contextual-corr-analysis/contextualizers/bert_large_cased/ptb_pos_dev.hdf5 to bert_large_cased-ptb_pos_dev.hdf5.
analysis. Data analysis. The results that will be presented. analysis-n analyzes the result of experiment n.
hnb. “Helper notebook.” Files in this directory are to
- help me code
- help the reader understand
the resulting .py files.

These are files containing a copy of the function with loops and co. destructured (run once with an arbitrary value, to help debugging).

It may help you understand a function.
slurm. SLURM scripts.
Run directly as SCRIPTNAME.
other. Everything else. Lots of junk.

workflow

Our pipeline is:

Generate representations (hdf5 files)
Run main.py on them
1. Loads the representations (load_representations in corr_methods.py)
2. Compute the correlations using the given methods
3. Writes them to OUTPUT_FILE.
Analyze results (the OUTPUT_FILE above) in the analysis directory

modifying

New correlation methods should extend corr_methods.Method.

gallery

SVCCA similarities (Method CCA in corr_methods.py):

CKA attention similarity (Method AttnLinCKA in attention_corr_methods.py):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

intro

usage (script)

usage (python)

dir

workflow

modifying

gallery

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
analysis		analysis
assets		assets
hnb		hnb
other		other
slurm		slurm
tests		tests
.gitignore		.gitignore
attention_corr_methods.py		attention_corr_methods.py
corr_methods.py		corr_methods.py
ex.ipynb		ex.ipynb
main.py		main.py
main_attn.py		main_attn.py
readme.org		readme.org
var.py		var.py

johnmwu/contextual-corr-analysis

Folders and files

Latest commit

History

Repository files navigation

intro

usage (script)

usage (python)

dir

workflow

modifying

gallery

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages