We provide the Python scripts used to analyze the AIRR sequencing data from the study High frequency of shared clonotypes in human T cell receptor repertoires. Each directory contains a README file that describes the file format for the data. We also provide example data that can be used with each script. The example data is meant for illustration purposes only. These scripts, with the exception of the MongoDB subsampling script, were run on a machine which has 48 cores and 64 GB of RAM.
We used the program IGoR to create synthetic repertoires for TCRβ chains. The synthetic repertoires can be downloaded here. Note that the simHIP1 data corresponds to all files with the prefix tcr_beta_synrep_set1, simHIP2 data corresponds to all files with the prefix tcr_beta_synrep_set2 and simHIP3 data corresponds to all the files with prefix tcr_beta_synrep_set3.