Authors: Zev Kronenberg, Khi Pin Chua, Egor Dolzhenko, Mark Chaisson, Byunggil Yoo, Midhat Farooqi, Mike Eberle
cd owl
cargo build --release
For best results please ensure that your HiFi reads are phased with HiPhase, or WhatsHap. The code will only warn if phasing is missing. Un-phased data/loci will result in falsely elevated levels of MSI.
Owl is provided as a stand alone tool and integrated into the HiFi Somatic Workflow. For a more comprehensive cancer analysis use the HiFi Somatic Workflow.
# profile the repeats.
owl profile --bam NA12878.haplotagged.bam --regions data/GRCh38_owl_markers.bed.gz --sample NA12878 > NA12878.results.txt
# run the scoring step.
owl score --file NA12878.results.txt --prefix NA12878
After running owl score there are four output files with the main two {prefix}.owl-motif-counts.txt and {prefix}.owl-scores.txt. The score file provides a summary MSI score for each sample, whereas the motif file breaks down the score by motif.
| sample | #high | #low | %high | #phased | %phased | #sites | #passing | %passing | qc |
|---|---|---|---|---|---|---|---|---|---|
| NA12878 | 15 | 758 | 1.94 | 378 | 75.60 | 500 | 399 | 79.80 | pass |
| … | … | … | … | … | … | … | … | … | … |
High and low are counts of haplotypes (multiple per loci) with high vs. low coefficient of variation (CV). %high is the proportion of loci with high CV (our primary MSI metric). QC reflects data completeness: it reports the percentage of sites with reliable measurements (%passing), and the qc column labels each sample pass or fail based on that percentage.
| motif | #high | #low | %high |
|---|---|---|---|
| TAGGAC | 0 | 0 | 0.00 |
| ... | ... | ... | ... |
The motif file contains the same information, but summarizes the information across motifs. If multiple samples are scored together, the motif stats are merged.
- v0.4.0 -- Dec 5 2025
- Add info field, and haplotype PS tag to the score output.
- Bug fixes, and changes to account for file format change.
- Add phase output files
- v0.3.0 -- Sept 22 2025
- Support bgzip bed files
- Warn if
owl profiledoes not encounter a phased region. - Switch polarity of un-phased read filter, keep unphased reads unless region contains phased reads.
- Update score report to include phasing information.
- Fix panic on failed BAM open.
- v0.2.1 -- Sept 3 2025
- Fix duplicate header field
- v0.2.0 -- August 28 2025
- Add QC metric to reporting
- v0.1.3 -- August 26 2025
- Fix memory reporting bug
- v0.1.2 -- August 21 2025
- Initial release to github
Please feel free to file a ticket at: PacBio GitHub Issues