Skip to content

PacificBiosciences/owl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Owl

Microsatellite instability (MSI) analysis for HiFi data.

Authors: Zev Kronenberg, Khi Pin Chua, Egor Dolzhenko, Mark Chaisson, Byunggil Yoo, Midhat Farooqi, Mike Eberle

Pre-built binaries

Static Linux Releases

Building

 cd owl
 cargo build --release

Requirements

For best results please ensure that your HiFi reads are phased with HiPhase, or WhatsHap. The code will only warn if phasing is missing. Un-phased data/loci will result in falsely elevated levels of MSI.

Running

Owl is provided as a stand alone tool and integrated into the HiFi Somatic Workflow. For a more comprehensive cancer analysis use the HiFi Somatic Workflow.

Command line stand alone steps:

step one, profile the repeats

 # profile the repeats.
 owl profile --bam NA12878.haplotagged.bam --regions data/GRCh38_owl_markers.bed.gz --sample NA12878 > NA12878.results.txt

step two, summarize and score sample(s)

# run the scoring step.
owl score --file NA12878.results.txt --prefix NA12878

Output

After running owl score there are four output files with the main two {prefix}.owl-motif-counts.txt and {prefix}.owl-scores.txt. The score file provides a summary MSI score for each sample, whereas the motif file breaks down the score by motif.

sample #high #low %high #phased %phased #sites #passing %passing qc
NA12878 15 758 1.94 378 75.60 500 399 79.80 pass

High and low are counts of haplotypes (multiple per loci) with high vs. low coefficient of variation (CV). %high is the proportion of loci with high CV (our primary MSI metric). QC reflects data completeness: it reports the percentage of sites with reliable measurements (%passing), and the qc column labels each sample pass or fail based on that percentage.

Summary of motif output

motif #high #low %high
TAGGAC 0 0 0.00
... ... ... ...

The motif file contains the same information, but summarizes the information across motifs. If multiple samples are scored together, the motif stats are merged.

Changelog

  • v0.4.0 -- Dec 5 2025
    • Add info field, and haplotype PS tag to the score output.
    • Bug fixes, and changes to account for file format change.
    • Add phase output files
  • v0.3.0 -- Sept 22 2025
    • Support bgzip bed files
    • Warn if owl profile does not encounter a phased region.
    • Switch polarity of un-phased read filter, keep unphased reads unless region contains phased reads.
    • Update score report to include phasing information.
    • Fix panic on failed BAM open.
  • v0.2.1 -- Sept 3 2025
    • Fix duplicate header field
  • v0.2.0 -- August 28 2025
    • Add QC metric to reporting
  • v0.1.3 -- August 26 2025
    • Fix memory reporting bug
  • v0.1.2 -- August 21 2025
    • Initial release to github

Questions, Comments, Feedback:

Please feel free to file a ticket at: PacBio GitHub Issues

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •