Skip to content

LTLA/PCSelection2018

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Discussion of PC selection methods for scRNA-seq data

This repository contains some scripts to assess different methods of choosing the number of PCs to retain. The text directory contains LaTeX files for the report, a compiled PDF of which can be found here. The simulations directory contains R scripts for performing the basic simulations:

  • functions.R, a central R script containing definitions of useful functions for the simulations.
  • sim_gaussclust.R, a template for simulations of clusters with Gaussian noise.
  • sim_trajectory.R, a tempalte for simulations of trajectories between multiple nodes.
  • submitter.sh, a Bash script for SLURM job submission of the simulations.
  • plot_results.R, an R script to generate the plots.
  • simulate_noise.R, an R script examining the effect of removing biological noise.

The real directory contains R scripts for performing the real data-based simulations:

  • proc_kolod.R, an R script for pre-processing the mESC data set.
  • proc_pbmc4k.R, an R script for pre-processing the PBMC data set.
  • run_kolod.R, a template for performing simulations based on the mESC data set.
  • run_pbmc4k.R, a template for performing simulations based on the PBMC data set.
  • submitter.sh, a Bash script for SLURM job submission of the simulations.
  • plot_results.R, an R script to generate the plots.

In addition, batching/batching.Rmd contains an example of how batch removal in the presence of zeroes can distort the PCA results.

About

Some comments on how to determine the number of PCs to retain.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published