Linux Commands for Data Management
Linux Commands for Data Management
2023
eacikgoz@dell-node-1:~$ cd /netscratch/dep_mercier/grp_novikova/eacikgoz
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls
reference
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ mv download.20230705.14
[Link] reference/
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd reference
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ unzip
Archive: [Link]
inflating: Download_504096_File_Manifest.csv
inflating: Phytozome/Esyriacum_483_v1.[Link]
inflating: Phytozome/Esyriacum_483_v1.1.gene_exons.[Link]
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ ls Phytozome
Esyriacum_483_v1.1.gene_exons.[Link] Esyriacum_483_v1.[Link]
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ cd ..
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova$ ls
eacikgoz@dell-node-1:~$ ls /
archive boot etc [Link] lost+found mpipz proc run srv usr ……
eacikgoz@dell-node-1:~$ rm -r Elif
eacikgoz@dell-node-1:~$ ls /home/eacikgoz
eacikgoz@dell-node-1:~$ cd /netscratch/dep_mercier/grp_novikova/eacikgoz/
- cd — Change directory
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cp
/netscratch/dep_mercier/grp_novikova/[Link]/anna_g/[Link]
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls
[Link] reference
#! /bin/bash
#BSUB -J map[1-124]
…….
#! /bin/bash
#BSUB -J map[1-124]
……..
#BSUB -n 4
eacikgoz@dell-node-1:/$ ls /biodata/dep_mercier/grp_novikova/[Link]/NTbatch1
A1_NT1_1_1_FDSW202525719-1r_HKJHWDSXY_L4_1.[Link]
A1_NT1_1_1_FDSW202525719-1r_HKJHWDSXY_L4_2.[Link]
A2_NT4_2_1_FDSW202525727-1r_HKJHWDSXY_L4_1.[Link]
………
SYNOPSIS
……..
FastQC Analysis: this code worked somehow. But the following one is the correct one.
eacikgoz@dell-node-1:/$ ls /biodata/dep_mercier/grp_novikova/[Link]/NTbatch1/G1_NT2_4_1_
FDSW202525725-1r_HKJHWDSXY_L4_1.[Link]
/biodata/dep_mercier/grp_novikova/[Link]/NTbatch1/G1_NT2_4_1_FDSW202525725-1r_HKJHWDSXY_L
4_1.[Link]
eacikgoz@dell-node-1:/$ ls
archive boot etc [Link] lost+found mpipz proc run srv usr …..
FASTQC ANALYSIS
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ fastqc -t 10 -o
/netscratch/dep_mercier/grp_novikova/eacikgoz/
/biodata/dep_mercier/grp_novikova/[Link]/NTbatch1/D4_NT11_2_1_FDSW202525746-
1r_HKJHWDSXY_L4_2.[Link]
- In your directory: fastqc -t 10 -o test sonucunun kaydının yapılacağı yer__test yapılacak dosyanın bulunduğu
yer
TRANSFERRING FILES FROM LINUX TO WINDOWS
- then this file will be open and you will work here.
07.07.2023
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls
/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome
Esyriacum_483_v1.1.gene_exons.[Link] Esyriacum_483_v1.[Link]
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ gunzip
/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome/Esyriacum_483_v1.[Link]
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls
/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome
Esyriacum_483_v1.1.gene_exons.[Link] Esyriacum_483_v1.fa
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ head
/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome/Esyriacum_483_v1.fa
>Eucl_scaffold1
AACCGGTTGGGGTAGCAAAAACCTTTAAAGACAGACATTCCAAAATACAGCTTCATAAATTGGCTGTTACTCCGTTGAAGAAGTAG
AAGGCCTTGCTAAA
…….
- the parent option of mkdir helps us to create the parent directory (unless it exists) without any error, while
the touch command creates a file. Generally, the touch command doesn't put anything into the file.
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs
……….
FastQC Continue
-IRT group
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ fastqc -t 10 -o
/netscratch/dep_mercier/grp_novikova/eacikgoz/
/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/samples/IRK-ID27652
eacikgoz@dell-node-1:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023$ ls
eacikgoz@dell-node-1:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-
Z01-F001_03$ ls
eacikgoz@dell-node-1:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-
Z01-F001_03/[Link]$ ls
P5717_E P5717_F P5717_G P5717_H P5717_I P5717_J P5717_K P5717_L P5717_M P5717_N P5717_O
P5717_P P5717_Q P5717_R P5717_S P5717_T
eacikgoz@dell-node-1:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-
Z01-F001_03/[Link]$ cd P5717_M
eacikgoz@dell-node-1:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-
Z01-F001_03/[Link]/P5717_M$ ls
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls
/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-Z01-
F001_03/[Link]/P5717_P
[Link] P5717_P_EKDL230000332-1A_HMTF5DSX5_L3_1.[Link] P5717_P_EKDL230000332-
1A_HMTF5DSX5_L3_2.[Link]
12.07.2023
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls ~
profile test
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd
eacikgoz@hpcws001:~$ ls
profile test
eacikgoz@hpcws001:~$ ls ~/.profile
/home/eacikgoz/.profile
eacikgoz@hpcws001:~$ bjobs
eacikgoz@hpcws001:~$ cd /netscratch/dep_mercier/grp_novikova/[Link]/S_locus/uliana/
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/[Link]/S_locus/uliana$ ls *.sh
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/[Link]/S_locus/uliana$ cd
/netscratch/dep_mercier/grp_novikova/eacikgoz/
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd reference/
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ ls
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ cd Phytozome/
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome$ ls
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome$ cd ../..
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$gedit [Link]
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano
/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ sh ./rename_merge.sh
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ sh ./rename_merge.sh
13.07.2023
MultiQC sor
Gff file
Annotation files
14.07.2023
eacikgoz@hpcws001:~$ bjobs
eacikgoz@hpcws001:~$ bjobs
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd samtools_cov
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/samtools_cov$ ls
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/samtools_cov$ cd ../
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs
1. Mapping:
Figure 1. Mapping
/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/trimmed/BAM_11.2-1 BAM_1.1-1
/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/trimmed/BAM_12.1-1 BAM_2.1-1 ……
4401752 eacikgo RUN multicore2 [Link] [Link] map[4] Jul 17 14:52 ………………..
Script: .sh dosyasını nano’yla açtığımızda açılan kodların yazılı olduğu sayfa.
- Mappingde durumu takip etmek için bjobs kullanılır. Eğer detay vermiyorsa;
- ls *JOBID* ile takip edilir.
- watch bjobs ile de izlenir. Ctrl + C -bu sayfadan çıkış tuşu
Run olan dosyalarda sorun var mı bakmak için [Link] dosyasını açıp bakıyoruz
- cat [Link]
18.07.2023
2. Coverage:
- nano samtools_cov.sh
- ve oradaki gerekli yerleri düzelt: sample number, samples variable, accessions….
- bsub < ./samtools_cov.sh
- bjobs
Figure 2. Coverage
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs
19.07.2023
3. NQuire
Figure 3. nQuire analysis
sample_list3_biodata:
/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/trimmed/BAM_11.2-1 BAM_1.1-1
- less nquire_9_4404824_error.txt
- …sample_nquire$ ls -lah
- cat lrmodel_output_denoised.txt
- Burda sonuçlar liste olarak açılacak. Bu listeyi bilgisayarda bir dosyaya kaydet. Note’a kopyalayabilirsin.
- Listede diploid, triploidy ve tetraploid sonuçları var: Hangisine ait numara en büyükse o sonuç örneğin ploidy
seviyesini gösterir. Örnek; tetraploidy sütunundaki sayı en büyükse bitki tetraploiddir. Bu karşılaştırma R’da yapılıyor.
- Aşağıdaki R kodunu Uliana gönderdi. Burda gerekli değişiklikleri yapıyoruz. Bugünkü işlemde örnekleri kopyaladığım
dosyanın adı [Link] idi ve kodu buna göre ayarladıktan sonra R benim dosyamdaki listeye ulaşıp analiz yaptı.
setwd('/netscratch/dep_mercier/grp_novikova/[Link]/map_feb23_to_NT1/nquire')
data<-[Link]('lrmodel_output_denoised_clean1.txt', header=T)
PL<-c()
for (i in 1:dim(data)[1]){
name<-unlist(strsplit([Link](data$file[i]), split='/'))[8]
pl<-which(data[i,3:5]==max(data[i,3:5]))+1
PL<-rbind(PL, c(name, pl))
}
[Link](file='inferred_ploidy.txt', [Link](PL),
quote = F, sep = "\t",
eol = "\n", na = "NA", dec = ".", [Link] = F,
[Link] = TRUE,)
setwd('/netscratch/dep_mercier/grp_novikova/[Link]/map_feb23_to_NT1/nquire')
data<-[Link]('lrmodel_output_denoised_clean1.txt', header=T)
PL<-c()
for (i in 1:dim(data)[1]){
name<-unlist(strsplit([Link](data$file[i]), split='/'))[8]
pl<-which(data[i,3:5]==max(data[i,3:5]))+1
PL<-rbind(PL, c(name, pl))
}
[Link](file='inferred_ploidy.txt', [Link](PL),
quote = F, sep = "\t",
eol = "\n", na = "NA", dec = ".", [Link] = F,
[Link] = TRUE,)
20.07.2023
Creating a VCF (Variant Call Format) file involves representing genetic variations in a standard format used for storing
and exchanging genomic data.
21.07.2023
VCF File
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls *[Link]*
- BAM_9.[Link]
24.07.2023
Picard
25.07.2023
FastQC raporlarini analiz etmek icin calistim. Notes dosyasinda detaylar var.
26.07.2023
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ vi [Link]
Ekranin basina bunu yazmazsan calismaz. Bu mutlaka olacak ve devamina kodlarini yazacaksin.
27.07.2023
31.07.2023
Genotyping processinden sonra bir sorun var mi diye kontrol etmek icin asagidaki komurlari kullaniyoruz.
##fileformat=VCFv4.2
- bcftools - utilities for variant calling and manipulating VCFs and BCFs.
08.08.2023
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs
eacikgoz@hpcws001:~/vcftools$ ls
eacikgoz@hpcws001's password:
PS C:\Users\eacikgoz\Desktop\MPIPZ>
03.08.2023
As well as a our per variant statistics we generated earlier, we also calculated some individual metrics too. WE can
look at the distribution of these to get an idea whether some of our individuals have not sequenced or mapped as
well as others. This is good practice to do with a new dataset. A lot of these statistics can be compared to other
measures generated from the data (i.e. principal components as a measure of population structure) to see if they
drive any apparent patterns in the data.
Variant missingness
Then we plot the data with ggplot2. One thing to keep in mind here is that different datasets will likely have different
missingness profiles. RAD-sequencing data for example is likely to have a slightly higher mean missingness than
whole genome resequencing data because it is a random sample of RAD sites from each individual genome - mean-
ing it is very unlikely all individuals will share exactly the same loci (although you would hope the majority share a
subset).
First we will look at the distribution of mean depth among individuals. We read the data in with read_delim:
This is very similar to the missing data per site. Here we will focus on the fmiss column - i.e. the proportion of missing
data.
Figure 10. Statistical analysis results of proportion of missing data per individual.
a + theme_light()
04.08.2023
In genetics, it is used to visualize the affinity and similarity between two populations.
--set-missing-var-ids @:# \
braya_combined.[Link]
07.08.2023
Heterozygosity Analysis
Figure 12. Heterozygosity analysis
- Bu kod calisti, takip edicem. ‘27539’ kisminda boyutu kucuk yazdik, eger calismazsa asil boyutu yazip tekrar
denicez.
Orneklerin cogu birbirine cok yakin cikmis, bu durum bu orneklerin birbirine cok yakin genetik yapida oldugunu
gosteriyor, yani iyi bir durum.
-Fakat, kumeden cok uzakta cikan 3 tane tur var. Bunlarin genetik yapilari farkli. Bu durum yanlis isimleendirmeden
veya yanlis ekimden kaynaklanmis olabilir. Bunun olmamasi gerekirdi.
Figure 14. PCA analysis results without wrong samples.
Genetik olarak uzak cikan ornekler plottan cikarildigi zaman bu sekilde daha detayli bir goruntu elde edildi. NT
ornekleri olmasi gerektigi gibi birbirine cok yakin cikmis, fakat BAM ornekleri beklenenden uzaktalar. Bu uzaklik cok
dusuk ama yine de dikkate deger mi?
Downloading
[Link]
fb/[Link] (95kB)
Stored in directory:
/home/eacikgoz/.cache/pip/wheels/81/75/d6/e1317bf09bf1af5a30befc2a007869fa6e1f516b8f7c591cb9
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls
(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ cd ..
(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ cd ..
(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ ls
[Link]
(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ nano [Link]
(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ cd ..
(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls
5717_A IRKU048360 5
5717_B IRKU049793 5
09.08.2023
(kwip) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/software$ ls
- Dosya isimlerini buraya gore ayarladik. Listedeki isim neyse sonuna [Link] uzantili olacak sekilde aayrlandi.
BAM_1.[Link]
BAM_6.[Link]
BAM_2.[Link] .........
eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ ./[Link]
- It gives error.
PARAMETERS:
- n tables = 1 (-N)
- max tablesize = 1e+09 (-x)
Estimated memory usage is 1.0 Gb (1e+09 bytes = 1 bytes x 1e+09 entries / 1 entries per byte)
--------
making countgraph
[Link]()
self._target(*self._args, **self._kwargs)
saving NT11_1_1.[Link]
DONE.
09.08.2023
NJ Trees
Scriptteki vcf dosyasi kismini degistirip bu yeni dosyayi ekleyip tekrar calistikrdik.
- Dosya cok buyuk oldugu icin basta butun datayla figur elde edemedim, head(braya_data) komutunu kullanarak
datayi kucultup sonra denedim ve calisti.
- Butun datayla islem yapabilmek icin sonra ikinci bir yol denedim:
nj() kodu calismadi cunku dosya boyutu cok buyutku, bunu icin njs() kullandim ve calisti.
- Figure 18’deki tree butun data kullanilarak elde edildi.
15.08.2023
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source
/netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/condabin/conda
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda-env
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/activate
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/deactivate
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/fish/conf.d/[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/Conda.psm1
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/conda-hook.ps1
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/lib/python3.8/site-packages/xontrib/
[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
modified /home/eacikgoz/.bashrc
==> For changes to take effect, close and re-open your current shell. <==
eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source
/netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
^C
^C
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/condabin/conda
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda-env
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/activate
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/deactivate
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/fish/conf.d/[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/Conda.psm1
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/conda-hook.ps1
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/lib/python3.8/site-packages/xontrib/
[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
no change /home/eacikgoz/.bashrc
No action taken.
(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$
eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd pixy
eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd /home/eacikgoz/.conda/envs/pixy
eacikgoz@build-stretch:~/.conda/envs/pixy$ ls
conda-meta
eacikgoz@build-stretch:~/.conda/envs/pixy$ cd /netscratch/dep_mercier/grp_novikova/eacikgoz/
eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source
/netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
You can list all discoverable environments with `conda info --envs`.
# conda environments:
pixy /home/eacikgoz/.conda/envs/pixy
base * /netscratch/dep_mercier/grp_novikova/software/anaconda3
admix /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/admix
anna_syri /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/anna_syri
braker /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/braker
ema_env /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/ema_env
kwip /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/kwip
pixy /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/pixy
(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$
/netscratch/dep_mercier/grp_novikova/software/anaconda3
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/condabin/conda
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda-env
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/activate
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/deactivate
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/fish/conf.d/[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/Conda.psm1
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/conda-hook.ps1
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/lib/python3.8/site-packages/xontrib/
[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
no change /home/eacikgoz/.bashrc
No action taken.
remainder_args: ['/netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/pixy']
remainder_args: ['/home/eacikgoz/.conda/envs/pixy']
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/condabin/conda
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda-env
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/activate
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/deactivate
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/fish/conf.d/[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/Conda.psm1
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/conda-hook.ps1
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/lib/python3.8/site-packages/xontrib/
[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]
no change /home/eacikgoz/.bashrc
16.08.2023
pixy /home/eacikgoz/.conda/envs/pixy
/netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/kwip
base /opt/share/software/packages/miniconda3-4.12.0
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
IMPORTANT: You may need to close and restart your shell after running 'conda init'. -problem
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
- bash...................
usage: conda init [-h] [--all] [--user] [--no-user] [--system] [--reverse] [--json] [-v] [-q] [-d] [SHELLS ...]
conda init: error: argument SHELLS: invalid choice: 'pixy' (choose from 'bash', 'fish', 'tcsh', 'xonsh', 'zsh',
'powershell')
no change /opt/share/software/packages/miniconda3-4.12.0/condabin/conda.................
No action taken.
- Ben conda activate bash yapiyordum once, Mehmet conda activate pixy yapti ve duzeldi.
- Vim’e ilk girdiginde kilitlidir, acip degisilik yapmak icin Shift + i yapilir.
- Vim’de istedigin gibi degisiklik yaapbilirsin mouse’la gitmek istedigin yere tiklaman yeterli
- Cikis yapmak icin
- hic degisiklik yapmadiysan direkt cikmak icin :q
- degisiklik yaptiysan ve kaydedeceksen :wq
- degisiklik yaptin ve kaydetmeyeceksen :q!
- Sayfanin basina gitmek icin iki kez tek tirnak tusuna bas Shift + ‘’
- Sayfanin sonuna gitmek icin Shift + g
- Bulundugumuz yerden itibaren tum satiri silmek icin Shift + d (veya direkt D)
- Kes dd
- Kopyala yy
- Yapistir bulundugum satirdan sonraya p, onceye P
- Coklu silme icin kac satir silinecekse belirtilir 5+dd, p ile tekrar yapistirilabilir
PIXY ANALYSIS
- pixy is a command-line tool for painlessly and correctly estimating average nucleotide diversity within (π) and
between (dxy) populations from a VCF. In particular, pixy facilitates the use of VCFs containing invariant (AKA
monomorphic) sites, which are essential for the correct computation of π and dxy in the face of missing data
(i.e. always).
Pi within 500 kb yapilacak
Coverage-pi gosteren 2 histogram yapilacak
- Pi within ve pi between??
- Fst test shows the differentiation
- Pi for populatin
- Pi for nt11 and bam+IRK without outliers
Figure 19. Pixy analysis for triploid NT samples and tetraploid BAM and IRKU samples to see nucleotide diversity.
Pixy icin ornek listesi bu sekilde hazirlaniyor; orneklerin isimleri ve genel isimleri.
- Bir dosya aratirken dosyanin ismi iki kelimeden olusan bosluklu bir isimse onu tirnak icinde yasmak gerekir
yoksa too much argument uyarisi verir.
- mkdir ile yeni dosya olustururken de iki isim olacaksa bunu tirnak icinde yazmak gerekir cunku bosluklu bir isim
yazarken iki farkli dosya acar.
- rm -r bir dosyanin ici dolu olsa bile silmek istedigimizde bunu kullaniyoruz
- Bir dosyada kelime aratmak icin; cat pixy_4424775_1_output.txt | grep 'error'
17.08.2023
Pixy yine calismadi onunla son ilgilenicem cunku ilk error installing ile ilgiliydi ve tekrar indirince sorun cozuldu. Ama
simdi asil problem vcf dosyalariyla ilgili. Vcf dosyalarini okuyamadigiyla ilgili bir error verdi cunku okumasi gereken
dosya mevcut degil.
- Bunun icin combining processini tekrar baslattim cunku combined dosyalar olusmamis cunku islem hata vermis.
- Tetre ve Hexa icin tekrar combining yapilacak.
- Dosyalarin isimlendirilmeleri onemli. Ben [Link] yapmistim ama [Link] olmasi gerekiyormus. Combining islemi
bittikten sonraq convert edilecekler.
24.08.2023
- Pixy analizi icin [Link] dosyasiya ihtiyac vardir. Bunun icin once dosyalari convert ettik.
Figure 20. Combining triploid and tetraploid files. [Link]
- Pixy analysis icin triploid ve tetraploid ornekelrimiz var, bunun icin iki dosyayi birlestirip tek analiz yapacagiz.
- Some type of analysis is better to do with samples called as diploids. For other stuff we need to know ploidy
and call variations based on nquire. Calling as hexaploids we decided not to use, because if we do that we need
to call our tera BAMs as octaploids, which is too much. So, we will need to merge tetra and triploid called
samples and use these for pixy.
28.08.2023
- Didn’t work.
31.08.2023
Pixy calismiyor. Sorunun vcf dosyasindan kaynaklanma ihtimaline karsi vcf dosyalarini tekrar olusturduk. Orneklerin
bazilari triploid, bazilari tetraploid. Bunun icin vcf dosyasini olustururken orneklere ait tri ve tetra [Link] dosyalarini
kullandik.
- Bu islemi yaparken script jar file ve module not found uyarisi verdi. Bunun icin mapping scriptine gidp yazan
module’ler ayni mi diye baktik.
- Farkliydi ve asagidaki figurdeki gibi mapping scriptinde sadece module kismini tekrar calistirdik ve ardindan
ordaki modulu kopyalayip convert scriptindeki yere yapistirip tekrar calistirdik ve sorun cozuldu.
Figure 21. Module ve jar file errorunu duzeltmek icin modulu tekrar calistirdik.
06.06.2023
Interpretation of Results
Pixy analysis is valuable in the field of population genetics and evolutionary biology for several reasons. Here are
some of the key reasons why researchers might use Pixy analysis:
Estimation of Nucleotide Diversity and Divergence: Pixy is specifically designed to estimate nucleotide diversity (π)
within populations and nucleotide divergence (D) between populations or groups of genetic sequences. These
measures provide essential insights into genetic variation and differentiation, which are fundamental for
understanding evolutionary processes.
Handling Missing Data: One of the significant advantages of Pixy is its ability to handle missing data effectively.
Genetic data often contain gaps or missing information due to incomplete sequencing or data quality issues. Pixy
uses a likelihood-based approach to account for missing data, reducing the potential bias in diversity and divergence
estimates.
Unbiased Estimations: Pixy employs a statistical framework based on coalescent theory and likelihood inference. This
approach helps provide unbiased estimates of genetic diversity and divergence, which is critical for accurate and
reliable population genetic analysis.
Understanding pixy output
pixy outputs a slightly different file type for each summary statistic it calculates. The contents of the columns of
these output files are detailed below.
avg_pi - Average per site nucleotide diversity for the window. More specifically, pixy computes the weighted aver-
age nucleotide diversity per site for all sites in the window, where the weights are determined by the number of
genotyped samples at each site.
no_sites - The total number of sites in the window that have at least one valid genotype. This statistic is included for
the user, and not directly used in any calculations.
count_diffs - The raw number of pairwise differences between all genotypes in the window. This is the numerator of
avg_pi.
count_comparisons - The raw number of non-missing pairwise comparisons between all genotypes in the window
(i.e. cases where two genotypes were compared and both were valid). This is the denominator of avg_pi.
count_missing - The raw number of missing pairwise comparisons between all genotypes in the window (i.e. cases
where two genotypes were compared and at least one was missing).
count_diffs - The raw number of pairwise, cross-population differences between all genotypes. This is the numerator
of avg_dxy.
count_comparisons - The raw number of non-missing pairwise cross-population comparisons between all genotypes
in the window (i.e. cases where two genotypes were compared and both were valid). This is the denominator of
avg_dxy.
count_missing - The raw number of missing pairwise cross-population comparisons between all genotypes in the
window (i.e. cases where two genotypes were compared and at least one was missing). This statistic is included for
the user, and not directly used in any calculations.
Figure 23. Plotting of pixy analysis results. Only pi and this graph contains outliers.
- Now we will fix the list and run pixy again with both dxy and fts.
07.09.2023
Scaffolding: After sequencing these smaller segments, the challenge lies in assembling them into the correct order to
reconstruct the entire genome. This is where scaffolds come into play. A scaffold is a hypothetical or provisional
framework that represents the relative positions and orientations of the sequenced segments within the genome.
- Pixy analizi sonucu pi, dxy ve fst dosyaalri olustu. Bu dosyalari kullanarak R’da grafikler cizildi.
library(ggplot2)
#Pi
unique(pi$chromosome)
pi1<-pi[pi$chromosome=='Eucl_scaffold1',]
#calculate average pi for both of them separately, for all scaffolds separately
#dxy between
unique(dxy$chromosome)
popd1=dxy$pop1
popd2=dxy$pop2
theme_minimal()
#Fst
fst<-[Link]("braya_tet_1_win500000_pops_fst.txt",sep="\t",header=T)
popf1=fst$pop1
popf2=fst$pop2
unique(fst$chromosome)
fst1<-fst[fst$chromosome=='Eucl_scaffold1',]
labs(color = "Population") +
theme_minimal()
- Herbir scaffold icin average hesaplanacak, bunu donguye olusturup yapabilirim.