0% found this document useful (0 votes)
74 views52 pages

Linux Commands for Data Management

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views52 pages

Linux Commands for Data Management

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd

05.07.

2023

eacikgoz@dell-node-1:~$ cd /netscratch/dep_mercier/grp_novikova/eacikgoz

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ mkdir refere


nce

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls

reference

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ curl --cookie jgi_sessi


on=/api/sessions/a93346ae7a07badb41eab8d4e65a4a9c --output [Link] -d
"{\"ids\":{\"Phytozome-483\":[\"59a5da307ded5e41edd8e7a1\",\"59a5da337ded5e41edd8e7ac\"]}}"
-H "Content-Type: application/json" [Link]

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 .4M 0 68.4M 100 81 10.4M 12 [Link] [Link] --:--:-- 14.4M

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ mv download.20230705.14
[Link] reference/

- mv__hedef dosya__taşınacak yer

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd reference

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ unzip

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ unzip downloa


[Link]

Archive: [Link]

inflating: Download_504096_File_Manifest.csv

inflating: Phytozome/Esyriacum_483_v1.[Link]

inflating: Phytozome/Esyriacum_483_v1.1.gene_exons.[Link]

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ ls Phytozome

Esyriacum_483_v1.1.gene_exons.[Link] Esyriacum_483_v1.[Link]

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ cd ..

- Go back from current directory

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova$ ls

Adansonia [Link] mvasilarou RepeatMasker_libraries

alison Cyprinidae Neobatrachus scripts ………

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova$ pwd /netscratch/dep_mercier/grp_novikova


- pwd — Print working directory

eacikgoz@dell-node-1:~$ ls /

archive boot etc [Link] lost+found mpipz proc run srv usr ……

- Tüm dosyaları gösteriyor

eacikgoz@dell-node-1:~$ rm -r Elif

eacikgoz@dell-node-1:~$ ls /home/eacikgoz

eacikgoz@dell-node-1:~$ cd /netscratch/dep_mercier/grp_novikova/eacikgoz/

- cd — Change directory

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cp
/netscratch/dep_mercier/grp_novikova/[Link]/anna_g/[Link]

- Copy files and directories


- Kopyalamak istediğin directorye git, sonra cp ile kopyalamak istediğin dosyanın path’ini yaz

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls

[Link] reference

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ head [Link]

#! /bin/bash

#BSUB -J map[1-124]

…….

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cat [Link]

#! /bin/bash

#BSUB -J map[1-124]

……..

echo "FINISHED JOB"

# Check output during run with bpeek JOBID

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ less [Link]

#BSUB -M 40000 -R "rusage[mem=40000] "

#BSUB -n 4

#BSUB -o pi_%I_%J_output.txt …….

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ tail [Link]

# Check status with bjobs

# Check output during run with bpeek JOBID

eacikgoz@dell-node-1:/$ ls /biodata/dep_mercier/grp_novikova/[Link]/NTbatch1

A1_NT1_1_1_FDSW202525719-1r_HKJHWDSXY_L4_1.[Link]
A1_NT1_1_1_FDSW202525719-1r_HKJHWDSXY_L4_2.[Link]

A2_NT4_2_1_FDSW202525727-1r_HKJHWDSXY_L4_1.[Link]

………

eacikgoz@dell-node-1:/$ fastqc --help

FastQC - A high throughput sequence QC analysis tool

SYNOPSIS

fastqc seqfile1 seqfile2 .. seqfileN

fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam]

[-c contaminant file] seqfile1 .. seqfileN

……..

FastQC Analysis: this code worked somehow. But the following one is the correct one.

eacikgoz@dell-node-1:/$ fastqc /biodata/dep_mercier/grp_novikova/[Link]/NTbatch1/G1_NT2_4


_1_FDSW202525725-1r_HKJHWDSXY_L4_1.[Link]

Started analysis of G1_NT2_4_1_FDSW202525725-1r_HKJHWDSXY_L4_1.[Link]

Approx 95% complete for G1_NT2_4_1_FDSW202525725-1r_HKJHWDSXY_L4_1.[Link]

Analysis complete for G1_NT2_4_1_FDSW202525725-1r_HKJHWDSXY_L4_1.[Link]

eacikgoz@dell-node-1:/$ ls /biodata/dep_mercier/grp_novikova/[Link]/NTbatch1/G1_NT2_4_1_
FDSW202525725-1r_HKJHWDSXY_L4_1.[Link]

/biodata/dep_mercier/grp_novikova/[Link]/NTbatch1/G1_NT2_4_1_FDSW202525725-1r_HKJHWDSXY_L
4_1.[Link]

eacikgoz@dell-node-1:/$ ls

archive boot etc [Link] lost+found mpipz proc run srv usr …..

Skipping 'quality_control' which didn't exist, or couldn't be read

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ mkdir quality_control

FASTQC ANALYSIS

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ fastqc -t 10 -o
/netscratch/dep_mercier/grp_novikova/eacikgoz/
/biodata/dep_mercier/grp_novikova/[Link]/NTbatch1/D4_NT11_2_1_FDSW202525746-
1r_HKJHWDSXY_L4_2.[Link]

- In your directory: fastqc -t 10 -o test sonucunun kaydının yapılacağı yer__test yapılacak dosyanın bulunduğu
yer
TRANSFERRING FILES FROM LINUX TO WINDOWS

Komut istemi penceresini aç:

C:\Users\HP>cd C:\Program Files\PuTTY

C:\Program Files\PuTTY>scp eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz/*.html C:\


Users\HP\Desktop\MPIPZ

Code: scp username@linux-machine-ip:/path/to/[Link] C:\path\to\destination

OPEN A FILE AND TRIMMING

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano [Link]

- then this file will be open and you will work here.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent

permitted by applicable law.

07.07.2023

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls
/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome

Esyriacum_483_v1.1.gene_exons.[Link] Esyriacum_483_v1.[Link]

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ gunzip
/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome/Esyriacum_483_v1.[Link]

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls
/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome

Esyriacum_483_v1.1.gene_exons.[Link] Esyriacum_483_v1.fa

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ head
/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome/Esyriacum_483_v1.fa

>Eucl_scaffold1

AACCGGTTGGGGTAGCAAAAACCTTTAAAGACAGACATTCCAAAATACAGCTTCATAAATTGGCTGTTACTCCGTTGAAGAAGTAG
AAGGCCTTGCTAAA

…….

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano [Link]

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ touch sample_list

- the parent option of mkdir helps us to create the parent directory (unless it exists) without any error, while
the touch command creates a file. Generally, the touch command doesn't put anything into the file.

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano [Link]


eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub < ./[Link]

Job <4381560> is submitted to queue <multicore20>.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME

4381560 eacikgo PEND multicore2 [Link] map[1] Jul 7 16:08

4381560 eacikgo PEND multicore2 [Link] map[2] Jul 7 16:08

……….

FastQC Continue

-IRT group

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$
eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ fastqc -t 10 -o
/netscratch/dep_mercier/grp_novikova/eacikgoz/
/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/samples/IRK-ID27652

-bash: eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$: No such file or directory

eacikgoz@dell-node-1:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023$ ls

sample_accessions.tsv samples trimmed X204SC22124444-Z01-F001_02 X204SC22124444-Z01-F001_03


X204SC22124444-Z01-F001_04 X204SC22124444-Z01-F001_05

eacikgoz@dell-node-1:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-
Z01-F001_03$ ls

[Link] [Link] [Link] [Link]

eacikgoz@dell-node-1:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-
Z01-F001_03/[Link]$ ls

P5717_E P5717_F P5717_G P5717_H P5717_I P5717_J P5717_K P5717_L P5717_M P5717_N P5717_O
P5717_P P5717_Q P5717_R P5717_S P5717_T

eacikgoz@dell-node-1:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-
Z01-F001_03/[Link]$ cd P5717_M

eacikgoz@dell-node-1:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-
Z01-F001_03/[Link]/P5717_M$ ls

[Link] P5717_M_EKDL230000332-1A_HMTF5DSX5_L3_1.[Link] P5717_M_EKDL230000332-


1A_HMTF5DSX5_L3_2.[Link]

eacikgoz@dell-node-1: /netscratch/dep_mercier/grp_novikova/eacikgoz$ fastqc -t 10 -o


/netscratch/dep_mercier/grp_novikova/eacikgoz/
/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-Z01-
F001_03/[Link]/P5717_M/P5717_M_EKDL230000332-1A_HMTF5DSX5_L3_1.[Link]

Started analysis of P5717_M_EKDL230000332-1A_HMTF5DSX5_L3_1.[Link] ……..

eacikgoz@dell-node-1:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls
/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/X204SC22124444-Z01-
F001_03/[Link]/P5717_P
[Link] P5717_P_EKDL230000332-1A_HMTF5DSX5_L3_1.[Link] P5717_P_EKDL230000332-
1A_HMTF5DSX5_L3_2.[Link]

12.07.2023

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano .profile

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub ~/.profile

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ less ~/.profile

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano .profile

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ less .profile

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source .profile

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs

No unfinished job found

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls ~

- bu şekilde de /home/’daki dosyalar görülüyor

profile test

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ mv .profile ~/.profile

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd

eacikgoz@hpcws001:~$ ls

profile test

eacikgoz@hpcws001:~$ ls ~/.profile

/home/eacikgoz/.profile

eacikgoz@hpcws001:~$ source ~/.profile

eacikgoz@hpcws001:~$ bjobs

No unfinished job found

eacikgoz@hpcws001:~$ cd /netscratch/dep_mercier/grp_novikova/[Link]/S_locus/uliana/

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/[Link]/S_locus/uliana$ ls *.sh

CombineGVCFs_IRK.sh RunBlast_lyrpet.sh genotypSCR_kamchatica.sh make_bed.sh


order_readcont.sh

CombineGVCFs_all_sep22.sh [Link] genotyp_SCR_TE4.3_2.sh make_bed_NT1_final.sh


[Link] …………………….

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/[Link]/S_locus/uliana$ cd
/netscratch/dep_mercier/grp_novikova/eacikgoz/

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd reference/
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ ls

Download_504096_File_Manifest.csv Phytozome [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference$ cd Phytozome/

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome$ ls

Esyriacum_483_v1.1.gene_exons.[Link] Esyriacum_483_v1.fa Esyriacum_483_v1.[Link] Esyriacum_483_v1.[Link]


Esyriacum_483_v1.[Link] Esyriacum_483_v1.[Link] Esyriacum_483_v1.[Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome$ cd ../..

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ gedıt [Link]

-bash: $'ged\304\261t': command not found

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$gedit [Link]

-bash: $'ged\304i': command not found

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano
/netscratch/dep_mercier/grp_novikova/eacikgoz/reference/Phytozome

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ less sample_list

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cp sample_list list_rename

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ vim list_rename

- vim [Link]: to control file

-bash: $'v\304vi': command not found

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ less list_rename

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano rename_merge.sh

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano list_rename

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano rename_merge.sh

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ sh ./rename_merge.sh

./rename_merge.sh: 9: ./rename_merge.sh: Syntax error: end of file unexpected (expecting "done")

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano rename_merge.sh

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ sh ./rename_merge.sh

sed: -e expression #1, char 1: missing command

13.07.2023

Operations applied to samples


1. FastQC Analysis: aims to provide a simple way to do some quality control checks on raw sequence data
coming from high throughput sequencing pipelines.
2. Read trimming: assists with the read mapping by removing adapter sequences and low-sequencing-quality
bases.
3. Mapping: is the process of comparing each one of the reads with the reference genome.
4. Samtool:
5. Calculation:

MultiQC sor

Gff file

Annotation files

14.07.2023

eacikgoz@hpcws001:~$ bjobs

-bash: bjobs: command not found

eacikgoz@hpcws001:~$ source ~/.profile

eacikgoz@hpcws001:~$ bjobs

No unfinished job found

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd samtools_cov

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/samtools_cov$ ls

NT11_1_1_cov.txt NT11_2_1_cov.txt NT11_3_1_cov.txt NT11_5_1_cov.txt _cov.txt

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/samtools_cov$ cd ../

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ less samtools_cov.sh

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ fast < ./samtools_cov.sh

Job <4388761> is submitted to queue <multicore20>.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME

4388761 eacikgo RUN multicore2 [Link] [Link] cov[2] Jul 14 15:42

4388761 eacikgo RUN multicore2 [Link] [Link] cov[4] Jul 14 15:42

4388761 eacikgo RUN multicore2 [Link] [Link] cov[1] Jul 14 15:42

4388761 eacikgo RUN multicore2 [Link] [Link] cov[3] Jul 14 15:42


17.07.2023

bkill JOBID -to stop running the code in mapping

1. Mapping:

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano [Link]

Figure 1. Mapping

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cat sample_list3_biodata

/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/trimmed/BAM_11.2-1 BAM_1.1-1

/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/trimmed/BAM_12.1-1 BAM_2.1-1 ……

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ watch bjobs

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME


4401752 eacikgo RUN multicore2 [Link] [Link] map[2] Jul 17 14:52

4401752 eacikgo RUN multicore2 [Link] [Link] map[4] Jul 17 14:52 ………………..

samtools: estimate coverage

.sam: whole genome file

.bam: less space; contains same info with .sam

[Link]: creating sam files

[Link]: using .sam files to estimate coverage

Script: .sh dosyasını nano’yla açtığımızda açılan kodların yazılı olduğu sayfa.

- Mappingde durumu takip etmek için bjobs kullanılır. Eğer detay vermiyorsa;
- ls *JOBID* ile takip edilir.
- watch bjobs ile de izlenir. Ctrl + C -bu sayfadan çıkış tuşu

Run olan dosyalarda sorun var mı bakmak için [Link] dosyasını açıp bakıyoruz

- cat [Link]

--eğer sorun varsa script’e dönüp sorunu bulmaya çalışıyoruz

--düzelttikten sonra tekrar bsub ile submitliyoruz.

18.07.2023

- Mapping durumunu bjobs ile kontrol et


- less map_5_4401752_error.txt, bununla dosyaların içini error var mı diye kontrol et
- ls -lah ile tüm dosyaları göreceksin. Herhangi birinde error var mı diye kontrol et.
- Her şey yolundaysa Coverage aşamasına geçilir.

2. Coverage:

Mapping’den sonra coverage kısmına geçilir.

- nano samtools_cov.sh
- ve oradaki gerekli yerleri düzelt: sample number, samples variable, accessions….
- bsub < ./samtools_cov.sh
- bjobs
Figure 2. Coverage

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub < ./samtools_cov.sh

Job <4404644> is submitted to queue <multicore20>.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME

4404644 eacikgo PEND multicore2 [Link] cov[1] Jul 18 11:07

4404644 eacikgo PEND multicore2 [Link] cov[2] Jul 18 11:07 …………

19.07.2023

3. NQuire
Figure 3. nQuire analysis

nQuire başka bir işlem (AÇIKLAMA YAZ)

- İşlem verilerinin kaydedilmesi için yeni bir dosya açtım


mkdir sample_nquire
- nano [Link]
burada sample kısmına örneklerimin olduğu dosyayı yazdım: sample_list3_biodata
inputu kendi path’imi ekledim
output kısmına nquire dosyamı ekledim: sample_nquire
- Burada örnek listesini olusturuken tum pathin yazilmasi gerekiyor
- bsub < ./[Link] - bjobs

sample_list3_biodata:

/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/trimmed/BAM_11.2-1 BAM_1.1-1

/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023/trimmed/BAM_12.1-1 BAM_2.1-1 ......

To see results: error dosyalarından birini seçip kontrol et

- less nquire_9_4404824_error.txt
- …sample_nquire$ ls -lah

Ana dosya: lrmodel_output_denoised.txt

- cat lrmodel_output_denoised.txt

- Burda sonuçlar liste olarak açılacak. Bu listeyi bilgisayarda bir dosyaya kaydet. Note’a kopyalayabilirsin.

- Gerekli düzenlemeleri yaptıktan sonra R’da okutucaz.

- Listede diploid, triploidy ve tetraploid sonuçları var: Hangisine ait numara en büyükse o sonuç örneğin ploidy
seviyesini gösterir. Örnek; tetraploidy sütunundaki sayı en büyükse bitki tetraploiddir. Bu karşılaştırma R’da yapılıyor.
- Aşağıdaki R kodunu Uliana gönderdi. Burda gerekli değişiklikleri yapıyoruz. Bugünkü işlemde örnekleri kopyaladığım
dosyanın adı [Link] idi ve kodu buna göre ayarladıktan sonra R benim dosyamdaki listeye ulaşıp analiz yaptı.

setwd('/netscratch/dep_mercier/grp_novikova/[Link]/map_feb23_to_NT1/nquire')
data<-[Link]('lrmodel_output_denoised_clean1.txt', header=T)
PL<-c()
for (i in 1:dim(data)[1]){
name<-unlist(strsplit([Link](data$file[i]), split='/'))[8]
pl<-which(data[i,3:5]==max(data[i,3:5]))+1
PL<-rbind(PL, c(name, pl))
}
[Link](file='inferred_ploidy.txt', [Link](PL),
quote = F, sep = "\t",
eol = "\n", na = "NA", dec = ".", [Link] = F,
[Link] = TRUE,)

setwd('/netscratch/dep_mercier/grp_novikova/[Link]/map_feb23_to_NT1/nquire')
data<-[Link]('lrmodel_output_denoised_clean1.txt', header=T)
PL<-c()
for (i in 1:dim(data)[1]){
name<-unlist(strsplit([Link](data$file[i]), split='/'))[8]
pl<-which(data[i,3:5]==max(data[i,3:5]))+1
PL<-rbind(PL, c(name, pl))
}
[Link](file='inferred_ploidy.txt', [Link](PL),
quote = F, sep = "\t",
eol = "\n", na = "NA", dec = ".", [Link] = F,
[Link] = TRUE,)

Figure 4. R part of the nQuire analysis


Sonuç:

20.07.2023

- nQuire islemini bugun NT`ler icin de yaptik.


- Listedeki ornek isimlerini path seklinde yazmadigim icin kod clismadi. Asagidaki gibi duzeltince calisti.

Creating VFC File

Creating a VCF (Variant Call Format) file involves representing genetic variations in a standard format used for storing
and exchanging genomic data.

Creating bashrc profile

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano ~/.bashrc

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source ~/.bashrc

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ module load picard/2.23.5

- Bu module kismini nano [Link]’tan aldik.


Figure 6. Creating vcf file by using [Link]
Vfc icin liste bu sekilde yapiliyor. Sadece ornek isimleri.

- Sonucunda accession boyle olacak: [Link]

21.07.2023

Cov and nquire output ve error dosyalarini sil

VCF File

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub < [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls *[Link]*

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls -lah *[Link]*

- To check the situation of the error and output files.

Figure 6’daki script ile Gvcf dosyalarini olusturduk

- BAM_9.[Link]

Sonraki asama bunlari combine etmek

24.07.2023

Picard

25.07.2023

FastQC raporlarini analiz etmek icin calistim. Notes dosyasinda detaylar var.

26.07.2023

Vcf dosyalari hazir olduktan sonra bunlari combine etmemiz gerekiyor.


Figure 7. Combining gvcf files to have Vcf File

Uliana’dan aldigim scripti kullandim.

Creating a new script file as [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ vi [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano [Link]

Ekranin basina bunu yazmazsan calismaz. Bu mutlaka olacak ve devamina kodlarini yazacaksin.
27.07.2023

Combination bittikten sonra, genotyping processi basliyor.

Figure 8. Genotyping with vcf files

31.07.2023

Genotyping processinden sonra bir sorun var mi diye kontrol etmek icin asagidaki komurlari kullaniyoruz.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bcftools view braya_combined.[Link] | head

##fileformat=VCFv4.2

##FILTER=<ID=PASS,Description="All filters passed"................

- bcftools - utilities for variant calling and manipulating VCFs and BCFs.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bcftools view -H braya_combined.[Link] |


head

Eucl_scaffold1 1 . A . 9.09 LowQual DP=1 GT:AD:DP ./.:1,0:1 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0


./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0
./.:0,0:0 ./.:0,0:0 ./.:0,0:0

Eucl_scaffold1 2 . A . 9.09 LowQual DP=1 GT:AD:DP ./.:1,0:1 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0


./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bcftools view -H braya_combined.[Link] |


grep '0/0' | head

Eucl_scaffold1 5 . G . 11.94 . AN=2;DP=1 GT:AD:DP:RGQ


0/0:1,[Link] ./.:0,[Link]. ./.:0,[Link]. ./.:0,[Link]. ./.:0,[Link]. ./.:0,[Link]. ./.:0,[Link]. ./.:0,[Link]. ./.:0,[Link]. ./.:0,0:0
:. ./.:0,[Link]. ./.:0,[Link]. ./.:0,[Link]. ./.:0,[Link]. .
eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bcftools view -H braya_combined.[Link] |
grep '1/1' | head

Eucl_scaffold1 563 . TA T 30.03 .


AC=2;AF=1;AN=2;DP=1;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;QD=30.03;SOR=1.609
GT:AD:DP:GQ:PL ./.:0,[Link].:0,0,0 ./.:0,[Link].:0,0,0 ./.:0,[Link].:0,0,0 ./.:0,[Link].:0,0,0 ./.:0,[Link].:0,0,0 ./.:0,0
:0:.:0,0,0 ./.:0,[Link].:0

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ samtools view BAM_1.1-


[Link] | head

A0[Link]HMTF5DSX[Link]5 99 Eucl_scaffold1 46 0 87S20M1I18M24S = 200 176


AATCAAAACCACTAAAATCTAGACTTAGCCACAAACAGGTTAGGGTGACAAAAGCCTTTAGAGACAGACCTTCCAAAATACTAGAA
TTACAGCTTCATAAATTGGTTGGTTACTCAGTTGAAGAAGAAGTGGAAGTCCTTGCTGAATTGC
FF:FFFFFFFFF:FFFF[Link]FFFFF,FFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFF[Link],FFF:FFFFFFF:FFFFF,FFFFFF,FF
[Link],:FFFFFFF,::FFFF,[Link]F NM:i:3 MD:Z:18C8C10 AS:i:21 XS:i:21 RG:Z:bra_BAM_1.1-1
XA:Z:Eucl_scaffold129,+4474810,92S19M39S,0; MQ:i:58 MC:Z:67S22M61S ms:i:4990

A0[Link]HMTF5DSX[Link]7 99 Eucl_scaffold1 46 40 113S20M1I16M = 200 176


CATATAGATTATGGATTCAAGACACTAATCAAAACCACTAAAATCTAGACTTAGCCACAAACAGGTTAGGGTGACAAAAGCCTTTAG
AGACAGACCTACCAAAATACTAGAATTACAGCTTCATAAATTGGTTGGTTACTCAGTTGAAGA
FFFFF,:::FFF,:FF,FFF,::FFFFF:FFFF,F,[Link],

08.08.2023

Generating statistics from a VCF

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ mkdir ~/vcftools

- first, we open a new directory in home

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub vcftools --gzvcf braya_combined.[Link]


--freq2 --out ~/vcftools --max-alleles 2

Job <4417884> is submitted to default queue <normal>.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub cftools --gzvcf braya_combined.[Link] --


site-mean-depth --out ~/vcftools

Job <4417887> is submitted to default queue <normal>.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub vcftools --gzvcf braya_combined.[Link]


--site-quality --out ~/vcftools

Job <4417888> is submitted to default queue <normal>.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub vcftools --gzvcf braya_combined.[Link]


--missing-indv --out ~/vcftools

Job <4417889> is submitted to default queue <normal>.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub vcftools --gzvcf braya_combined.[Link]


--missing-site --out ~/vcftools
Job <4417890> is submitted to default queue <normal>.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub vcftools --gzvcf braya_combined.[Link]


--het --out ~/vcftools

Job <4417891> is submitted to default queue <normal>.

- For this commeand, check the [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs

eacikgoz@hpcws001:~/vcftools$ ls

[Link] [Link] [Link] [Link] [Link] [Link] [Link]

eacikgoz@hpcws001:~/vcftools$ head [Link]

INDV O(HOM) E(HOM) N_SITES F

BAM_1.1-1 9954497 9140982.1 11985565 0.28599

BAM_10.3-1 9684033 8885312.2 11628027 0.29122

BAM_11.2-1 9904839 9002479.1 11783588 0.32446 ..................

- Bu istatiksel hesaplamalarin sonucunda bu verileri elde ettik. Bunlari yorumlamayi ogren.

PS C:\Users\eacikgoz\Desktop\MPIPZ> scp eacikgoz@hpcws001:/home/eacikgoz/vcftools/[Link] C:\Users\


eacikgoz\Desktop\MPIPZ

eacikgoz@hpcws001's password:

[Link] 100% 6081MB 111.8MB/s 00:54

PS C:\Users\eacikgoz\Desktop\MPIPZ>

03.08.2023

Individual based statistics

As well as a our per variant statistics we generated earlier, we also calculated some individual metrics too. WE can
look at the distribution of these to get an idea whether some of our individuals have not sequenced or mapped as
well as others. This is good practice to do with a new dataset. A lot of these statistics can be compared to other
measures generated from the data (i.e. principal components as a measure of population structure) to see if they
drive any apparent patterns in the data.

Variant missingness

var_miss <- read_delim("./cichlid_subset.lmiss", delim = "\t",


col_names = c("chr", "pos", "nchr", "nfiltered", "nmiss", "fmiss"), skip = 1)

Then we plot the data with ggplot2. One thing to keep in mind here is that different datasets will likely have different
missingness profiles. RAD-sequencing data for example is likely to have a slightly higher mean missingness than
whole genome resequencing data because it is a random sample of RAD sites from each individual genome - mean-
ing it is very unlikely all individuals will share exactly the same loci (although you would hope the majority share a
subset).

a <- ggplot(var_miss, aes(fmiss)) + geom_density(fill = "dodgerblue1", colour = "black", alpha = 0.3)


a + theme_light()

Mean depth per individual

First we will look at the distribution of mean depth among individuals. We read the data in with read_delim:

ind_depth <- read_delim("./cichlid_subset.idepth", delim = "\t",


col_names = c("ind", "nsites", "depth"), skip = 1)

Then we plot the distribution as a histogram using ggplot and geom_hist.

a <- ggplot(ind_depth, aes(depth)) + geom_histogram(fill = "dodgerblue1", colour = "black", alpha = 0.3)


a + theme_light()

Figure 9. Statistical analysis results of mean depth per individual

Proportion of missing data per individual


Next we will look at the proportion of missing data per individual. We read in the data below:

ind_miss <- read_delim("./cichlid_subset.imiss", delim = "\t",


col_names = c("ind", "ndata", "nfiltered", "nmiss", "fmiss"), skip = 1)

This is very similar to the missing data per site. Here we will focus on the fmiss column - i.e. the proportion of missing
data.

a <- ggplot(ind_miss, aes(fmiss)) + geom_histogram(fill = "dodgerblue1", colour = "black", alpha = 0.3)


a + theme_light()

Figure 10. Statistical analysis results of proportion of missing data per individual.

- 0.80’deki ornek weird cunku digerlerinden cok uzakta cikmis.

Heterozygosity and inbreeding coefficient per individual


ind_het <- read_delim("./cichlid_subset.het", delim = "\t",
col_names = c("ind","ho", "he", "nsites", "f"), skip = 1)

a <- ggplot(ind_het, aes(f)) + geom_histogram(fill = "dodgerblue1", colour = "black", alpha = 0.3)


a + theme_light()
Figure 11. Statistical analysis results of Heterozygosity and inbreeding coefficient per individual.

var_qual <- read_delim("C:/Users/eacikgoz/Documents/[Link]", delim = "\t",

col_names = c("chr", "pos", "qual"), skip = 1)

a <- ggplot(var_qual, aes(qual)) + geom_density(fill = "dodgerblue1", colour = "black", alpha = 0.3)

a + theme_light()

04.08.2023

Population structure: PCA

In genetics, it is used to visualize the affinity and similarity between two populations.

plink --vcf $VCF --double-id --allow-extra-chr \

--set-missing-var-ids @:# \

--indep-pairwise 50 10 0.1 --out cichlids


plink --vcf $VCF --double-id --allow-extra-chr \ --set-missing-var-ids @:# \

> plink --vcf $/netscratch/dep_mercier/grp_novikova/eacikgoz/braya_combined.[Link] --double-id --allow-extra-chr >


--set-missing-var-ids @:# \

braya_combined.[Link]

plink --vcf $/netscratch/dep_mercier/grp_novikova/eacikgoz/braya_combined.[Link] --double-id --allow-extra-chr \


--set-missing-var-ids @:# \
--indep-pairwise 50 10 0.1 --out brayas

plink --vcf $/netscratch/dep_mercier/grp_novikova/eacikgoz/braya_combined.[Link] --double-id --allow-extra-chr \


--set-missing-var-ids @:# \
--indep-pairwise 50 10 0.1 --out brayas

plink --vcf $VCF --double-id --allow-extra-chr \


--set-missing-var-ids @:# \
--indep-pairwise 50 10 0.1 --out braya

plink --vcf $/netscratch/dep_mercier/grp_novikova/eacikgoz/braya_combined.[Link] --double-id --allow-extra-chr \


--set-missing-var-ids @:# \
--indep-pairwise 50 10 0.1 --out brayas

bsub -M 27539072 plink --vcf $VCF --double-id --allow-extra-chr \


--set-missing-var-ids @:# \
--indep-pairwise 50 10 0.1 --out brayas

07.08.2023

Heterozygosity Analysis
Figure 12. Heterozygosity analysis

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub python3 [Link]


/netscratch/dep_mercier/grp_novikova/eacikgoz/braya_combined.[Link]

Job <4420503> is submitted to default queue <normal>.

- Bu bir phyton kodu oldugu icin python detayini ekledik


- Cok fazla Syntax error Verdi. Bunun icin yeni bir [Link] kodu denedik. En alttaki kapali olan calismamis olan.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub -M 27539 -q bigmem plink --vcf


$/netscratch/dep_mercier/grp_novikova/eacikgoz/braya_combined.[Link] --double-id --allow-extra-chr --set-
missing-var-ids @:# \--indep-pairwise 50 10 0.1 --out brayas

- Bu kod calisti, takip edicem. ‘27539’ kisminda boyutu kucuk yazdik, eger calismazsa asil boyutu yazip tekrar
denicez.

plink --vcf $ netscratch/dep_mercier/grp_novikova/eacikgoz/braya_combined.[Link] --double-id --allow-extra-chr --


set-missing-var-ids @:# \--extract [Link] \--make-bed --pca --out brayas
08.08.2023

Figure 12. Rplot results of PCA analysis of braya samples in barplot.


Figure13. The main PCA analysis

Orneklerin cogu birbirine cok yakin cikmis, bu durum bu orneklerin birbirine cok yakin genetik yapida oldugunu
gosteriyor, yani iyi bir durum.

-Fakat, kumeden cok uzakta cikan 3 tane tur var. Bunlarin genetik yapilari farkli. Bu durum yanlis isimleendirmeden
veya yanlis ekimden kaynaklanmis olabilir. Bunun olmamasi gerekirdi.
Figure 14. PCA analysis results without wrong samples.

Genetik olarak uzak cikan ornekler plottan cikarildigi zaman bu sekilde daha detayli bir goruntu elde edildi. NT
ornekleri olmasi gerektigi gibi birbirine cok yakin cikmis, fakat BAM ornekleri beklenenden uzaktalar. Bu uzaklik cok
dusuk ama yine de dikkate deger mi?

Genotyping for Tetraploid Samples


Figure 15. Genotyping for tetraploid samples

Bunun icin [Link] kullandik. Genotyping’i bununla yap.

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ pip3 install -e


git+[Link]

Obtaining khmer from git+[Link]

Cloning [Link] to ./src/khmer

Collecting bz2file (from khmer)


Downloading
[Link]
84f2d/[Link]

Collecting screed>=1.0 (from khmer)

Downloading
[Link]
fb/[Link] (95kB)

100% |████████████████████████████████| 102kB 2.7MB/s

Building wheels for collected packages: bz2file

Running [Link] bdist_wheel for bz2file ... done

Stored in directory:
/home/eacikgoz/.cache/pip/wheels/81/75/d6/e1317bf09bf1af5a30befc2a007869fa6e1f516b8f7c591cb9

Successfully built bz2file

Installing collected packages: bz2file, screed, khmer

The script screed is installed in '/home/eacikgoz/.local/bin' which is not on PATH.

Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

Running [Link] develop for khmer

Successfully installed bz2file-0.98 khmer screed-1.1.2

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ mkdir hashes

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd hashes

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ nano

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ nano rename_merge.sh

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ cd ..

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano samtools_cov.sh

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd hashes

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ nano

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ cd ..

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano samtools_cov.sh

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd hashes

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ nano

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ ls

[Link]
(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ nano [Link]

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ nano [Link]

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ cd ..

(kwip) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ ls

(kwip) eacikgoz@hpcws001:/biodata/dep_mercier/grp_novikova/[Link]/lyrata_raw_February2023$ cat


sample_accessions.tsv

Library number Library Name Batch

5717_A IRKU048360 5

5717_B IRKU049793 5

09.08.2023

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/software$ conda activate kwip

(kwip) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/software$ ls

admixture Bandage_Ubuntu-x86-64_v0.[Link] CNVcaller

Figure 16. kWIP analysis.

- Dosya isimlerini buraya gore ayarladik. Listedeki isim neyse sonuna [Link] uzantili olacak sekilde aayrlandi.

BAM_1.[Link]

BAM_6.[Link]

BAM_2.[Link] .........

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz/hashes$ ./[Link]

- It gives error.

PARAMETERS:

- kmer size = 20 (-k)

- n tables = 1 (-N)
- max tablesize = 1e+09 (-x)

Estimated memory usage is 1.0 Gb (1e+09 bytes = 1 bytes x 1e+09 entries / 1 entries per byte)

--------

Saving k-mer countgraph to NT11_1_1.[Link]

Loading kmers from sequences in ['../NT11_1_1.[Link]']

making countgraph

consuming input ../NT11_1_1.[Link]

Exception in thread Thread-1:

Traceback (most recent call last):

File "/netscratch/dep_mercier/grp_novikova/software/anaconda3/lib/python3.8/[Link]", line 932, in


_bootstrap_inner

[Link]()

File "/netscratch/dep_mercier/grp_novikova/software/anaconda3/lib/python3.8/[Link]", line 870, in run

self._target(*self._args, **self._kwargs)

TypeError: argument 1 must be str, not _khmer.ReadParser

Total number of unique k-mers: 0

saving NT11_1_1.[Link]

Writing summmary info to NT11_1_1.[Link]

fp rate estimated to be 0.000

DONE.

wrote to: NT11_1_1.[Link]

09.08.2023

NJ Trees

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub -q multicore40 bcftools view --threads


40 --max-alleles 2 --min-alleles 2 --types snps -o vcf_name.[Link] vcf_name.[Link]

Job <4422016> is submitted to queue <multicore40>.


Figure 17. Script to create NJ trees.

Figure 17.1. Script to obtain matrix file for NJ tree.

Ilk scripti calistirdiktan sonra [Link] isimli dosyayi elde ettik.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub < ./snp-matrix_awk.sh

Scriptteki vcf dosyasi kismini degistirip bu yeni dosyayi ekleyip tekrar calistikrdik.

- Script error verdi: gzip: [Link]: not in gzip format


- Dosyadan gz uzantisini silip gzip dosyasi haline getirdik

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ mv [Link] [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bgzip [Link]


eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ jobs

[1]- Done bgzip [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub < ./snp-matrix_awk.sh

Figure 18. Creating NJ tree via R.

- Dosya cok buyuk oldugu icin basta butun datayla figur elde edemedim, head(braya_data) komutunu kullanarak
datayi kucultup sonra denedim ve calisti.
- Butun datayla islem yapabilmek icin sonra ikinci bir yol denedim:
nj() kodu calismadi cunku dosya boyutu cok buyutku, bunu icin njs() kullandim ve calisti.
- Figure 18’deki tree butun data kullanilarak elde edildi.

15.08.2023

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source
/netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda activate


/home/eacikgoz/.conda/envs/pixy

(pixy) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda init bash

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/condabin/conda

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda-env
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/activate

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/deactivate

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/fish/conf.d/[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/Conda.psm1

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/conda-hook.ps1

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/lib/python3.8/site-packages/xontrib/
[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

modified /home/eacikgoz/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

(base) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source deactivate

DeprecationWarning: 'source deactivate' is deprecated. Use 'conda deactivate'.

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source
/netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda create -n pixy

WARNING: A conda environment already exists at


'/netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/pixy'

Remove existing environment (y/[n])? y

^C

^C

CondaValueError: prefix already exists: /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/pixy

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda activate pixy

ls: cannot access '/netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/pixy/share/gdb/auto-load/


replace_this_section_with_absolute_slashed_path_to_CONDA_PREFIX/lib': No such file or directory

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda init bash

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/condabin/conda

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda-env

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/activate
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/deactivate

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/fish/conf.d/[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/Conda.psm1

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/conda-hook.ps1

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/lib/python3.8/site-packages/xontrib/
[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

no change /home/eacikgoz/.bashrc

No action taken.

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd pixy

-bash: cd: pixy: No such file or directory

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano [Link]

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ cd /home/eacikgoz/.conda/envs/pixy

eacikgoz@build-stretch:~/.conda/envs/pixy$ ls

conda-meta

eacikgoz@build-stretch:~/.conda/envs/pixy$ bsub < ./[Link]

-bash: ./[Link]: No such file or directory

eacikgoz@build-stretch:~/.conda/envs/pixy$ cd /netscratch/dep_mercier/grp_novikova/eacikgoz/

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source
/netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda activate my_env

Could not find conda environment: my_env

You can list all discoverable environments with `conda info --envs`.

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda env list

# conda environments:

pixy /home/eacikgoz/.conda/envs/pixy

base * /netscratch/dep_mercier/grp_novikova/software/anaconda3

admix /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/admix
anna_syri /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/anna_syri

braker /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/braker

ema_env /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/ema_env

kwip /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/kwip

pixy /netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/pixy

eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda activate


/netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/pixy

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda info | grep -i 'base


environment'

base environment : /netscratch/dep_mercier/grp_novikova/software/anaconda3 (writable)

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda info --base

/netscratch/dep_mercier/grp_novikova/software/anaconda3

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source


/netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda activate


/home/eacikgoz/.conda/envs/pixy

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source ~/anaconda3/bin/activate

-bash: /home/eacikgoz/anaconda3/bin/activate: No such file or directory

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source ~/miniconda3/bin/activate

-bash: /home/eacikgoz/miniconda3/bin/activate: No such file or directory

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda init bash

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/condabin/conda

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda-env

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/activate

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/deactivate

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/fish/conf.d/[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/Conda.psm1

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/conda-hook.ps1

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/lib/python3.8/site-packages/xontrib/
[Link]
no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

no change /home/eacikgoz/.bashrc

No action taken.

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda deactivate

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda deactivate


/netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/pixy

deactivate does not accept arguments

remainder_args: ['/netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/pixy']

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda deactivate


/home/eacikgoz/.conda/envs/pixy

deactivate does not accept arguments

remainder_args: ['/home/eacikgoz/.conda/envs/pixy']

(pixy) eacikgoz@build-stretch:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda init bash

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/condabin/conda

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/conda-env

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/activate

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/bin/deactivate

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/fish/conf.d/[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/Conda.psm1

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/shell/condabin/conda-hook.ps1

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/lib/python3.8/site-packages/xontrib/
[Link]

no change /netscratch/dep_mercier/grp_novikova/software/anaconda3/etc/profile.d/[Link]

no change /home/eacikgoz/.bashrc

- Bunlari denedim ama calismadi

16.08.2023

Pixy analysis icin Mehmet’e sordum.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ nano [Link]

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda env list


# conda environments:

pixy /home/eacikgoz/.conda/envs/pixy

/netscratch/dep_mercier/grp_novikova/software/anaconda3/envs/kwip

base /opt/share/software/packages/miniconda3-4.12.0

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda activate pixy

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.

To initialize your shell, run

$ conda init <SHELL_NAME>

Currently supported shells are:

- bash – fish - tcsh - xonsh - zsh - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'. -problem

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda activate


/home/eacikgoz/.conda/envs/pixy

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.

To initialize your shell, run

$ conda init <SHELL_NAME>

Currently supported shells are:

- bash...................

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda init pixy

usage: conda init [-h] [--all] [--user] [--no-user] [--system] [--reverse] [--json] [-v] [-q] [-d] [SHELLS ...]

conda init: error: argument SHELLS: invalid choice: 'pixy' (choose from 'bash', 'fish', 'tcsh', 'xonsh', 'zsh',
'powershell')

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda init bash

no change /opt/share/software/packages/miniconda3-4.12.0/condabin/conda.................

No action taken.

- Ayni errorlari yine aldik. Sonra source’u calistirmayi denedik.

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source ~/.bashrc

-bash: $'source\302': command not found

eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source ~/.bash

.bash_history .bash_profile .bashrc


eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source ~/.bashrc

(base) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda activate pixy

(pixy) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub < ./[Link]

Job <4424768> is submitted to queue <multicore20>.

- Ben conda activate bash yapiyordum once, Mehmet conda activate pixy yapti ve duzeldi.

(pixy) eacikgoz@hpcws001:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME

4424768 eacikgo PEND multicore2 [Link] pixy[1] Aug 16 11:37.....................

vim for Scpript

- Vim’e ilk girdiginde kilitlidir, acip degisilik yapmak icin Shift + i yapilir.
- Vim’de istedigin gibi degisiklik yaapbilirsin mouse’la gitmek istedigin yere tiklaman yeterli
- Cikis yapmak icin
- hic degisiklik yapmadiysan direkt cikmak icin :q
- degisiklik yaptiysan ve kaydedeceksen :wq
- degisiklik yaptin ve kaydetmeyeceksen :q!
- Sayfanin basina gitmek icin iki kez tek tirnak tusuna bas Shift + ‘’
- Sayfanin sonuna gitmek icin Shift + g
- Bulundugumuz yerden itibaren tum satiri silmek icin Shift + d (veya direkt D)
- Kes dd
- Kopyala yy
- Yapistir bulundugum satirdan sonraya p, onceye P
- Coklu silme icin kac satir silinecekse belirtilir 5+dd, p ile tekrar yapistirilabilir

*Scriptlerini biodataya dosya acip kopyala yedek durmasi icin.

PIXY ANALYSIS

Unbiased estimation of nucleotide diversity within and between populations.

- pixy is a command-line tool for painlessly and correctly estimating average nucleotide diversity within (π) and
between (dxy) populations from a VCF. In particular, pixy facilitates the use of VCFs containing invariant (AKA
monomorphic) sites, which are essential for the correct computation of π and dxy in the face of missing data
(i.e. always).
Pi within 500 kb yapilacak
Coverage-pi gosteren 2 histogram yapilacak
- Pi within ve pi between??
- Fst test shows the differentiation
- Pi for populatin
- Pi for nt11 and bam+IRK without outliers
Figure 19. Pixy analysis for triploid NT samples and tetraploid BAM and IRKU samples to see nucleotide diversity.

- Uliana bu scripti verdi.


- Bunu calistirabilmek icin kendi home place’imde pixy kurulumu olmasi gerekiyordu:

[Link] hostunda pixyi indirdim.

conda install --yes -c conda-forge pixy


conda install --yes -c bioconda htslib

Pixy icin ornek listesi bu sekilde hazirlaniyor; orneklerin isimleri ve genel isimleri.

(base) eacikgoz@hpcws002:/netscratch/dep_mercier/grp_novikova/eacikgoz$ source ~/.bash_profile

(base) eacikgoz@hpcws002:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda activate pixy

(pixy) eacikgoz@hpcws002:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bsub < [Link]

Job <4424777> is submitted to queue <multicore20>.

(pixy) eacikgoz@hpcws002:/netscratch/dep_mercier/grp_novikova/eacikgoz$ bjobs

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME

4424777 eacikgo PEND multicore2 [Link] pixy[1] Aug 16 13:48

- Kod calismadi, buna bak calisirsa boyle devam et


SHELL SCRIPT

- Bir dosya aratirken dosyanin ismi iki kelimeden olusan bosluklu bir isimse onu tirnak icinde yasmak gerekir
yoksa too much argument uyarisi verir.
- mkdir ile yeni dosya olustururken de iki isim olacaksa bunu tirnak icinde yazmak gerekir cunku bosluklu bir isim
yazarken iki farkli dosya acar.
- rm -r bir dosyanin ici dolu olsa bile silmek istedigimizde bunu kullaniyoruz
- Bir dosyada kelime aratmak icin; cat pixy_4424775_1_output.txt | grep 'error'

17.08.2023

Pixy yine calismadi onunla son ilgilenicem cunku ilk error installing ile ilgiliydi ve tekrar indirince sorun cozuldu. Ama
simdi asil problem vcf dosyalariyla ilgili. Vcf dosyalarini okuyamadigiyla ilgili bir error verdi cunku okumasi gereken
dosya mevcut degil.

- Bunun icin combining processini tekrar baslattim cunku combined dosyalar olusmamis cunku islem hata vermis.
- Tetre ve Hexa icin tekrar combining yapilacak.
- Dosyalarin isimlendirilmeleri onemli. Ben [Link] yapmistim ama [Link] olmasi gerekiyormus. Combining islemi
bittikten sonraq convert edilecekler.

24.08.2023

Figure 20. Obtaining [Link] file by converting [Link] file. [Link]

- Pixy analizi icin [Link] dosyasiya ihtiyac vardir. Bunun icin once dosyalari convert ettik.
Figure 20. Combining triploid and tetraploid files. [Link]

- Pixy analysis icin triploid ve tetraploid ornekelrimiz var, bunun icin iki dosyayi birlestirip tek analiz yapacagiz.
- Some type of analysis is better to do with samples called as diploids. For other stuff we need to know ploidy
and call variations based on nquire. Calling as hexaploids we decided not to use, because if we do that we need
to call our tera BAMs as octaploids, which is too much. So, we will need to merge tetra and triploid called
samples and use these for pixy.

28.08.2023

(base) eacikgoz@hpcws002:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda create tabix

(base) eacikgoz@hpcws002:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda activate tabix

(tabix) eacikgoz@hpcws002:/netscratch/dep_mercier/grp_novikova/eacikgoz$ conda install -c bioconda tabix

Collecting package metadata (current_repodata.json): done

Solving environment: done

- Didn’t work.
31.08.2023

Pixy calismiyor. Sorunun vcf dosyasindan kaynaklanma ihtimaline karsi vcf dosyalarini tekrar olusturduk. Orneklerin
bazilari triploid, bazilari tetraploid. Bunun icin vcf dosyasini olustururken orneklere ait tri ve tetra [Link] dosyalarini
kullandik.

Bu islemden sonra; gvcf dosyasini vcf dosyasina cevirmemiz gerekiyor.


Bunu icin bu scripti kullandik.

- Bu islemi yaparken script jar file ve module not found uyarisi verdi. Bunun icin mapping scriptine gidp yazan
module’ler ayni mi diye baktik.
- Farkliydi ve asagidaki figurdeki gibi mapping scriptinde sadece module kismini tekrar calistirdik ve ardindan
ordaki modulu kopyalayip convert scriptindeki yere yapistirip tekrar calistirdik ve sorun cozuldu.
Figure 21. Module ve jar file errorunu duzeltmek icin modulu tekrar calistirdik.
06.06.2023

Figure 22. Uliana’nin scripti [Link].

Interpretation of Results

pop chromosome window_pos_1 window_pos_2 avg_pi no_sites count_diffs count_comparison count_missing

BAM Eucl_scaffold1 1 10000 0.0144161225206966 4569 3528 244726 1157672

IRKU Eucl_scaffold1 1 10000 0.0169797391518333 3412 207 12191 125299

Pixy analysis is valuable in the field of population genetics and evolutionary biology for several reasons. Here are
some of the key reasons why researchers might use Pixy analysis:

Estimation of Nucleotide Diversity and Divergence: Pixy is specifically designed to estimate nucleotide diversity (π)
within populations and nucleotide divergence (D) between populations or groups of genetic sequences. These
measures provide essential insights into genetic variation and differentiation, which are fundamental for
understanding evolutionary processes.

Handling Missing Data: One of the significant advantages of Pixy is its ability to handle missing data effectively.
Genetic data often contain gaps or missing information due to incomplete sequencing or data quality issues. Pixy
uses a likelihood-based approach to account for missing data, reducing the potential bias in diversity and divergence
estimates.

Unbiased Estimations: Pixy employs a statistical framework based on coalescent theory and likelihood inference. This
approach helps provide unbiased estimates of genetic diversity and divergence, which is critical for accurate and
reliable population genetic analysis.
Understanding pixy output

Output file contents

pixy outputs a slightly different file type for each summary statistic it calculates. The contents of the columns of
these output files are detailed below.

Within population nucleotide diversity (pi)

pop - The ID of the population from the population file

chromosome - The chromosome/contig

window_pos_1 - The first position of the genomic window

window_pos_2 - The last position of the genomic window

avg_pi - Average per site nucleotide diversity for the window. More specifically, pixy computes the weighted aver-
age nucleotide diversity per site for all sites in the window, where the weights are determined by the number of
genotyped samples at each site.

no_sites - The total number of sites in the window that have at least one valid genotype. This statistic is included for
the user, and not directly used in any calculations.

count_diffs - The raw number of pairwise differences between all genotypes in the window. This is the numerator of
avg_pi.

count_comparisons - The raw number of non-missing pairwise comparisons between all genotypes in the window
(i.e. cases where two genotypes were compared and both were valid). This is the denominator of avg_pi.

count_missing - The raw number of missing pairwise comparisons between all genotypes in the window (i.e. cases
where two genotypes were compared and at least one was missing).

Between population nucleotide divergence (dxy)

pop1 - The ID of the first population from the population file.

pop2 - The ID of the second population from the population file.

chromosome - The chromosome/contig.

window_pos_1 - The first position of the genomic window.

window_pos_2 - The last position of the genomic window.

avg_dxy - Average per site nucleotide divergence for the window.


no_sites - The total number of sites in the window that have at least one valid genotype in both populations. This
statistic is included for the user, and not directly used in any calculations.

count_diffs - The raw number of pairwise, cross-population differences between all genotypes. This is the numerator
of avg_dxy.

count_comparisons - The raw number of non-missing pairwise cross-population comparisons between all genotypes
in the window (i.e. cases where two genotypes were compared and both were valid). This is the denominator of
avg_dxy.

count_missing - The raw number of missing pairwise cross-population comparisons between all genotypes in the
window (i.e. cases where two genotypes were compared and at least one was missing). This statistic is included for
the user, and not directly used in any calculations.

Figure 23. Plotting of pixy analysis results. Only pi and this graph contains outliers.

- Now we will fix the list and run pixy again with both dxy and fts.

07.09.2023

Scaffolding: After sequencing these smaller segments, the challenge lies in assembling them into the correct order to
reconstruct the entire genome. This is where scaffolds come into play. A scaffold is a hypothetical or provisional
framework that represents the relative positions and orientations of the sequenced segments within the genome.
- Pixy analizi sonucu pi, dxy ve fst dosyaalri olustu. Bu dosyalari kullanarak R’da grafikler cizildi.

library(ggplot2)

#Pi

unique(pi$chromosome)

pi1<-pi[pi$chromosome=='Eucl_scaffold1',]

ggplot(pi, aes(x=window_pos_1, y=avg_pi, color=pop)) + geom_point()

ggplot(inp1, aes(x=window_pos_1, y=avg_pi, color=pop)) + geom_point()# scaffold 1

ggplot(inp, aes(x=window_pos_1, y=avg_pi, color=pop)) + geom_point()# for all scaffolds

#calculate average pi for both of them separately, for all scaffolds separately

#dxy between

dxy <- [Link]("braya_tet_1_win500000_pops_dxy.txt", sep="\t", header=TRUE)

unique(dxy$chromosome)

dxy1 <- dxy[dxy$chromosome == 'Eucl_scaffold1',]

popd1=dxy$pop1

popd2=dxy$pop2

ggplot(dxy, aes(x = window_pos_1, y = avg_dxy)) +

geom_point(aes(color = popd1), size = 2) +

geom_point(aes(color = popd2), size = 2) +

scale_color_manual(values = c("pink", "purple")) +


labs(color = "Population") +

theme_minimal()

#Fst

fst<-[Link]("braya_tet_1_win500000_pops_fst.txt",sep="\t",header=T)

popf1=fst$pop1

popf2=fst$pop2

unique(fst$chromosome)

fst1<-fst[fst$chromosome=='Eucl_scaffold1',]

ggplot(fst, aes(x=window_pos_1, y=avg_wc_fst, color=popfts1)) + geom_point()

ggplot(fst, aes(x = window_pos_1, y = avg_wc_fst)) +

geom_point(aes(color = popf1), size = 2) +

geom_point(aes(color = popf2), size = 2) +

scale_color_manual(values = c("pink", "purple")) +

labs(color = "Population") +

theme_minimal()
- Herbir scaffold icin average hesaplanacak, bunu donguye olusturup yapabilirim.

You might also like