Trần Vĩnh Bảo Ngọc - BTBTWE21113 - Lab 2
Trần Vĩnh Bảo Ngọc - BTBTWE21113 - Lab 2
Q1.
African swine fever on SRA fever/ NCBI:
There are total 766 results in the SRA sever. On the left side, there are some filters and its
number of publications in the parentheses such as Source (DNA(400), RNA(366)), Type, Library
layout,...
+ Platform: ABI SOLID (4), Illumina (525), Ion Torrent (70), Oxford Nanopore (154),
BGISEQ(9), Cappilary(2)
+ Libraty layout: paired (412), single (354)
+ File type: bam (27), fastq (737)
There are total 303 results in the Genome filter in SRA sever.
+ Platform: ABI SOLID (1), Illumina (107), Ion Torrent (54), Oxford Nanopore (139)
+ Libraty layout: paired (102), single (201)
+ File type: bam (13), fastq (289)
African swine fever on SRA fever/Run selector:
- No Genome filter:
- Continental geography:
+ Europe (128) + North America (110)
+ Africa (92) + Asia (63)
+ Uncalculated (9) + Empty (403)
- Organism:
+ Sus Scrofa (406) + African swine fever virus (364)
+ Sus Scrofa domesticus (25) + Ornithodoros moubata (3)
+ Pig metagenome (2) + Ornithodoros erraticus (3)
+ Asfivirus (1) + Porchine sapelovirus 1 (1)
- Platform:
+ ILLUMINA (564) + OXFORD_NANOPORE (154)
+ ION_TORRENT (70) + BGISEQ (9)
+ ABI_SOLID (4) + CAPILLARY (2)
+ DNBSEQ (2)
- Genome filter:
- Continental geography:
+ Europe (54) + North America (110)
+ Africa (74) + Asia (40)
+ Uncalculated (6) + Empty (5)
- Organism:
+ African swine fever virus (286)
+ Pig metagenome (2) + Asfivirus (1)
- Platform:
+ ILLUMINA (94) + OXFORD_NANOPORE (139)
+ ION_TORRENT (53) + ABI_SOLID (1)
+ DNBSEQ (2)
Q2.
There are 35 results for the Illumina platform for African Swine fever virus in Africa
ID: SRR10282409
Forward sequence Reverse sequence
- Almost the first 95 bases have the good quality and - Almost the first 103 bases have the good quality and
36 first bases have nearly no variability. ranges from good to reasonable quality score.
- The bases from the position 96 starts to have the
variability with good score (on the green area).
- Just few bases at the end have the reasonable quality - The remaining bases all have the good quality score,
but have the long lower whisker (means that 25% of however 25% of the quality is fell in the poor quality
the quality is lower than 26 quality score). score.
- The further position of bases, the lower the quality
score and higher variability.
- Universally low quality values because subset of - Universally low quality values because subset of
sequences will have universally poor quality, often sequences will have universally poor quality, often
because they are poorly imaged (on the edge of the because they are poorly imaged (on the edge of the field
field of view) of view)
- The red X mark indicates that there is some wrong - The red X mark indicates that there is some wrong with
with this statistic. this statistic.
- The first 12 reads have the large deviations among 4 - The first 12 reads have the large deviations among 4
types of nucleotide (could because of the bias either in types of nucleotide (could because of the bias either in the
the library or the sequencing). library or the sequencing).
- The rest reads have 4 lines overlap together so it is - The rest reads have 4 lines overlap together so it is
good. good.
- The read has the GC content (blue line) nearly match - The read has the GC content (blue line) nearly match
with the theoretical distribution (red line) => Nearly no with the theoretical distribution (red line) => Nearly no
containination in the genomic dataset or biased subset containination in the genomic dataset or biased subset
- There is a slight porpotion of Ns appearing at read - There is no porpotion of Ns appearing => the sequencer
no.115- 124 => the sequencer can not make a call base makes no proportion or frequency of ambiguous bases at
at that read. each position in a DNA or RNA sequence readout.
- There are just small Illumina univeral adapter (red) - There are just small Illumina univeral adapter (red) and
and Poly G adapter remain in the dataset Poly G adapter remain in the dataset
Q3.
Read 1: forward sequencing
- minimum overlap: 3