Bioinformatics: Introduction and Methods 生物信息学: 导论与方法
Week 3
Sequence Database Search
Submit your assignment
Due DateNov 9, 7:59 AM CET
Attempts3 every 8 hours
Receive grade
To Pass80% or higher
Grade
20%
We keep your highest score
Sequence Database Search
Graded Quiz • 30 min
Due Nov 9, 7:59 AM CET
Sequence Database Search
Total points 5
1.
Question 1
关于 BLAST 结果中 E-value 的说法,以下不正确的是
Which one of the following options is not correct with respect to the BLAST's E-
value?
1 point
当它确定的时候,相应的 p-value 也是确定的
When it is fixed, the corresponding p-value for this E-value will be fixed as well.
它可以大于 1
It could be larger than 1
它表示了相应 hit 的可信度
It denotes how much we could trust its corresponding "hit" sequence
它的值在接近 1 时,是几乎和相应的 p-value 一样的
When it is near 1, it is nearly identical to its corresponding p-value.
它和一开始输入的查询序列的长度以及数据库总序列长度都有关
It depends on the length of the query sequence AND the size of the database
2.
Question 2
下列选项中,哪个项不能帮助 BLAST 降低假阳性?
Which of the following options cannot reduce the false positives of BLAST?
1 point
提前给数据库建索引
Build an index for the database ahead of time
从最初始找到的 hit 里面去掉一些零散的 hit,只保留 hit cluster
Discard isolated hits and keep only those hits that can form hit clusters
屏蔽重复性的低复杂度区域
Masking the low-complexity regions
使用 E-value 来评估比对的统计显著性
Use E-value to evaluate the statistical significance of alignments
3.
Question 3
下列选项中,哪一项不能帮助 BLAST 提升计算速度?(注意不一定非得是和以
前的双序列比对算法相比有显著提升)
Which one of the following options cannot improve the speed of BLAST? Note that
the improvement need not to be significant compared to previous pairwise sequence
alignment algorithms.
1 point
使用较短的 seed word
Use shorter seed words
选择邻居单字时,只选择高度相似的邻居单字
Choosing only those neighborhood words that are highly similar to the current seed
word
提前给数据库建索引
Build an index for the database ahead of time
不计算 p-value,只计算 E-value
Do not compute the p-value; computer the E-value only
从最初始找到的 hit 里面去掉一些零散的 hit,只保留 hit cluster
Discard isolated hits and keep only those hits that can form hit clusters
对数据库预先屏蔽重复性的低复杂度区域
Masking the low-complexity regions of a database before using it in BLAST
4.
Question 4
Given the following protein sequence, please run BLAST, to find similar protein
sequences:
>Protein Sequence
MVRAPCCEKMGLKKGPWTPEEDQILISYIQSNGHGNWRALPKLAGLLRCGKS
CRLRWTNYLRPDIKRGNFTREEEDSIIQ
LHEMLGNRWSAIAARLPGRTDNEIKNVWHTHLKKRLKNYQPPQSSKRHSKN
KDSKAPCTSQIALKSSNNFSNIKEDGPGL
GSGPNSPQLSSSEMSTVTADSLAVTMDISNSNDQIDSSENFIPEIDESFWTDGLS
TSGGGEELQVQFPFHDMKQENVEKD
VGAKLEDDMDFWYSVFIKSGDLLELPEF
现有如下一条蛋白序列,请通过 BLAST,对其进行分析,寻找与其相似的蛋白
序列:
BLAST:http://blast.ncbi.nlm.nih.gov
Parameters 参数设置:
Database: Non-redundant protein sequences (nr)
Algorithm: blastp
Word size: 3
Matrix: BLOSUM62
Gap Costs: Existence: 11 Extension: 1
Other parameters leave as default. 其他参数默认.
Q: Which program listed in BLAST homepage should you use to do the analysis?
Q: 为了完成上述分析,应选择 BLAST 主页上的哪个程序?
1 point
nucleotide blast
protein blast
blastx
tblastn
tblastx
5.
Question 5
In BLAST result of question 4,which species has the highest similarity score?
在第 4 题的 BLAST 结果中,所获得的相似度最高的序列来自于哪个物种?
1 point
Capsicum annuum 辣椒
Datura metel 洋金花
Petunia x hybrida 矮牵牛
Solanum lycopersicum 番茄