0% found this document useful (0 votes)

103 views11 pages

Worked Example BioMart Ensembl Tutorial

The document provides a step-by-step worked example for using the BioMart data mining tool within Ensembl to retrieve gene and protein information for cow genes located on chromosome 1. It outlines 12 steps for constructing a query in BioMart to find the Ensembl Gene IDs, Entrez Gene IDs, and InterPro protein domains predicted for known genes on this chromosome. The results of the query are displayed in a table that can be viewed or exported.

Uploaded by

Reginald Shoe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views11 pages

Worked Example BioMart Ensembl Tutorial

Uploaded by

Reginald Shoe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Data mining in Ensembl with BioMart Worked Example The cow gene encoding Myosin light chain kinase

(MYLK_Bovin) is located on chromosome 1. What other known genes are found on cow chromosome 1? What are their Ensembl Gene IDs and Entrez Gene IDs? Do they have any domains predicted by Interpro? Follow the worked example below to answer these questions.

STEP 1: Go to the Ensembl main page www.ensembl.org

STEP 2: Click on BioMart

STEP 3: Select the database: Ensembl genes (version 48) and the species of interest under Choose Dataset. (Bos Taurus genes)

STEP 4: Narrow the geneset by clicking Filters on the left. Click on the + infront of REGION to expand the choices.

STEP 5: Select Chromosome 1

STEP 6: Expand the GENE panel and select Status (gene) as known. The filters have determined our gene set. Click Count (at the top) to see how many genes have passed these filters.

STEP 7: Click on Attributes to select output options (i.e. what we would like to know about our lect geneset).

STEP 8: Expand the GENE panel. Ensembl Gene and Transcript IDs are selected by default.

Note the summary of selected options. The order of attributes determines the order of columns in the result table.

STEP 9: Select, along with the default options, Description

STEP 10: Expand the EXTERNAL panel to select EntrezGene ID.

STEP 11: Click RESULTS at the top to preview the output.

To save a file of the complete table, click Go. Or, email the results to any address. Or, View All rows as HTML.

STEP 12: Go back and change Attributes by clicking on it, and adding InterPro Short Description from the PROTEIN section.

STEP 13: Clicking Results should now show a table like this. Select View ALL rows

Result Table

END of BIOMART Worked Example

V) BIOMART - Exercises
These exercises have been designed to familiarise you with different questions you can answer with this tool, and the types of data you can retrieve with BioMart. 1. Retrieve all SNPs for novel human G-protein coupled receptor genes (GPCRs Use the InterPro domain ID: IPR000276) on chromosome 2. Note: As this is the first exercise we walk you this time through BioMart step-bystep (but of course you can also try to do this exercise without our help!) Start a new BioMart session by clicking New, or go back to the Ensembl homepage and click on Mine Ensembl with Biomart under Ensembl tools. Choose the database and the dataset for your query as follows: - Select Ensembl 48 - Select Homo sapiens genes (NCBI36). Click on Filters at the left. Filter this dataset to select your genes of interest as follows: - Expand the REGION section at the right by clicking on the +. Select Chromosome 2. Click [count] at the top of the panel and note the number of Ensembl genes on Homo sapiens chromosome 2. - In the GENE section, select Status (gene) NOVEL. - In the PROTEIN section, select the second Limit to genes with these family or domain IDs option. Select Interpro ID(s) and enter IPR000276 in the box. Click [count] again and note that the number of genes is updated. Click on Attributes (at the left). Select the output for your gene list as follows: - Select the SNPs Attribute Page. - In the GENE section Ensembl Gene ID and Ensembl Transcript ID are selected by default also select Ensembl Peptide ID and Ensembl Peptide length. - In the GENE ASSOCIATED SNPs section select Reference ID, Allele, Peptide location (aa), Location in Gene (coding etc), Synonymous Status and Peptide Shift. Note: Clicking on count now will not show an altered number. Attribute selections should not affect the count (i.e. the number of genes that have passed the filters). Click on Results (at the top) to obtain the first 10 rows of your table. To obtain the entire table select View all rows as HTML or export a file by clicking Go.

Note that the output for this query gives you one row for each SNP, and if there are alternative transcripts then SNP data is given for each. This means that a particular SNP may appear more than once. Find the coding SNPs, and note that you have information about the effect of the SNP, and its location within the protein. Synonymous status is yes for silent mutations. Two amino acids will be shown in the Peptide Shift column if there are two alleles on the protein level. The Peptide location (aa), Synonymous Status and Peptide Shift will all be blank if the SNP is not in a coding region. 2. Click New to start a new query. Retrieve the gene structure (i.e. start and end coordinates of exons) of the mouse gene ENSMUSG00000042351. 3. Retrieve peptide sequences of all chicken genes on chromosome 1. 4. The file http://www.ebi.ac.uk/~xose/Affy_exercise.txt contains a list of probeset IDs from a microarray experiment using the Affymetrix array HG-U133 Plus 2.0 (human). Retrieve the 500 bp upstream of the transcripts matching these probeset IDs. 5. Retrieve the 5UTR sequence of cow genes on chromosome 5 that possess a UTR. 6. Retrieve sequence (including reference ID in the header) of all human SNPs that have an ID from The SNP Consortium (TSC), from chromosome 6 between 15 Mb and 15.2 Mb, with 200 bases flanking sequence. 7. Retrieve the mouse homologues of Homo sapiens genes CASP1, CASP2, CASP3, and CASP4. (These are HGNC symbols for the genes). 8. Design your own query!

Answers (BioMart)
1. You should find one novel gene on chromosome 2 with this InterPro domain. (Note: there can be more than one gene with one InterPro domain). The result set has one transcript and a total of 261 rows of output (to see this, change the option from TSV to XLS under Export all results and click Go, then open in Excel so you dont have to count the rows manually). The transcript has 9 coding SNPs (Location in Gene is coding), most of which are non-synonymous (Synonymous status is no) and thus affect the amino acid sequence of the encoded peptide. One allele is a stop codon- can you find it?

2. Click New. Select: Database and dataset: Ensembl 48 and Mus musculus genes (NCBIM36). Filters: GENE ID list limit Ensembl Gene ID(s): enter the mouse gene ID. Attributes Structures: select in the EXON panel: Ensembl Exon ID, Exon Start and Exon End. Click Results. You should find 7 exons. Take the link from the Ensembl Gene ID in your output back to the GeneView page to confirm the BioMart data with the gene structure displayed on this page. 3. Database and dataset: Ensembl 48 and Gallus gallus genes (NCBI36). Filters: REGION Chromosome 1 Attributes: Sequences: Peptide Sequences, and add to the header: Description and Ensembl Peptide ID along with the default options (Ensembl Gene ID and Transcript ID). Count should show 2297 Ensembl genes 4. Database and dataset: Ensembl 48 and Homo sapiens genes (NCBI36). Filters: GENE: ID list limit: Affy hg u133 plus 2 ID(s) and enter the list of probeset IDs. Attributes: Sequences select Flank (Transcript), Upstream flank 500. In the header, apart from the already default selected options, select Ensembl Transcript ID. You should find upstream sequences for the transcripts of 31 genes (Hint: click count to see the number of genes!) 5. Database and dataset: Ensembl 48 and Bos Taurus genes (NCBI36). Filters: REGION Chromosome 5 GENE: Entries with a 5UTR Only Attributes Sequences and select 5UTR Count should show 547 genes. FYI: The Flank option in the Sequences Attribute page:

If you choose the option Flank (Gene) you will see only one upstream sequence per gene in the output. In the case where a gene has multiple transcripts, the upstream sequence of the transcript that extends the furthest at the 5 end is shown. If you want to export the upstream sequences for each transcript you should choose the option Flank (Transcript).

6. Database: SNP and dataset: Homo sapiens SNPs (dbSNP127;HGVbase 15; TSC 1; affy GeneChip Mapping Array). Filters: REGION: Chromosome 6, Base pair Start 15000000, Base pair End 15200000 GENERAL SNP FILTERS: SNP source: SNPs with TSC ID(s) Only. Attributes Sequences: SEQUENCES : SNP sequences, Upstream flank 200, Downstream flank 200. SNP: SNP attributes, select Reference ID. You should find 69 SNPs. 7. Database: Ensembl 48 Dataset: Homo sapiens genes (NCBI36) Filters: GENE: ID list limit HGNC Symbol(s). Enter the human HGNC (HUGO) symbols in the box: CASP1, CASP2, CASP3, and CASP4. Attributes: Under Homologs, select in the MOUSE ORTHOLOGS panel Mouse Ensembl Gene ID and Mouse External ID. Also select Ensembl gene ID and Transcript ID (default options) and Description in the GENE panel (these will be for the starting dataset i.e. Human.) Results displays the mouse ortholog

Bioinformatics LAb Report
100% (3)
Bioinformatics LAb Report
7 pages
Motor Winding Procedure
100% (1)
Motor Winding Procedure
25 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Exploring Ensembl: Exercise 1 - Panda
No ratings yet
Exploring Ensembl: Exercise 1 - Panda
4 pages
Tutorial For Proteome Data Analysis Using The Perseus Software Platform
No ratings yet
Tutorial For Proteome Data Analysis Using The Perseus Software Platform
22 pages
PNS 193 (2005) - Flat Glass Specification PDF
No ratings yet
PNS 193 (2005) - Flat Glass Specification PDF
30 pages
Fast Facts: EGFR Exon 20 Insertion Mutations in NSCLC
From Everand
Fast Facts: EGFR Exon 20 Insertion Mutations in NSCLC
Julia Rotow
No ratings yet
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Fast Facts: Comprehensive Genomic Profiling: Making Precision Medicine Possible
From Everand
Fast Facts: Comprehensive Genomic Profiling: Making Precision Medicine Possible
Bernardo L. Rapoport
5/5 (1)
Browsing Genomes With Ensembl PDF
No ratings yet
Browsing Genomes With Ensembl PDF
105 pages
Guide Sheet For Tics Lab 1 - 4
No ratings yet
Guide Sheet For Tics Lab 1 - 4
17 pages
Introduction To Bioinformatics Lab: 10B17BT571 Core Course Credits: 1 L0T0P2
No ratings yet
Introduction To Bioinformatics Lab: 10B17BT571 Core Course Credits: 1 L0T0P2
3 pages
BIOINFORMATICS LAB Report
No ratings yet
BIOINFORMATICS LAB Report
14 pages
Query Sequence 1
No ratings yet
Query Sequence 1
3 pages
Primer Design Exercise
No ratings yet
Primer Design Exercise
34 pages
202 07 Bioinformatics
No ratings yet
202 07 Bioinformatics
14 pages
Lecture12 Functional Pathway Analysis
No ratings yet
Lecture12 Functional Pathway Analysis
13 pages
Bioinformatics Tutorial 2019
No ratings yet
Bioinformatics Tutorial 2019
54 pages
Using Genbank and BLAST in The Biology Classroom: Matt Wester
No ratings yet
Using Genbank and BLAST in The Biology Classroom: Matt Wester
9 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
Phylogenetic Trees
No ratings yet
Phylogenetic Trees
11 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Laboratory Manual: Biology 3055
No ratings yet
Laboratory Manual: Biology 3055
37 pages
Bioinformatics Scientist - The Comprehensive Guide: Vanguard Professionals
From Everand
Bioinformatics Scientist - The Comprehensive Guide: Vanguard Professionals
Viruti Shivan
No ratings yet
Primer Design For PCR Assignment
100% (1)
Primer Design For PCR Assignment
5 pages
LSM2241 Practical 4: Introduction To BLAST
No ratings yet
LSM2241 Practical 4: Introduction To BLAST
12 pages
CBE 647 Lesson Plan - Sept 2017
No ratings yet
CBE 647 Lesson Plan - Sept 2017
3 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
No ratings yet
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
3 pages
Advanced Perl Techniques for Bioinformatics: Optimizing Data Analysis and Computational Biology
From Everand
Advanced Perl Techniques for Bioinformatics: Optimizing Data Analysis and Computational Biology
Adam Jones
No ratings yet
Bi0505 Lab
No ratings yet
Bi0505 Lab
102 pages
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
No ratings yet
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
11 pages
Bioinformatics Assignment Topic: Phylogenetics Analysis Softwares
No ratings yet
Bioinformatics Assignment Topic: Phylogenetics Analysis Softwares
12 pages
6 Micro Arrays
100% (1)
6 Micro Arrays
60 pages
Genomic Technologies in Clinical Diagnostics - Glossary: Term Alignment Allele
No ratings yet
Genomic Technologies in Clinical Diagnostics - Glossary: Term Alignment Allele
7 pages
Genomic DNA Libraries For Shotgun Sequencing Projects
No ratings yet
Genomic DNA Libraries For Shotgun Sequencing Projects
40 pages
Lecture 1: INTRODUCTION: A/Prof. Ly Le School of Biotechnology Email: Office: RM 705
100% (1)
Lecture 1: INTRODUCTION: A/Prof. Ly Le School of Biotechnology Email: Office: RM 705
43 pages
1 - Introduction To Computational Biology
No ratings yet
1 - Introduction To Computational Biology
22 pages
Single Nucleotide Polymorphism Analysis
No ratings yet
Single Nucleotide Polymorphism Analysis
34 pages
Bioinformatics - Group21 - Report - Application of Bioinformatics in Agriculture
No ratings yet
Bioinformatics - Group21 - Report - Application of Bioinformatics in Agriculture
11 pages
VRsec BIOINFORMATICS
No ratings yet
VRsec BIOINFORMATICS
2 pages
MSC Bioinformatics Syllabus
No ratings yet
MSC Bioinformatics Syllabus
42 pages
Next Generation
No ratings yet
Next Generation
5 pages
Bioinformatics Session1
No ratings yet
Bioinformatics Session1
35 pages
Genome Mapping
No ratings yet
Genome Mapping
8 pages
Exer 5 - BIOINFORMATICS
No ratings yet
Exer 5 - BIOINFORMATICS
21 pages
DNA Sequencing at 40 - Past Present and Future
No ratings yet
DNA Sequencing at 40 - Past Present and Future
10 pages
FASTA
No ratings yet
FASTA
33 pages
Genomics: A New Revolution in Science:: An Introduction To Promises and Ethical Considerations by Genome Alberta
100% (1)
Genomics: A New Revolution in Science:: An Introduction To Promises and Ethical Considerations by Genome Alberta
66 pages
Advanced PCR - Methods and Applications
No ratings yet
Advanced PCR - Methods and Applications
21 pages
Instruction Manual, Iscript Select cDNA Synthesis Kit, Rev B
No ratings yet
Instruction Manual, Iscript Select cDNA Synthesis Kit, Rev B
2 pages
Blast2Go Tutorial
No ratings yet
Blast2Go Tutorial
31 pages
BIOINFORMATICS
100% (1)
BIOINFORMATICS
4 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Omics-Based On Science, Technology, and Applications Omics
0% (1)
Omics-Based On Science, Technology, and Applications Omics
22 pages
Introduction To Databases
No ratings yet
Introduction To Databases
7 pages
2-Gen-Sept-14-Inheritance of Complex Disorders 2021 Post Ahead
No ratings yet
2-Gen-Sept-14-Inheritance of Complex Disorders 2021 Post Ahead
61 pages
An Overview of Clinical Molecular Genetics: Rob Elles
No ratings yet
An Overview of Clinical Molecular Genetics: Rob Elles
352 pages
Bioinformatics For Health Care: By-Daniyal Jadhav PRN No - 19010143002
No ratings yet
Bioinformatics For Health Care: By-Daniyal Jadhav PRN No - 19010143002
24 pages
Unit 1: Structure Determination: Protein Structure Database PDB PDB File Format Ramachandran Plot
No ratings yet
Unit 1: Structure Determination: Protein Structure Database PDB PDB File Format Ramachandran Plot
33 pages
An Introduction On Bioinformatics
No ratings yet
An Introduction On Bioinformatics
66 pages
Lab Act 3
No ratings yet
Lab Act 3
5 pages
Chapter 7 - Unit Test (Redox Reaction)
No ratings yet
Chapter 7 - Unit Test (Redox Reaction)
2 pages
Understanding Culture Society and Politics
No ratings yet
Understanding Culture Society and Politics
9 pages
Bracing Connection
No ratings yet
Bracing Connection
6 pages
Reviewer-in-CALLP
No ratings yet
Reviewer-in-CALLP
6 pages
Travel Speed. A Traffic Stream Measure Based On Travel Time Observed On
No ratings yet
Travel Speed. A Traffic Stream Measure Based On Travel Time Observed On
10 pages
Lesson Plan Template - Integrated Curriculum Unit: Lauren Moss Curriculum Content Areas: Physical Science
No ratings yet
Lesson Plan Template - Integrated Curriculum Unit: Lauren Moss Curriculum Content Areas: Physical Science
5 pages
Fluids HW Chapter 1
No ratings yet
Fluids HW Chapter 1
18 pages
english portfolio
No ratings yet
english portfolio
9 pages
Finish Standard Plastic Material
No ratings yet
Finish Standard Plastic Material
1 page
Harmonic Sequence
No ratings yet
Harmonic Sequence
5 pages
Core Els q1w3
No ratings yet
Core Els q1w3
22 pages
Final Org Theory Module Modified as at 30112024
No ratings yet
Final Org Theory Module Modified as at 30112024
101 pages
FLP Test 1 Keys
No ratings yet
FLP Test 1 Keys
8 pages
Oops C++ Lab Ii CS 2023-26 Practical Docs
No ratings yet
Oops C++ Lab Ii CS 2023-26 Practical Docs
4 pages
Nord Manuals 2649
No ratings yet
Nord Manuals 2649
89 pages
OTK - AZ Buraxılış 2024 ( 4) - Sinif 11
No ratings yet
OTK - AZ Buraxılış 2024 ( 4) - Sinif 11
3 pages
Class 8 Ai - Answer Keys - PDF - Artificial Intelligence - Intelligence (AI) & Semantics
No ratings yet
Class 8 Ai - Answer Keys - PDF - Artificial Intelligence - Intelligence (AI) & Semantics
18 pages
Exercise Science An Introduction To Health and Physical Education PDF
No ratings yet
Exercise Science An Introduction To Health and Physical Education PDF
2 pages
Historical Origins of Education Notes
No ratings yet
Historical Origins of Education Notes
101 pages
Touched By The Demon Nl Hoffmann instant download
No ratings yet
Touched By The Demon Nl Hoffmann instant download
45 pages
Unlocking-Self-Awareness-Mastering-the-Johari-Window
No ratings yet
Unlocking-Self-Awareness-Mastering-the-Johari-Window
8 pages
Totalitarian Government
No ratings yet
Totalitarian Government
12 pages
Scitech Reviewer
No ratings yet
Scitech Reviewer
6 pages
Effect of Metal Coupling O Rusting of Iron
No ratings yet
Effect of Metal Coupling O Rusting of Iron
11 pages
Funcao
No ratings yet
Funcao
3 pages
ISO9001+ISO14001 Management Review Process Overview Sample
No ratings yet
ISO9001+ISO14001 Management Review Process Overview Sample
1 page
Textile Testing & Quality Assurance: Crease Recovery
100% (1)
Textile Testing & Quality Assurance: Crease Recovery
5 pages
Lessons From Molecular Gastronomy (Ruy)
No ratings yet
Lessons From Molecular Gastronomy (Ruy)
11 pages

Worked Example BioMart Ensembl Tutorial

Uploaded by

Worked Example BioMart Ensembl Tutorial

Uploaded by

Data mining in Ensembl with BioMart Worked Example The cow gene encoding Myosin light chain kinase

STEP 1: Go to the Ensembl main page www.ensembl.org

STEP 2: Click on BioMart

STEP 5: Select Chromosome 1

STEP 9: Select, along with the default options, Description

STEP 10: Expand the EXTERNAL panel to select EntrezGene ID.

STEP 11: Click RESULTS at the top to preview the output.

END of BIOMART Worked Example

You might also like