Skip to main content

NCBI Resources: by sequence

Contains information about the NCBI databases to be used as a teaching tool.

Intro to BLAST

BLAST (Basic Local Alignment Search Tool) is a set of algorithms available at NCBI that allow you to use a DNA, RNA or protein sequence to find similar sequences in the NCBI databases. There are many flavors of BLAST that accomodate different types of sequence, but they operate in similar ways. See below for more details

What is an E-value?

An E-value, aka the “expect value”, is the number of matches you’d expect to get by random chance in a given database/query combo (false positives). Smaller numbers are better because false positives are bad. This depends on the length of the query and the size of the database.

For example, if you search two databases with the same sequence, you would get different expect values depending on the size of the databases. The E-value would be bigger in the larger databases, because it provides more opportunity for false positives.

What is an Alignment Score?

An alignment score is a measure of how well the query and a given search result (subject) are aligned.

The score is determined by the scoring parameters like match/mismatch scores, which are usually positive for matching bases and negatives for mismatched bases, and gap penalties, which have negative scores for opening and extending gaps in the alignment. 

Higher alignment scores are better, but they depend heavily on sequence length, so don't try to compare alignment scores between different BLAST queries.

What do you need to know before you can BLAST?

Before you can use BLAST, here's the information you need:

  1. Query: Nucleotide or protein sequence
  2. Type of result (aka subject or sbjct): are you trying to pull out nucleotides or proteins?
  3. Algorithm: depends on the combination of Query and Subject from above
    • Nucleotide - search for nucleotides using nucleotides
    • Protein - search for proteins using proteins
    • blastx - search for protein using a translated nucleotide
    • tblastn - search for translated nucleotides using protein
    • tblastx- search for translated nucleotides using translated nucleotides
  4. The Database you want to search
    • Organism specific? (human, mouse, rat)
    • limit to certain sequences
    • exclude certain sequences
    • nr - all nonredundant sequences
    • You can basically create a custom DB by selecting options in "choose search set"

Nucleotide BLAST example

Your advisor walks in to the lab and says:

"I have this nucleotide sequence derived from wild-type mouse DNA. What gene is it? What does this gene do?"

CTACCGTAGACATACAAAACTTAAAGCTTTTCACACTAACTCAATTCTTATAGTAATTTTATTTGCCCTGTGCTAAAACCTTAAGACAACCCTTT

TCAATGTAATTACACAAAAGTGCACAGAAACCTAACTGCTAGCTTCTACTCTGTCTTGAGGCTGCATGTAAAGTTCTAGCTTACGTTCCAAGCTT

GTCCAAGTACGACTCCAGTACATGGTGGCTGCTCTTCTTTGTACCAAAGAGTTCAGATGGTCTCTAGTTCCTTCTCTAATGCTCTGATGTTCCTC

CAATGCTATCAGGTCTGCAATAATGAAGCCCAGTGCACCGAGACACTGGATTTGCTTGTGCAAGCCCTCAGACATTCCAGGACACAACTGGTTA

How do you figure this out? USE BLAST! 

http://blast.ncbi.nlm.nih.gov/Blast.cgi

Entering your query:

  1. Choose the type of BLAST (nucleotide in this case)
  2. Paste sequence into the results box
  3. Name your search in the “Job title” field
  4. Choose search set: (stick with nr for now)
  5. Program Selection: MEGABLAST
  6. Click BLAST!

Creating a BLAST query and interpreting results