How to update or install your local NCBI BLAST database in a Unix shell using update_blastdb.pl

I recently updated my local BLAST database and I thought I would revisit the process of installing/updating, but this time using the included update_blastdb.pl script.

First,  make sure the BLAST program is set to your path.  Because I work on a cluster all I  do is load the module.

 
module load bio/blast+/2.7.1

I then delete the old database folder and make a new folder with the same exact name (this keeps your old  scripts working ). If you are doing a fresh install, just create a new folder.

 
module load bio/blast+/2.7.1 
rm -r blastdb_folder_name
mkdir blastdb_folder_name

Now use the perl script to download the database of your choice.  The decompress option automatically decompresses the tar.gz files. Depending on what database you choose to download and your internet speeds, this could be a lengthy process.

 
perl update_blastdb.pl --decompress nt

I am also downloading the taxonomy database to know more about my BLAST hits. You have to manually unpack this database.

 
module load bio/blast+/2.7.1 
perl update_blastdb.pl taxdb
gunzip taxdb.tar.gz 

That’s it! Now you should have an updated BLAST database.

Advertisements

Installing and querying a local NCBI nucleotide database (nt)

While the online version of the non-redundant nucleotide database (nr/nt) is useful for small scale applications, checking for contamination in an assembly is best done with a local NCBI nt database.  Read along for a guide on how I installed and then queried the NCBI nt database on a unix cluster.

First  you need to make a folder where you will store your entire database and enter that folder.

mkdir NCBI_nt_DB
cd NCBI_nt_DB

Next you need to download the entire nt database from the NCBI website. Note that this database is almost 50 GB in size so make sure that you have sufficient space. This download may take some time depending on the speed of your connection.

wget "ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.??.tar.gz"

Next each tar.gz file has to first be uncompressed and then deleted. Doing this manually will take some time so I used a for loop. After the loop is complete the database is ready! No formatting is needed as it already is formatted.

#/bin/bash
for file in *.gz
do
tar -zxvpf "$file"
rm "$file"
done

To query the new database its important to point blast to the database folder and to the nt index file, so: path_to_database_folder/nt.  An example query looks like so:

blastn -db path_to_the_folder/nt -query fastafile

That’s it! Everything should we working now. With an offline database it’s important to decide on a regular update schedule. I plan on updating my database every 6 months or so. This can be done with a perl script found with the blast+ software or by deleting and downloading the entire database anew.

A primer on PCR

The PCR gods are a fickle sort and it’s an art to appease them. This strange intersection between science and the occult can be at the best of times trying, but fear not for I have braved the trials of PCR and write to offer advice.

To start I am sharing my standard PCR reaction. 10μL may seem like a tiny amount of product, but it’s enough to allow for testing on an agarose gel, amplicon cleanup for sequencing, and evaporation.

I use standard 10-μL PCR reactions, containing:

  • 5μL of PCR MasterMix (Promega, Madison, Wisconsin, USA)
  • 2.5μL of DNA-free H­2O
  • 0.5μL MgCl­­­­­2
  • 0.5μL forward primer
  • 0.5μL reverse primer
  • 1μL of template DNA

 

I then use touchdown PCR programs optimized for each primer pair to maximize the primer specificity. PCR product is then checked with Agarose gels. If the reaction failed I troubleshoot by trying each step below. Most of the time, diluting the template solves the problem.

  1. Dilute the template
  2. Decrease the specifity of the PCR program
  3. Increase the amount of MasterMix
  4. Increase the size of the reaction
  5. Re-extract template DNA

No-boil Chelex 100 proteinase K genomic extraction.

I previously posted a Chelex 100 extraction protocol that while easy, cheap, and moderately effective calls for a boiling step to degrade the proteinase. The degradation step denatures template DNA causing it to become a single stranded product, which is unsuitable for genomic work and in my experience increases the difficulty of PCR.

A protocol published in Molecular Ecology Resources by Casquet et al. (2012) removes the boiling step. Their published protocol allows for high throughout and minimal cost. To their protocol I have added a grinding step which I believe increases DNA yield. I have successfully extracted and used DNA from just 1 leg of a microlepidopteran species.

 

Modified Chelex without boiling from Casquet et al. (2012)

  1. Add 10ul of protinease K (20 mg/ml) to each tube.
  2. Add 150ul of 10% Chelex 100 to each tube.
  3. Grind specimens with melted and sterilized pipet tips
  4. Incubate for 24 hours at 55 °C., swirling occasionally.
  5. Spin down to pellet the Chelex, pull DNA from top.

DNA extraction using Chelex 100 and proteinase K

Let me start by saying that the following extraction protocol is far from perfect. The product is dirty and often degrades at a faster rate than other extraction methods. Though for my work on insect genetics,  I find this protocol ideal because it is fast and cheap.

The extraction process consists of the addition of proteinase -K, Cheelex 100, and heat. Cheelex is used to protect the DNA from the degrading effects of proteinase K , which is added to free DNA from cells. Heat first speeds up the enzymatic degradation of the cells and then stops the enzymes by degradation. The extraction protocol is as follows:

  1. Prepare samples by placing tissues into 1.5 ml Eppendorf tubes, taking care to allow ethanol to evaporate (if the sample was stored in it).
  2. Add 200ul  of 10 % Chelex 100 per tube.
  3. Crush samples in Chelex 100 (I use melted pipette tips as tiny pestles).
  4. Add 1ul proteinase K (10 mg/ml) to each tube.
  5. Crush samples again.
  6. Vortex tubes for 10 seconds.
  7. Incubate on plate at 57 °C for 1 hour (up to 24 hours).
  8. Vortex for 10 seconds
  9. Boil at 95 °C for 5 minutes.
  10. Vortex for 10 Seconds
  11. Centrifuge at max speed (14,000 RPM) for 10 minutes.
  12. Take supernate and put in new labeled Eppendorf tubes.
  13. Create a working dilution (I usually do 1:10 or 1:50).
  14. Store at 4°C.Photo Oct 21, 10 33 26 AM

Evening Primroses

Many North American Mompha species feed on members of the evening primrose family, Onagraceae. Onagraceae is a cosmopolitian family with 17 genera and 655 species, with the center of diversity in the North American Southwest. This region is also a hot-spot for the largest genus within the evening primrose family, Oenothera; with 120 or so species.

For a master’s thesis, surveying Mompha diversity across the entirety of Onagraceae would be impossible because it would take years to complete! I have decided to instead focus my efforts on Oenothera that occur in the North American Southwest

Overlapping ranges, isolated populations, and multiple feeding niches within Oenothera will allow me to evaluate how host plants and biogeography have shaped speciation and distribution of associated Mompha.  

IMG_3334

IMG_3336

Pictures of Oenothera Serrulata