Installing and querying a local NCBI nucleotide database (nt)

While the online version of the non-redundant nucleotide database (nr/nt) is useful for small scale applications, checking for contamination in an assembly is best done with a local NCBI nt database.  Read along for a guide on how I installed and then queried the NCBI nt database on a unix cluster.

First  you need to make a folder where you will store your entire database and enter that folder.

mkdir NCBI_nt_DB
cd NCBI_nt_DB

Next you need to download the entire nt database from the NCBI website. Note that this database is almost 50 GB in size so make sure that you have sufficient space. This download may take some time depending on the speed of your connection.

wget "ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.??.tar.gz"

Next each tar.gz file has to first be uncompressed and then deleted. Doing this manually will take some time so I used a for loop. After the loop is complete the database is ready! No formatting is needed as it already is formatted.

#/bin/bash
for file in *.gz
do
tar -zxvpf "$file"
rm "$file"
done

To query the new database its important to point blast to the database folder and to the nt index file, so: path_to_database_folder/nt.  An example query looks like so:

blastn -db path_to_the_folder/nt -query fastafile

That’s it! Everything should we working now. With an offline database it’s important to decide on a regular update schedule. I plan on updating my database every 6 months or so. This can be done with a perl script found with the blast+ software or by deleting and downloading the entire database anew.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s