While the online version of the non-redundant nucleotide database (nr/nt) is useful for small scale applications, checking for contamination in an assembly is best done with a local NCBI nt database. Read along for a guide on how I installed and then queried the NCBI nt database on a unix cluster.
First you need to make a folder where you will store your entire database and enter that folder.
mkdir NCBI_nt_DB cd NCBI_nt_DB
Next you need to download the entire nt database from the NCBI website. Note that this database is almost 50 GB in size so make sure that you have sufficient space. This download may take some time depending on the speed of your connection.
Next each tar.gz file has to first be uncompressed and then deleted. Doing this manually will take some time so I used a for loop. After the loop is complete the database is ready! No formatting is needed as it already is formatted.
#/bin/bash for file in *.gz do tar -zxvpf "$file" rm "$file" done
To query the new database its important to point blast to the database folder and to the nt index file, so: path_to_database_folder/nt. An example query looks like so:
blastn -db path_to_the_folder/nt -query fastafile
That’s it! Everything should we working now. With an offline database it’s important to decide on a regular update schedule. I plan on updating my database every 6 months or so. This can be done with a perl script found with the blast+ software or by deleting and downloading the entire database anew.