Testing the potential contribution of Wolbachia to speciation when cytoplasmic incompatibility becomes associated with host-related reproductive isolation

Check out this new paper! We evaluated reproductive isolation between different allopatric cherry-infesting Rhagoletis populations across the United States. We found moderate allochronic isolation between populations, no premating isolation, but strong postmating isolation in the form of reduced hatch rate in hybrid crosses involving male flies from the Southwest USA. Both the reciprocal hybrid crosses and the parental cross between Southwest USA parents show normal hatch rates. This unidirectional reduction in hatch rate is consistent with Wolbachia induced cytoplasmic incompatibility. We then genotyped flies for Wolbachia and found all cherry-infesting Rhagoletis are infected with an A-type Wolbachia strain, wCin2. Additionally, a second B-type strain wCin3, was only found in flies from the Southwest USA. It’s likely that this strain causes cytoplasmic incompatibility and the observed reduction in egg hatch. Finally, we evaluated whether Wolbachia CI can couple with the other non-endosymbiont RI to prevent hybridization between the currently allopatric populations in the event of secondary contact. We found that Wolbachia CI will not reduce gene flow enough to prevent hybridization between Southwest USA flies to either Eastern USA or Pacific Northwestern Flies. However, Wolbachia CI may reduce gene flow enough to prevent hybridization between Mexican and either Eastern USA or Pacific Northwestern flies. This is because Mexican flies have a much longer diapause. These results show that Wolbachia may be able to contribute to population divergence and thus the speciation process!


Comparative genome sequencing reveals insights into the dynamics of Wolbachia in native and invasive cherry fruit flies

Check out this new paper! We used whole genome comparisons and found that the invasive cherry-infesting fly (Rhagoletis cingulata) did not transfer a Wolbachia strain to the native European cherry-infesting fly (R. cerasi). This contradicts previous findings that used five MLST markers to compare the Wolbachia strains and highlights the importance of whole genome approaches when comparing closely-related Wolbachia strains.


Adapter and quality trimming Illumina data with Fastp

I recently made a 150 PE Illumnia library using a NEBNext Ultra DNA Library Prep Kit for sequencing on the Hiseq X ten. When my data came back from sequencing, I had no idea how to prepare it for assembly! I  had to learn how to trim adapters and low quality sequences. After evaluating three trimming tools (Trim Galore!, Trimmomatic, and Fastp) I decided on Fastp, mainly because of its speed and ease of use.

First install Fastp on your cluster or your local system. Install with Conda or download a working binary (see the Fastp github for detailed directions)

conda install -c bioconda fastp
wget http://opengene.org/fastp/fastp
chmod a+x ./fastp

Once installed you should skim the Fastp github to learn how to use the program. Fastp can both trim adapters and low quality reads.  Ideally you know the adapters you used so you can trim them. After emailing with NEB customer support, I found that NEBNext library adapters resemble TruSeq adapters and can be trimmed similarly.



Essentially, you feed your raw read(s) into Fastp, the adapters, the output file name(s), and then set the appropriate flags.  Here is an example:

First I set my variables (input and output):


Then I call Fastp, input my variables, input adapters, and set flags. Here I opted to trim sequences below Q20 (-q 20), shorter than 80 bp (–length_required 80), then I moved a sliding window from front to tail and tail to front trimming if the mean quality drops below Q20. I choose rather stringent setting here because I have over 100X coverage. You should modify your settings dependent on your data and goals.

cd ${DIR}
fastp -i $IN1 -I $IN2 -o ${OUT1} -O ${OUT2} \
-q 20 --length_required 80 --cut_tail --cut_front \
--cut_mean_quality 20

That’s it! Hope this helped! FastP is a very powerful program and can do a lot more than what I demonstrated here so be sure to check out its other options.

How to update or install your local NCBI BLAST database in a Unix shell using update_blastdb.pl

I recently updated my local BLAST database and I thought I would revisit the process of installing/updating, but this time using the included update_blastdb.pl script.

First,  make sure the BLAST program is set to your path.  Because I work on a cluster all I  do is load the module.

module load bio/blast+/2.7.1

I then delete the old database folder and make a new folder with the same exact name (this keeps your old  scripts working ). If you are doing a fresh install, just create a new folder.

module load bio/blast+/2.7.1 
rm -r blastdb_folder_name
mkdir blastdb_folder_name

Now use the perl script to download the database of your choice.  The decompress option automatically decompresses the tar.gz files. Depending on what database you choose to download and your internet speeds, this could be a lengthy process.

perl update_blastdb.pl --decompress nt

I am also downloading the taxonomy database to know more about my BLAST hits. You have to manually unpack this database.

module load bio/blast+/2.7.1 
perl update_blastdb.pl taxdb
gunzip taxdb.tar.gz 

That’s it! Now you should have an updated BLAST database.

Installing and querying a local NCBI nucleotide database (nt)

While the online version of the non-redundant nucleotide database (nr/nt) is useful for small scale applications, checking for contamination in an assembly is best done with a local NCBI nt database.  Read along for a guide on how I installed and then queried the NCBI nt database on a unix cluster.

First  you need to make a folder where you will store your entire database and enter that folder.

mkdir NCBI_nt_DB
cd NCBI_nt_DB

Next you need to download the entire nt database from the NCBI website. Note that this database is almost 50 GB in size so make sure that you have sufficient space. This download may take some time depending on the speed of your connection.

wget "ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.??.tar.gz"

Next each tar.gz file has to first be uncompressed and then deleted. Doing this manually will take some time so I used a for loop. After the loop is complete the database is ready! No formatting is needed as it already is formatted.

for file in *.gz
tar -zxvpf "$file"
rm "$file"

To query the new database its important to point blast to the database folder and to the nt index file, so: path_to_database_folder/nt.  An example query looks like so:

blastn -db path_to_the_folder/nt -query fastafile

That’s it! Everything should we working now. With an offline database it’s important to decide on a regular update schedule. I plan on updating my database every 6 months or so. This can be done with a perl script found with the blast+ software or by deleting and downloading the entire database anew.

A primer on PCR

The PCR gods are a fickle sort and it’s an art to appease them. This strange intersection between science and the occult can be at the best of times trying, but fear not for I have braved the trials of PCR and write to offer advice.

To start I am sharing my standard PCR reaction. 10μL may seem like a tiny amount of product, but it’s enough to allow for testing on an agarose gel, amplicon cleanup for sequencing, and evaporation.

I use standard 10-μL PCR reactions, containing:

  • 5μL of PCR MasterMix (Promega, Madison, Wisconsin, USA)
  • 2.5μL of DNA-free H­2O
  • 0.5μL MgCl­­­­­2
  • 0.5μL forward primer
  • 0.5μL reverse primer
  • 1μL of template DNA


I then use touchdown PCR programs optimized for each primer pair to maximize the primer specificity. PCR product is then checked with Agarose gels. If the reaction failed I troubleshoot by trying each step below. Most of the time, diluting the template solves the problem.

  1. Dilute the template
  2. Decrease the specifity of the PCR program
  3. Increase the amount of MasterMix
  4. Increase the size of the reaction
  5. Re-extract template DNA

No-boil Chelex 100 proteinase K genomic extraction.

I previously posted a Chelex 100 extraction protocol that while easy, cheap, and moderately effective calls for a boiling step to degrade the proteinase. The degradation step denatures template DNA causing it to become a single stranded product, which is unsuitable for genomic work and in my experience increases the difficulty of PCR.

A protocol published in Molecular Ecology Resources by Casquet et al. (2012) removes the boiling step. Their published protocol allows for high throughout and minimal cost. To their protocol I have added a grinding step which I believe increases DNA yield. I have successfully extracted and used DNA from just 1 leg of a microlepidopteran species.


Modified Chelex without boiling from Casquet et al. (2012)

  1. Add 10ul of protinease K (20 mg/ml) to each tube.
  2. Add 150ul of 10% Chelex 100 to each tube.
  3. Grind specimens with melted and sterilized pipet tips
  4. Incubate for 24 hours at 55 °C., swirling occasionally.
  5. Spin down to pellet the Chelex, pull DNA from top.