Adapter and quality trimming Illumina data with Fastp

I recently made a 150 PE Illumnia library using a NEBNext Ultra DNA Library Prep Kit for sequencing on the Hiseq X ten. When my data came back from sequencing, I had no idea how to prepare it for assembly! I  had to learn how to trim adapters and low quality sequences. After evaluating three trimming tools (Trim Galore!, Trimmomatic, and Fastp) I decided on Fastp, mainly because of its speed and ease of use.

First install Fastp on your cluster or your local system. Install with Conda or download a working binary (see the Fastp github for detailed directions)

conda install -c bioconda fastp
#or
wget http://opengene.org/fastp/fastp
chmod a+x ./fastp

Once installed you should skim the Fastp github to learn how to use the program. Fastp can both trim adapters and low quality reads.  Ideally you know the adapters you used so you can trim them. After emailing with NEB customer support, I found that NEBNext library adapters resemble TruSeq adapters and can be trimmed similarly.

NEBNext Adapter Read1:   AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

NEBNext Adapter Read2:   AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

Essentially, you feed your raw read(s) into Fastp, the adapters, the output file name(s), and then set the appropriate flags.  Here is an example:

First I set my variables (input and output):

#!/bin/bash
DIR=/path_to_working_directory
IN1=/path_to_raw_pe_reads_1
IN2=/path_to_raw_pe_reads_1
OUT1=Clean.${IN1}
OUT2=Clean.${IN2}

Then I call Fastp, input my variables, input adapters, and set flags. Here I opted to trim sequences below Q20 (-q 20), shorter than 80 bp (–length_required 80), then I moved a sliding window from front to tail and tail to front trimming if the mean quality drops below Q20. I choose rather stringent setting here because I have over 100X coverage. You should modify your settings dependent on your data and goals.

cd ${DIR}
fastp -i $IN1 -I $IN2 -o ${OUT1} -O ${OUT2} \
--adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
--adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
-q 20 --length_required 80 --cut_tail --cut_front \
--cut_mean_quality 20

That’s it! Hope this helped! FastP is a very powerful program and can do a lot more than what I demonstrated here so be sure to check out its other options.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s