I recently made a 150 PE Illumnia library using a NEBNext Ultra DNA Library Prep Kit for sequencing on the Hiseq X ten. When my data came back from sequencing, I had no idea how to prepare it for assembly! I had to learn how to trim adapters and low quality sequences. After evaluating three trimming tools (Trim Galore!, Trimmomatic, and Fastp) I decided on Fastp, mainly because of its speed and ease of use.
First install Fastp on your cluster or your local system. Install with Conda or download a working binary (see the Fastp github for detailed directions)
conda install -c bioconda fastp #or wget http://opengene.org/fastp/fastp chmod a+x ./fastp
Once installed you should skim the Fastp github to learn how to use the program. Fastp can both trim adapters and low quality reads. Ideally you know the adapters you used so you can trim them. After emailing with NEB customer support, I found that NEBNext library adapters resemble TruSeq adapters and can be trimmed similarly.
NEBNext Adapter Read1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
NEBNext Adapter Read2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
Essentially, you feed your raw read(s) into Fastp, the adapters, the output file name(s), and then set the appropriate flags. Here is an example:
First I set my variables (input and output):
#!/bin/bash DIR=/path_to_working_directory IN1=/path_to_raw_pe_reads_1 IN2=/path_to_raw_pe_reads_1 OUT1=Clean.${IN1} OUT2=Clean.${IN2}
Then I call Fastp, input my variables, input adapters, and set flags. Here I opted to trim sequences below Q20 (-q 20), shorter than 80 bp (–length_required 80), then I moved a sliding window from front to tail and tail to front trimming if the mean quality drops below Q20. I choose rather stringent setting here because I have over 100X coverage. You should modify your settings dependent on your data and goals.
cd ${DIR} fastp -i $IN1 -I $IN2 -o ${OUT1} -O ${OUT2} \ --adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \ --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \ -q 20 --length_required 80 --cut_tail --cut_front \ --cut_mean_quality 20
That’s it! Hope this helped! FastP is a very powerful program and can do a lot more than what I demonstrated here so be sure to check out its other options.