My previous notebook entry contains the file manipulation parts, as well as creating and prepping the genome index for Bismark. This section will mostly contain quality trimming and analysis steps.
First, lets run some fastQC.
setwd("~/Documents/C-virginica-BSSeq/")
system("mkdir untrimmed_fastqc")
mkdir: cannot create directory ‘untrimmed_fastqc’: File exists
raw.file.list <- list.files(pattern = "*.fastq")
for(i in 1:length(raw.file.list)) {
system(paste0("fastqc ", raw.file.list[i], " -o ~/Documents/C-virginica-BSSeq/untrimmed_fastqc"))
}
Started analysis of 2112_lane1_ACAGTG_L001_R1.fastq
Approx 5% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 10% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 15% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 20% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 25% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 30% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 35% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 40% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 45% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 50% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 55% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 60% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 65% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 70% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 75% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 80% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 85% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 90% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 95% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Analysis complete for 2112_lane1_ACAGTG_L001_R1.fastq
Started analysis of 2112_lane1_ATCACG_R1.fastq
Approx 5% complete for 2112_lane1_ATCACG_R1.fastq
Approx 10% complete for 2112_lane1_ATCACG_R1.fastq
Approx 15% complete for 2112_lane1_ATCACG_R1.fastq
Approx 20% complete for 2112_lane1_ATCACG_R1.fastq
Approx 25% complete for 2112_lane1_ATCACG_R1.fastq
Approx 30% complete for 2112_lane1_ATCACG_R1.fastq
Approx 35% complete for 2112_lane1_ATCACG_R1.fastq
Approx 40% complete for 2112_lane1_ATCACG_R1.fastq
Approx 45% complete for 2112_lane1_ATCACG_R1.fastq
Approx 50% complete for 2112_lane1_ATCACG_R1.fastq
Approx 55% complete for 2112_lane1_ATCACG_R1.fastq
Approx 60% complete for 2112_lane1_ATCACG_R1.fastq
Approx 65% complete for 2112_lane1_ATCACG_R1.fastq
Approx 70% complete for 2112_lane1_ATCACG_R1.fastq
Approx 75% complete for 2112_lane1_ATCACG_R1.fastq
Approx 80% complete for 2112_lane1_ATCACG_R1.fastq
Approx 85% complete for 2112_lane1_ATCACG_R1.fastq
Approx 90% complete for 2112_lane1_ATCACG_R1.fastq
Approx 95% complete for 2112_lane1_ATCACG_R1.fastq
Analysis complete for 2112_lane1_ATCACG_R1.fastq
Started analysis of 2112_lane1_CAGATC_R1.fastq
Approx 5% complete for 2112_lane1_CAGATC_R1.fastq
Approx 10% complete for 2112_lane1_CAGATC_R1.fastq
Approx 15% complete for 2112_lane1_CAGATC_R1.fastq
Approx 20% complete for 2112_lane1_CAGATC_R1.fastq
Approx 25% complete for 2112_lane1_CAGATC_R1.fastq
Approx 30% complete for 2112_lane1_CAGATC_R1.fastq
Approx 35% complete for 2112_lane1_CAGATC_R1.fastq
Approx 40% complete for 2112_lane1_CAGATC_R1.fastq
Approx 45% complete for 2112_lane1_CAGATC_R1.fastq
Approx 50% complete for 2112_lane1_CAGATC_R1.fastq
Approx 55% complete for 2112_lane1_CAGATC_R1.fastq
Approx 60% complete for 2112_lane1_CAGATC_R1.fastq
Approx 65% complete for 2112_lane1_CAGATC_R1.fastq
Approx 70% complete for 2112_lane1_CAGATC_R1.fastq
Approx 75% complete for 2112_lane1_CAGATC_R1.fastq
Approx 80% complete for 2112_lane1_CAGATC_R1.fastq
Approx 85% complete for 2112_lane1_CAGATC_R1.fastq
Approx 90% complete for 2112_lane1_CAGATC_R1.fastq
Approx 95% complete for 2112_lane1_CAGATC_R1.fastq
Analysis complete for 2112_lane1_CAGATC_R1.fastq
Started analysis of 2112_lane1_GCCAAT_L001_R1.fastq
Approx 5% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 10% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 15% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 20% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 25% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 30% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 35% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 40% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 45% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 50% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 55% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 60% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 65% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 70% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 75% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 80% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 85% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 90% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 95% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Analysis complete for 2112_lane1_GCCAAT_L001_R1.fastq
Started analysis of 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 5% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 10% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 15% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 20% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 25% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 30% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 35% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 40% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 45% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 50% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 55% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 60% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 65% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 70% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 75% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 80% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 85% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 90% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 95% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Analysis complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Started analysis of 2112_lane1_TTAGGC_L001_R1.fastq
Approx 5% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 10% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 15% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 20% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 25% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 30% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 35% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 40% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 45% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 50% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 55% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 60% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 65% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 70% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 75% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 80% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 85% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 90% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 95% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Analysis complete for 2112_lane1_TTAGGC_L001_R1.fastq
Failed to process untrimmed_fastqc
java.io.FileNotFoundException: untrimmed_fastqc (Is a directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at uk.ac.babraham.FastQC.Sequence.FastQFile.<init>(FastQFile.java:73)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:129)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:102)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)
Then do some trimming with Trimmomatic. Trimmomatic is nice in that it runs FastQC on the trimmed portions in line. Saves a step!
setwd("~/Documents/C-virginica-BSSeq/")
system("mkdir ~/Documents/C-virginica-BSSeq/trimmed_fastqc")
mkdir: cannot create directory ‘/home/srlab/Documents/C-virginica-BSSeq/trimmed_fastqc’: File exists
raw.file.list <- raw.file.list[1:6]
for(i in 1:length(raw.file.list)) {
system(paste0("/home/shared/trimgalore/trim_galore --fastqc --fastqc_args \" -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ \" -q 20 ~/Documents/C-virginica-BSSeq/", raw.file.list[i]))
}
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')
AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ACAGTG_L001_R1.fastq <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 43268 AGATCGGAAGAGC 1000000 4.33
smallRNA 0 TGGAATTCTCGG 1000000 0.00
Nextera 0 CTGTCTCTTATA 1000000 0.00
Using Illumina adapter for trimming (count: 43268). Second best hit was smallRNA (count: 0)
Writing report to '2112_lane1_ACAGTG_L001_R1.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ACAGTG_L001_R1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '
Writing final adapter and quality trimmed output to 2112_lane1_ACAGTG_L001_R1_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ACAGTG_L001_R1.fastq <<<
10000000 sequences processed
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ACAGTG_L001_R1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 684.55 s (40 us/read; 1.52 M reads/minute).
=== Summary ===
Total reads processed: 17,327,210
Reads with adapters: 10,263,059 (59.2%)
Reads written (passing filters): 17,327,210 (100.0%)
Total basepairs processed: 1,750,048,210 bp
Quality-trimmed: 185,921,084 bp (10.6%)
Total written (filtered): 1,261,527,856 bp (72.1%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 10263059 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 28.4%
C: 14.1%
G: 6.2%
T: 13.4%
none/other: 38.0%
Overview of removed sequences
length count expect max.err error counts
1 4187128 4331802.5 0 4187128
2 454105 1082950.6 0 454105
3 173887 270737.7 0 173887
4 132573 67684.4 0 132573
5 112014 16921.1 0 112014
6 112058 4230.3 0 112058
7 118966 1057.6 0 118966
8 75644 264.4 0 75644
9 82132 66.1 0 81258 874
10 91566 16.5 1 90662 904
11 49449 4.1 1 48579 870
12 74402 1.0 1 73557 845
13 54356 0.3 1 53474 882
14 54152 0.3 1 53335 817
15 51633 0.3 1 50745 888
16 45761 0.3 1 44877 884
17 50139 0.3 1 48938 1201
18 49514 0.3 1 48856 658
19 20534 0.3 1 20080 454
20 33290 0.3 1 32710 580
21 29372 0.3 1 28697 675
22 33080 0.3 1 32419 661
23 19223 0.3 1 18693 530
24 21075 0.3 1 20530 545
25 21912 0.3 1 21297 615
26 18258 0.3 1 16907 1351
27 19242 0.3 1 17472 1770
28 22623 0.3 1 19941 2682
29 13645 0.3 1 12102 1543
30 17134 0.3 1 15866 1268
31 8771 0.3 1 7489 1282
32 15705 0.3 1 11988 3717
33 11427 0.3 1 7518 3909
34 11995 0.3 1 9523 2472
35 8576 0.3 1 6341 2235
36 14389 0.3 1 7992 6397
37 11073 0.3 1 7217 3856
38 10008 0.3 1 8264 1744
39 4031 0.3 1 2275 1756
40 8813 0.3 1 5164 3649
41 7361 0.3 1 2676 4685
42 8430 0.3 1 3157 5273
43 5950 0.3 1 3071 2879
44 4902 0.3 1 2756 2146
45 4218 0.3 1 2813 1405
46 4226 0.3 1 2704 1522
47 3516 0.3 1 1742 1774
48 3322 0.3 1 1428 1894
49 3799 0.3 1 1840 1959
50 4609 0.3 1 1432 3177
51 5145 0.3 1 1432 3713
52 5288 0.3 1 1634 3654
53 3974 0.3 1 1619 2355
54 4262 0.3 1 522 3740
55 4224 0.3 1 812 3412
56 5448 0.3 1 816 4632
57 8442 0.3 1 1008 7434
58 8455 0.3 1 857 7598
59 5746 0.3 1 910 4836
60 9462 0.3 1 529 8933
61 12883 0.3 1 591 12292
62 34190 0.3 1 673 33517
63 60413 0.3 1 1245 59168
64 20284 0.3 1 1108 19176
65 25390 0.3 1 305 25085
66 57515 0.3 1 449 57066
67 203411 0.3 1 694 202717
68 453643 0.3 1 1657 451986
69 1544843 0.3 1 2603 1542240
70 698942 0.3 1 5353 693589
71 251952 0.3 1 1270 250682
72 90056 0.3 1 392 89664
73 25840 0.3 1 109 25731
74 13657 0.3 1 61 13596
75 7878 0.3 1 44 7834
76 6202 0.3 1 49 6153
77 8053 0.3 1 48 8005
78 8207 0.3 1 47 8160
79 7538 0.3 1 57 7481
80 6582 0.3 1 19 6563
81 5978 0.3 1 16 5962
82 5811 0.3 1 12 5799
83 5628 0.3 1 13 5615
84 5305 0.3 1 15 5290
85 5343 0.3 1 17 5326
86 5444 0.3 1 29 5415
87 5906 0.3 1 18 5888
88 5690 0.3 1 25 5665
89 5649 0.3 1 40 5609
90 6103 0.3 1 16 6087
91 6211 0.3 1 2 6209
92 6664 0.3 1 13 6651
93 6982 0.3 1 4 6978
94 7732 0.3 1 11 7721
95 7979 0.3 1 4 7975
96 9450 0.3 1 2 9448
97 10273 0.3 1 1 10272
98 11315 0.3 1 2 11313
99 13205 0.3 1 1 13204
100 26252 0.3 1 2 26250
101 108226 0.3 1 4 108222
RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ACAGTG_L001_R1.fastq
=============================================
17327210 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 3919105 (22.6%)
>>> Now running FastQC on the data <<<
Started analysis of 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 5% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 10% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 15% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 20% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 25% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 30% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 35% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 40% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 45% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 50% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 55% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 60% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 65% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 70% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 75% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 80% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 85% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 90% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 95% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Analysis complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')
AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ATCACG_R1.fastq <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 33011 AGATCGGAAGAGC 1000000 3.30
smallRNA 1 TGGAATTCTCGG 1000000 0.00
Nextera 1 CTGTCTCTTATA 1000000 0.00
Using Illumina adapter for trimming (count: 33011). Second best hit was smallRNA (count: 1)
Writing report to '2112_lane1_ATCACG_R1.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ATCACG_R1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '
Writing final adapter and quality trimmed output to 2112_lane1_ATCACG_R1_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ATCACG_R1.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ATCACG_R1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1309.46 s (39 us/read; 1.53 M reads/minute).
=== Summary ===
Total reads processed: 33,409,414
Reads with adapters: 17,611,436 (52.7%)
Reads written (passing filters): 33,409,414 (100.0%)
Total basepairs processed: 3,374,350,814 bp
Quality-trimmed: 220,159,975 bp (6.5%)
Total written (filtered): 2,847,951,372 bp (84.4%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17611436 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 36.3%
C: 16.1%
G: 7.6%
T: 18.9%
none/other: 21.1%
Overview of removed sequences
length count expect max.err error counts
1 10442109 8352353.5 0 10442109
2 858531 2088088.4 0 858531
3 323059 522022.1 0 323059
4 208999 130505.5 0 208999
5 156468 32626.4 0 156468
6 150095 8156.6 0 150095
7 150683 2039.1 0 150683
8 112087 509.8 0 112087
9 121119 127.4 0 119789 1330
10 120185 31.9 1 118496 1689
11 82522 8.0 1 80895 1627
12 102396 2.0 1 100840 1556
13 83339 0.5 1 81773 1566
14 80199 0.5 1 78593 1606
15 80591 0.5 1 78931 1660
16 71183 0.5 1 69586 1597
17 78307 0.5 1 76360 1947
18 77111 0.5 1 75828 1283
19 37842 0.5 1 36894 948
20 53140 0.5 1 51938 1202
21 48734 0.5 1 47537 1197
22 51325 0.5 1 50222 1103
23 35379 0.5 1 34409 970
24 35467 0.5 1 34472 995
25 36003 0.5 1 34999 1004
26 29489 0.5 1 27823 1666
27 30967 0.5 1 28456 2511
28 35485 0.5 1 31877 3608
29 22082 0.5 1 19860 2222
30 27130 0.5 1 25572 1558
31 12798 0.5 1 11146 1652
32 21740 0.5 1 16029 5711
33 20199 0.5 1 13434 6765
34 19003 0.5 1 13472 5531
35 14389 0.5 1 11153 3236
36 11969 0.5 1 10095 1874
37 13574 0.5 1 8671 4903
38 13729 0.5 1 9035 4694
39 10171 0.5 1 6936 3235
40 11802 0.5 1 6610 5192
41 10036 0.5 1 4373 5663
42 10201 0.5 1 3897 6304
43 7546 0.5 1 3982 3564
44 7277 0.5 1 3278 3999
45 6507 0.5 1 3634 2873
46 5774 0.5 1 3093 2681
47 5356 0.5 1 1998 3358
48 4427 0.5 1 1669 2758
49 4043 0.5 1 1934 2109
50 5248 0.5 1 1453 3795
51 7087 0.5 1 1453 5634
52 6646 0.5 1 1797 4849
53 4822 0.5 1 1667 3155
54 5450 0.5 1 473 4977
55 5173 0.5 1 783 4390
56 6833 0.5 1 755 6078
57 10395 0.5 1 884 9511
58 11086 0.5 1 751 10335
59 7377 0.5 1 898 6479
60 12412 0.5 1 523 11889
61 16010 0.5 1 614 15396
62 40681 0.5 1 636 40045
63 71709 0.5 1 1313 70396
64 22353 0.5 1 1201 21152
65 27766 0.5 1 296 27470
66 60315 0.5 1 483 59832
67 205212 0.5 1 736 204476
68 445189 0.5 1 1601 443588
69 1354272 0.5 1 2421 1351851
70 645444 0.5 1 4623 640821
71 244073 0.5 1 1185 242888
72 91085 0.5 1 384 90701
73 26529 0.5 1 148 26381
74 13928 0.5 1 82 13846
75 8001 0.5 1 49 7952
76 6494 0.5 1 53 6441
77 8716 0.5 1 54 8662
78 8523 0.5 1 58 8465
79 8002 0.5 1 52 7950
80 6802 0.5 1 17 6785
81 6065 0.5 1 10 6055
82 5812 0.5 1 13 5799
83 5192 0.5 1 17 5175
84 4934 0.5 1 16 4918
85 4964 0.5 1 16 4948
86 5251 0.5 1 23 5228
87 5652 0.5 1 26 5626
88 5278 0.5 1 23 5255
89 5331 0.5 1 34 5297
90 5719 0.5 1 8 5711
91 6034 0.5 1 9 6025
92 6322 0.5 1 18 6304
93 6631 0.5 1 8 6623
94 7224 0.5 1 12 7212
95 7883 0.5 1 10 7873
96 9041 0.5 1 4 9037
97 9978 0.5 1 1 9977
98 11212 0.5 1 6 11206
99 12879 0.5 1 5 12874
100 25791 0.5 1 9 25782
101 108043 0.5 1 10 108033
RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ATCACG_R1.fastq
=============================================
33409414 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 3771783 (11.3%)
>>> Now running FastQC on the data <<<
Started analysis of 2112_lane1_ATCACG_R1_trimmed.fq
Approx 5% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 10% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 15% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 20% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 25% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 30% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 35% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 40% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 45% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 50% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 55% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 60% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 65% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 70% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 75% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 80% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 85% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 90% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 95% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Analysis complete for 2112_lane1_ATCACG_R1_trimmed.fq
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')
AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_CAGATC_R1.fastq <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 54855 AGATCGGAAGAGC 1000000 5.49
smallRNA 2 TGGAATTCTCGG 1000000 0.00
Nextera 0 CTGTCTCTTATA 1000000 0.00
Using Illumina adapter for trimming (count: 54855). Second best hit was smallRNA (count: 2)
Writing report to '2112_lane1_CAGATC_R1.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_CAGATC_R1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '
Writing final adapter and quality trimmed output to 2112_lane1_CAGATC_R1_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_CAGATC_R1.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_CAGATC_R1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1305.38 s (33 us/read; 1.83 M reads/minute).
=== Summary ===
Total reads processed: 39,780,221
Reads with adapters: 23,668,832 (59.5%)
Reads written (passing filters): 39,780,221 (100.0%)
Total basepairs processed: 4,017,802,321 bp
Quality-trimmed: 316,184,920 bp (7.9%)
Total written (filtered): 3,168,106,786 bp (78.9%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 23668832 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 33.0%
C: 15.7%
G: 6.0%
T: 18.1%
none/other: 27.2%
Overview of removed sequences
length count expect max.err error counts
1 11462116 9945055.2 0 11462116
2 913629 2486263.8 0 913629
3 452027 621566.0 0 452027
4 350618 155391.5 0 350618
5 306933 38847.9 0 306933
6 287673 9712.0 0 287673
7 281622 2428.0 0 281622
8 223594 607.0 0 223594
9 240116 151.7 0 238205 1911
10 227281 37.9 1 224750 2531
11 163868 9.5 1 161288 2580
12 192839 2.4 1 190444 2395
13 162948 0.6 1 160373 2575
14 155026 0.6 1 152612 2414
15 156135 0.6 1 153474 2661
16 136091 0.6 1 133382 2709
17 150211 0.6 1 146876 3335
18 140793 0.6 1 138808 1985
19 77443 0.6 1 75897 1546
20 101254 0.6 1 99368 1886
21 93085 0.6 1 91238 1847
22 92927 0.6 1 91104 1823
23 72275 0.6 1 70493 1782
24 72096 0.6 1 70374 1722
25 70407 0.6 1 68697 1710
26 59212 0.6 1 56603 2609
27 60402 0.6 1 56761 3641
28 69540 0.6 1 64651 4889
29 44622 0.6 1 41133 3489
30 56254 0.6 1 53841 2413
31 26735 0.6 1 24281 2454
32 44698 0.6 1 38863 5835
33 37973 0.6 1 24758 13215
34 33790 0.6 1 29226 4564
35 32081 0.6 1 27180 4901
36 29448 0.6 1 22077 7371
37 24821 0.6 1 20313 4508
38 22225 0.6 1 17979 4246
39 18973 0.6 1 16137 2836
40 18644 0.6 1 12986 5658
41 17506 0.6 1 12014 5492
42 18109 0.6 1 11098 7011
43 14861 0.6 1 10066 4795
44 15499 0.6 1 8463 7036
45 14330 0.6 1 9724 4606
46 11858 0.6 1 7524 4334
47 10333 0.6 1 4854 5479
48 9714 0.6 1 3768 5946
49 9608 0.6 1 4719 4889
50 9483 0.6 1 3478 6005
51 14031 0.6 1 3385 10646
52 13293 0.6 1 4130 9163
53 8936 0.6 1 3526 5410
54 9880 0.6 1 942 8938
55 9118 0.6 1 1651 7467
56 13037 0.6 1 1504 11533
57 18891 0.6 1 1702 17189
58 20513 0.6 1 1377 19136
59 11927 0.6 1 1601 10326
60 19127 0.6 1 774 18353
61 23847 0.6 1 954 22893
62 75729 0.6 1 1023 74706
63 106543 0.6 1 2066 104477
64 28953 0.6 1 1537 27416
65 37265 0.6 1 412 36853
66 83783 0.6 1 586 83197
67 294910 0.6 1 1013 293897
68 680760 0.6 1 2305 678455
69 2319080 0.6 1 3770 2315310
70 1329812 0.6 1 8452 1321360
71 474084 0.6 1 2222 471862
72 166586 0.6 1 691 165895
73 46783 0.6 1 227 46556
74 24031 0.6 1 123 23908
75 13845 0.6 1 81 13764
76 11129 0.6 1 88 11041
77 15028 0.6 1 90 14938
78 14602 0.6 1 103 14499
79 13582 0.6 1 91 13491
80 11830 0.6 1 47 11783
81 10270 0.6 1 32 10238
82 9814 0.6 1 16 9798
83 9005 0.6 1 15 8990
84 8662 0.6 1 18 8644
85 8888 0.6 1 26 8862
86 9233 0.6 1 24 9209
87 9845 0.6 1 42 9803
88 9350 0.6 1 37 9313
89 9508 0.6 1 55 9453
90 10077 0.6 1 20 10057
91 10706 0.6 1 17 10689
92 11285 0.6 1 25 11260
93 11770 0.6 1 9 11761
94 12970 0.6 1 17 12953
95 13873 0.6 1 6 13867
96 16173 0.6 1 6 16167
97 17692 0.6 1 7 17685
98 19791 0.6 1 6 19785
99 22589 0.6 1 7 22582
100 45074 0.6 1 7 45067
101 187596 0.6 1 13 187583
RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_CAGATC_R1.fastq
=============================================
39780221 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 6470690 (16.3%)
>>> Now running FastQC on the data <<<
Started analysis of 2112_lane1_CAGATC_R1_trimmed.fq
Approx 5% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 10% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 15% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 20% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 25% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 30% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 35% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 40% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 45% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 50% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 55% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 60% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 65% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 70% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 75% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 80% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 85% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 90% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 95% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Analysis complete for 2112_lane1_CAGATC_R1_trimmed.fq
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')
AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_GCCAAT_L001_R1.fastq <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 34053 AGATCGGAAGAGC 1000000 3.41
Nextera 5 CTGTCTCTTATA 1000000 0.00
smallRNA 0 TGGAATTCTCGG 1000000 0.00
Using Illumina adapter for trimming (count: 34053). Second best hit was Nextera (count: 5)
Writing report to '2112_lane1_GCCAAT_L001_R1.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_GCCAAT_L001_R1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '
Writing final adapter and quality trimmed output to 2112_lane1_GCCAAT_L001_R1_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_GCCAAT_L001_R1.fastq <<<
10000000 sequences processed
20000000 sequences processed
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_GCCAAT_L001_R1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 611.36 s (25 us/read; 2.43 M reads/minute).
=== Summary ===
Total reads processed: 24,792,684
Reads with adapters: 13,652,723 (55.1%)
Reads written (passing filters): 24,792,684 (100.0%)
Total basepairs processed: 2,504,061,084 bp
Quality-trimmed: 162,739,913 bp (6.5%)
Total written (filtered): 2,103,508,344 bp (84.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 13652723 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 37.8%
C: 17.3%
G: 6.2%
T: 17.8%
none/other: 21.1%
Overview of removed sequences
length count expect max.err error counts
1 8285474 6198171.0 0 8285474
2 559416 1549542.8 0 559416
3 216338 387385.7 0 216338
4 143815 96846.4 0 143815
5 117955 24211.6 0 117955
6 111642 6052.9 0 111642
7 108539 1513.2 0 108539
8 86050 378.3 0 86050
9 91517 94.6 0 90714 803
10 88443 23.6 1 87249 1194
11 62625 5.9 1 61504 1121
12 74455 1.5 1 73328 1127
13 63753 0.4 1 62538 1215
14 59311 0.4 1 58269 1042
15 59539 0.4 1 58231 1308
16 55250 0.4 1 54149 1101
17 58732 0.4 1 57225 1507
18 58521 0.4 1 57593 928
19 27839 0.4 1 27131 708
20 40399 0.4 1 39530 869
21 37610 0.4 1 36794 816
22 34021 0.4 1 33233 788
23 30471 0.4 1 29631 840
24 29774 0.4 1 28829 945
25 28103 0.4 1 27162 941
26 24228 0.4 1 22988 1240
27 23744 0.4 1 22120 1624
28 25066 0.4 1 22672 2394
29 19955 0.4 1 18259 1696
30 20569 0.4 1 19245 1324
31 12893 0.4 1 11639 1254
32 17686 0.4 1 14475 3211
33 16816 0.4 1 10050 6766
34 13947 0.4 1 12215 1732
35 10733 0.4 1 8237 2496
36 10850 0.4 1 8775 2075
37 10956 0.4 1 8744 2212
38 8624 0.4 1 6206 2418
39 7255 0.4 1 5843 1412
40 7962 0.4 1 5111 2851
41 8308 0.4 1 4244 4064
42 8798 0.4 1 4247 4551
43 6029 0.4 1 3510 2519
44 6198 0.4 1 3128 3070
45 5386 0.4 1 3247 2139
46 4678 0.4 1 2600 2078
47 3677 0.4 1 1794 1883
48 3427 0.4 1 1496 1931
49 3113 0.4 1 1630 1483
50 3613 0.4 1 1203 2410
51 5273 0.4 1 1201 4072
52 4675 0.4 1 1313 3362
53 3187 0.4 1 1168 2019
54 3635 0.4 1 364 3271
55 3449 0.4 1 596 2853
56 5070 0.4 1 589 4481
57 7323 0.4 1 647 6676
58 8103 0.4 1 525 7578
59 4762 0.4 1 627 4135
60 7877 0.4 1 311 7566
61 10083 0.4 1 362 9721
62 26621 0.4 1 392 26229
63 44918 0.4 1 850 44068
64 14573 0.4 1 727 13846
65 18391 0.4 1 213 18178
66 41246 0.4 1 289 40957
67 143024 0.4 1 520 142504
68 320571 0.4 1 1156 319415
69 1097289 0.4 1 1900 1095389
70 537660 0.4 1 4063 533597
71 189910 0.4 1 942 188968
72 67056 0.4 1 296 66760
73 18900 0.4 1 81 18819
74 10062 0.4 1 58 10004
75 5888 0.4 1 47 5841
76 4721 0.4 1 38 4683
77 5866 0.4 1 38 5828
78 5884 0.4 1 60 5824
79 5522 0.4 1 47 5475
80 4957 0.4 1 19 4938
81 4479 0.4 1 12 4467
82 4324 0.4 1 9 4315
83 4003 0.4 1 12 3991
84 3995 0.4 1 13 3982
85 3853 0.4 1 12 3841
86 4026 0.4 1 16 4010
87 4437 0.4 1 22 4415
88 4100 0.4 1 15 4085
89 4291 0.4 1 20 4271
90 4505 0.4 1 5 4500
91 4791 0.4 1 9 4782
92 5060 0.4 1 16 5044
93 5354 0.4 1 4 5350
94 5880 0.4 1 6 5874
95 6341 0.4 1 5 6336
96 7141 0.4 1 4 7137
97 8080 0.4 1 1 8079
98 8935 0.4 1 1 8934
99 10431 0.4 1 2 10429
100 20510 0.4 1 3 20507
101 87588 0.4 1 7 87581
RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_GCCAAT_L001_R1.fastq
=============================================
24792684 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 2906782 (11.7%)
>>> Now running FastQC on the data <<<
Started analysis of 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 5% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 10% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 15% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 20% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 25% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 30% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 35% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 40% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 45% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 50% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 55% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 60% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 65% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 70% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 75% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 80% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 85% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 90% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 95% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Analysis complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')
AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TGACCA_L001_R1_001.fastq <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 91573 AGATCGGAAGAGC 1000000 9.16
smallRNA 1 TGGAATTCTCGG 1000000 0.00
Nextera 0 CTGTCTCTTATA 1000000 0.00
Using Illumina adapter for trimming (count: 91573). Second best hit was smallRNA (count: 1)
Writing report to '2112_lane1_TGACCA_L001_R1_001.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TGACCA_L001_R1_001.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '
Writing final adapter and quality trimmed output to 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TGACCA_L001_R1_001.fastq <<<
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TGACCA_L001_R1_001.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 245.43 s (27 us/read; 2.25 M reads/minute).
=== Summary ===
Total reads processed: 9,188,940
Reads with adapters: 7,714,396 (84.0%)
Reads written (passing filters): 9,188,940 (100.0%)
Total basepairs processed: 928,082,940 bp
Quality-trimmed: 208,253,890 bp (22.4%)
Total written (filtered): 346,199,587 bp (37.3%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 7714396 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 20.0%
C: 5.5%
G: 4.6%
T: 5.6%
none/other: 64.4%
Overview of removed sequences
length count expect max.err error counts
1 1492673 2297235.0 0 1492673
2 128598 574308.8 0 128598
3 46570 143577.2 0 46570
4 41159 35894.3 0 41159
5 37536 8973.6 0 37536
6 41569 2243.4 0 41569
7 48187 560.8 0 48187
8 28646 140.2 0 28646
9 34011 35.1 0 33552 459
10 40836 8.8 1 40446 390
11 21070 2.2 1 20614 456
12 39375 0.5 1 38852 523
13 28223 0.1 1 27712 511
14 30477 0.1 1 30009 468
15 31572 0.1 1 30949 623
16 34270 0.1 1 33591 679
17 32392 0.1 1 31586 806
18 37144 0.1 1 36578 566
19 16557 0.1 1 16121 436
20 27628 0.1 1 27119 509
21 27318 0.1 1 26715 603
22 29792 0.1 1 29190 602
23 20283 0.1 1 19662 621
24 23742 0.1 1 23010 732
25 27134 0.1 1 26401 733
26 21110 0.1 1 19750 1360
27 24652 0.1 1 22661 1991
28 29900 0.1 1 27077 2823
29 20898 0.1 1 18877 2021
30 26378 0.1 1 24966 1412
31 11675 0.1 1 10195 1480
32 24841 0.1 1 20740 4101
33 19835 0.1 1 13337 6498
34 22072 0.1 1 17701 4371
35 17506 0.1 1 15765 1741
36 19797 0.1 1 15998 3799
37 16923 0.1 1 12762 4161
38 14306 0.1 1 12662 1644
39 26208 0.1 1 13199 13009
40 19199 0.1 1 10916 8283
41 20520 0.1 1 10636 9884
42 18073 0.1 1 9522 8551
43 14909 0.1 1 9560 5349
44 16890 0.1 1 8049 8841
45 15620 0.1 1 10542 5078
46 12395 0.1 1 8215 4180
47 10591 0.1 1 5173 5418
48 9609 0.1 1 4173 5436
49 9523 0.1 1 5244 4279
50 9295 0.1 1 3891 5404
51 13901 0.1 1 3851 10050
52 13243 0.1 1 5125 8118
53 8982 0.1 1 4127 4855
54 9199 0.1 1 1093 8106
55 8068 0.1 1 1833 6235
56 12312 0.1 1 1891 10421
57 16344 0.1 1 2128 14216
58 16997 0.1 1 1515 15482
59 9772 0.1 1 1717 8055
60 15043 0.1 1 825 14218
61 18869 0.1 1 914 17955
62 60104 0.1 1 885 59219
63 81495 0.1 1 1695 79800
64 21625 0.1 1 1087 20538
65 27271 0.1 1 267 27004
66 62437 0.1 1 500 61937
67 216127 0.1 1 644 215483
68 512969 0.1 1 1417 511552
69 1759541 0.1 1 2378 1757163
70 1031227 0.1 1 5431 1025796
71 385531 0.1 1 1504 384027
72 136018 0.1 1 455 135563
73 37324 0.1 1 158 37166
74 18788 0.1 1 90 18698
75 10612 0.1 1 56 10556
76 8749 0.1 1 51 8698
77 11805 0.1 1 34 11771
78 11502 0.1 1 51 11451
79 10571 0.1 1 56 10515
80 9179 0.1 1 14 9165
81 7981 0.1 1 17 7964
82 7509 0.1 1 12 7497
83 6703 0.1 1 18 6685
84 6606 0.1 1 19 6587
85 6518 0.1 1 27 6491
86 6887 0.1 1 43 6844
87 7238 0.1 1 38 7200
88 6963 0.1 1 48 6915
89 7086 0.1 1 61 7025
90 7376 0.1 1 7 7369
91 7796 0.1 1 10 7786
92 8320 0.1 1 10 8310
93 8758 0.1 1 3 8755
94 9533 0.1 1 11 9522
95 10141 0.1 1 3 10138
96 11512 0.1 1 3 11509
97 13094 0.1 1 5 13089
98 14343 0.1 1 3 14340
99 16413 0.1 1 1 16412
100 32725 0.1 1 10 32715
101 135802 0.1 1 12 135790
RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TGACCA_L001_R1_001.fastq
=============================================
9188940 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 4981933 (54.2%)
>>> Now running FastQC on the data <<<
Started analysis of 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 5% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 10% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 15% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 20% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 25% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 30% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 35% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 40% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 45% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 50% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 55% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 60% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 65% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 70% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 75% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 80% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 85% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 90% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 95% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Analysis complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')
AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TTAGGC_L001_R1.fastq <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 75439 AGATCGGAAGAGC 1000000 7.54
smallRNA 0 TGGAATTCTCGG 1000000 0.00
Nextera 0 CTGTCTCTTATA 1000000 0.00
Using Illumina adapter for trimming (count: 75439). Second best hit was smallRNA (count: 0)
Writing report to '2112_lane1_TTAGGC_L001_R1.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TTAGGC_L001_R1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '
Writing final adapter and quality trimmed output to 2112_lane1_TTAGGC_L001_R1_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TTAGGC_L001_R1.fastq <<<
10000000 sequences processed
20000000 sequences processed
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TTAGGC_L001_R1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 648.12 s (25 us/read; 2.38 M reads/minute).
=== Summary ===
Total reads processed: 25,752,634
Reads with adapters: 15,393,563 (59.8%)
Reads written (passing filters): 25,752,634 (100.0%)
Total basepairs processed: 2,601,016,034 bp
Quality-trimmed: 168,252,428 bp (6.5%)
Total written (filtered): 2,140,459,636 bp (82.3%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 15393563 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 35.3%
C: 17.2%
G: 7.1%
T: 19.5%
none/other: 20.9%
Overview of removed sequences
length count expect max.err error counts
1 7645997 6438158.5 0 7645997
2 534336 1609539.6 0 534336
3 291105 402384.9 0 291105
4 245902 100596.2 0 245902
5 222809 25149.1 0 222809
6 221201 6287.3 0 221201
7 227983 1571.8 0 227983
8 176189 393.0 0 176189
9 191707 98.2 0 190234 1473
10 196407 24.6 1 194560 1847
11 135415 6.1 1 133356 2059
12 172560 1.5 1 170422 2138
13 142990 0.4 1 140803 2187
14 139835 0.4 1 137590 2245
15 140839 0.4 1 138493 2346
16 128017 0.4 1 125580 2437
17 141709 0.4 1 138594 3115
18 145963 0.4 1 144065 1898
19 61855 0.4 1 60451 1404
20 97268 0.4 1 95653 1615
21 90319 0.4 1 88627 1692
22 85905 0.4 1 84316 1589
23 72813 0.4 1 71243 1570
24 73060 0.4 1 71571 1489
25 65690 0.4 1 64250 1440
26 57685 0.4 1 55768 1917
27 54892 0.4 1 52716 2176
28 60126 0.4 1 57259 2867
29 41472 0.4 1 39372 2100
30 49205 0.4 1 47594 1611
31 23857 0.4 1 22268 1589
32 37275 0.4 1 33618 3657
33 26224 0.4 1 21303 4921
34 29760 0.4 1 25969 3791
35 22547 0.4 1 20747 1800
36 23176 0.4 1 19483 3693
37 18674 0.4 1 15930 2744
38 16053 0.4 1 13708 2345
39 13968 0.4 1 12793 1175
40 11681 0.4 1 9163 2518
41 10475 0.4 1 8039 2436
42 10739 0.4 1 7353 3386
43 8516 0.4 1 6429 2087
44 7285 0.4 1 5372 1913
45 6868 0.4 1 5558 1310
46 6169 0.4 1 4828 1341
47 4628 0.4 1 3103 1525
48 4038 0.4 1 2397 1641
49 4377 0.4 1 2706 1671
50 4702 0.4 1 2022 2680
51 5287 0.4 1 1938 3349
52 5055 0.4 1 1990 3065
53 3894 0.4 1 1797 2097
54 3728 0.4 1 633 3095
55 3840 0.4 1 948 2892
56 4945 0.4 1 854 4091
57 7122 0.4 1 904 6218
58 7223 0.4 1 746 6477
59 5119 0.4 1 898 4221
60 7815 0.4 1 497 7318
61 10538 0.4 1 562 9976
62 26673 0.4 1 635 26038
63 53981 0.4 1 963 53018
64 17769 0.4 1 917 16852
65 22283 0.4 1 264 22019
66 49178 0.4 1 385 48793
67 173176 0.4 1 558 172618
68 380184 0.4 1 1354 378830
69 1270634 0.4 1 1947 1268687
70 552446 0.4 1 4168 548278
71 201233 0.4 1 913 200320
72 72352 0.4 1 272 72080
73 21277 0.4 1 91 21186
74 11114 0.4 1 52 11062
75 6478 0.4 1 37 6441
76 5114 0.4 1 24 5090
77 6703 0.4 1 33 6670
78 6637 0.4 1 40 6597
79 6056 0.4 1 46 6010
80 5622 0.4 1 26 5596
81 5152 0.4 1 13 5139
82 4865 0.4 1 8 4857
83 4602 0.4 1 14 4588
84 4362 0.4 1 19 4343
85 4613 0.4 1 17 4596
86 4742 0.4 1 29 4713
87 5120 0.4 1 19 5101
88 4899 0.4 1 32 4867
89 4742 0.4 1 24 4718
90 5284 0.4 1 5 5279
91 5575 0.4 1 11 5564
92 5872 0.4 1 12 5860
93 6133 0.4 1 4 6129
94 6580 0.4 1 8 6572
95 7236 0.4 1 6 7230
96 8258 0.4 1 3 8255
97 9049 0.4 1 5 9044
98 10113 0.4 1 1 10112
99 11510 0.4 1 0 11510
100 22958 0.4 1 2 22956
101 96156 0.4 1 5 96151
RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TTAGGC_L001_R1.fastq
=============================================
25752634 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 3234303 (12.6%)
>>> Now running FastQC on the data <<<
Started analysis of 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 5% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 10% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 15% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 20% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 25% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 30% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 35% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 40% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 45% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 50% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 55% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 60% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 65% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 70% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 75% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 80% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 85% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 90% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 95% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Analysis complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq