My previous notebook entry contains the file manipulation parts, as well as creating and prepping the genome index for Bismark. This section will mostly contain quality trimming and analysis steps.

First, lets run some fastQC.

setwd("~/Documents/C-virginica-BSSeq/")
system("mkdir untrimmed_fastqc")
mkdir: cannot create directory ‘untrimmed_fastqc’: File exists
raw.file.list <- list.files(pattern = "*.fastq")
for(i in 1:length(raw.file.list))   {
  
  system(paste0("fastqc ", raw.file.list[i], " -o ~/Documents/C-virginica-BSSeq/untrimmed_fastqc"))
  
  
}
Started analysis of 2112_lane1_ACAGTG_L001_R1.fastq
Approx 5% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 10% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 15% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 20% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 25% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 30% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 35% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 40% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 45% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 50% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 55% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 60% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 65% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 70% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 75% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 80% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 85% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 90% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Approx 95% complete for 2112_lane1_ACAGTG_L001_R1.fastq
Analysis complete for 2112_lane1_ACAGTG_L001_R1.fastq
Started analysis of 2112_lane1_ATCACG_R1.fastq
Approx 5% complete for 2112_lane1_ATCACG_R1.fastq
Approx 10% complete for 2112_lane1_ATCACG_R1.fastq
Approx 15% complete for 2112_lane1_ATCACG_R1.fastq
Approx 20% complete for 2112_lane1_ATCACG_R1.fastq
Approx 25% complete for 2112_lane1_ATCACG_R1.fastq
Approx 30% complete for 2112_lane1_ATCACG_R1.fastq
Approx 35% complete for 2112_lane1_ATCACG_R1.fastq
Approx 40% complete for 2112_lane1_ATCACG_R1.fastq
Approx 45% complete for 2112_lane1_ATCACG_R1.fastq
Approx 50% complete for 2112_lane1_ATCACG_R1.fastq
Approx 55% complete for 2112_lane1_ATCACG_R1.fastq
Approx 60% complete for 2112_lane1_ATCACG_R1.fastq
Approx 65% complete for 2112_lane1_ATCACG_R1.fastq
Approx 70% complete for 2112_lane1_ATCACG_R1.fastq
Approx 75% complete for 2112_lane1_ATCACG_R1.fastq
Approx 80% complete for 2112_lane1_ATCACG_R1.fastq
Approx 85% complete for 2112_lane1_ATCACG_R1.fastq
Approx 90% complete for 2112_lane1_ATCACG_R1.fastq
Approx 95% complete for 2112_lane1_ATCACG_R1.fastq
Analysis complete for 2112_lane1_ATCACG_R1.fastq
Started analysis of 2112_lane1_CAGATC_R1.fastq
Approx 5% complete for 2112_lane1_CAGATC_R1.fastq
Approx 10% complete for 2112_lane1_CAGATC_R1.fastq
Approx 15% complete for 2112_lane1_CAGATC_R1.fastq
Approx 20% complete for 2112_lane1_CAGATC_R1.fastq
Approx 25% complete for 2112_lane1_CAGATC_R1.fastq
Approx 30% complete for 2112_lane1_CAGATC_R1.fastq
Approx 35% complete for 2112_lane1_CAGATC_R1.fastq
Approx 40% complete for 2112_lane1_CAGATC_R1.fastq
Approx 45% complete for 2112_lane1_CAGATC_R1.fastq
Approx 50% complete for 2112_lane1_CAGATC_R1.fastq
Approx 55% complete for 2112_lane1_CAGATC_R1.fastq
Approx 60% complete for 2112_lane1_CAGATC_R1.fastq
Approx 65% complete for 2112_lane1_CAGATC_R1.fastq
Approx 70% complete for 2112_lane1_CAGATC_R1.fastq
Approx 75% complete for 2112_lane1_CAGATC_R1.fastq
Approx 80% complete for 2112_lane1_CAGATC_R1.fastq
Approx 85% complete for 2112_lane1_CAGATC_R1.fastq
Approx 90% complete for 2112_lane1_CAGATC_R1.fastq
Approx 95% complete for 2112_lane1_CAGATC_R1.fastq
Analysis complete for 2112_lane1_CAGATC_R1.fastq
Started analysis of 2112_lane1_GCCAAT_L001_R1.fastq
Approx 5% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 10% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 15% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 20% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 25% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 30% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 35% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 40% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 45% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 50% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 55% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 60% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 65% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 70% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 75% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 80% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 85% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 90% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Approx 95% complete for 2112_lane1_GCCAAT_L001_R1.fastq
Analysis complete for 2112_lane1_GCCAAT_L001_R1.fastq
Started analysis of 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 5% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 10% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 15% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 20% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 25% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 30% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 35% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 40% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 45% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 50% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 55% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 60% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 65% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 70% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 75% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 80% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 85% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 90% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Approx 95% complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Analysis complete for 2112_lane1_TGACCA_L001_R1_001.fastq
Started analysis of 2112_lane1_TTAGGC_L001_R1.fastq
Approx 5% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 10% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 15% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 20% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 25% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 30% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 35% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 40% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 45% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 50% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 55% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 60% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 65% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 70% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 75% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 80% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 85% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 90% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Approx 95% complete for 2112_lane1_TTAGGC_L001_R1.fastq
Analysis complete for 2112_lane1_TTAGGC_L001_R1.fastq
Failed to process untrimmed_fastqc
java.io.FileNotFoundException: untrimmed_fastqc (Is a directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at uk.ac.babraham.FastQC.Sequence.FastQFile.<init>(FastQFile.java:73)
    at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
    at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:129)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:102)
    at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)

Then do some trimming with Trimmomatic. Trimmomatic is nice in that it runs FastQC on the trimmed portions in line. Saves a step!

setwd("~/Documents/C-virginica-BSSeq/")
system("mkdir ~/Documents/C-virginica-BSSeq/trimmed_fastqc")
mkdir: cannot create directory ‘/home/srlab/Documents/C-virginica-BSSeq/trimmed_fastqc’: File exists
raw.file.list <- raw.file.list[1:6]
for(i in 1:length(raw.file.list))   {
system(paste0("/home/shared/trimgalore/trim_galore --fastqc --fastqc_args \" -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ \" -q 20 ~/Documents/C-virginica-BSSeq/", raw.file.list[i]))
}  
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')


AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ACAGTG_L001_R1.fastq <<)

Found perfect matches for the following adapter sequences:
Adapter type    Count   Sequence    Sequences analysed  Percentage
Illumina    43268   AGATCGGAAGAGC   1000000 4.33
smallRNA    0   TGGAATTCTCGG    1000000 0.00
Nextera 0   CTGTCTCTTATA    1000000 0.00
Using Illumina adapter for trimming (count: 43268). Second best hit was smallRNA (count: 0)

Writing report to '2112_lane1_ACAGTG_L001_R1.fastq_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ACAGTG_L001_R1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '

Writing final adapter and quality trimmed output to 2112_lane1_ACAGTG_L001_R1_trimmed.fq


  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ACAGTG_L001_R1.fastq <<< 
10000000 sequences processed
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ACAGTG_L001_R1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 684.55 s (40 us/read; 1.52 M reads/minute).

=== Summary ===

Total reads processed:              17,327,210
Reads with adapters:                10,263,059 (59.2%)
Reads written (passing filters):    17,327,210 (100.0%)

Total basepairs processed: 1,750,048,210 bp
Quality-trimmed:             185,921,084 bp (10.6%)
Total written (filtered):  1,261,527,856 bp (72.1%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 10263059 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 28.4%
  C: 14.1%
  G: 6.2%
  T: 13.4%
  none/other: 38.0%

Overview of removed sequences
length  count   expect  max.err error counts
1   4187128 4331802.5   0   4187128
2   454105  1082950.6   0   454105
3   173887  270737.7    0   173887
4   132573  67684.4 0   132573
5   112014  16921.1 0   112014
6   112058  4230.3  0   112058
7   118966  1057.6  0   118966
8   75644   264.4   0   75644
9   82132   66.1    0   81258 874
10  91566   16.5    1   90662 904
11  49449   4.1 1   48579 870
12  74402   1.0 1   73557 845
13  54356   0.3 1   53474 882
14  54152   0.3 1   53335 817
15  51633   0.3 1   50745 888
16  45761   0.3 1   44877 884
17  50139   0.3 1   48938 1201
18  49514   0.3 1   48856 658
19  20534   0.3 1   20080 454
20  33290   0.3 1   32710 580
21  29372   0.3 1   28697 675
22  33080   0.3 1   32419 661
23  19223   0.3 1   18693 530
24  21075   0.3 1   20530 545
25  21912   0.3 1   21297 615
26  18258   0.3 1   16907 1351
27  19242   0.3 1   17472 1770
28  22623   0.3 1   19941 2682
29  13645   0.3 1   12102 1543
30  17134   0.3 1   15866 1268
31  8771    0.3 1   7489 1282
32  15705   0.3 1   11988 3717
33  11427   0.3 1   7518 3909
34  11995   0.3 1   9523 2472
35  8576    0.3 1   6341 2235
36  14389   0.3 1   7992 6397
37  11073   0.3 1   7217 3856
38  10008   0.3 1   8264 1744
39  4031    0.3 1   2275 1756
40  8813    0.3 1   5164 3649
41  7361    0.3 1   2676 4685
42  8430    0.3 1   3157 5273
43  5950    0.3 1   3071 2879
44  4902    0.3 1   2756 2146
45  4218    0.3 1   2813 1405
46  4226    0.3 1   2704 1522
47  3516    0.3 1   1742 1774
48  3322    0.3 1   1428 1894
49  3799    0.3 1   1840 1959
50  4609    0.3 1   1432 3177
51  5145    0.3 1   1432 3713
52  5288    0.3 1   1634 3654
53  3974    0.3 1   1619 2355
54  4262    0.3 1   522 3740
55  4224    0.3 1   812 3412
56  5448    0.3 1   816 4632
57  8442    0.3 1   1008 7434
58  8455    0.3 1   857 7598
59  5746    0.3 1   910 4836
60  9462    0.3 1   529 8933
61  12883   0.3 1   591 12292
62  34190   0.3 1   673 33517
63  60413   0.3 1   1245 59168
64  20284   0.3 1   1108 19176
65  25390   0.3 1   305 25085
66  57515   0.3 1   449 57066
67  203411  0.3 1   694 202717
68  453643  0.3 1   1657 451986
69  1544843 0.3 1   2603 1542240
70  698942  0.3 1   5353 693589
71  251952  0.3 1   1270 250682
72  90056   0.3 1   392 89664
73  25840   0.3 1   109 25731
74  13657   0.3 1   61 13596
75  7878    0.3 1   44 7834
76  6202    0.3 1   49 6153
77  8053    0.3 1   48 8005
78  8207    0.3 1   47 8160
79  7538    0.3 1   57 7481
80  6582    0.3 1   19 6563
81  5978    0.3 1   16 5962
82  5811    0.3 1   12 5799
83  5628    0.3 1   13 5615
84  5305    0.3 1   15 5290
85  5343    0.3 1   17 5326
86  5444    0.3 1   29 5415
87  5906    0.3 1   18 5888
88  5690    0.3 1   25 5665
89  5649    0.3 1   40 5609
90  6103    0.3 1   16 6087
91  6211    0.3 1   2 6209
92  6664    0.3 1   13 6651
93  6982    0.3 1   4 6978
94  7732    0.3 1   11 7721
95  7979    0.3 1   4 7975
96  9450    0.3 1   2 9448
97  10273   0.3 1   1 10272
98  11315   0.3 1   2 11313
99  13205   0.3 1   1 13204
100 26252   0.3 1   2 26250
101 108226  0.3 1   4 108222


RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ACAGTG_L001_R1.fastq
=============================================
17327210 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp:  3919105 (22.6%)


  >>> Now running FastQC on the data <<<

Started analysis of 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 5% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 10% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 15% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 20% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 25% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 30% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 35% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 40% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 45% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 50% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 55% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 60% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 65% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 70% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 75% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 80% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 85% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 90% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Approx 95% complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
Analysis complete for 2112_lane1_ACAGTG_L001_R1_trimmed.fq
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')


AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ATCACG_R1.fastq <<)

Found perfect matches for the following adapter sequences:
Adapter type    Count   Sequence    Sequences analysed  Percentage
Illumina    33011   AGATCGGAAGAGC   1000000 3.30
smallRNA    1   TGGAATTCTCGG    1000000 0.00
Nextera 1   CTGTCTCTTATA    1000000 0.00
Using Illumina adapter for trimming (count: 33011). Second best hit was smallRNA (count: 1)

Writing report to '2112_lane1_ATCACG_R1.fastq_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ATCACG_R1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '

Writing final adapter and quality trimmed output to 2112_lane1_ATCACG_R1_trimmed.fq


  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ATCACG_R1.fastq <<< 
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ATCACG_R1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1309.46 s (39 us/read; 1.53 M reads/minute).

=== Summary ===

Total reads processed:              33,409,414
Reads with adapters:                17,611,436 (52.7%)
Reads written (passing filters):    33,409,414 (100.0%)

Total basepairs processed: 3,374,350,814 bp
Quality-trimmed:             220,159,975 bp (6.5%)
Total written (filtered):  2,847,951,372 bp (84.4%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17611436 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 36.3%
  C: 16.1%
  G: 7.6%
  T: 18.9%
  none/other: 21.1%

Overview of removed sequences
length  count   expect  max.err error counts
1   10442109    8352353.5   0   10442109
2   858531  2088088.4   0   858531
3   323059  522022.1    0   323059
4   208999  130505.5    0   208999
5   156468  32626.4 0   156468
6   150095  8156.6  0   150095
7   150683  2039.1  0   150683
8   112087  509.8   0   112087
9   121119  127.4   0   119789 1330
10  120185  31.9    1   118496 1689
11  82522   8.0 1   80895 1627
12  102396  2.0 1   100840 1556
13  83339   0.5 1   81773 1566
14  80199   0.5 1   78593 1606
15  80591   0.5 1   78931 1660
16  71183   0.5 1   69586 1597
17  78307   0.5 1   76360 1947
18  77111   0.5 1   75828 1283
19  37842   0.5 1   36894 948
20  53140   0.5 1   51938 1202
21  48734   0.5 1   47537 1197
22  51325   0.5 1   50222 1103
23  35379   0.5 1   34409 970
24  35467   0.5 1   34472 995
25  36003   0.5 1   34999 1004
26  29489   0.5 1   27823 1666
27  30967   0.5 1   28456 2511
28  35485   0.5 1   31877 3608
29  22082   0.5 1   19860 2222
30  27130   0.5 1   25572 1558
31  12798   0.5 1   11146 1652
32  21740   0.5 1   16029 5711
33  20199   0.5 1   13434 6765
34  19003   0.5 1   13472 5531
35  14389   0.5 1   11153 3236
36  11969   0.5 1   10095 1874
37  13574   0.5 1   8671 4903
38  13729   0.5 1   9035 4694
39  10171   0.5 1   6936 3235
40  11802   0.5 1   6610 5192
41  10036   0.5 1   4373 5663
42  10201   0.5 1   3897 6304
43  7546    0.5 1   3982 3564
44  7277    0.5 1   3278 3999
45  6507    0.5 1   3634 2873
46  5774    0.5 1   3093 2681
47  5356    0.5 1   1998 3358
48  4427    0.5 1   1669 2758
49  4043    0.5 1   1934 2109
50  5248    0.5 1   1453 3795
51  7087    0.5 1   1453 5634
52  6646    0.5 1   1797 4849
53  4822    0.5 1   1667 3155
54  5450    0.5 1   473 4977
55  5173    0.5 1   783 4390
56  6833    0.5 1   755 6078
57  10395   0.5 1   884 9511
58  11086   0.5 1   751 10335
59  7377    0.5 1   898 6479
60  12412   0.5 1   523 11889
61  16010   0.5 1   614 15396
62  40681   0.5 1   636 40045
63  71709   0.5 1   1313 70396
64  22353   0.5 1   1201 21152
65  27766   0.5 1   296 27470
66  60315   0.5 1   483 59832
67  205212  0.5 1   736 204476
68  445189  0.5 1   1601 443588
69  1354272 0.5 1   2421 1351851
70  645444  0.5 1   4623 640821
71  244073  0.5 1   1185 242888
72  91085   0.5 1   384 90701
73  26529   0.5 1   148 26381
74  13928   0.5 1   82 13846
75  8001    0.5 1   49 7952
76  6494    0.5 1   53 6441
77  8716    0.5 1   54 8662
78  8523    0.5 1   58 8465
79  8002    0.5 1   52 7950
80  6802    0.5 1   17 6785
81  6065    0.5 1   10 6055
82  5812    0.5 1   13 5799
83  5192    0.5 1   17 5175
84  4934    0.5 1   16 4918
85  4964    0.5 1   16 4948
86  5251    0.5 1   23 5228
87  5652    0.5 1   26 5626
88  5278    0.5 1   23 5255
89  5331    0.5 1   34 5297
90  5719    0.5 1   8 5711
91  6034    0.5 1   9 6025
92  6322    0.5 1   18 6304
93  6631    0.5 1   8 6623
94  7224    0.5 1   12 7212
95  7883    0.5 1   10 7873
96  9041    0.5 1   4 9037
97  9978    0.5 1   1 9977
98  11212   0.5 1   6 11206
99  12879   0.5 1   5 12874
100 25791   0.5 1   9 25782
101 108043  0.5 1   10 108033


RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_ATCACG_R1.fastq
=============================================
33409414 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp:  3771783 (11.3%)


  >>> Now running FastQC on the data <<<

Started analysis of 2112_lane1_ATCACG_R1_trimmed.fq
Approx 5% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 10% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 15% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 20% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 25% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 30% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 35% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 40% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 45% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 50% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 55% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 60% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 65% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 70% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 75% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 80% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 85% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 90% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Approx 95% complete for 2112_lane1_ATCACG_R1_trimmed.fq
Analysis complete for 2112_lane1_ATCACG_R1_trimmed.fq
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')


AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_CAGATC_R1.fastq <<)

Found perfect matches for the following adapter sequences:
Adapter type    Count   Sequence    Sequences analysed  Percentage
Illumina    54855   AGATCGGAAGAGC   1000000 5.49
smallRNA    2   TGGAATTCTCGG    1000000 0.00
Nextera 0   CTGTCTCTTATA    1000000 0.00
Using Illumina adapter for trimming (count: 54855). Second best hit was smallRNA (count: 2)

Writing report to '2112_lane1_CAGATC_R1.fastq_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_CAGATC_R1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '

Writing final adapter and quality trimmed output to 2112_lane1_CAGATC_R1_trimmed.fq


  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_CAGATC_R1.fastq <<< 
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_CAGATC_R1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1305.38 s (33 us/read; 1.83 M reads/minute).

=== Summary ===

Total reads processed:              39,780,221
Reads with adapters:                23,668,832 (59.5%)
Reads written (passing filters):    39,780,221 (100.0%)

Total basepairs processed: 4,017,802,321 bp
Quality-trimmed:             316,184,920 bp (7.9%)
Total written (filtered):  3,168,106,786 bp (78.9%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 23668832 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 33.0%
  C: 15.7%
  G: 6.0%
  T: 18.1%
  none/other: 27.2%

Overview of removed sequences
length  count   expect  max.err error counts
1   11462116    9945055.2   0   11462116
2   913629  2486263.8   0   913629
3   452027  621566.0    0   452027
4   350618  155391.5    0   350618
5   306933  38847.9 0   306933
6   287673  9712.0  0   287673
7   281622  2428.0  0   281622
8   223594  607.0   0   223594
9   240116  151.7   0   238205 1911
10  227281  37.9    1   224750 2531
11  163868  9.5 1   161288 2580
12  192839  2.4 1   190444 2395
13  162948  0.6 1   160373 2575
14  155026  0.6 1   152612 2414
15  156135  0.6 1   153474 2661
16  136091  0.6 1   133382 2709
17  150211  0.6 1   146876 3335
18  140793  0.6 1   138808 1985
19  77443   0.6 1   75897 1546
20  101254  0.6 1   99368 1886
21  93085   0.6 1   91238 1847
22  92927   0.6 1   91104 1823
23  72275   0.6 1   70493 1782
24  72096   0.6 1   70374 1722
25  70407   0.6 1   68697 1710
26  59212   0.6 1   56603 2609
27  60402   0.6 1   56761 3641
28  69540   0.6 1   64651 4889
29  44622   0.6 1   41133 3489
30  56254   0.6 1   53841 2413
31  26735   0.6 1   24281 2454
32  44698   0.6 1   38863 5835
33  37973   0.6 1   24758 13215
34  33790   0.6 1   29226 4564
35  32081   0.6 1   27180 4901
36  29448   0.6 1   22077 7371
37  24821   0.6 1   20313 4508
38  22225   0.6 1   17979 4246
39  18973   0.6 1   16137 2836
40  18644   0.6 1   12986 5658
41  17506   0.6 1   12014 5492
42  18109   0.6 1   11098 7011
43  14861   0.6 1   10066 4795
44  15499   0.6 1   8463 7036
45  14330   0.6 1   9724 4606
46  11858   0.6 1   7524 4334
47  10333   0.6 1   4854 5479
48  9714    0.6 1   3768 5946
49  9608    0.6 1   4719 4889
50  9483    0.6 1   3478 6005
51  14031   0.6 1   3385 10646
52  13293   0.6 1   4130 9163
53  8936    0.6 1   3526 5410
54  9880    0.6 1   942 8938
55  9118    0.6 1   1651 7467
56  13037   0.6 1   1504 11533
57  18891   0.6 1   1702 17189
58  20513   0.6 1   1377 19136
59  11927   0.6 1   1601 10326
60  19127   0.6 1   774 18353
61  23847   0.6 1   954 22893
62  75729   0.6 1   1023 74706
63  106543  0.6 1   2066 104477
64  28953   0.6 1   1537 27416
65  37265   0.6 1   412 36853
66  83783   0.6 1   586 83197
67  294910  0.6 1   1013 293897
68  680760  0.6 1   2305 678455
69  2319080 0.6 1   3770 2315310
70  1329812 0.6 1   8452 1321360
71  474084  0.6 1   2222 471862
72  166586  0.6 1   691 165895
73  46783   0.6 1   227 46556
74  24031   0.6 1   123 23908
75  13845   0.6 1   81 13764
76  11129   0.6 1   88 11041
77  15028   0.6 1   90 14938
78  14602   0.6 1   103 14499
79  13582   0.6 1   91 13491
80  11830   0.6 1   47 11783
81  10270   0.6 1   32 10238
82  9814    0.6 1   16 9798
83  9005    0.6 1   15 8990
84  8662    0.6 1   18 8644
85  8888    0.6 1   26 8862
86  9233    0.6 1   24 9209
87  9845    0.6 1   42 9803
88  9350    0.6 1   37 9313
89  9508    0.6 1   55 9453
90  10077   0.6 1   20 10057
91  10706   0.6 1   17 10689
92  11285   0.6 1   25 11260
93  11770   0.6 1   9 11761
94  12970   0.6 1   17 12953
95  13873   0.6 1   6 13867
96  16173   0.6 1   6 16167
97  17692   0.6 1   7 17685
98  19791   0.6 1   6 19785
99  22589   0.6 1   7 22582
100 45074   0.6 1   7 45067
101 187596  0.6 1   13 187583


RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_CAGATC_R1.fastq
=============================================
39780221 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp:  6470690 (16.3%)


  >>> Now running FastQC on the data <<<

Started analysis of 2112_lane1_CAGATC_R1_trimmed.fq
Approx 5% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 10% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 15% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 20% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 25% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 30% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 35% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 40% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 45% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 50% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 55% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 60% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 65% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 70% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 75% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 80% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 85% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 90% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Approx 95% complete for 2112_lane1_CAGATC_R1_trimmed.fq
Analysis complete for 2112_lane1_CAGATC_R1_trimmed.fq
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')


AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_GCCAAT_L001_R1.fastq <<)

Found perfect matches for the following adapter sequences:
Adapter type    Count   Sequence    Sequences analysed  Percentage
Illumina    34053   AGATCGGAAGAGC   1000000 3.41
Nextera 5   CTGTCTCTTATA    1000000 0.00
smallRNA    0   TGGAATTCTCGG    1000000 0.00
Using Illumina adapter for trimming (count: 34053). Second best hit was Nextera (count: 5)

Writing report to '2112_lane1_GCCAAT_L001_R1.fastq_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_GCCAAT_L001_R1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '

Writing final adapter and quality trimmed output to 2112_lane1_GCCAAT_L001_R1_trimmed.fq


  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_GCCAAT_L001_R1.fastq <<< 
10000000 sequences processed
20000000 sequences processed
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_GCCAAT_L001_R1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 611.36 s (25 us/read; 2.43 M reads/minute).

=== Summary ===

Total reads processed:              24,792,684
Reads with adapters:                13,652,723 (55.1%)
Reads written (passing filters):    24,792,684 (100.0%)

Total basepairs processed: 2,504,061,084 bp
Quality-trimmed:             162,739,913 bp (6.5%)
Total written (filtered):  2,103,508,344 bp (84.0%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 13652723 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 37.8%
  C: 17.3%
  G: 6.2%
  T: 17.8%
  none/other: 21.1%

Overview of removed sequences
length  count   expect  max.err error counts
1   8285474 6198171.0   0   8285474
2   559416  1549542.8   0   559416
3   216338  387385.7    0   216338
4   143815  96846.4 0   143815
5   117955  24211.6 0   117955
6   111642  6052.9  0   111642
7   108539  1513.2  0   108539
8   86050   378.3   0   86050
9   91517   94.6    0   90714 803
10  88443   23.6    1   87249 1194
11  62625   5.9 1   61504 1121
12  74455   1.5 1   73328 1127
13  63753   0.4 1   62538 1215
14  59311   0.4 1   58269 1042
15  59539   0.4 1   58231 1308
16  55250   0.4 1   54149 1101
17  58732   0.4 1   57225 1507
18  58521   0.4 1   57593 928
19  27839   0.4 1   27131 708
20  40399   0.4 1   39530 869
21  37610   0.4 1   36794 816
22  34021   0.4 1   33233 788
23  30471   0.4 1   29631 840
24  29774   0.4 1   28829 945
25  28103   0.4 1   27162 941
26  24228   0.4 1   22988 1240
27  23744   0.4 1   22120 1624
28  25066   0.4 1   22672 2394
29  19955   0.4 1   18259 1696
30  20569   0.4 1   19245 1324
31  12893   0.4 1   11639 1254
32  17686   0.4 1   14475 3211
33  16816   0.4 1   10050 6766
34  13947   0.4 1   12215 1732
35  10733   0.4 1   8237 2496
36  10850   0.4 1   8775 2075
37  10956   0.4 1   8744 2212
38  8624    0.4 1   6206 2418
39  7255    0.4 1   5843 1412
40  7962    0.4 1   5111 2851
41  8308    0.4 1   4244 4064
42  8798    0.4 1   4247 4551
43  6029    0.4 1   3510 2519
44  6198    0.4 1   3128 3070
45  5386    0.4 1   3247 2139
46  4678    0.4 1   2600 2078
47  3677    0.4 1   1794 1883
48  3427    0.4 1   1496 1931
49  3113    0.4 1   1630 1483
50  3613    0.4 1   1203 2410
51  5273    0.4 1   1201 4072
52  4675    0.4 1   1313 3362
53  3187    0.4 1   1168 2019
54  3635    0.4 1   364 3271
55  3449    0.4 1   596 2853
56  5070    0.4 1   589 4481
57  7323    0.4 1   647 6676
58  8103    0.4 1   525 7578
59  4762    0.4 1   627 4135
60  7877    0.4 1   311 7566
61  10083   0.4 1   362 9721
62  26621   0.4 1   392 26229
63  44918   0.4 1   850 44068
64  14573   0.4 1   727 13846
65  18391   0.4 1   213 18178
66  41246   0.4 1   289 40957
67  143024  0.4 1   520 142504
68  320571  0.4 1   1156 319415
69  1097289 0.4 1   1900 1095389
70  537660  0.4 1   4063 533597
71  189910  0.4 1   942 188968
72  67056   0.4 1   296 66760
73  18900   0.4 1   81 18819
74  10062   0.4 1   58 10004
75  5888    0.4 1   47 5841
76  4721    0.4 1   38 4683
77  5866    0.4 1   38 5828
78  5884    0.4 1   60 5824
79  5522    0.4 1   47 5475
80  4957    0.4 1   19 4938
81  4479    0.4 1   12 4467
82  4324    0.4 1   9 4315
83  4003    0.4 1   12 3991
84  3995    0.4 1   13 3982
85  3853    0.4 1   12 3841
86  4026    0.4 1   16 4010
87  4437    0.4 1   22 4415
88  4100    0.4 1   15 4085
89  4291    0.4 1   20 4271
90  4505    0.4 1   5 4500
91  4791    0.4 1   9 4782
92  5060    0.4 1   16 5044
93  5354    0.4 1   4 5350
94  5880    0.4 1   6 5874
95  6341    0.4 1   5 6336
96  7141    0.4 1   4 7137
97  8080    0.4 1   1 8079
98  8935    0.4 1   1 8934
99  10431   0.4 1   2 10429
100 20510   0.4 1   3 20507
101 87588   0.4 1   7 87581


RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_GCCAAT_L001_R1.fastq
=============================================
24792684 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp:  2906782 (11.7%)


  >>> Now running FastQC on the data <<<

Started analysis of 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 5% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 10% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 15% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 20% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 25% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 30% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 35% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 40% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 45% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 50% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 55% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 60% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 65% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 70% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 75% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 80% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 85% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 90% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Approx 95% complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
Analysis complete for 2112_lane1_GCCAAT_L001_R1_trimmed.fq
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')


AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TGACCA_L001_R1_001.fastq <<)

Found perfect matches for the following adapter sequences:
Adapter type    Count   Sequence    Sequences analysed  Percentage
Illumina    91573   AGATCGGAAGAGC   1000000 9.16
smallRNA    1   TGGAATTCTCGG    1000000 0.00
Nextera 0   CTGTCTCTTATA    1000000 0.00
Using Illumina adapter for trimming (count: 91573). Second best hit was smallRNA (count: 1)

Writing report to '2112_lane1_TGACCA_L001_R1_001.fastq_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TGACCA_L001_R1_001.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '

Writing final adapter and quality trimmed output to 2112_lane1_TGACCA_L001_R1_001_trimmed.fq


  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TGACCA_L001_R1_001.fastq <<< 
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TGACCA_L001_R1_001.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 245.43 s (27 us/read; 2.25 M reads/minute).

=== Summary ===

Total reads processed:               9,188,940
Reads with adapters:                 7,714,396 (84.0%)
Reads written (passing filters):     9,188,940 (100.0%)

Total basepairs processed:   928,082,940 bp
Quality-trimmed:             208,253,890 bp (22.4%)
Total written (filtered):    346,199,587 bp (37.3%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 7714396 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 20.0%
  C: 5.5%
  G: 4.6%
  T: 5.6%
  none/other: 64.4%

Overview of removed sequences
length  count   expect  max.err error counts
1   1492673 2297235.0   0   1492673
2   128598  574308.8    0   128598
3   46570   143577.2    0   46570
4   41159   35894.3 0   41159
5   37536   8973.6  0   37536
6   41569   2243.4  0   41569
7   48187   560.8   0   48187
8   28646   140.2   0   28646
9   34011   35.1    0   33552 459
10  40836   8.8 1   40446 390
11  21070   2.2 1   20614 456
12  39375   0.5 1   38852 523
13  28223   0.1 1   27712 511
14  30477   0.1 1   30009 468
15  31572   0.1 1   30949 623
16  34270   0.1 1   33591 679
17  32392   0.1 1   31586 806
18  37144   0.1 1   36578 566
19  16557   0.1 1   16121 436
20  27628   0.1 1   27119 509
21  27318   0.1 1   26715 603
22  29792   0.1 1   29190 602
23  20283   0.1 1   19662 621
24  23742   0.1 1   23010 732
25  27134   0.1 1   26401 733
26  21110   0.1 1   19750 1360
27  24652   0.1 1   22661 1991
28  29900   0.1 1   27077 2823
29  20898   0.1 1   18877 2021
30  26378   0.1 1   24966 1412
31  11675   0.1 1   10195 1480
32  24841   0.1 1   20740 4101
33  19835   0.1 1   13337 6498
34  22072   0.1 1   17701 4371
35  17506   0.1 1   15765 1741
36  19797   0.1 1   15998 3799
37  16923   0.1 1   12762 4161
38  14306   0.1 1   12662 1644
39  26208   0.1 1   13199 13009
40  19199   0.1 1   10916 8283
41  20520   0.1 1   10636 9884
42  18073   0.1 1   9522 8551
43  14909   0.1 1   9560 5349
44  16890   0.1 1   8049 8841
45  15620   0.1 1   10542 5078
46  12395   0.1 1   8215 4180
47  10591   0.1 1   5173 5418
48  9609    0.1 1   4173 5436
49  9523    0.1 1   5244 4279
50  9295    0.1 1   3891 5404
51  13901   0.1 1   3851 10050
52  13243   0.1 1   5125 8118
53  8982    0.1 1   4127 4855
54  9199    0.1 1   1093 8106
55  8068    0.1 1   1833 6235
56  12312   0.1 1   1891 10421
57  16344   0.1 1   2128 14216
58  16997   0.1 1   1515 15482
59  9772    0.1 1   1717 8055
60  15043   0.1 1   825 14218
61  18869   0.1 1   914 17955
62  60104   0.1 1   885 59219
63  81495   0.1 1   1695 79800
64  21625   0.1 1   1087 20538
65  27271   0.1 1   267 27004
66  62437   0.1 1   500 61937
67  216127  0.1 1   644 215483
68  512969  0.1 1   1417 511552
69  1759541 0.1 1   2378 1757163
70  1031227 0.1 1   5431 1025796
71  385531  0.1 1   1504 384027
72  136018  0.1 1   455 135563
73  37324   0.1 1   158 37166
74  18788   0.1 1   90 18698
75  10612   0.1 1   56 10556
76  8749    0.1 1   51 8698
77  11805   0.1 1   34 11771
78  11502   0.1 1   51 11451
79  10571   0.1 1   56 10515
80  9179    0.1 1   14 9165
81  7981    0.1 1   17 7964
82  7509    0.1 1   12 7497
83  6703    0.1 1   18 6685
84  6606    0.1 1   19 6587
85  6518    0.1 1   27 6491
86  6887    0.1 1   43 6844
87  7238    0.1 1   38 7200
88  6963    0.1 1   48 6915
89  7086    0.1 1   61 7025
90  7376    0.1 1   7 7369
91  7796    0.1 1   10 7786
92  8320    0.1 1   10 8310
93  8758    0.1 1   3 8755
94  9533    0.1 1   11 9522
95  10141   0.1 1   3 10138
96  11512   0.1 1   3 11509
97  13094   0.1 1   5 13089
98  14343   0.1 1   3 14340
99  16413   0.1 1   1 16412
100 32725   0.1 1   10 32715
101 135802  0.1 1   12 135790


RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TGACCA_L001_R1_001.fastq
=============================================
9188940 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp:  4981933 (54.2%)


  >>> Now running FastQC on the data <<<

Started analysis of 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 5% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 10% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 15% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 20% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 25% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 30% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 35% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 40% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 45% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 50% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 55% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 60% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 65% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 70% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 75% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 80% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 85% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 90% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Approx 95% complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
Analysis complete for 2112_lane1_TGACCA_L001_R1_001_trimmed.fq
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Path to Cutadapt set as: 'cutadapt' (default)
1.13
Cutadapt seems to be working fine (tested command 'cutadapt --version')


AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TTAGGC_L001_R1.fastq <<)

Found perfect matches for the following adapter sequences:
Adapter type    Count   Sequence    Sequences analysed  Percentage
Illumina    75439   AGATCGGAAGAGC   1000000 7.54
smallRNA    0   TGGAATTCTCGG    1000000 0.00
Nextera 0   CTGTCTCTTATA    1000000 0.00
Using Illumina adapter for trimming (count: 75439). Second best hit was smallRNA (count: 0)

Writing report to '2112_lane1_TTAGGC_L001_R1.fastq_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TTAGGC_L001_R1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.13
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: ' -o ~/Documents/C-virginica-BSSeq/trimmed_fastqc/ '

Writing final adapter and quality trimmed output to 2112_lane1_TTAGGC_L001_R1_trimmed.fq


  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TTAGGC_L001_R1.fastq <<< 
10000000 sequences processed
20000000 sequences processed
This is cutadapt 1.13 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TTAGGC_L001_R1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 648.12 s (25 us/read; 2.38 M reads/minute).

=== Summary ===

Total reads processed:              25,752,634
Reads with adapters:                15,393,563 (59.8%)
Reads written (passing filters):    25,752,634 (100.0%)

Total basepairs processed: 2,601,016,034 bp
Quality-trimmed:             168,252,428 bp (6.5%)
Total written (filtered):  2,140,459,636 bp (82.3%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 15393563 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 35.3%
  C: 17.2%
  G: 7.1%
  T: 19.5%
  none/other: 20.9%

Overview of removed sequences
length  count   expect  max.err error counts
1   7645997 6438158.5   0   7645997
2   534336  1609539.6   0   534336
3   291105  402384.9    0   291105
4   245902  100596.2    0   245902
5   222809  25149.1 0   222809
6   221201  6287.3  0   221201
7   227983  1571.8  0   227983
8   176189  393.0   0   176189
9   191707  98.2    0   190234 1473
10  196407  24.6    1   194560 1847
11  135415  6.1 1   133356 2059
12  172560  1.5 1   170422 2138
13  142990  0.4 1   140803 2187
14  139835  0.4 1   137590 2245
15  140839  0.4 1   138493 2346
16  128017  0.4 1   125580 2437
17  141709  0.4 1   138594 3115
18  145963  0.4 1   144065 1898
19  61855   0.4 1   60451 1404
20  97268   0.4 1   95653 1615
21  90319   0.4 1   88627 1692
22  85905   0.4 1   84316 1589
23  72813   0.4 1   71243 1570
24  73060   0.4 1   71571 1489
25  65690   0.4 1   64250 1440
26  57685   0.4 1   55768 1917
27  54892   0.4 1   52716 2176
28  60126   0.4 1   57259 2867
29  41472   0.4 1   39372 2100
30  49205   0.4 1   47594 1611
31  23857   0.4 1   22268 1589
32  37275   0.4 1   33618 3657
33  26224   0.4 1   21303 4921
34  29760   0.4 1   25969 3791
35  22547   0.4 1   20747 1800
36  23176   0.4 1   19483 3693
37  18674   0.4 1   15930 2744
38  16053   0.4 1   13708 2345
39  13968   0.4 1   12793 1175
40  11681   0.4 1   9163 2518
41  10475   0.4 1   8039 2436
42  10739   0.4 1   7353 3386
43  8516    0.4 1   6429 2087
44  7285    0.4 1   5372 1913
45  6868    0.4 1   5558 1310
46  6169    0.4 1   4828 1341
47  4628    0.4 1   3103 1525
48  4038    0.4 1   2397 1641
49  4377    0.4 1   2706 1671
50  4702    0.4 1   2022 2680
51  5287    0.4 1   1938 3349
52  5055    0.4 1   1990 3065
53  3894    0.4 1   1797 2097
54  3728    0.4 1   633 3095
55  3840    0.4 1   948 2892
56  4945    0.4 1   854 4091
57  7122    0.4 1   904 6218
58  7223    0.4 1   746 6477
59  5119    0.4 1   898 4221
60  7815    0.4 1   497 7318
61  10538   0.4 1   562 9976
62  26673   0.4 1   635 26038
63  53981   0.4 1   963 53018
64  17769   0.4 1   917 16852
65  22283   0.4 1   264 22019
66  49178   0.4 1   385 48793
67  173176  0.4 1   558 172618
68  380184  0.4 1   1354 378830
69  1270634 0.4 1   1947 1268687
70  552446  0.4 1   4168 548278
71  201233  0.4 1   913 200320
72  72352   0.4 1   272 72080
73  21277   0.4 1   91 21186
74  11114   0.4 1   52 11062
75  6478    0.4 1   37 6441
76  5114    0.4 1   24 5090
77  6703    0.4 1   33 6670
78  6637    0.4 1   40 6597
79  6056    0.4 1   46 6010
80  5622    0.4 1   26 5596
81  5152    0.4 1   13 5139
82  4865    0.4 1   8 4857
83  4602    0.4 1   14 4588
84  4362    0.4 1   19 4343
85  4613    0.4 1   17 4596
86  4742    0.4 1   29 4713
87  5120    0.4 1   19 5101
88  4899    0.4 1   32 4867
89  4742    0.4 1   24 4718
90  5284    0.4 1   5 5279
91  5575    0.4 1   11 5564
92  5872    0.4 1   12 5860
93  6133    0.4 1   4 6129
94  6580    0.4 1   8 6572
95  7236    0.4 1   6 7230
96  8258    0.4 1   3 8255
97  9049    0.4 1   5 9044
98  10113   0.4 1   1 10112
99  11510   0.4 1   0 11510
100 22958   0.4 1   2 22956
101 96156   0.4 1   5 96151


RUN STATISTICS FOR INPUT FILE: /home/srlab/Documents/C-virginica-BSSeq/2112_lane1_TTAGGC_L001_R1.fastq
=============================================
25752634 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp:  3234303 (12.6%)


  >>> Now running FastQC on the data <<<

Started analysis of 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 5% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 10% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 15% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 20% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 25% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 30% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 35% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 40% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 45% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 50% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 55% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 60% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 65% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 70% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 75% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 80% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 85% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 90% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Approx 95% complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
Analysis complete for 2112_lane1_TTAGGC_L001_R1_trimmed.fq
LS0tCnRpdGxlOiAiQmlzdWxmaXRlIHRyZWF0ZWQgQy4gdmlyZ2luaWNhIGFuYWx5c2lzIHB0LjIiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCk15IHByZXZpb3VzIG5vdGVib29rIGVudHJ5IGNvbnRhaW5zIHRoZSBmaWxlIG1hbmlwdWxhdGlvbiBwYXJ0cywgYXMgd2VsbCBhcyBjcmVhdGluZyBhbmQgcHJlcHBpbmcgdGhlIGdlbm9tZSBpbmRleCBmb3IgQmlzbWFyay4gVGhpcyBzZWN0aW9uIHdpbGwgbW9zdGx5IGNvbnRhaW4gcXVhbGl0eSB0cmltbWluZyBhbmQgYW5hbHlzaXMgc3RlcHMuCgpGaXJzdCwgbGV0cyBydW4gc29tZSBmYXN0UUMuIAoKYGBge3J9CgpzZXR3ZCgifi9Eb2N1bWVudHMvQy12aXJnaW5pY2EtQlNTZXEvIikKCnN5c3RlbSgibWtkaXIgdW50cmltbWVkX2Zhc3RxYyIpCgpyYXcuZmlsZS5saXN0IDwtIGxpc3QuZmlsZXMocGF0dGVybiA9ICIqLmZhc3RxIikKCmZvcihpIGluIDE6bGVuZ3RoKHJhdy5maWxlLmxpc3QpKSAgIHsKICAKICBzeXN0ZW0ocGFzdGUwKCJmYXN0cWMgIiwgcmF3LmZpbGUubGlzdFtpXSwgIiAtbyB+L0RvY3VtZW50cy9DLXZpcmdpbmljYS1CU1NlcS91bnRyaW1tZWRfZmFzdHFjIikpCiAgCiAgCn0KYGBgCgoKVGhlbiBkbyBzb21lIHRyaW1taW5nIHdpdGggVHJpbW1vbWF0aWMuIFRyaW1tb21hdGljIGlzIG5pY2UgaW4gdGhhdCBpdCBydW5zIEZhc3RRQyBvbiB0aGUgdHJpbW1lZCBwb3J0aW9ucyBpbiBsaW5lLiBTYXZlcyBhIHN0ZXAhCmBgYHtyfQoKc2V0d2QoIn4vRG9jdW1lbnRzL0MtdmlyZ2luaWNhLUJTU2VxLyIpCgpzeXN0ZW0oIm1rZGlyIH4vRG9jdW1lbnRzL0MtdmlyZ2luaWNhLUJTU2VxL3RyaW1tZWRfZmFzdHFjIikKCnJhdy5maWxlLmxpc3QgPC0gcmF3LmZpbGUubGlzdFsxOjZdCgpmb3IoaSBpbiAxOmxlbmd0aChyYXcuZmlsZS5saXN0KSkgICB7CnN5c3RlbShwYXN0ZTAoIi9ob21lL3NoYXJlZC90cmltZ2Fsb3JlL3RyaW1fZ2Fsb3JlIC0tZmFzdHFjIC0tZmFzdHFjX2FyZ3MgXCIgLW8gfi9Eb2N1bWVudHMvQy12aXJnaW5pY2EtQlNTZXEvdHJpbW1lZF9mYXN0cWMvIFwiIC1xIDIwIH4vRG9jdW1lbnRzL0MtdmlyZ2luaWNhLUJTU2VxLyIsIHJhdy5maWxlLmxpc3RbaV0pKQoKfSAgCmBgYA==