{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Transcriptome analysis - _Geoduck_\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_As part of this module I will take you through the steps of basic transcriptome analysis starting with raw data._ " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###_Noteworthy_\n", "_This is a lot of data, and simply moving it around is not trivial. It is good to have a plan. \n", "Our lab works from the Data management plan: \n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Based on the above DMP - raw data is in fact located at:** \n", "\n", "\"Index_of__nightingales_P_generosa_1BBC1C9F.png\"/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is Paired-End 100bp data, big file size. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Step 1- Download" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this course we might now actually download all of the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1-a) Grab url \"Index_of__nightingales_P_generosa_and_1Password_1BBC1DD9.png\"/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1-b) use `curl`, `wget` or GUI download to get data on local machine" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 9465M 100 9465M 0 0 54.3M 0 0:02:54 0:02:54 --:--:-- 41.7M\n" ] } ], "source": [ "!curl http://owl.fish.washington.edu/nightingales/P_generosa/Geo_Pool_F_GGCTAC_L006_R1_001.fastq.gz \\\n", "-o /Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R1_001.fastq.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!curl http://owl.fish.washington.edu/nightingales/P_generosa/Geo_Pool_F_GGCTAC_L006_R2_001.fastq.gz \\\n", "-o /Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R2_001.fastq.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!curl http://owl.fish.washington.edu/nightingales/P_generosa/Geo_Pool_M_CTTGTA_L006_R1_001.fastq.gz \\\n", "-o /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R1_001.fastq.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!curl http://owl.fish.washington.edu/nightingales/P_generosa/Geo_Pool_M_CTTGTA_L006_R2_001.fastq.gz \\\n", "-o /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R2_001.fastq.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Step 2 - Check integrity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###View md5 checksum hash \n", "(in OSX Terminal)\n", "\n", "A md5 checksum hash is a unique \"fingerprint\" for each and every file. This hash (which is a string of characters) is used to compare file integrity after transferring a file to a new location (or checking file integrity before and after compression). If the file has successfully transferred (i.e. has not been corrupted), the md5 hash will be the same as it was before the file was transferred.\n", "\n", "Before copying/moving a file (particularly large files): $md5 /original/path/to/file\n", "\n", "After copying/moving your file: $md5 /new/path/to/file\n", "\n", "Compare the hashes provided from both before and after moving the files. If the hashes do NOT match, then the file was modified during transfer and the file should be transferred again." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/sr320/data-genomic/tentacle\n" ] } ], "source": [ "cd /Users/sr320/data-genomic/tentacle/" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "^C\r\n" ] } ], "source": [ "!md5 Geo_Pool_F_GGCTAC_L006_R1_001.fastq.gz \\\n", "Geo_Pool_F_GGCTAC_L006_R2_001.fastq.gz \\\n", "Geo_Pool_M_CTTGTA_L006_R1_001.fastq.gz \\\n", "Geo_Pool_M_CTTGTA_L006_R2_001.fastq.gz >> Geo_Pool_0915.md5" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MD5 (Geo_Pool_F_GGCTAC_L006_R1_001.fastq.gz) = 67a8ec79f655b903b7afa32e4dc3727d\r\n", "MD5 (Geo_Pool_F_GGCTAC_L006_R2_001.fastq.gz) = f51d8644ba9280cb6f005b96e080666a\r\n", "MD5 (Geo_Pool_M_CTTGTA_L006_R1_001.fastq.gz) = ae44d4dc794b8cd20ba90d7426ef9123\r\n", "MD5 (Geo_Pool_M_CTTGTA_L006_R2_001.fastq.gz) = 03fcf100e0e59cb6f36f50aaec1407b6\r\n" ] } ], "source": [ "!head Geo_Pool_0915.md5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Now, compare pre- and post-transfer md5 hash files_\n", "\n", "`$diff PreTransferMD5hashes.md5 PostTransferMD5hases.md5`\n", "\n", "Code explanation: diff - Compares files and lists any lines in the files that have differences PreTransferMD5hashes.md5 - The name of the md5 file containing the hashes of the files pre-transfer. PostTransferMD5hases.md5 - The name of the md5 file containing the hashes of the files post-transfer.\n", "\n", "If there are any differences in the hashes in the two md5 files, those files were corrupted during transfer and should be retransferred." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!diff Geo_Pool_0915.md5 /Volumes/web-1/nightingales/P_generosa/checksums.md5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**md5 from newly downloaded files**" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MD5 (Geo_Pool_F_GGCTAC_L006_R1_001.fastq.gz) = 67a8ec79f655b903b7afa32e4dc3727d\r\n", "MD5 (Geo_Pool_F_GGCTAC_L006_R2_001.fastq.gz) = f51d8644ba9280cb6f005b96e080666a\r\n", "MD5 (Geo_Pool_M_CTTGTA_L006_R1_001.fastq.gz) = ae44d4dc794b8cd20ba90d7426ef9123\r\n", "MD5 (Geo_Pool_M_CTTGTA_L006_R2_001.fastq.gz) = 03fcf100e0e59cb6f36f50aaec1407b6\r\n" ] } ], "source": [ "!cat Geo_Pool_0915.md5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**md5 from original files**" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MD5 (Geo_Pool_F_GGCTAC_L006_R1_001.fastq.gz) = 67a8ec79f655b903b7afa32e4dc3727d\r\n", "MD5 (Geo_Pool_F_GGCTAC_L006_R2_001.fastq.gz) = f51d8644ba9280cb6f005b96e080666a\r\n", "MD5 (Geo_Pool_M_CTTGTA_L006_R1_001.fastq.gz) = ae44d4dc794b8cd20ba90d7426ef9123\r\n", "MD5 (Geo_Pool_M_CTTGTA_L006_R2_001.fastq.gz) = 03fcf100e0e59cb6f36f50aaec1407b6\r\n" ] } ], "source": [ "!cat /Volumes/web-1/nightingales/P_generosa/checksums.md5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**They match!**" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#Step 3 - Quality Trimming" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/sr320/data-genomic/tentacle/Geoduck_v3\n" ] } ], "source": [ "cd /Users/sr320/data-genomic/tentacle/Geoduck_v3" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)\n", "\n", "Path to Cutadapt set as: '/Users/sr320/.local/bin/cutadapt' (user defined)\n", "1.8.1\n", "Cutadapt seems to be working fine (tested command '/Users/sr320/.local/bin/cutadapt --version')\n", "\n", "\n", "AUTO-DETECTING ADAPTER TYPE\n", "===========================\n", "Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R2_001.fastq <<)\n", "\n", "Found perfect matches for the following adapter sequences:\n", "Adapter type\tCount\tSequence\tSequences analysed\tPercentage\n", "Illumina\t5574\tAGATCGGAAGAGC\t1000000\t0.56\n", "Nextera\t1\tCTGTCTCTTATA\t1000000\t0.00\n", "smallRNA\t0\tATGGAATTCTCG\t1000000\t0.00\n", "Using Illumina adapter for trimming (count: 5574). Second best hit was Nextera (count: 1)\n", "\n", "Writing report to 'Geo_Pool_M_CTTGTA_L006_R2_001.fastq_trimming_report.txt'\n", "\n", "SUMMARISING RUN PARAMETERS\n", "==========================\n", "Input filename: /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R2_001.fastq\n", "Trimming mode: paired-end\n", "Trim Galore version: 0.4.0\n", "Cutadapt version: 1.8.1\n", "Quality Phred score cutoff: 20\n", "Quality encoding type selected: ASCII+33\n", "Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)\n", "Maximum trimming error rate: 0.1 (default)\n", "Minimum required adapter overlap (stringency): 1 bp\n", "Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp\n", "Length cut-off for read 1: 35 bp (default)\n", "Length cut-off for read 2: 35 bb (default)\n", "\n", "Writing final adapter and quality trimmed output to Geo_Pool_M_CTTGTA_L006_R2_001_trimmed.fq\n", "\n", "\n", " >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R2_001.fastq <<< \n", "10000000 sequences processed\n", "20000000 sequences processed\n", "30000000 sequences processed\n", "40000000 sequences processed\n", "50000000 sequences processed\n", "60000000 sequences processed\n", "70000000 sequences processed\n", "80000000 sequences processed\n", "90000000 sequences processed\n", "100000000 sequences processed\n", "110000000 sequences processed\n", "This is cutadapt 1.8.1 with Python 2.7.10\n", "Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R2_001.fastq\n", "Trimming 1 adapter with at most 10.0% errors in single-end mode ...\n", "Finished in 1443.61 s (12 us/read; 4.89 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 117,663,240\n", "Reads with adapters: 35,501,056 (30.2%)\n", "Reads written (passing filters): 117,663,240 (100.0%)\n", "\n", "Total basepairs processed: 11,883,987,240 bp\n", "Quality-trimmed: 237,425,663 bp (2.0%)\n", "Total written (filtered): 11,577,940,979 bp (97.4%)\n", "\n", "=== Adapter 1 ===\n", "\n", "Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 35501056 times.\n", "\n", "No. of allowed errors:\n", "0-9 bp: 0; 10-13 bp: 1\n", "\n", "Bases preceding removed adapters:\n", " A: 31.4%\n", " C: 27.6%\n", " G: 21.1%\n", " T: 19.9%\n", " none/other: 0.0%\n", "\n", "Overview of removed sequences\n", "length\tcount\texpect\tmax.err\terror counts\n", "1\t24450304\t29415810.0\t0\t24450304\n", "2\t7075603\t7353952.5\t0\t7075603\n", "3\t1919893\t1838488.1\t0\t1919893\n", "4\t497991\t459622.0\t0\t497991\n", "5\t161689\t114905.5\t0\t161689\n", "6\t140394\t28726.4\t0\t140394\n", "7\t131039\t7181.6\t0\t131039\n", "8\t97843\t1795.4\t0\t97843\n", "9\t105867\t448.8\t0\t105078 789\n", "10\t101622\t112.2\t1\t100317 1305\n", "11\t65604\t28.1\t1\t64606 998\n", "12\t81956\t7.0\t1\t81028 928\n", "13\t54306\t1.8\t1\t53471 835\n", "14\t88001\t1.8\t1\t86892 1109\n", "15\t26956\t1.8\t1\t26334 622\n", "16\t46221\t1.8\t1\t45427 794\n", "17\t72447\t1.8\t1\t71198 1249\n", "18\t14532\t1.8\t1\t14131 401\n", "19\t52653\t1.8\t1\t51900 753\n", "20\t21994\t1.8\t1\t21627 367\n", "21\t21346\t1.8\t1\t20978 368\n", "22\t28174\t1.8\t1\t27647 527\n", "23\t24146\t1.8\t1\t23473 673\n", "24\t28835\t1.8\t1\t28231 604\n", "25\t12030\t1.8\t1\t11754 276\n", "26\t17003\t1.8\t1\t16660 343\n", "27\t15193\t1.8\t1\t14899 294\n", "28\t17107\t1.8\t1\t16893 214\n", "29\t8289\t1.8\t1\t8063 226\n", "30\t22127\t1.8\t1\t21904 223\n", "31\t368\t1.8\t1\t296 72\n", "32\t12935\t1.8\t1\t12842 93\n", "33\t1642\t1.8\t1\t1588 54\n", "34\t6090\t1.8\t1\t6020 70\n", "35\t6540\t1.8\t1\t6417 123\n", "36\t5260\t1.8\t1\t5191 69\n", "37\t5160\t1.8\t1\t5047 113\n", "38\t4361\t1.8\t1\t4287 74\n", "39\t4410\t1.8\t1\t4308 102\n", "40\t3904\t1.8\t1\t3848 56\n", "41\t3923\t1.8\t1\t3745 178\n", "42\t6239\t1.8\t1\t6117 122\n", "43\t661\t1.8\t1\t627 34\n", "44\t2682\t1.8\t1\t2593 89\n", "45\t4635\t1.8\t1\t4562 73\n", "46\t660\t1.8\t1\t623 37\n", "47\t1770\t1.8\t1\t1752 18\n", "48\t1684\t1.8\t1\t1656 28\n", "49\t1691\t1.8\t1\t1661 30\n", "50\t1906\t1.8\t1\t1859 47\n", "51\t2614\t1.8\t1\t2583 31\n", "52\t790\t1.8\t1\t762 28\n", "53\t764\t1.8\t1\t748 16\n", "54\t1098\t1.8\t1\t1084 14\n", "55\t1178\t1.8\t1\t1152 26\n", "56\t734\t1.8\t1\t706 28\n", "57\t908\t1.8\t1\t884 24\n", "58\t1053\t1.8\t1\t1030 23\n", "59\t822\t1.8\t1\t796 26\n", "60\t904\t1.8\t1\t865 39\n", "61\t887\t1.8\t1\t848 39\n", "62\t877\t1.8\t1\t791 86\n", "63\t1074\t1.8\t1\t950 124\n", "64\t1719\t1.8\t1\t1136 583\n", "65\t3080\t1.8\t1\t1439 1641\n", "66\t2169\t1.8\t1\t1205 964\n", "67\t859\t1.8\t1\t564 295\n", "68\t309\t1.8\t1\t173 136\n", "69\t85\t1.8\t1\t54 31\n", "70\t37\t1.8\t1\t24 13\n", "71\t42\t1.8\t1\t25 17\n", "72\t43\t1.8\t1\t31 12\n", "73\t62\t1.8\t1\t31 31\n", "74\t81\t1.8\t1\t59 22\n", "75\t51\t1.8\t1\t33 18\n", "76\t54\t1.8\t1\t27 27\n", "77\t55\t1.8\t1\t21 34\n", "78\t36\t1.8\t1\t17 19\n", "79\t41\t1.8\t1\t14 27\n", "80\t32\t1.8\t1\t22 10\n", "81\t25\t1.8\t1\t12 13\n", "82\t52\t1.8\t1\t8 44\n", "83\t82\t1.8\t1\t10 72\n", "84\t38\t1.8\t1\t12 26\n", "85\t38\t1.8\t1\t6 32\n", "86\t29\t1.8\t1\t5 24\n", "87\t19\t1.8\t1\t5 14\n", "88\t12\t1.8\t1\t4 8\n", "89\t20\t1.8\t1\t4 16\n", "90\t9\t1.8\t1\t3 6\n", "91\t19\t1.8\t1\t2 17\n", "92\t13\t1.8\t1\t3 10\n", "93\t24\t1.8\t1\t2 22\n", "94\t4\t1.8\t1\t1 3\n", "95\t4\t1.8\t1\t2 2\n", "96\t12\t1.8\t1\t0 12\n", "97\t11\t1.8\t1\t0 11\n", "98\t27\t1.8\t1\t2 25\n", "99\t22\t1.8\t1\t3 19\n", "100\t78\t1.8\t1\t0 78\n", "101\t374\t1.8\t1\t3 371\n", "\n", "\n", "RUN STATISTICS FOR INPUT FILE: /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R2_001.fastq\n", "=============================================\n", "117663240 sequences processed in total\n", "The length threshold of paired-end sequences gets evaluated later on (in the validation step)\n", "\n", "Writing report to 'Geo_Pool_M_CTTGTA_L006_R1_001.fastq_trimming_report.txt'\n", "\n", "SUMMARISING RUN PARAMETERS\n", "==========================\n", "Input filename: /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R1_001.fastq\n", "Trimming mode: paired-end\n", "Trim Galore version: 0.4.0\n", "Cutadapt version: 1.8.1\n", "Quality Phred score cutoff: 20\n", "Quality encoding type selected: ASCII+33\n", "Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)\n", "Maximum trimming error rate: 0.1 (default)\n", "Minimum required adapter overlap (stringency): 1 bp\n", "Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp\n", "Length cut-off for read 1: 35 bp (default)\n", "Length cut-off for read 2: 35 bb (default)\n", "\n", "Writing final adapter and quality trimmed output to Geo_Pool_M_CTTGTA_L006_R1_001_trimmed.fq\n", "\n", "\n", " >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R1_001.fastq <<< \n", "10000000 sequences processed\n", "20000000 sequences processed\n", "30000000 sequences processed\n", "40000000 sequences processed\n", "50000000 sequences processed\n", "60000000 sequences processed\n", "70000000 sequences processed\n", "80000000 sequences processed\n", "90000000 sequences processed\n", "100000000 sequences processed\n", "110000000 sequences processed\n", "This is cutadapt 1.8.1 with Python 2.7.10\n", "Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R1_001.fastq\n", "Trimming 1 adapter with at most 10.0% errors in single-end mode ...\n", "Finished in 1323.17 s (11 us/read; 5.34 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 117,663,240\n", "Reads with adapters: 36,573,174 (31.1%)\n", "Reads written (passing filters): 117,663,240 (100.0%)\n", "\n", "Total basepairs processed: 11,883,987,240 bp\n", "Quality-trimmed: 93,468,521 bp (0.8%)\n", "Total written (filtered): 11,719,003,078 bp (98.6%)\n", "\n", "=== Adapter 1 ===\n", "\n", "Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 36573174 times.\n", "\n", "No. of allowed errors:\n", "0-9 bp: 0; 10-13 bp: 1\n", "\n", "Bases preceding removed adapters:\n", " A: 30.2%\n", " C: 27.7%\n", " G: 22.0%\n", " T: 20.0%\n", " none/other: 0.1%\n", "\n", "Overview of removed sequences\n", "length\tcount\texpect\tmax.err\terror counts\n", "1\t25314985\t29415810.0\t0\t25314985\n", "2\t7083781\t7353952.5\t0\t7083781\n", "3\t2055289\t1838488.1\t0\t2055289\n", "4\t528362\t459622.0\t0\t528362\n", "5\t168427\t114905.5\t0\t168427\n", "6\t141699\t28726.4\t0\t141699\n", "7\t126831\t7181.6\t0\t126831\n", "8\t103210\t1795.4\t0\t103210\n", "9\t103800\t448.8\t0\t102998 802\n", "10\t97754\t112.2\t1\t95922 1832\n", "11\t73805\t28.1\t1\t72519 1286\n", "12\t80967\t7.0\t1\t79627 1340\n", "13\t60342\t1.8\t1\t59215 1127\n", "14\t59644\t1.8\t1\t58560 1084\n", "15\t52364\t1.8\t1\t51290 1074\n", "16\t45994\t1.8\t1\t44529 1465\n", "17\t43596\t1.8\t1\t42339 1257\n", "18\t49349\t1.8\t1\t47755 1594\n", "19\t30364\t1.8\t1\t29445 919\n", "20\t35193\t1.8\t1\t34163 1030\n", "21\t29228\t1.8\t1\t28373 855\n", "22\t29103\t1.8\t1\t28224 879\n", "23\t23037\t1.8\t1\t22244 793\n", "24\t21212\t1.8\t1\t20357 855\n", "25\t17349\t1.8\t1\t16828 521\n", "26\t16326\t1.8\t1\t15741 585\n", "27\t15511\t1.8\t1\t14971 540\n", "28\t15182\t1.8\t1\t14584 598\n", "29\t11906\t1.8\t1\t11596 310\n", "30\t11697\t1.8\t1\t11371 326\n", "31\t7589\t1.8\t1\t7364 225\n", "32\t9443\t1.8\t1\t9211 232\n", "33\t7108\t1.8\t1\t6940 168\n", "34\t7346\t1.8\t1\t7143 203\n", "35\t6915\t1.8\t1\t6724 191\n", "36\t5646\t1.8\t1\t5486 160\n", "37\t5643\t1.8\t1\t5479 164\n", "38\t5219\t1.8\t1\t5075 144\n", "39\t4201\t1.8\t1\t4068 133\n", "40\t4217\t1.8\t1\t4081 136\n", "41\t4405\t1.8\t1\t4263 142\n", "42\t3254\t1.8\t1\t3168 86\n", "43\t2637\t1.8\t1\t2572 65\n", "44\t2372\t1.8\t1\t2276 96\n", "45\t2682\t1.8\t1\t2571 111\n", "46\t2483\t1.8\t1\t2356 127\n", "47\t1993\t1.8\t1\t1905 88\n", "48\t1877\t1.8\t1\t1819 58\n", "49\t1900\t1.8\t1\t1828 72\n", "50\t1600\t1.8\t1\t1527 73\n", "51\t1565\t1.8\t1\t1479 86\n", "52\t1652\t1.8\t1\t1535 117\n", "53\t1577\t1.8\t1\t1515 62\n", "54\t951\t1.8\t1\t900 51\n", "55\t1053\t1.8\t1\t980 73\n", "56\t974\t1.8\t1\t902 72\n", "57\t1060\t1.8\t1\t989 71\n", "58\t943\t1.8\t1\t843 100\n", "59\t1061\t1.8\t1\t983 78\n", "60\t764\t1.8\t1\t652 112\n", "61\t815\t1.8\t1\t728 87\n", "62\t819\t1.8\t1\t642 177\n", "63\t1077\t1.8\t1\t798 279\n", "64\t883\t1.8\t1\t704 179\n", "65\t606\t1.8\t1\t444 162\n", "66\t715\t1.8\t1\t487 228\n", "67\t939\t1.8\t1\t560 379\n", "68\t1163\t1.8\t1\t513 650\n", "69\t3967\t1.8\t1\t565 3402\n", "70\t3654\t1.8\t1\t959 2695\n", "71\t3086\t1.8\t1\t630 2456\n", "72\t1986\t1.8\t1\t362 1624\n", "73\t1579\t1.8\t1\t207 1372\n", "74\t1220\t1.8\t1\t129 1091\n", "75\t300\t1.8\t1\t76 224\n", "76\t250\t1.8\t1\t31 219\n", "77\t211\t1.8\t1\t17 194\n", "78\t154\t1.8\t1\t19 135\n", "79\t133\t1.8\t1\t18 115\n", "80\t81\t1.8\t1\t15 66\n", "81\t59\t1.8\t1\t7 52\n", "82\t75\t1.8\t1\t5 70\n", "83\t87\t1.8\t1\t11 76\n", "84\t49\t1.8\t1\t15 34\n", "85\t31\t1.8\t1\t6 25\n", "86\t21\t1.8\t1\t5 16\n", "87\t29\t1.8\t1\t4 25\n", "88\t31\t1.8\t1\t8 23\n", "89\t25\t1.8\t1\t7 18\n", "90\t18\t1.8\t1\t6 12\n", "91\t44\t1.8\t1\t6 38\n", "92\t12\t1.8\t1\t4 8\n", "93\t27\t1.8\t1\t1 26\n", "94\t24\t1.8\t1\t0 24\n", "95\t30\t1.8\t1\t2 28\n", "96\t65\t1.8\t1\t2 63\n", "97\t57\t1.8\t1\t2 55\n", "98\t121\t1.8\t1\t4 117\n", "99\t182\t1.8\t1\t5 177\n", "100\t494\t1.8\t1\t0 494\n", "101\t1618\t1.8\t1\t1 1617\n", "\n", "\n", "RUN STATISTICS FOR INPUT FILE: /Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R1_001.fastq\n", "=============================================\n", "117663240 sequences processed in total\n", "The length threshold of paired-end sequences gets evaluated later on (in the validation step)\n", "\n", "Validate paired-end files Geo_Pool_M_CTTGTA_L006_R2_001_trimmed.fq and Geo_Pool_M_CTTGTA_L006_R1_001_trimmed.fq\n", "file_1: Geo_Pool_M_CTTGTA_L006_R2_001_trimmed.fq, file_2: Geo_Pool_M_CTTGTA_L006_R1_001_trimmed.fq\n", "\n", "\n", ">>>>> Now validing the length of the 2 paired-end infiles: Geo_Pool_M_CTTGTA_L006_R2_001_trimmed.fq and Geo_Pool_M_CTTGTA_L006_R1_001_trimmed.fq <<<<<\n", "Writing validated paired-end read 1 reads to Geo_Pool_M_CTTGTA_L006_R2_001_val_1.fq\n", "Writing validated paired-end read 2 reads to Geo_Pool_M_CTTGTA_L006_R1_001_val_2.fq\n", "\n", "Writing unpaired read 1 reads to Geo_Pool_M_CTTGTA_L006_R2_001_unpaired_1.fq\n", "Writing unpaired read 2 reads to Geo_Pool_M_CTTGTA_L006_R1_001_unpaired_2.fq\n", "\n", "Total number of sequences analysed: 117663240\n", "\n", "Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 951930 (0.81%)\n", "\n", "Deleting both intermediate output files Geo_Pool_M_CTTGTA_L006_R2_001_trimmed.fq and Geo_Pool_M_CTTGTA_L006_R1_001_trimmed.fq\n", "\n", "====================================================================================================\n", "\n", "Writing report to 'Geo_Pool_F_GGCTAC_L006_R2_001.fastq_trimming_report.txt'\n", "\n", "SUMMARISING RUN PARAMETERS\n", "==========================\n", "Input filename: /Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R2_001.fastq\n", "Trimming mode: paired-end\n", "Trim Galore version: 0.4.0\n", "Cutadapt version: 1.8.1\n", "Quality Phred score cutoff: 20\n", "Quality encoding type selected: ASCII+33\n", "Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)\n", "Maximum trimming error rate: 0.1 (default)\n", "Minimum required adapter overlap (stringency): 1 bp\n", "Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp\n", "Length cut-off for read 1: 35 bp (default)\n", "Length cut-off for read 2: 35 bb (default)\n", "\n", "Writing final adapter and quality trimmed output to Geo_Pool_F_GGCTAC_L006_R2_001_trimmed.fq\n", "\n", "\n", " >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R2_001.fastq <<< \n", "10000000 sequences processed\n", "20000000 sequences processed\n", "30000000 sequences processed\n", "40000000 sequences processed\n", "50000000 sequences processed\n", "60000000 sequences processed\n", "70000000 sequences processed\n", "80000000 sequences processed\n", "90000000 sequences processed\n", "100000000 sequences processed\n", "This is cutadapt 1.8.1 with Python 2.7.10\n", "Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R2_001.fastq\n", "Trimming 1 adapter with at most 10.0% errors in single-end mode ...\n", "Finished in 1234.28 s (12 us/read; 5.06 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 104,070,998\n", "Reads with adapters: 36,698,854 (35.3%)\n", "Reads written (passing filters): 104,070,998 (100.0%)\n", "\n", "Total basepairs processed: 10,511,170,798 bp\n", "Quality-trimmed: 150,777,819 bp (1.4%)\n", "Total written (filtered): 10,286,886,259 bp (97.9%)\n", "\n", "=== Adapter 1 ===\n", "\n", "Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 36698854 times.\n", "\n", "No. of allowed errors:\n", "0-9 bp: 0; 10-13 bp: 1\n", "\n", "Bases preceding removed adapters:\n", " A: 34.0%\n", " C: 27.6%\n", " G: 17.1%\n", " T: 21.2%\n", " none/other: 0.1%\n", "\n", "Overview of removed sequences\n", "length\tcount\texpect\tmax.err\terror counts\n", "1\t26153939\t26017749.5\t0\t26153939\n", "2\t6228369\t6504437.4\t0\t6228369\n", "3\t1938619\t1626109.3\t0\t1938619\n", "4\t582435\t406527.3\t0\t582435\n", "5\t209124\t101631.8\t0\t209124\n", "6\t148620\t25408.0\t0\t148620\n", "7\t141477\t6352.0\t0\t141477\n", "8\t118367\t1588.0\t0\t118367\n", "9\t119274\t397.0\t0\t117193 2081\n", "10\t112287\t99.2\t1\t110157 2130\n", "11\t79231\t24.8\t1\t77576 1655\n", "12\t88909\t6.2\t1\t87742 1167\n", "13\t62889\t1.6\t1\t61296 1593\n", "14\t88212\t1.6\t1\t87141 1071\n", "15\t32297\t1.6\t1\t31572 725\n", "16\t50140\t1.6\t1\t49221 919\n", "17\t73362\t1.6\t1\t72127 1235\n", "18\t18878\t1.6\t1\t18390 488\n", "19\t56164\t1.6\t1\t55364 800\n", "20\t25818\t1.6\t1\t25377 441\n", "21\t25577\t1.6\t1\t25209 368\n", "22\t30079\t1.6\t1\t29504 575\n", "23\t27440\t1.6\t1\t26515 925\n", "24\t31423\t1.6\t1\t30791 632\n", "25\t15234\t1.6\t1\t14903 331\n", "26\t18175\t1.6\t1\t17626 549\n", "27\t16789\t1.6\t1\t16340 449\n", "28\t19705\t1.6\t1\t19455 250\n", "29\t10913\t1.6\t1\t10351 562\n", "30\t29223\t1.6\t1\t28927 296\n", "31\t589\t1.6\t1\t419 170\n", "32\t16451\t1.6\t1\t16294 157\n", "33\t2311\t1.6\t1\t2176 135\n", "34\t7978\t1.6\t1\t7814 164\n", "35\t8906\t1.6\t1\t8469 437\n", "36\t7161\t1.6\t1\t6974 187\n", "37\t7135\t1.6\t1\t6704 431\n", "38\t5977\t1.6\t1\t5705 272\n", "39\t6110\t1.6\t1\t5791 319\n", "40\t5684\t1.6\t1\t5569 115\n", "41\t6169\t1.6\t1\t5450 719\n", "42\t9074\t1.6\t1\t8730 344\n", "43\t1019\t1.6\t1\t967 52\n", "44\t4070\t1.6\t1\t3829 241\n", "45\t6450\t1.6\t1\t6334 116\n", "46\t1059\t1.6\t1\t990 69\n", "47\t2522\t1.6\t1\t2472 50\n", "48\t2376\t1.6\t1\t2324 52\n", "49\t2482\t1.6\t1\t2400 82\n", "50\t2639\t1.6\t1\t2542 97\n", "51\t3781\t1.6\t1\t3720 61\n", "52\t1078\t1.6\t1\t1044 34\n", "53\t1123\t1.6\t1\t1099 24\n", "54\t1612\t1.6\t1\t1566 46\n", "55\t1690\t1.6\t1\t1638 52\n", "56\t1080\t1.6\t1\t1042 38\n", "57\t1340\t1.6\t1\t1252 88\n", "58\t1278\t1.6\t1\t1219 59\n", "59\t1014\t1.6\t1\t955 59\n", "60\t1079\t1.6\t1\t1021 58\n", "61\t1137\t1.6\t1\t1038 99\n", "62\t1170\t1.6\t1\t955 215\n", "63\t1591\t1.6\t1\t1147 444\n", "64\t3625\t1.6\t1\t1372 2253\n", "65\t7172\t1.6\t1\t1690 5482\n", "66\t4560\t1.6\t1\t1414 3146\n", "67\t1640\t1.6\t1\t634 1006\n", "68\t582\t1.6\t1\t180 402\n", "69\t177\t1.6\t1\t76 101\n", "70\t124\t1.6\t1\t50 74\n", "71\t56\t1.6\t1\t32 24\n", "72\t67\t1.6\t1\t33 34\n", "73\t114\t1.6\t1\t37 77\n", "74\t266\t1.6\t1\t228 38\n", "75\t44\t1.6\t1\t27 17\n", "76\t66\t1.6\t1\t27 39\n", "77\t61\t1.6\t1\t25 36\n", "78\t32\t1.6\t1\t17 15\n", "79\t52\t1.6\t1\t7 45\n", "80\t59\t1.6\t1\t20 39\n", "81\t30\t1.6\t1\t12 18\n", "82\t36\t1.6\t1\t13 23\n", "83\t42\t1.6\t1\t18 24\n", "84\t97\t1.6\t1\t8 89\n", "85\t57\t1.6\t1\t12 45\n", "86\t39\t1.6\t1\t10 29\n", "87\t32\t1.6\t1\t5 27\n", "88\t23\t1.6\t1\t9 14\n", "89\t25\t1.6\t1\t1 24\n", "90\t16\t1.6\t1\t6 10\n", "91\t65\t1.6\t1\t11 54\n", "92\t64\t1.6\t1\t8 56\n", "93\t27\t1.6\t1\t4 23\n", "94\t16\t1.6\t1\t3 13\n", "95\t14\t1.6\t1\t2 12\n", "96\t46\t1.6\t1\t3 43\n", "97\t16\t1.6\t1\t1 15\n", "98\t72\t1.6\t1\t4 68\n", "99\t79\t1.6\t1\t2 77\n", "100\t230\t1.6\t1\t1 229\n", "101\t1027\t1.6\t1\t8 1019\n", "\n", "\n", "RUN STATISTICS FOR INPUT FILE: /Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R2_001.fastq\n", "=============================================\n", "104070998 sequences processed in total\n", "The length threshold of paired-end sequences gets evaluated later on (in the validation step)\n", "\n", "Writing report to 'Geo_Pool_F_GGCTAC_L006_R1_001.fastq_trimming_report.txt'\n", "\n", "SUMMARISING RUN PARAMETERS\n", "==========================\n", "Input filename: /Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R1_001.fastq\n", "Trimming mode: paired-end\n", "Trim Galore version: 0.4.0\n", "Cutadapt version: 1.8.1\n", "Quality Phred score cutoff: 20\n", "Quality encoding type selected: ASCII+33\n", "Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)\n", "Maximum trimming error rate: 0.1 (default)\n", "Minimum required adapter overlap (stringency): 1 bp\n", "Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp\n", "Length cut-off for read 1: 35 bp (default)\n", "Length cut-off for read 2: 35 bb (default)\n", "\n", "Writing final adapter and quality trimmed output to Geo_Pool_F_GGCTAC_L006_R1_001_trimmed.fq\n", "\n", "\n", " >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R1_001.fastq <<< \n", "10000000 sequences processed\n", "20000000 sequences processed\n", "30000000 sequences processed\n", "40000000 sequences processed\n", "50000000 sequences processed\n", "60000000 sequences processed\n", "70000000 sequences processed\n", "80000000 sequences processed\n", "90000000 sequences processed\n", "100000000 sequences processed\n", "This is cutadapt 1.8.1 with Python 2.7.10\n", "Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R1_001.fastq\n", "Trimming 1 adapter with at most 10.0% errors in single-end mode ...\n", "Finished in 1244.27 s (12 us/read; 5.02 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 104,070,998\n", "Reads with adapters: 37,295,736 (35.8%)\n", "Reads written (passing filters): 104,070,998 (100.0%)\n", "\n", "Total basepairs processed: 10,511,170,798 bp\n", "Quality-trimmed: 60,385,363 bp (0.6%)\n", "Total written (filtered): 10,371,962,798 bp (98.7%)\n", "\n", "=== Adapter 1 ===\n", "\n", "Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 37295736 times.\n", "\n", "No. of allowed errors:\n", "0-9 bp: 0; 10-13 bp: 1\n", "\n", "Bases preceding removed adapters:\n", " A: 33.2%\n", " C: 27.3%\n", " G: 18.0%\n", " T: 21.3%\n", " none/other: 0.2%\n", "\n", "Overview of removed sequences\n", "length\tcount\texpect\tmax.err\terror counts\n", "1\t26607903\t26017749.5\t0\t26607903\n", "2\t6205626\t6504437.4\t0\t6205626\n", "3\t2015304\t1626109.3\t0\t2015304\n", "4\t605105\t406527.3\t0\t605105\n", "5\t213369\t101631.8\t0\t213369\n", "6\t149182\t25408.0\t0\t149182\n", "7\t135972\t6352.0\t0\t135972\n", "8\t121962\t1588.0\t0\t121962\n", "9\t119594\t397.0\t0\t117688 1906\n", "10\t105176\t99.2\t1\t102213 2963\n", "11\t88007\t24.8\t1\t85816 2191\n", "12\t85821\t6.2\t1\t83926 1895\n", "13\t67361\t1.6\t1\t65457 1904\n", "14\t62670\t1.6\t1\t61353 1317\n", "15\t54158\t1.6\t1\t52922 1236\n", "16\t52418\t1.6\t1\t50649 1769\n", "17\t45872\t1.6\t1\t44547 1325\n", "18\t53653\t1.6\t1\t51881 1772\n", "19\t32783\t1.6\t1\t31807 976\n", "20\t38147\t1.6\t1\t36991 1156\n", "21\t33876\t1.6\t1\t32792 1084\n", "22\t29957\t1.6\t1\t29027 930\n", "23\t26946\t1.6\t1\t25903 1043\n", "24\t24287\t1.6\t1\t23345 942\n", "25\t20639\t1.6\t1\t19936 703\n", "26\t17259\t1.6\t1\t16587 672\n", "27\t16749\t1.6\t1\t16162 587\n", "28\t16867\t1.6\t1\t16228 639\n", "29\t15343\t1.6\t1\t14818 525\n", "30\t15312\t1.6\t1\t14839 473\n", "31\t11469\t1.6\t1\t11036 433\n", "32\t11638\t1.6\t1\t11297 341\n", "33\t9508\t1.6\t1\t9202 306\n", "34\t9567\t1.6\t1\t9247 320\n", "35\t8649\t1.6\t1\t8336 313\n", "36\t7529\t1.6\t1\t7227 302\n", "37\t7203\t1.6\t1\t6931 272\n", "38\t6549\t1.6\t1\t6284 265\n", "39\t5949\t1.6\t1\t5672 277\n", "40\t6630\t1.6\t1\t6219 411\n", "41\t7612\t1.6\t1\t7372 240\n", "42\t3156\t1.6\t1\t2969 187\n", "43\t4344\t1.6\t1\t4203 141\n", "44\t3313\t1.6\t1\t2980 333\n", "45\t4063\t1.6\t1\t3670 393\n", "46\t3833\t1.6\t1\t3560 273\n", "47\t2700\t1.6\t1\t2496 204\n", "48\t2610\t1.6\t1\t2426 184\n", "49\t2761\t1.6\t1\t2577 184\n", "50\t2249\t1.6\t1\t2072 177\n", "51\t2525\t1.6\t1\t2172 353\n", "52\t2566\t1.6\t1\t2157 409\n", "53\t2534\t1.6\t1\t2364 170\n", "54\t1290\t1.6\t1\t1091 199\n", "55\t1618\t1.6\t1\t1328 290\n", "56\t1541\t1.6\t1\t1287 254\n", "57\t1607\t1.6\t1\t1360 247\n", "58\t1465\t1.6\t1\t1005 460\n", "59\t1456\t1.6\t1\t1271 185\n", "60\t1376\t1.6\t1\t671 705\n", "61\t1132\t1.6\t1\t867 265\n", "62\t1482\t1.6\t1\t748 734\n", "63\t2263\t1.6\t1\t993 1270\n", "64\t1402\t1.6\t1\t752 650\n", "65\t1083\t1.6\t1\t436 647\n", "66\t1426\t1.6\t1\t509 917\n", "67\t2172\t1.6\t1\t580 1592\n", "68\t3517\t1.6\t1\t595 2922\n", "69\t13060\t1.6\t1\t594 12466\n", "70\t12407\t1.6\t1\t1062 11345\n", "71\t10499\t1.6\t1\t732 9767\n", "72\t6011\t1.6\t1\t498 5513\n", "73\t4624\t1.6\t1\t266 4358\n", "74\t3658\t1.6\t1\t232 3426\n", "75\t909\t1.6\t1\t85 824\n", "76\t757\t1.6\t1\t40 717\n", "77\t654\t1.6\t1\t19 635\n", "78\t449\t1.6\t1\t12 437\n", "79\t324\t1.6\t1\t13 311\n", "80\t236\t1.6\t1\t11 225\n", "81\t160\t1.6\t1\t13 147\n", "82\t139\t1.6\t1\t10 129\n", "83\t117\t1.6\t1\t21 96\n", "84\t115\t1.6\t1\t8 107\n", "85\t77\t1.6\t1\t17 60\n", "86\t66\t1.6\t1\t17 49\n", "87\t59\t1.6\t1\t6 53\n", "88\t50\t1.6\t1\t11 39\n", "89\t51\t1.6\t1\t6 45\n", "90\t41\t1.6\t1\t9 32\n", "91\t123\t1.6\t1\t15 108\n", "92\t87\t1.6\t1\t7 80\n", "93\t70\t1.6\t1\t3 67\n", "94\t101\t1.6\t1\t1 100\n", "95\t118\t1.6\t1\t3 115\n", "96\t352\t1.6\t1\t7 345\n", "97\t271\t1.6\t1\t3 268\n", "98\t402\t1.6\t1\t3 399\n", "99\t679\t1.6\t1\t3 676\n", "100\t1933\t1.6\t1\t4 1929\n", "101\t6032\t1.6\t1\t14 6018\n", "\n", "\n", "RUN STATISTICS FOR INPUT FILE: /Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R1_001.fastq\n", "=============================================\n", "104070998 sequences processed in total\n", "The length threshold of paired-end sequences gets evaluated later on (in the validation step)\n", "\n", "Validate paired-end files Geo_Pool_F_GGCTAC_L006_R2_001_trimmed.fq and Geo_Pool_F_GGCTAC_L006_R1_001_trimmed.fq\n", "file_1: Geo_Pool_F_GGCTAC_L006_R2_001_trimmed.fq, file_2: Geo_Pool_F_GGCTAC_L006_R1_001_trimmed.fq\n", "\n", "\n", ">>>>> Now validing the length of the 2 paired-end infiles: Geo_Pool_F_GGCTAC_L006_R2_001_trimmed.fq and Geo_Pool_F_GGCTAC_L006_R1_001_trimmed.fq <<<<<\n", "Writing validated paired-end read 1 reads to Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Writing validated paired-end read 2 reads to Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "\n", "Writing unpaired read 1 reads to Geo_Pool_F_GGCTAC_L006_R2_001_unpaired_1.fq\n", "Writing unpaired read 2 reads to Geo_Pool_F_GGCTAC_L006_R1_001_unpaired_2.fq\n", "\n", "Total number of sequences analysed: 104070998\n", "\n", "Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 857078 (0.82%)\n", "\n", "Deleting both intermediate output files Geo_Pool_F_GGCTAC_L006_R2_001_trimmed.fq and Geo_Pool_F_GGCTAC_L006_R1_001_trimmed.fq\n", "\n", "====================================================================================================\n", "\n" ] } ], "source": [ "!/Applications/bioinfo/trim_galore_zip/trim_galore \\\n", "--paired \\\n", "--retain_unpaired \\\n", "--path_to_cutadapt /Users/sr320/.local/bin/cutadapt \\\n", "/Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R2_001.fastq \\\n", "/Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R1_001.fastq \\\n", "/Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R2_001.fastq \\\n", "/Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R1_001.fastq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Files generated\n", "\n", "\"tentacle_1BBC74BF.png\"/" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "u'/Users/sr320/data-genomic/tentacle/Geoduck_v3'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Step 4 - QC (FastQC)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Started analysis of Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 5% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 10% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 15% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 20% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 25% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 30% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 35% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 40% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 45% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 50% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 55% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 60% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 65% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 70% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 75% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 80% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 85% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 90% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Approx 95% complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Analysis complete for Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq\n", "Started analysis of Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 5% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 10% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 15% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 20% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 25% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 30% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 35% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 40% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 45% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 50% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 55% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 60% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 65% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 70% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "Approx 75% complete for Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq\n", "^C\n" ] } ], "source": [ "!/Applications/bioinfo/FastQC/fastqc \\\n", "Geo_Pool_F_GGCTAC_L006_R2_001_val_1.fq \\\n", "Geo_Pool_F_GGCTAC_L006_R1_001_val_2.fq \\\n", "Geo_Pool_M_CTTGTA_L006_R2_001_val_1.fq \\\n", "Geo_Pool_M_CTTGTA_L006_R1_001_val_2.fq\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }