{ "metadata": { "name": "", "signature": "sha256:1df038518d3036220f50974b485b84ca2c37722a42cba7031dafaafecda84373" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Trying to bedgraph from tophat output" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!genomeCoverageBed" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\r\n", "Tool: bedtools genomecov (aka genomeCoverageBed)\r\n", "Version: v2.22.0\r\n", "Summary: Compute the coverage of a feature file among a genome.\r\n", "\r\n", "Usage: bedtools genomecov [OPTIONS] -i -g \r\n", "\r\n", "Options: \r\n", "\t-ibam\t\tThe input file is in BAM format.\r\n", "\t\t\tNote: BAM _must_ be sorted by position\r\n", "\r\n", "\t-d\t\tReport the depth at each genome position (with one-based coordinates).\r\n", "\t\t\tDefault behavior is to report a histogram.\r\n", "\r\n", "\t-dz\t\tReport the depth at each genome position (with zero-based coordinates).\r\n", "\t\t\tReports only non-zero positions.\r\n", "\t\t\tDefault behavior is to report a histogram.\r\n", "\r\n", "\t-bg\t\tReport depth in BedGraph format. For details, see:\r\n", "\t\t\tgenome.ucsc.edu/goldenPath/help/bedgraph.html\r\n", "\r\n", "\t-bga\t\tReport depth in BedGraph format, as above (-bg).\r\n", "\t\t\tHowever with this option, regions with zero \r\n", "\t\t\tcoverage are also reported. This allows one to\r\n", "\t\t\tquickly extract all regions of a genome with 0 \r\n", "\t\t\tcoverage by applying: \"grep -w 0$\" to the output.\r\n", "\r\n", "\t-split\t\tTreat \"split\" BAM or BED12 entries as distinct BED intervals.\r\n", "\t\t\twhen computing coverage.\r\n", "\t\t\tFor BAM files, this uses the CIGAR \"N\" and \"D\" operations \r\n", "\t\t\tto infer the blocks for computing coverage.\r\n", "\t\t\tFor BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds\r\n", "\t\t\tfields (i.e., columns 10,11,12).\r\n", "\r\n", "\t-strand\t\tCalculate coverage of intervals from a specific strand.\r\n", "\t\t\tWith BED files, requires at least 6 columns (strand is column 6). \r\n", "\t\t\t- (STRING): can be + or -\r\n", "\r\n", "\t-5\t\tCalculate coverage of 5\" positions (instead of entire interval).\r\n", "\r\n", "\t-3\t\tCalculate coverage of 3\" positions (instead of entire interval).\r\n", "\r\n", "\t-max\t\tCombine all positions with a depth >= max into\r\n", "\t\t\ta single bin in the histogram. Irrelevant\r\n", "\t\t\tfor -d and -bedGraph\r\n", "\t\t\t- (INTEGER)\r\n", "\r\n", "\t-scale\t\tScale the coverage by a constant factor.\r\n", "\t\t\tEach coverage value is multiplied by this factor before being reported.\r\n", "\t\t\tUseful for normalizing coverage by, e.g., reads per million (RPM).\r\n", "\t\t\t- Default is 1.0; i.e., unscaled.\r\n", "\t\t\t- (FLOAT)\r\n", "\r\n", "\t-trackline\tAdds a UCSC/Genome-Browser track line definition in the first line of the output.\r\n", "\t\t\t- See here for more details about track line definition:\r\n", "\t\t\t http://genome.ucsc.edu/goldenPath/help/bedgraph.html\r\n", "\t\t\t- NOTE: When adding a trackline definition, the output BedGraph can be easily\r\n", "\t\t\t uploaded to the Genome Browser as a custom track,\r\n", "\t\t\t BUT CAN NOT be converted into a BigWig file (w/o removing the first line).\r\n", "\r\n", "\t-trackopts\tWrites additional track line definition parameters in the first line.\r\n", "\t\t\t- Example:\r\n", "\t\t\t -trackopts 'name=\"My Track\" visibility=2 color=255,30,30'\r\n", "\t\t\t Note the use of single-quotes if you have spaces in your parameters.\r\n", "\t\t\t- (TEXT)\r\n", "\r\n", "Notes: \r\n", "\t(1) The genome file should tab delimited and structured as follows:\r\n", "\t \r\n", "\r\n", "\tFor example, Human (hg19):\r\n", "\tchr1\t249250621\r\n", "\tchr2\t243199373\r\n", "\t...\r\n", "\tchr18_gl000207_random\t4262\r\n", "\r\n", "\t(2) The input BED (-i) file must be grouped by chromosome.\r\n", "\t A simple \"sort -k 1,1 > .sorted\" will suffice.\r\n", "\r\n", "\t(3) The input BAM (-ibam) file must be sorted by position.\r\n", "\t A \"samtools sort \" should suffice.\r\n", "\r\n", "Tips: \r\n", "\tOne can use the UCSC Genome Browser's MySQL database to extract\r\n", "\tchromosome sizes. For example, H. sapiens:\r\n", "\r\n", "\tmysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \\\r\n", "\t\"select chrom, size from hg19.chromInfo\" > hg19.genome\r\n", "\r\n" ] } ], "prompt_number": 2 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "bam file" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ls /Volumes/web/halfshell/qdod3/hs-rna/2M" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[31maccepted_hits.bam\u001b[m\u001b[m* \u001b[31malign_summary.txt\u001b[m\u001b[m* \u001b[31mdeletions.bed\u001b[m\u001b[m* \u001b[31minsertions.bed\u001b[m\u001b[m* \u001b[31mjunctions.bed\u001b[m\u001b[m* \u001b[34mlogs\u001b[m\u001b[m/ \u001b[31mprep_reads.info\u001b[m\u001b[m* \u001b[31munmapped.bam\u001b[m\u001b[m*\r\n" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/halfshell/qdod3/cgigas_v9_genome03.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "scaffold360 280\r\n", "scaffold18356 567\r\n", "scaffold20428 1021\r\n", "scaffold18720 628\r\n", "scaffold23246 1949\r\n", "scaffold23910 2448\r\n", "scaffold19422 826\r\n", "scaffold480 447\r\n", "scaffold350 10047\r\n", "scaffold29842 7041\r\n" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "!wc -l /Volumes/web/halfshell/qdod3/cgigas_v9_genome03.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 11969 /Volumes/web/halfshell/qdod3/cgigas_v9_genome03.tab\r\n" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!genomeCoverageBed \\\n", "-bg \\\n", "-split \\\n", "-ibam /Volumes/web/halfshell/qdod3/hs-rna/2M/accepted_hits.bam \\\n", "-g /Volumes/web/halfshell/qdod3/cgigas_v9_genome03.tab \\\n", "> /Volumes/web/halfshell/qdod3/hs-rna/2M/genomecoverage" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "!head Volumes/web/halfshell/qdod3/hs-rna/2M/genomecoverage" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "head: Volumes/web/halfshell/qdod3/hs-rna/2M/genomecoverage: No such file or directory\r\n" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }