{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Running in Docker container on Swoose\n", "\n", "Started Docker container with the following command:\n", "\n", "```docker run -p 8888:8888 -v /home/sam/data/pacbio_oly/:/home/data -it 9ce16ff93ef9 /bin/bash```\n", "\n", "The command allows ```/home/sam/data/pacbio_oly/``` to be accessible to the Docker container.\n", "\n", "Once access to Jupyter Notebook over port 8888 and makes my Jupyter Notebook GitHub repo and my data files the container was started, started Jupyter Notebook with the following command inside the Docker container:\n", "\n", "```jupyter notebook --allow-root```\n", "\n", "This is configured in the Docker container to launch a Jupyter Notebook without a browser on port 8888.\n", "The Docker container is running on an image created from this [Dockerfile (Git commit 832008c](https://github.com/RobertsLab/code/commit/832008c0160d71ae3470200756bbcbc87bb25fb6)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Thu Sep 7 22:03:11 UTC 2017\n" ] } ], "source": [ "%%bash\n", "date" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ff9e68310edc\n" ] } ], "source": [ "%%bash\n", "hostname" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Architecture: x86_64\n", "CPU op-mode(s): 32-bit, 64-bit\n", "Byte Order: Little Endian\n", "CPU(s): 24\n", "On-line CPU(s) list: 0-23\n", "Thread(s) per core: 2\n", "Core(s) per socket: 6\n", "Socket(s): 2\n", "NUMA node(s): 1\n", "Vendor ID: GenuineIntel\n", "CPU family: 6\n", "Model: 44\n", "Model name: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz\n", "Stepping: 2\n", "CPU MHz: 2926.129\n", "BogoMIPS: 5851.98\n", "Virtualization: VT-x\n", "L1d cache: 32K\n", "L1i cache: 32K\n", "L2 cache: 256K\n", "L3 cache: 12288K\n", "NUMA node0 CPU(s): 0-23\n" ] } ], "source": [ "%%bash\n", "lscpu" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " total used free shared buffers cached\n", "Mem: 70G 32G 38G 178M 447M 25G\n", "-/+ buffers/cache: 6.1G 64G\n", "Swap: 4.7G 0B 4.7G\n" ] } ], "source": [ "%%bash\n", "free -mh" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "UsageError: %%bash is a cell magic, but the cell body is empty.\n" ] } ], "source": [ "%%bash" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/data\n" ] } ], "source": [ "%%bash\n", "pwd" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 24738112\n", "-rwxrwxr-x 1 1000 1000 2852947472 Sep 7 21:26 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 3126996263 Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2843320527 Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 3114876304 Sep 7 21:28 170228_PCB-CC_AL_20kb_P6v2_E01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2960438946 Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2995066419 Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2092190052 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1842836662 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1672061431 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A03_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1831019208 Sep 7 21:30 170314_PCB-CC_20kb_P6v2_A04_1_filtered_subreads.fastq\n", "-rw-r--r-- 1 root root 3799 Sep 7 22:12 20170907_docker_pacbio_oly_minimap2.ipynb\n", "-rw-rw-r-- 1 1000 1000 902 Sep 7 21:30 md5sums.txt\n" ] } ], "source": [ "%%bash\n", "ls -l" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run [minimap2](https://github.com/lh3/minimap2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Miminap2 is a fast sequence aligner that can be used with PaBio data. \n", "\n", "#### Using as part of pipeline: minimap/miniasm/racon" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/usr/local/bioinformatics/minimap2-2.1.1_x64-linux/minimap2\n" ] } ], "source": [ "%%bash\n", "which minimap2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run minimap with the ```-x ava-pb``` option. This is listed in the minimap2 manual as a preset for \"PacBio all-vs-all overlap mapping.\"" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[M::mm_idx_gen::44.362*1.37] collected minimizers\n", "[M::mm_idx_gen::53.324*1.64] sorted minimizers\n", "[M::main::53.324*1.64] loaded/built the index for 220331 target sequence(s)\n", "[M::mm_mapopt_update::59.437*1.57] mid_occ = 150; max_occ = 1012\n", "[M::mm_idx_stat] kmer size: 19; skip: 5; is_HPC: 1; #seq: 220331\n", "[M::mm_idx_stat::61.424*1.56] distinct minimizers: 143263923 (59.16% are singletons); average occurrences: 2.317; average spacing: 4.270\n", "[M::worker_pipeline::99.245*2.10] mapped 61679 sequences\n", "[M::worker_pipeline::136.871*2.35] mapped 61589 sequences\n", "[M::worker_pipeline::173.658*2.49] mapped 61193 sequences\n", "[M::worker_pipeline::177.803*2.50] mapped 6791 sequences\n", "[M::worker_pipeline::216.438*2.58] mapped 61406 sequences\n", "[M::worker_pipeline::254.175*2.65] mapped 61190 sequences\n", "[M::worker_pipeline::285.102*2.69] mapped 50466 sequences\n", "[M::worker_pipeline::323.885*2.72] mapped 61057 sequences\n", "[M::worker_pipeline::360.993*2.75] mapped 60462 sequences\n", "[M::worker_pipeline::398.020*2.78] mapped 60668 sequences\n", "[M::worker_pipeline::401.684*2.78] mapped 6051 sequences\n", "[M::worker_pipeline::440.719*2.79] mapped 74133 sequences\n", "[M::worker_pipeline::478.095*2.81] mapped 72450 sequences\n", "[M::worker_pipeline::513.634*2.83] mapped 69360 sequences\n", "[M::worker_pipeline::552.319*2.84] mapped 72499 sequences\n", "[M::worker_pipeline::589.710*2.85] mapped 73630 sequences\n", "[M::worker_pipeline::626.184*2.86] mapped 71104 sequences\n", "[M::worker_pipeline::666.204*2.86] mapped 76847 sequences\n", "[M::worker_pipeline::705.398*2.87] mapped 76285 sequences\n", "[M::worker_pipeline::708.543*2.87] mapped 6132 sequences\n", "[M::worker_pipeline::748.995*2.88] mapped 77534 sequences\n", "[M::worker_pipeline::782.140*2.88] mapped 64988 sequences\n", "[M::worker_pipeline::823.985*2.89] mapped 78021 sequences\n", "[M::worker_pipeline::850.287*2.89] mapped 51988 sequences\n", "[M::worker_pipeline::891.063*2.89] mapped 78098 sequences\n", "[M::worker_pipeline::922.855*2.90] mapped 63965 sequences\n", "[M::main] Version: 2.1.1-r341\n", "[M::main] CMD: minimap2 -x ava-pb 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq 170228_PCB-CC_AL_20kb_P6v2_D01_1_filtered_subreads.fastq 170228_PCB-CC_AL_20kb_P6v2_E01_1_filtered_subreads.fastq 170307_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq 170307_PCB-CC_AL_20kb_P6v2_C02_1_filtered_subreads.fastq 170314_PCB-CC_20kb_P6v2_A01_1_filtered_subreads.fastq 170314_PCB-CC_20kb_P6v2_A02_1_filtered_subreads.fastq 170314_PCB-CC_20kb_P6v2_A03_1_filtered_subreads.fastq 170314_PCB-CC_20kb_P6v2_A04_1_filtered_subreads.fastq\n", "[M::main] Real time: 923.248 sec; CPU: 2673.740 sec\n", "\n", "real\t15m23.428s\n", "user\t44m21.720s\n", "sys\t0m12.200s\n" ] } ], "source": [ "%%bash\n", "time minimap2 -x ava-pb \\\n", "170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq \\\n", "170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq \\\n", "170228_PCB-CC_AL_20kb_P6v2_D01_1_filtered_subreads.fastq \\\n", "170228_PCB-CC_AL_20kb_P6v2_E01_1_filtered_subreads.fastq \\\n", "170307_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq \\\n", "170307_PCB-CC_AL_20kb_P6v2_C02_1_filtered_subreads.fastq \\\n", "170314_PCB-CC_20kb_P6v2_A01_1_filtered_subreads.fastq \\\n", "170314_PCB-CC_20kb_P6v2_A02_1_filtered_subreads.fastq \\\n", "170314_PCB-CC_20kb_P6v2_A03_1_filtered_subreads.fastq \\\n", "170314_PCB-CC_20kb_P6v2_A04_1_filtered_subreads.fastq \\\n", "> 20170905_minimap2_pacibio_oly.paf" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 24738120\n", "-rwxrwxr-x 1 1000 1000 2852947472 Sep 7 21:26 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 3126996263 Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2843320527 Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 3114876304 Sep 7 21:28 170228_PCB-CC_AL_20kb_P6v2_E01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2960438946 Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2995066419 Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2092190052 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1842836662 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1672061431 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A03_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1831019208 Sep 7 21:30 170314_PCB-CC_20kb_P6v2_A04_1_filtered_subreads.fastq\n", "-rw-r--r-- 1 root root 0 Sep 7 22:40 20170905_minimap2_pacibio_oly.paf\n", "-rw-r--r-- 1 root root 10901 Sep 11 17:48 20170907_docker_pacbio_oly_minimap2.ipynb\n", "-rw-rw-r-- 1 1000 1000 902 Sep 7 21:30 md5sums.txt\n" ] } ], "source": [ "%%bash\n", "ls -l" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Well, the output file (```20170905_minimap2_pacibio_oly.paf```) appears to be empty... Let's verify." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "head 20170905_minimap2_pacibio_oly.paf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Not really sure why this is the case. There doesn't appear to be any error messages that were generated. Will try some Googling to see if I can find out anything." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Well, glancing at the manual suggests that this program can only align two sequences:\n", "\n", ">SYNOPSIS\n", " * Indexing the target sequences (optional):\n", " minimap2 [-x preset] -d target.mmi target.fa\n", " minimap2 [-H] [-k kmer] [-w miniWinSize] [-I batchSize] -d target.mmi target.fa\n", "\n", "> * Long-read alignment with CIGAR:\n", " minimap2 -a [-x preset] target.mmi query.fa > output.sam\n", " minimap2 -c [-H] [-k kmer] [-w miniWinSize] [...] target.fa query.fa > output.paf\n", "\n", "> * Long-read overlap without CIGAR:\n", " minimap2 -x ava-ont [-t nThreads] target.fa query.fa > output.paf\n", " \n", "Each of the usage examples only specifies a single target and a single query.\n", "\n", "So, let's test it out by only aligning two sequences and see if the output file actually contains some data.\n", "\n", "Of course, I guess there's always the possiblility that none of the PacBio data actually overlaps with each other (which could explain the empty output file in the initial assembly that used all of the files)?\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[M::mm_idx_gen::44.417*1.37] collected minimizers\n", "[M::mm_idx_gen::53.100*1.63] sorted minimizers\n", "[M::main::53.100*1.63] loaded/built the index for 220331 target sequence(s)\n", "[M::mm_mapopt_update::59.320*1.57] mid_occ = 150; max_occ = 1012\n", "[M::mm_idx_stat] kmer size: 19; skip: 5; is_HPC: 1; #seq: 220331\n", "[M::mm_idx_stat::61.303*1.55] distinct minimizers: 143263923 (59.16% are singletons); average occurrences: 2.317; average spacing: 4.270\n", "[M::worker_pipeline::98.772*2.09] mapped 61679 sequences\n", "[M::worker_pipeline::136.002*2.35] mapped 61589 sequences\n", "[M::worker_pipeline::173.195*2.49] mapped 61193 sequences\n", "[M::worker_pipeline::177.334*2.50] mapped 6791 sequences\n", "[M::main] Version: 2.1.1-r341\n", "[M::main] CMD: minimap2 -x ava-pb 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "[M::main] Real time: 177.408 sec; CPU: 443.432 sec\n", "\n", "real\t2m57.995s\n", "user\t7m16.116s\n", "sys\t0m7.900s\n" ] } ], "source": [ "%%bash\n", "time minimap2 -x ava-pb \\\n", "170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq \\\n", "170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq \\\n", "> 20170911_minimap2_pacibio_oly_170210_vs_170228C01.paf" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 24738124\n", "-rwxrwxr-x 1 1000 1000 2852947472 Sep 7 21:26 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 3126996263 Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2843320527 Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 3114876304 Sep 7 21:28 170228_PCB-CC_AL_20kb_P6v2_E01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2960438946 Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2995066419 Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2092190052 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1842836662 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1672061431 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A03_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1831019208 Sep 7 21:30 170314_PCB-CC_20kb_P6v2_A04_1_filtered_subreads.fastq\n", "-rw-r--r-- 1 root root 0 Sep 7 22:40 20170905_minimap2_pacibio_oly.paf\n", "-rw-r--r-- 1 root root 16091 Sep 11 18:08 20170907_docker_pacbio_oly_minimap2.ipynb\n", "-rw-r--r-- 1 root root 0 Sep 11 18:03 20170911_minimap2_pacibio_oly_170210_vs_170228C01.paf\n", "-rw-rw-r-- 1 1000 1000 902 Sep 7 21:30 md5sums.txt\n" ] } ], "source": [ "%%bash\n", "ls -l" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, this output file (20170911_minimap2_pacibio_oly_170210_vs_170228C01.paf) is also empty. I'll just post an issue to the [minimap2 GitHub page](https://github.com/lh3/minimap2) and see if I get any help there." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Looking at the same manual entry, I just realized something else - the manual uses FASTA files as the example query/target files! Could this really be the issue? Let's find out..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Copied and gunzipped two fasta files from the PacBio data folders (nightingales/O_lurida/20170323_pacbio)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-rw-r-- 1 1000 1000 1463327832 Sep 11 18:51 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta\n", "-rw-rw-r-- 1 1000 1000 1601947458 Sep 11 18:49 170228_PCB-CC_AL_20kb_P6v2_C01_1.fasta\n" ] } ], "source": [ "%%bash\n", "ls -l *.fasta" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "mv 170228_PCB-CC_AL_20kb_P6v2_C01_1.fasta 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fasta" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-rw-r-- 1 1000 1000 1463327832 Sep 11 18:51 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta\n", "-rw-rw-r-- 1 1000 1000 1601947458 Sep 11 18:49 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fasta\n" ] } ], "source": [ "%%bash\n", "ls -l *.fasta" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[M::mm_idx_gen::44.207*1.36] collected minimizers\n", "[M::mm_idx_gen::52.985*1.63] sorted minimizers\n", "[M::main::52.985*1.63] loaded/built the index for 220331 target sequence(s)\n", "[M::mm_mapopt_update::59.202*1.57] mid_occ = 150; max_occ = 1012\n", "[M::mm_idx_stat] kmer size: 19; skip: 5; is_HPC: 1; #seq: 220331\n", "[M::mm_idx_stat::61.148*1.55] distinct minimizers: 143263923 (59.16% are singletons); average occurrences: 2.317; average spacing: 4.270\n", "[M::worker_pipeline::99.409*2.10] mapped 61679 sequences\n", "[M::worker_pipeline::136.320*2.35] mapped 61589 sequences\n", "[M::worker_pipeline::172.797*2.49] mapped 61193 sequences\n", "[M::worker_pipeline::176.903*2.50] mapped 6791 sequences\n", "[M::main] Version: 2.1.1-r341\n", "[M::main] CMD: minimap2 -x ava-pb 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fasta\n", "[M::main] Real time: 176.953 sec; CPU: 442.168 sec\n", "\n", "real\t2m57.459s\n", "user\t7m15.560s\n", "sys\t0m7.112s\n" ] } ], "source": [ "%%bash\n", "time minimap2 -x ava-pb \\\n", "170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta \\\n", "170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fasta \\\n", "> 20170911_minimap2_pacibio_oly_170210_vs_170228C01.paf" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 27731572\n", "-rw-rw-r-- 1 1000 1000 1463327832 Sep 11 18:51 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta\n", "-rwxrwxr-x 1 1000 1000 2852947472 Sep 7 21:26 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rw-rw-r-- 1 1000 1000 1601947458 Sep 11 18:49 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fasta\n", "-rwxrwxr-x 1 1000 1000 3126996263 Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2843320527 Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 3114876304 Sep 7 21:28 170228_PCB-CC_AL_20kb_P6v2_E01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2960438946 Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2995066419 Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2092190052 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1842836662 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1672061431 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A03_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1831019208 Sep 7 21:30 170314_PCB-CC_20kb_P6v2_A04_1_filtered_subreads.fastq\n", "-rw-r--r-- 1 root root 0 Sep 7 22:40 20170905_minimap2_pacibio_oly.paf\n", "-rw-r--r-- 1 root root 20221 Sep 11 18:54 20170907_docker_pacbio_oly_minimap2.ipynb\n", "-rw-r--r-- 1 root root 0 Sep 11 18:54 20170911_minimap2_pacibio_oly_170210_vs_170228C01.paf\n", "-rw-rw-r-- 1 1000 1000 902 Sep 7 21:30 md5sums.txt\n" ] } ], "source": [ "%%bash\n", "ls -l" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, that didn't do anything. Like a doofus, I should've thought of the next step as my initial test!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Align a file against itself to see if this thing is actually working! Duh! \n", "\n", "(I'm also bumping this up to 12 - added thread argument ```-t 12``` to use 12 computing threads)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[M::mm_idx_gen::44.598*1.36] collected minimizers\n", "[M::mm_idx_gen::47.690*2.02] sorted minimizers\n", "[M::main::47.690*2.02] loaded/built the index for 220331 target sequence(s)\n", "[M::mm_mapopt_update::53.712*1.90] mid_occ = 150; max_occ = 1012\n", "[M::mm_idx_stat] kmer size: 19; skip: 5; is_HPC: 1; #seq: 220331\n", "[M::mm_idx_stat::55.658*1.87] distinct minimizers: 143263923 (59.16% are singletons); average occurrences: 2.317; average spacing: 4.270\n", "[M::worker_pipeline::79.382*4.78] mapped 78070 sequences\n", "[M::worker_pipeline::103.565*6.50] mapped 77485 sequences\n", "[M::worker_pipeline::118.661*7.11] mapped 64776 sequences\n", "[M::main] Version: 2.1.1-r341\n", "[M::main] CMD: minimap2 -x ava-pb -t 12 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta\n", "[M::main] Real time: 119.067 sec; CPU: 844.484 sec\n", "\n", "real\t1m59.437s\n", "user\t13m53.832s\n", "sys\t0m10.868s\n" ] } ], "source": [ "%%bash\n", "time minimap2 -x ava-pb -t 12 \\\n", "170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta \\\n", "170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta \\\n", "> 20170911_minimap2_pacibio_oly_170210.paf" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 29050584\n", "-rw-rw-r-- 1 1000 1000 1463327832 Sep 11 18:51 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta\n", "-rwxrwxr-x 1 1000 1000 2852947472 Sep 7 21:26 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rw-rw-r-- 1 1000 1000 1601947458 Sep 11 18:49 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fasta\n", "-rwxrwxr-x 1 1000 1000 3126996263 Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2843320527 Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 3114876304 Sep 7 21:28 170228_PCB-CC_AL_20kb_P6v2_E01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2960438946 Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2995066419 Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2092190052 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1842836662 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1672061431 Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A03_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1831019208 Sep 7 21:30 170314_PCB-CC_20kb_P6v2_A04_1_filtered_subreads.fastq\n", "-rw-r--r-- 1 root root 0 Sep 7 22:40 20170905_minimap2_pacibio_oly.paf\n", "-rw-r--r-- 1 root root 25434 Sep 11 19:04 20170907_docker_pacbio_oly_minimap2.ipynb\n", "-rw-r--r-- 1 root root 1350653569 Sep 11 19:04 20170911_minimap2_pacibio_oly_170210.paf\n", "-rw-r--r-- 1 root root 0 Sep 11 18:54 20170911_minimap2_pacibio_oly_170210_vs_170228C01.paf\n", "-rw-rw-r-- 1 1000 1000 902 Sep 7 21:30 md5sums.txt\n" ] } ], "source": [ "%%bash\n", "ls -l" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-r--r-- 1 root root 1.3G Sep 11 19:04 20170911_minimap2_pacibio_oly_170210.paf\n" ] } ], "source": [ "%%bash\n", "ls -lh 20170911_minimap2_pacibio_oly_170210.paf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hey, it worked! For fun, I'm just going to test with FASTQ files and see what happens..." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[M::mm_idx_gen::44.018*1.37] collected minimizers\n", "[M::mm_idx_gen::47.074*2.04] sorted minimizers\n", "[M::main::47.074*2.04] loaded/built the index for 220331 target sequence(s)\n", "[M::mm_mapopt_update::53.133*1.92] mid_occ = 150; max_occ = 1012\n", "[M::mm_idx_stat] kmer size: 19; skip: 5; is_HPC: 1; #seq: 220331\n", "[M::mm_idx_stat::55.089*1.89] distinct minimizers: 143263923 (59.16% are singletons); average occurrences: 2.317; average spacing: 4.270\n", "[M::worker_pipeline::79.106*4.84] mapped 78070 sequences\n", "[M::worker_pipeline::103.637*6.56] mapped 77485 sequences\n", "[M::worker_pipeline::118.680*7.17] mapped 64776 sequences\n", "[M::main] Version: 2.1.1-r341\n", "[M::main] CMD: minimap2 -x ava-pb -t 12 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "[M::main] Real time: 118.727 sec; CPU: 851.000 sec\n", "\n", "real\t1m59.282s\n", "user\t14m0.536s\n", "sys\t0m11.016s\n" ] } ], "source": [ "%%bash\n", "time minimap2 -x ava-pb -t 12 \\\n", "170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq \\\n", "170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq \\\n", "> 20170911_minimap2_pacbio_oly_170210_fq.paf" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-r--r-- 1 root root 1.3G Sep 11 19:11 20170911_minimap2_pacbio_oly_170210_fq.paf\n" ] } ], "source": [ "%%bash\n", "ls -lh 20170911_minimap2_pacbio_oly_170210_fq.paf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, this worked, too! So, this leads me to believe I need to run a query against itself in order for us to get the data we need to proceed to the next step in this pipeline (miniasm). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Concatenate all FASTQ files into a single file" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t1m58.624s\n", "user\t0m0.116s\n", "sys\t0m20.216s\n" ] } ], "source": [ "%%bash\n", "time for i in *.fastq\n", "do cat \"$i\" >> 201709011_oly_pacbio_cat.fastq\n", "done" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 53G\n", "-rw-rw-r-- 1 1000 1000 1.4G Sep 11 18:51 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fasta\n", "-rwxrwxr-x 1 1000 1000 2.7G Sep 7 21:26 170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rw-rw-r-- 1 1000 1000 1.5G Sep 11 18:49 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fasta\n", "-rwxrwxr-x 1 1000 1000 3.0G Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2.7G Sep 7 21:27 170228_PCB-CC_AL_20kb_P6v2_D01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 3.0G Sep 7 21:28 170228_PCB-CC_AL_20kb_P6v2_E01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2.8G Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2.8G Sep 7 21:28 170307_PCB-CC_AL_20kb_P6v2_C02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 2.0G Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A01_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1.8G Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A02_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1.6G Sep 7 21:29 170314_PCB-CC_20kb_P6v2_A03_1_filtered_subreads.fastq\n", "-rwxrwxr-x 1 1000 1000 1.8G Sep 7 21:30 170314_PCB-CC_20kb_P6v2_A04_1_filtered_subreads.fastq\n", "-rw-r--r-- 1 root root 24G Sep 11 19:31 201709011_oly_pacbio_cat.fastq\n", "-rw-r--r-- 1 root root 0 Sep 7 22:40 20170905_minimap2_pacibio_oly.paf\n", "-rw-r--r-- 1 root root 31K Sep 11 19:32 20170907_docker_pacbio_oly_minimap2.ipynb\n", "-rw-r--r-- 1 root root 1.3G Sep 11 19:11 20170911_minimap2_pacbio_oly_170210_fq.paf\n", "-rw-r--r-- 1 root root 1.3G Sep 11 19:04 20170911_minimap2_pacibio_oly_170210.paf\n", "-rw-r--r-- 1 root root 0 Sep 11 18:54 20170911_minimap2_pacibio_oly_170210_vs_170228C01.paf\n", "-rw-rw-r-- 1 1000 1000 902 Sep 7 21:30 md5sums.txt\n" ] } ], "source": [ "%%bash\n", "ls -lh" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "mv 201709011_oly_pacbio_cat.fastq 20170911_oly_pacbio_cat.fastq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Map concatenated PacBio data against itself" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[M::mm_idx_gen::130.358*1.51] collected minimizers\n", "[M::mm_idx_gen::136.461*2.38] sorted minimizers\n", "[M::main::136.461*2.38] loaded/built the index for 537516 target sequence(s)\n", "[M::mm_mapopt_update::151.239*2.25] mid_occ = 216; max_occ = 1259\n", "[M::mm_idx_stat] kmer size: 19; skip: 5; is_HPC: 1; #seq: 537516\n", "[M::mm_idx_stat::155.085*2.22] distinct minimizers: 263663554 (42.00% are singletons); average occurrences: 3.579; average spacing: 4.239\n", "[M::worker_pipeline::188.540*5.77] mapped 78070 sequences\n", "[M::worker_pipeline::219.029*8.19] mapped 77485 sequences\n", "[M::worker_pipeline::242.542*9.64] mapped 75465 sequences\n", "[M::worker_pipeline::259.888*10.54] mapped 61141 sequences\n", "[M::worker_pipeline::280.680*11.47] mapped 61571 sequences\n", "[M::worker_pipeline::299.354*12.19] mapped 61587 sequences\n", "[M::worker_pipeline::316.902*12.80] mapped 61048 sequences\n", "[M::worker_pipeline::335.060*13.35] mapped 61149 sequences\n", "[M::worker_pipeline::350.760*13.78] mapped 60782 sequences\n", "[M::worker_pipeline::365.411*14.16] mapped 60930 sequences\n", "[M::worker_pipeline::380.131*14.50] mapped 60739 sequences\n", "[M::worker_pipeline::394.948*14.82] mapped 62543 sequences\n", "[M::worker_pipeline::410.157*15.13] mapped 73661 sequences\n", "[M::worker_pipeline::425.195*15.41] mapped 72563 sequences\n", "[M::worker_pipeline::440.318*15.67] mapped 73439 sequences\n", "[M::worker_pipeline::455.319*15.91] mapped 72620 sequences\n", "[M::worker_pipeline::470.257*16.14] mapped 73624 sequences\n", "[M::worker_pipeline::485.364*16.35] mapped 73660 sequences\n", "[M::worker_pipeline::501.031*16.56] mapped 76838 sequences\n", "[M::worker_pipeline::516.778*16.76] mapped 76351 sequences\n", "[M::worker_pipeline::532.511*16.94] mapped 77711 sequences\n", "[M::worker_pipeline::548.370*17.12] mapped 78261 sequences\n", "[M::worker_pipeline::564.239*17.28] mapped 78012 sequences\n", "[M::worker_pipeline::579.895*17.44] mapped 78098 sequences\n", "[M::worker_pipeline::595.449*17.58] mapped 78393 sequences\n", "[M::worker_pipeline::598.265*17.61] mapped 14176 sequences\n", "[M::mm_idx_gen::723.834*14.81] collected minimizers\n", "[M::mm_idx_gen::729.497*14.86] sorted minimizers\n", "[M::main::729.497*14.86] loaded/built the index for 537277 target sequence(s)\n", "[M::mm_mapopt_update::750.838*14.47] mid_occ = 193; max_occ = 1097\n", "[M::mm_idx_stat] kmer size: 19; skip: 5; is_HPC: 1; #seq: 537277\n", "[M::mm_idx_stat::754.770*14.40] distinct minimizers: 269906881 (41.64% are singletons); average occurrences: 3.508; average spacing: 4.225\n", "[M::worker_pipeline::793.777*14.55] mapped 78070 sequences\n", "[M::worker_pipeline::819.415*14.82] mapped 77485 sequences\n", "[M::worker_pipeline::843.192*15.05] mapped 75465 sequences\n", "[M::worker_pipeline::860.856*15.22] mapped 61141 sequences\n", "[M::worker_pipeline::881.694*15.41] mapped 61571 sequences\n", "[M::worker_pipeline::902.016*15.58] mapped 61587 sequences\n", "[M::worker_pipeline::924.371*15.76] mapped 61048 sequences\n", "[M::worker_pipeline::946.292*15.93] mapped 61149 sequences\n", "[M::worker_pipeline::967.784*16.09] mapped 60782 sequences\n", "[M::worker_pipeline::988.697*16.24] mapped 60930 sequences\n", "[M::worker_pipeline::1010.371*16.39] mapped 60739 sequences\n", "[M::worker_pipeline::1030.331*16.52] mapped 62543 sequences\n", "[M::worker_pipeline::1049.598*16.64] mapped 73661 sequences\n", "[M::worker_pipeline::1068.946*16.76] mapped 72563 sequences\n", "[M::worker_pipeline::1086.343*16.86] mapped 73439 sequences\n", "[M::worker_pipeline::1101.711*16.94] mapped 72620 sequences\n", "[M::worker_pipeline::1116.774*17.03] mapped 73624 sequences\n", "[M::worker_pipeline::1130.965*17.10] mapped 73660 sequences\n", "[M::worker_pipeline::1144.947*17.17] mapped 76838 sequences\n", "[M::worker_pipeline::1159.121*17.24] mapped 76351 sequences\n", "[M::worker_pipeline::1173.292*17.32] mapped 77711 sequences\n", "[M::worker_pipeline::1187.516*17.38] mapped 78261 sequences\n", "[M::worker_pipeline::1201.661*17.45] mapped 78012 sequences\n", "[M::worker_pipeline::1215.707*17.52] mapped 78098 sequences\n", "[M::worker_pipeline::1229.665*17.58] mapped 78393 sequences\n", "[M::worker_pipeline::1232.203*17.59] mapped 14176 sequences\n", "[M::mm_idx_gen::1379.041*15.87] collected minimizers\n", "[M::mm_idx_gen::1384.750*15.89] sorted minimizers\n", "[M::main::1384.750*15.89] loaded/built the index for 612555 target sequence(s)\n", "[M::mm_mapopt_update::1400.855*15.72] mid_occ = 283; max_occ = 1800\n", "[M::mm_idx_stat] kmer size: 19; skip: 5; is_HPC: 1; #seq: 612555\n", "[M::mm_idx_stat::1404.738*15.68] distinct minimizers: 251016217 (43.01% are singletons); average occurrences: 3.728; average spacing: 4.274\n", "[M::worker_pipeline::1453.808*15.78] mapped 78070 sequences\n", "[M::worker_pipeline::1487.401*15.94] mapped 77485 sequences\n", "[M::worker_pipeline::1513.014*15.97] mapped 75465 sequences\n", "[M::worker_pipeline::1562.314*15.88] mapped 61141 sequences\n", "[M::worker_pipeline::1586.832*15.99] mapped 61571 sequences\n", "[M::worker_pipeline::1610.594*16.09] mapped 61587 sequences\n", "[M::worker_pipeline::1637.485*16.21] mapped 61048 sequences\n", "[M::worker_pipeline::1662.537*16.31] mapped 61149 sequences\n", "[M::worker_pipeline::1686.189*16.37] mapped 60782 sequences\n", "[M::worker_pipeline::1716.947*16.48] mapped 60930 sequences\n", "[M::worker_pipeline::1739.231*16.54] mapped 60739 sequences\n", "[M::worker_pipeline::1767.526*16.65] mapped 62543 sequences\n", "[M::worker_pipeline::1795.989*16.75] mapped 73661 sequences\n", "[M::worker_pipeline::1821.595*16.84] mapped 72563 sequences\n", "[M::worker_pipeline::1846.132*16.88] mapped 73439 sequences\n", "[M::worker_pipeline::1881.615*16.95] mapped 72620 sequences\n", "[M::worker_pipeline::1909.401*17.04] mapped 73624 sequences\n", "[M::worker_pipeline::1936.466*17.13] mapped 73660 sequences\n", "[M::worker_pipeline::1969.038*17.22] mapped 76838 sequences\n", "[M::worker_pipeline::1998.795*17.31] mapped 76351 sequences\n", "[M::worker_pipeline::2023.857*17.38] mapped 77711 sequences\n", "[M::worker_pipeline::2048.376*17.45] mapped 78261 sequences\n", "[M::worker_pipeline::2070.530*17.51] mapped 78012 sequences\n", "[M::worker_pipeline::2090.002*17.56] mapped 78098 sequences\n", "[M::worker_pipeline::2107.277*17.61] mapped 78393 sequences\n", "[M::worker_pipeline::2110.051*17.61] mapped 14176 sequences\n", "[M::mm_idx_gen::2133.572*17.43] collected minimizers\n", "[M::mm_idx_gen::2134.303*17.43] sorted minimizers\n", "[M::main::2134.303*17.43] loaded/built the index for 92569 target sequence(s)\n", "[M::mm_mapopt_update::2137.287*17.41] mid_occ = 96; max_occ = 702\n", "[M::mm_idx_stat] kmer size: 19; skip: 5; is_HPC: 1; #seq: 92569\n", "[M::mm_idx_stat::2138.223*17.40] distinct minimizers: 79280749 (71.19% are singletons); average occurrences: 1.736; average spacing: 4.295\n", "[M::worker_pipeline::2150.329*17.42] mapped 78070 sequences\n", "[M::worker_pipeline::2158.745*17.42] mapped 77485 sequences\n", "[M::worker_pipeline::2169.431*17.43] mapped 75465 sequences\n", "[M::worker_pipeline::2175.999*17.44] mapped 61141 sequences\n", "[M::worker_pipeline::2201.521*17.31] mapped 61571 sequences\n", "[M::worker_pipeline::2210.181*17.31] mapped 61587 sequences\n", "[M::worker_pipeline::2219.166*17.32] mapped 61048 sequences\n", "[M::worker_pipeline::2230.005*17.32] mapped 61149 sequences\n", "[M::worker_pipeline::2236.932*17.34] mapped 60782 sequences\n", "[M::worker_pipeline::2243.696*17.36] mapped 60930 sequences\n", "[M::worker_pipeline::2250.382*17.38] mapped 60739 sequences\n", "[M::worker_pipeline::2257.541*17.39] mapped 62543 sequences\n", "[M::worker_pipeline::2265.596*17.42] mapped 73661 sequences\n", "[M::worker_pipeline::2272.442*17.43] mapped 72563 sequences\n", "[M::worker_pipeline::2279.949*17.45] mapped 73439 sequences\n", "[M::worker_pipeline::2286.867*17.47] mapped 72620 sequences\n", "[M::worker_pipeline::2294.249*17.49] mapped 73624 sequences\n", "[M::worker_pipeline::2301.783*17.51] mapped 73660 sequences\n", "[M::worker_pipeline::2311.212*17.53] mapped 76838 sequences\n", "[M::worker_pipeline::2320.151*17.56] mapped 76351 sequences\n", "[M::worker_pipeline::2329.772*17.58] mapped 77711 sequences\n", "[M::worker_pipeline::2338.490*17.60] mapped 78261 sequences\n", "[M::worker_pipeline::2347.768*17.62] mapped 78012 sequences\n", "[M::worker_pipeline::2354.384*17.64] mapped 78098 sequences\n", "[M::worker_pipeline::2364.010*17.66] mapped 78393 sequences\n", "[M::worker_pipeline::2364.329*17.66] mapped 14176 sequences\n", "[M::main] Version: 2.1.1-r341\n", "[M::main] CMD: minimap2 -x ava-pb -t 23 20170911_oly_pacbio_cat.fastq 20170911_oly_pacbio_cat.fastq\n", "[M::main] Real time: 2364.639 sec; CPU: 41757.964 sec\n", "\n", "real\t39m25.094s\n", "user\t690m58.980s\n", "sys\t4m59.240s\n" ] } ], "source": [ "%%bash\n", "time minimap2 -x ava-pb -t 23 \\\n", "20170911_oly_pacbio_cat.fastq \\\n", "20170911_oly_pacbio_cat.fastq \\\n", "> 20170911_minimap2_pacbio_oly.paf" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-r--r-- 1 root root 40G Sep 11 22:02 20170911_minimap2_pacbio_oly.paf\n" ] } ], "source": [ "%%bash\n", "ls -lh 20170911_minimap2_pacbio_oly.paf" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mon Sep 18 13:57:31 UTC 2017\n" ] } ], "source": [ "%%bash\n", "date" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### Well, I think this worked. Will proceed to next step in pipeline (miniasm) for de-novo assembly. Here's a link to that notebook: [https://github.com/sr320/LabDocs/blob/master/jupyter_nbs/sam/20170907_docker_pacbio_oly_minimap2.ipynb](https://github.com/sr320/LabDocs/blob/master/jupyter_nbs/sam/20170907_docker_pacbio_oly_minimap2.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }