--- title: "MethylKit Notebook (Steven's Oly Samples 1-8)" output: html_notebook --- This is a combination of testing Methylkit as well as R Notebook, there may be some rough spots. We begin by loading some required packages, Rcpp may not be required, but methylkit documentation suggests it, so it got loaded. Also sets our working directory, which is both where our output will be directed towards, as well as the location of our data files. ```{r} invisible(require("Rcpp")) invisible(source("http://bioconductor.org/biocLite.R")) invisible(biocLite(c("GenomeRanges", "IRanges"))) invisible(require("devtools")) invisible(install_github("al2na/methylKit", build_vignettes = FALSE, repos = BiocInstaller::biocinstallRepos(), dependencies = TRUE)) invisible(setwd("/home/sean/Documents/Bismark/BismarkData/BismarkOutput/cov_working")) ``` Next we set up some variables for methylkit. olyLables are the individual sample names, file_labels removes the .fastq.gz and appends the appropriate bismark generated text for coverage files. Sample IDs denote the 8 different samples, and treatment is a factor for treatment. ```{r} olyLabels <- c("1_ATCACG_L001_R1_001.fastq.gz", "2_CGATGT_L001_R1_001.fastq.gz", "3_TTAGGC_L001_R1_001.fastq.gz", "4_TGACCA_L001_R1_001.fastq.gz", "5_ACAGTG_L001_R1_001.fastq.gz", "6_GCCAAT_L001_R1_001.fastq.gz", "7_CAGATC_L001_R1_001.fastq.gz", "8_ACTTGA_L001_R1_001.fastq.gz") file_labels <- as.list(paste0(gsub(".fastq.gz", "", olyLabels), "_bismark_bt2.bismark.cov")) sample_id <- list("1", "2", "3", "4", "5", "6", "7", "8") treatment <- c(0,0,0,0,1,1,1,1) ``` Next blob runs initial Methylkit data importation step, creating a methylrawlistDB object. The mincov argument sets minimum coverage level required, 1 includes all data. Next filterByCoverage filters based on high and low read cutoffs, high to eliminate PCR bias, and low to increase statistical efficacy. Picked 3 based on talking to Hollie. The dim commands show how many samples are removed via filtering. ```{r} testObj.1 <- methRead( file_labels, sample_id, assembly = "test", mincov = 1, pipeline = 'bismarkCoverage', context = "CpG", treatment = treatment) dim(testObj.1[[1]]) filtered.testObj.1.3 <- filterByCoverage(testObj.1, lo.count = 3, low.perc = NULL, hi.count = NULL, high.perc = 95) dim(filtered.testObj.1.3[[1]]) ``` Plots a histogram of coverage by base for each sample. Uncomment lines to get output ```{r} getMethylationStats(filtered.testObj.1.3[[1]], plot = T, both.strands = F) getMethylationStats(filtered.testObj.1.3[[2]], plot = T, both.strands = F) getMethylationStats(filtered.testObj.1.3[[3]], plot = T, both.strands = F) getMethylationStats(filtered.testObj.1.3[[4]], plot = T, both.strands = F) getMethylationStats(filtered.testObj.1.3[[5]], plot = T, both.strands = F) getMethylationStats(filtered.testObj.1.3[[6]], plot = T, both.strands = F) getMethylationStats(filtered.testObj.1.3[[7]], plot = T, both.strands = F) getMethylationStats(filtered.testObj.1.3[[8]], plot = T, both.strands = F) ``` Plots coverage histogram ```{r} getCoverageStats(filtered.testObj.1.3[[1]], plot = T, both.strands = F) getCoverageStats(filtered.testObj.1.3[[2]], plot = T, both.strands = F) getCoverageStats(filtered.testObj.1.3[[3]], plot = T, both.strands = F) getCoverageStats(filtered.testObj.1.3[[4]], plot = T, both.strands = F) getCoverageStats(filtered.testObj.1.3[[5]], plot = T, both.strands = F) getCoverageStats(filtered.testObj.1.3[[6]], plot = T, both.strands = F) getCoverageStats(filtered.testObj.1.3[[7]], plot = T, both.strands = F) getCoverageStats(filtered.testObj.1.3[[8]], plot = T, both.strands = F) ``` Merge samples into single object. ```{r} meth <- unite(filtered.testObj.1.3, destrand = F) dim(meth) getCorrelation(meth, plot = F) ``` Plots clustergram and PCA, uncomment lines to output to file in working directory ```{r} par(mfrow = c(1,1)) #pdf(file = "clustergram.pdf") CpGDendro <- clusterSamples(meth, dist = "correlation", method = "ward", plot = T) #dev.off() #pdf(file = "PCA.pdf") PCA <- PCASamples(meth, scale = T, center = T) #dev.off() #pdf(filename = "screeplot.pdf") PCASamples(meth, screeplot = T) #dev.off() ```