--- title: "Geoduck transcriptome" output: html_notebook --- Purpose of this notebook is to develop some graphs of CpGo/e for the Geoduck transcriptome. lets load up some data! ```{r} library(forestplot) data <- read_delim("~/Documents/Roberts Lab/geoduck transcriptome/Geoduck-transcriptome_v3_bigtable_v08.tab.txt", "\t", escape_double = FALSE, trim_ws = TRUE) GOslim_bin <- read_delim("~/Documents/Roberts Lab/geoduck transcriptome/GOslim-bin.txt", "\t", escape_double = FALSE, trim_ws = TRUE) ``` ```{r} dim(data) head(data) colnames(data) dim(GOslim_bin) head(GOslim_bin) ``` Lets weld these two together, at least where they match. ```{r} data_merged <- merge(data, GOslim_bin, by.x = "ContigID", by.y = "Column1", all = FALSE) ``` ```{r} data_merged$GOSlim_bin <- as.factor(data_merged$GOSlim_bin) GO_means[1] <- levels(data_merged$GOSlim_bin) GO_means[2] <- as.data.frame(tapply(data_merged$`CpG_o/e`, data_merged$GOSlim_bin, mean)) GO_means[3] <- tapply(data_merged$`CpG_o/e`, data_merged$GOSlim_bin, sd) GO_means[4] <- GO_means[3]/sqrt(summary(data_merged$GOSlim_bin)) GO_means[5] <- GO_means[2] - GO_means[4] GO_means[6] <- GO_means[2] + GO_means[4] colnames(GO_means) <- c("Gene Types", "CpG O/E Mean", "CpG O/E SD", "CpG O/E SEM", "Lower", "Upper") head(GO_means) ``` Lets do a couple quick spot checks, to make sure my logic for calculating the SEM works (I have the right S and sqrt(N) lined up.) ```{r} summary(data_merged$GOSlim_bin) factor(data_merged$GOSlim_bin)[1] transport_n <- length(data_merged$GOSlim_bin[which(data_merged$GOSlim_bin == factor(data_merged$GOSlim_bin)[1])]) transport_SEM <- sd(data_merged$`CpG_o/e`[which(data_merged$GOSlim_bin == factor(data_merged$GOSlim_bin)[1])]) / sqrt(transport_n) transport_SEM == GO_means[14,4] protein_met_n <- length(data_merged$GOSlim_bin[which(data_merged$GOSlim_bin == factor(data_merged$GOSlim_bin)[2])]) protein_met_SEM <- sd(data_merged$`CpG_o/e`[which(data_merged$GOSlim_bin == factor(data_merged$GOSlim_bin)[2])]) / sqrt(protein_met_n) protein_met_SEM == GO_means[10,4] ``` Good deal, looks like it worked! Or I got really lucky twice. Now to play with the forestplot command. I had some issues with the column gapping, originally it wanted to right justify the graph, leading to a large gap between the labels and the graph. I cheated around this by using the colgap argument with a negative number. This led to some weird spacing issues with the graph label. I'll likely look for a better tool for making the forest plot, but this is a functioning first pass. ```{r} forestplot(labeltext = GO_means$`Gene Types`, mean = GO_means$`CpG O/E Mean`, lower = GO_means$Lower, upper = GO_means$Upper, xticks = c(0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75), boxsize = 0.08, grid = TRUE, lwd.ci = 5, ci.verticies = TRUE, lty.ci = 1, colgap = unit(-140, 'mm'), xlab = " CpG O/E") ``` Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Cmd+Option+I*.