This workflow outlines analysis of data from mapping bisulfite treated DNA (BS-seq) to the oyster genome. Initially sequencing reads, in this case from oyster sperm, are mapped to the genome using BS-MAP. ./bsmap -a /Volumes/betty/filtered_174gm_A_NoIndex_L006_R1.fastq.gz -b /Volumes/betty/filtered_174gm_A_NoIndex_L006_R2.fastq.gz -d /Volumes/betty/oyster_v9.fa -o /Volumes/betty/BiGO_Betty_plain.sam -p 4 _output_: Total number of aligned reads: pairs: 90102067 (53%) single a: 17033002 (9.9%) single b: 15975991 (9.3%) Resulting SAM file then subjected to methratio python script python methratio.py -d /Volumes/betty/oyster_v9.fa -u -z -g -o /Volumes/betty/BiGO_betty_plain_methratio_v1.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/betty/BiGO_Betty_plain.sam _output_: total 167774088 valid mappings, 127192776 covered cytosines, average coverage: 13.23 fold. This file was uploaded to SQLShare using python script python singleupload.py sr320@washington.edu c8APIKEYAPIKEYAPIKEY5c15c /Volumes/web/cnidarian/BiGO_betty_plain_methratio_v1.txt
Screenshot%205/24/13%2010:34%20AM directlink:
### Interested in how many CpG loci have methylation information? The following query will get all lines where CG are in the 3rd and 4th position. SELECT * FROM [sr320@washington.edu].[BiGO_betty_plain_methratio_v1.txt]​ betty where context like '__CG_'​​​​​​​​​​​​​ --_=single character wildcard​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​

### To count the number of instances in this query
### How to convert methratio file to gff format In this case will make GFF only of CpGs, where there is at least 10x coverage. SELECT chr as seqname, 'methratio' as source, 'CpG' as feature, pos as start, pos + 1 as [end], ratio as score, strand, '.' as frame, '.' as attribute FROM [sr320@washington.edu]. [BiGO_betty_plain_methratio_v1.txt] betty where context like '__CG_' --_=single character wildcard and CT_Count > 9 Screenshot%205/25/13%207:47%20AM
Screenshot%205/25/13%207:52%20AM