# Annotation of *Acropora hyacinthus* transcriptome 

This workflow details the annotation of an *Acropora hyacinthus* [transcriptome](http://palumbi.stanford.edu/data/33496_Ahyacinthus_CoralContigs.fasta.zip)

The notebook requires you have the following 
- [NCBI Blast: 2.2.3](ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/)
- [SQLShare](https://sqlshare.escience.washington.edu/accounts/login/?next=/sqlshare/%3F__hash__%3D)

The annotation also requires a Uniprot/Swissprot BLAST database. Instructions for setting up this database can be found [here](https://github.com/jldimond/Coral-CpG-ratio-MS/blob/master/README.md)

The orginal analysis was carried out on on Mac OS X v10.10.3 running Python: 2.7.9 and IPython: 3.1.0.

This workflow is structured so that anyone can reproduce the analysis by downloading the repository locally and executing.

In [2]:
cd ../data/Ahya

/Users/jd/Documents/Projects/Coral-CpG-ratio-MS/data/Ahya


In [28]:
#Obtain FASTA file
!curl -O http://palumbi.stanford.edu/data/33496_Ahyacinthus_CoralContigs.fasta.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4601k  100 4601k    0     0  3749k      0  0:00:01  0:00:01 --:--:-- 3750k


In [29]:
!unzip -o 33496_Ahyacinthus_CoralContigs.fasta.zip 

Archive:  33496_Ahyacinthus_CoralContigs.fasta.zip
  inflating: 33496_Ahyacinthus_CoralContigs.fasta  


In [4]:
!head 33496_Ahyacinthus_CoralContigs.fasta

>contig27
CAAAATTCCAGCACTCCGTTTTGCATGGTAAACTTGTCTTAGTAGGACACTGTGGAAGATGTACAGCGCAAGACATCACAGTTGCAAGCGCCGACGAACAGCTGTTAAACTCTCCTCTCATATTCTCGAACAAACCAAATATTTCTTCCTCTCTGTTGTTGCTAACCTTTGAATATATGAAGCTGGCATTAGCACAGGACTCAAAGTTTCCGCCGAGCAGTTT
>contig88
TGTCCTGTGTTAGAGGCCAGCTTCAACCTCTTGCTTTCCCTGTCAGCCGAGTTTTCTTCTCCTTCAATAAGCTGGGATTTTCGATCTCTACTCAATGTTTCCATCAAACACCTGAGAGTTAAATCTGCCAGATAACGAAGAAATCCTCTTGCTAGAATACTTTTCAAAAGCCCTTCTTCATACATTGATCTTATCCCATTGCAAATTGCGTTGG
>contig100
TTCAGAACTATTCTCCGCCACACAGGGATAAATGGTCTTCACTTTCTTAGGTGTTTTTGTCTGTGGTGATGGTGTGGGTTCCTCCTGTGGGGGAGGTTCTCTTGGGAGTGGGGGTGGTCGTGTTCCCCATGACTGTATCCCCCTGTTTAGGCTCCCACCCTGACTGCTGACACGGCGTATTGCATGGGCAGAGCCCTTGTCATTCCTGCCCTTGTCATTC
>contig211
TGGGGCGATCAGGTCACCAACGAAATTGTCCGACAAGTCATGGAGATGAAAGGGTTTTATAGTCTCGACAAACCCGGTGAATTTACCAGCATAGTGGATCTTCAGTTTGTGGCAGCTATGATCCAGCCCGGTGGGGGCCGCAATGACATCCCAAGTCGTCTGAAAAGGCAGTTTACCATCCTCAACTGCACGCTTCCCGCAAATGCGTCCATCGATAAGATCTTTAGCTCTATCGGCTGTGGTTACTTCAACACTGAGCGCGGTTTCC
>contig405
ATTTTTAAT

In [5]:
#Count number of seqs
!fgrep -c ">" 33496_Ahyacinthus_CoralContigs.fasta

33496


### Blastx query

In [None]:
!blastx \
-query 33496_Ahyacinthus_CoralContigs.fasta \ #FASTA file
-db ~blast/db/uniprot_sprot \ #Use your blastx database address
-max_target_seqs 1 \ #maximum number of target sequences = 1
-max_hsps 1 \ #maximum number of high-scoring pairs = 1
-outfmt 6 \ #output format = tabular
-evalue 1E-05 \ #E-value = 10^-5
-num_threads 8 \ #number of threads = 8
-out ../../analyses/Ahya/Ahya_blastx_uniprot.tab \ #Direct output to analyses directory
2> ../../analyses/Ahya/Ahya_blastx_uniprot.error #Direct standard error output to its own file

In [3]:
cd ../../analyses/Ahya

/Users/jd/Documents/Projects/Coral-CpG-ratio-MS/analyses/Ahya


In [33]:
#Checking head and tail of the output file.
!head -10 Ahya_blastx_uniprot.tab

head: ../../analyses/Ahya/Ahya_blastx_uniprot.tab: No such file or directory


In [4]:
!wc Ahya_blastx_uniprot.tab

wc: Ahya_blastx_uniprot.tab: open: No such file or directory


In [12]:
#Removing pipes and converted to tab-delimited file
!tr '|' "\t" <Ahya_blastx_uniprot.tab> Ahya_blastx_uniprot_sql.tab
!head -1 Ahya_blastx_uniprot.tab
!echo SQLShare ready version has Pipes converted to Tabs ....
!head -1 Ahya_blastx_uniprot_sql.tab

contig211	sp|Q96JB1|DYH8_HUMAN	77.53	89	20	0	1	267	2533	2621	4e-44	  158
SQLShare ready version has Pipes converted to Tabs ....
contig211	sp	Q96JB1	DYH8_HUMAN	77.53	89	20	0	1	267	2533	2621	4e-44	  158


# Joining with GOSlim using SQLShare

###First upload dataset
![screen shot1](https://github.com/jldimond/Coral-CpG-ratio-MS/blob/master/images/Screen%20Shot%202015-09-25%20at%2012.01.38%20PM.png?raw=true)

###Then find the dataset, execute query, and download the new dataset
![screen shot](https://github.com/jldimond/Coral-CpG-ratio-MS/blob/master/images/Screen%20Shot%202015-09-25%20at%2012.29.18%20PM.png?raw=true)

##Query (note: insert your SQLShare account instead of jldimond@washington.edu)
`SELECT Distinct Column2 as ContigID, GOSlim_bin
  FROM [jldimond@washington.edu].[Ahya_blastx_uniprot_sql.tab]anno
  left join [sr320@washington.edu].[SPID and GO Numbers]go
  on anno.Column7=go.SPID left join [sr320@washington.edu].[GO_to_GOslim]slim
  on go.GOID=slim.GO_id where aspect like 'P'`

##Output file downloaded to ./Analyses/Ahya: Ahya_GOSlim.csv

In [5]:
!wc Ahya_blastx_uniprot_sql.tab

   11594  162316  899613 Ahya_blastx_uniprot_sql.tab


In [7]:
!python /Users/jd/sqlshare-pythonclient-master/tools/singleupload.py \
-d Ahya_uniprot \
Ahya_blastx_uniprot_sql.tab

#Uploads blast file that you just separated by tabs into SQL share



Traceback (most recent call last):
  File "/Users/jd/sqlshare-pythonclient-master/tools/singleupload.py", line 6, in <module>
    import sqlshare
ImportError: No module named sqlshare


In [16]:
#### Output file downloaded to ./analyses/Ahya: Ahya_GOSlim.csv

In [None]:
#Replacing commas with tabs
!tr ',' "\t" <Ahya_GOSlim.csv> Ahya_GOSlim.tab

In [19]:
!head -10 Ahya_GOSlim.tab

ContigID	GOSlim_bin
contig135011_153678_153601	cell organization and biogenesis
contig135011_153678_153601	other biological processes
contig135011_153678_153601	developmental processes
contig69684	protein metabolism
contig113621	protein metabolism
contig97647	protein metabolism
contig199902	protein metabolism
contig78855	other biological processes
contig8505_94477	DNA metabolism
