genefish - Emma's Summer Notebook 2013

August 26, 2013
Secondary stress: proteomics
Based on the Venn diagram (8/19/13), there are 7 proteins that are shared between the two mechanical stress responses that are not shared with the OA response. All of these proteins change expression in the same direction for the two mechanical stress responses (at different pCO2). Two of them were unnanotated and both of these were down-regulated: CGI_1004918 and CGI_10027073. Prohormone-4, kyphoscoliosis peptidase (cytoskeleton), and SAM domain and HD domain-containing protein 1 (immune) were all expressed less after mech stress. Apolipophorin (lipid transport), and 60S ribosomal protein L13 were both expressed more after mech stress.

Editing supplementary file 4.
Averaged NSAF across biological reps for each treatment:
https://sqlshare.escience.washington.edu/sqlshare#s=query/emmats%40washington.edu/NSAF%20with%20averages%20per%20treatment
Calculated fold change for each treatment comparison:
(this step to be done at a later date, for now will be done in Excel)
Joined with file indicating which proteins are >5-fold differently expressed:
https://sqlshare.escience.washington.edu/sqlshare#s=query/emmats%40washington.edu/NSAF%20with%20averages%2C%205-fold
Joined with SPID annotations:
https://sqlshare.escience.washington.edu/sqlshare#s=query/emmats%40washington.edu/NSAF%20averages%2C%205-fold%2C%20SPID
Joined with file indicating which proteins contribute to enrichment in >5-fold differently expressed protein sets: https://sqlshare.escience.washington.edu/sqlshare#s=query/emmats%2540washington.edu/NSAF%20with%20averages%252C%205-fold%252C%20SPIDs%252C%20enrichment&q=

Manually added column (in excel) or proteins that have q-value < 0.1.

Made xy plots for comparing expression (similar to 8/23/13). NSAF have been transformed to log(NSAF*10000). Points are color-coded according to fold change.
external image OA%20xy%20col%20by%20fold.jpg

external image OA%20xy%20col%20by%20fold.jpg

August 23, 2013
Secondary stress: proteomics
Made xy plots of the average expression across oysters for treatment comparisons: 2800 vs 400 µatm, 400 vs 400 + mech stress, 2800 vs 2800 + mech stress. The blue line on the plots is the 1:1 line.
external image Ocean%20Acidification%20xy%20all%20proteins.jpg

external image Ocean%20Acidification%20xy%20all%20proteins.jpg

external image 400%20Mech%20stress%20xy%20all%20proteins.jpg

external image 2800%20Mech%20stress%20xy%20all%20proteins.jpg

Made heat maps for each treatment comparison with all proteins (except for each set of treatments removed proteins that had 0 expression across all 8 oysters, on the order of ~160 proteins per dataset). The data were log transformed before making the heat maps. Oysters did not cluster within treatment groups and the heat maps were mostly entirely blue.

August 20, 2013
Secondary stress: proteomics
Used proteomics data of proteins that have at least 8 spec counts across all injections, but did not use the additional cutoff of a minimum of 2 unique peptides per protein. Did NMDS and calculated q-values. NMDS showed no difference between treatments -
https://www.evernote.com/shard/s242/sh/40edb789-22fc-47ca-b86b-49822b68a789/b8b11a712c5453f53bccd8d47b0f700d. Only one q-value was significant - response to mechanical stress at 400 µatm. This is technically an unannotated protein since the e-value = 5e-6, but it might be natterin-4 with SPID Q66S13.

August 19, 2013
Secondary stress: Proteomics
Made a file of proteins that are at least 5-fold different (including proteins that are expressed in only one treatment and not the other). This is 339 proteins across all 3 treatment comparisons. Not all of these proteins were annotated with SPIDs which could bias downstream analyses.
SELECT * FROM [emmats@washington.edu].[5-fold diff expressed proteins.txt]
LEFT JOIN [emmats@washington.edu].[NSAF based on avg SpC with SPIDs]
ON [emmats@washington.edu].[5-fold diff expressed proteins.txt].Protein=[emmats@washington.edu].[NSAF based on avg SpC with SPIDs].[All Proteins]

Removed proteins from dataset that are expressed in only one oyster. This does not represent true differential expression, but rather may reflect an anomalous oyster. This results in 69 proteins expressed higher in response to mechanical stress at 2800 µatm and 53 proteins expressed lower. 64 proteins are elevated in response to ocean acidification and 56 are decreased at least 5-fold. 47 proteins are higher in response to mechanical stress at 400 µatm and 54 are lower.
external image 5-fold%20venn.jpg

Proteins expression values were log-transformed before making heat maps. Euclidean distance was used to cluster rows (proteins) and columns (oysters). For OA response, low and high pCO2 oysters clustered within treatment groups. Below is the heat map for response to ocean acidification. Oysters labeled NSAF.CG2-11 are high pCO2 are NSAF.CG221-230 are low pCO2.
external image OA%205fold%20labeled.jpg

Making a Venn diagram to visualize overlap of proteins involved in >5-fold response to different treatments. Put protein names corresponding to each treatment comparison in new columns.
SELECT [Protein], [Comparison],
(CASE WHEN [Comparison]='OA' THEN [Protein] end) AS [OA5x],
(CASE WHEN [Comparison]='400MechS' THEN [Protein] end) AS [400Mech5x],
(CASE WHEN [Comparison]='2800MechS' THEN [Protein] end) AS [2800Mech5x]
FROM [emmats@washington.edu].[5-fold diff expressed proteins.txt]
GROUP BY [Protein], [Comparison]

Redid enrichment analysis in DAVID v. 6.7. Used 5-fold proteins (n=339 across all treatments) as gene lists and entire gill proteome (n=1616) as background. Below is the enrichment plot of >5-fold OA response proteins.

external image OA%20response%205%20fold%20proteins%20enrichment.jpg

external image OA%20response%205%20fold%20proteins%20enrichment.jpg

August 16, 2013
Secondary stress: Proteomics
Power point containing all xy plots by GO slim term can be found here: http://eagle.fish.washington.edu/oyster/proteomics/go%20slim%20xy%20plots.pptx

Created fold change file in SQLshare.
Replaced all NSAF values = 0 with 1E-20: https://sqlshare.escience.washington.edu/sqlshare#s=query/emmats%40washington.edu/NSAF%20avg%20SpC%20no%200
Created new columns with average NSAF for each treatment group:
https://sqlshare.escience.washington.edu/sqlshare#s=query/emmats%40washington.edu/NSAF%20avg%20SpC%20with%20avg%20NSAF
Created new columns with fold change for each treatment comparison:
https://sqlshare.escience.washington.edu/sqlshare#s=query/emmats%40washington.edu/NSAF%20avg%20SpC%20with%20fold%20change
NB: this file is problematic because the 1e-20 will have skewed the average expression values which are used for the fold change. To use need to go back and add 1e-20 after expression is averaged.

Used this file to create a list of proteins that are at least 2-fold differentially expressed between treatment groups. Joined this list with SPIDs and gene descriptions. Used DAVID to find enrichment of functional groups in the >2-fold differentially expressed proteins. There were no groups enriched for mechanical stress response at 400 µatm. Enriched groups for the ocean acidification response were homophilic cell adhesion, transcription, morphogenesis of embryonic epithelium, and regulation of transcription. At 2800 µatm response to mech stress, the enriched groups were polysaccharide metabolic process, transcription, vitamin metabolic process, regulation of catabolic process, regulation of protein catabolic process, neuromuscular process, polysaccharide biosynthetic process.
These results are not too different from the 5-fold differentially expressed proteins (see 8/14/13), but fewer categories are enriched.

Identified enzymes involved in glycogenolysis (and glycogenesis) in sequenced proteins. 11 enzymes in this pathway were identified. See diagram below. Summary: glycogen synthase and glycogenin (involved in glycogen synthesis) are downregulated whereas hexokinase and glycogen synthase kinase are upregulated. however, not all proteins are obviously up- or down-regulated in each pathway, but there may be a trend towards increased glycogen catabolism. The major player is catabolizing glycogen, the debranching enzyme, is down-regulated, but not hugely (only ~2.5x lower at high pCO2). The inhibitor of glycogen synthase (glyc. syn. kinase) is highly upregulated.
external image glycogen%20pathway.jpg

August 15, 2013
Secondary Stress: Proteomics
Redid query from yesterday so that also joined number of unique peptides. From this table, remade the table for the manuscript with numbers of proteins identified for each replicate.
SELECT * FROM [emmats@washington.edu].[proteins that pass cutoffs.txt]
LEFT JOIN [emmats@washington.edu].[NSAF tech reps]
ON [emmats@washington.edu].[proteins that pass cutoffs.txt].[All Proteins]=[emmats@washington.edu].[NSAF tech reps].[All Proteins]
LEFT JOIN [emmats@washington.edu].[Average spec counts with cutoffs]
ON [emmats@washington.edu].[proteins that pass cutoffs.txt].[All Proteins]=[emmats@washington.edu].[Average spec counts with cutoffs].[All Proteins]

Annotated file of NSAF expression values with SPIDs with GO and GO Slim terms.
SELECT distinct * FROM [emmats@washington.edu].[NSAF based on avg SpC with SPIDs]
LEFT JOIN [dhalperi@washington.edu].[SPID_GOnumber.txt]
ON [emmats@washington.edu].[NSAF based on avg SpC with SPIDs].SPID=[dhalperi@washington.edu].[SPID_GOnumber.txt].A0A000

SELECT distinct * FROM [emmats@washington.edu].[NSAF avg SpC with GO]
LEFT JOIN [sr320@washington.edu].[GO_to_GOslim]
ON [emmats@washington.edu].[NSAF avg SpC with GO].[GO:0003824]=[sr320@washington.edu].[GO_to_GOslim].GO_id

Keep only unique entries that are biological processes
SELECT DISTINCT * FROM [emmats@washington.edu].[NSAF avg SpC with GO slim]
WHERE [aspect]='P'

Created file of unique GO Slim associations with proteins

SELECT DISTINCT [All Proteins], [NSAF CG2], [NSAF CG5], [NSAF CG8], [NSAF CG11], [NSAF CG26], [NSAF CG29], [NSAF CG32], [NSAF CG35], [NSAF CG221], [NSAF CG224], [NSAF CG227], [NSAF CG230], [NSAF CG242], [NSAF CG245], [NSAF CG248], [NSAF CG251], [SPID], [evalue], [Gene Name], [GOSlim_bin]

FROM [emmats@washington.edu].[NSAF avg SpC biological processes]

August 14, 2013
Secondary stress: Proteomics
Annotated entire gill proteome (just proteins that made cutoffs of at least 2 unique peptides and at least 8 spec counts) with SPIDs.

SELECT * FROM [emmats@washington.edu].[NSAF avg SpC.csv]

LEFT JOIN [emmats@washington.edu].[table_TJGR_Gene_SPID_evalue_Description.txt]

ON [emmats@washington.edu].[NSAF avg SpC.csv].[All Proteins]=[emmats@washington.edu].[table_TJGR_Gene_SPID_evalue_Description.txt].[CGI Protein]

Created list of unique SPIDs corresponding to gill proteome and used this as background in DAVID. Created plots in Revigo of enriched categories of differentially expressed proteins for treatment comparisons (differentially = >5-fold up or down).
Enrichment of 5-fold different OA response proteins:
https://www.evernote.com/shard/s242/sh/151a6b3c-65cb-4308-8c19-77a50dc099a4/61f71f805da2d42e7b5e4406226121e0
Enrichment of 5-fold different mechanical stress response proteins at 400 µatm:
https://www.evernote.com/shard/s242/sh/3938fc75-b118-4f71-a204-c50c03dfaf71/0513e7d89c29505c3670d2f67cebda13
Enrichment of 5-fold different mechanical stress response proteins at 2800 µatm:
https://www.evernote.com/shard/s242/sh/3fc062ba-efc2-4d24-92df-4db038f31429/d74514ba124e1b3c7a928702829fa93d

Joined together list of 1616 proteins for final analysis with technical replicate data to create table of number of proteins identified in each replicate...This is incorrect because some of the proteins in the technical replicate data have <2 unique peptides. Need to fix this discrepancy and redo.

August 13, 2013
Secondary stress: Proteomics
Annotated proteins with >5-fold change between treatment groups with SPIDs in SQLshare.
SELECT * FROM [emmats@washington.edu].[greater than 5 fold change.txt]
LEFT JOIN [emmats@washington.edu].[table_TJGR_Gene_SPID_evalue_Description.txt]
ON [emmats@washington.edu].[greater than 5 fold change.txt].Protein=[emmats@washington.edu].[table_TJGR_Gene_SPID_evalue_Description.txt].[CGI Protein]

Removed annotations with evalue >1E-10. Some SPIDs didn't have protein descriptions so manually filled those in using Uniprot website.

August 12, 2013
Secondary stress: Proteomics
using qprot to find differentially expressed proteins
Downloaded and installed GNU (http://www.gnu.org/software/gsl/).
./configure
make all
sudo make install
Then navigated to qprot 1.2.2 folder and did sudo make all. In qprot bin folder ran: ./qprot-param qspecOA 2000 10000 1.
Got Segmentation error after running above code. NB: command line does not seem to want <>, extension on input file, or indication of number of threads to run.

Continuation of analysis of NSAF data for spec counts averaged across technical replicates.
In Excel, got p-values for differential expression between 400 and 2800 µatm and ran through q-value in R. One protein had a q-value ~0.008 and another had a q-value ~0.08, all others >0.3.
To get fold change, found average expression within each treatment group. If average was 0, changed to 1E-20 (minimum average for each treatment group - 400 and 2800 µatm - was E-6). Divided average expression between treatment groups to get fold change. 193 proteins were expressed >5-fold more at 2800 µatm and 175 were expression >5-fold less at 2800 µatm. For mechanical stress at 400 µatm, 190 were up-regulated at least 5-fold and 192 were down-regulated at least 5-fold. 2 proteins had q-value = 0.05 in response to mech stress, the rest were >0.46. 205 proteins were >5-fold upregulated in response to mech stress at 2800 µatm and 168 were >5-fold down-regulated. 9 proteins had a q-value <0.07 (5 were <0.05) in response to mechanical stress at 2800 µatm.

August 8, 2013
Secondary stress: Proteomics
To create an input file for qspec, uploaded all proteins (n=1616) that pass spectral count cutoffs to SQLshare. Joined with protein lengths and average SpC per protein. Still couldn't get qspec online to accept my input file and could not get the command line to work.

In R, plotted expression at high pCO2 against low pCO2 with 95% CIs and line of best fit. This is to identify potentially differentially expressed proteins. I'm not sure why so many proteins fall outside the 95% CI polygon, more to come...
external image correlation%20with%20CIs%20OA.jpg

external image correlation%20with%20CIs%20OA.jpg

August 6, 2013
Secondary stress: Proteomics
Continued analysis of NSAF data with technical replicates separated so n=12 per treatment instead of n=4. There was a significant proteomic shift between treatment groups using this approach (p<0.05 ANOSIM).
external image NMDS%20with%20tech%20reps%20separate.jpg

external image NMDS%20with%20tech%20reps%20separate.jpg

Redid workflow to produce NSAF based on average of spectral counts across technical replicates instead of summing them. In Excel, deleted any NSAF values (per oyster) for proteins that had <2 unique peptide matches (1616 proteins). With this new calculation of NSAF (tech reps averaged for each oyster) there is no significant difference in proteome expression between treatment comparisons (ANOSIM p>0.05).

external image NMDS%20based%20on%20avg%20spc.jpg

external image NMDS%20based%20on%20avg%20spc.jpg

Plotted expression for each protein, treatment against control (see figure in evernote: https://www.evernote.com/shard/s242/sh/f8c250d4-63e4-4abb-aa75-84224820115c/6856dd51777057ebc1705a973b197abe). Points circled in pink are potential outliers from the 1:1 line, i.e. they could be over- or under-expressed in response to treatment.

August 5, 2013
Secondary stress: Proteomics
New cut-offs for proteins loaded on NMDS axis 2 as significant for differences between treatment groups: p-value < 0.05, MDS2 loading >|0.2|. 6 proteins pass these criteria for response to ocean acidification, 15 for response to mechanical stress at low pCO2, and 16 for response to mech stress at high pCO2. Joined file of significant NMDS proteins with SwissProt descriptions and NSAF values for the proteins.
SELECT * FROM [emmats@washington.edu].[highly sig proteins from NMDS.txt]
LEFT JOIN [emmats@washington.edu].[table_TJGR_Gene_SPID_evalue_Description.txt]
ON [emmats@washington.edu].[highly sig proteins from NMDS.txt].Protein=[emmats@washington.edu].[table_TJGR_Gene_SPID_evalue_Description.txt].[CGI Protein]

SELECT * FROM [emmats@washington.edu].[highly sig proteins NMDS with SPID]

LEFT JOIN [emmats@washington.edu].[NSAF all oysters]

ON [emmats@washington.edu].[highly sig proteins NMDS with SPID].[Protein]=[emmats@washington.edu].[NSAF all oysters].[All Proteins]

Proteins contributing to trend in OA response = tubulin alpha, serine/threonine protein phosphatase, aldose reductase-related protein 2, peroxiredoxin-5, Rab GDP dissociation inhibitor beta, unknown (annotated as SOD but evalue is a little > 1E-10). All proteins except for possible SOD (2-fold upregulated) have almost the exact same expression between treatment groups.
Proteins contributing to trend in mech stress response at 400 µatm = tubulin alpha, triosephosphate isomerase, universal stress protein, 40S ribosomal protein, outer dense fiber protein, golgi-associated plant pathogenesis-related, isocitrate dehydrogenase, cytochrom b6, fatty acid-binding protein, ADP-ribosylation factor 2, coactosin-like protein, glutathione S-transferase, guanine nucelotide-binding protein, 2 unannotated (1 is possible SOD). Except for coactosin-like and SOD, all proteins were either slightly down-regulated or showed no real change in expression.
Did ANOSIM on proteins significantly loaded. No significant differences between treatment (p>>0.05).

All treatment comparisons show fold changes over 5-fold, however none of these differences are significant when corrected for multiple comparisons.
Calculated q-values for the different treatment comparisons. All q-values >0.8 except at 2800 mechanical stress response where all are >0.2.

NMDS without combining technical replicates. Joined together file of spec counts for each technical replicate with file that has proteins with at least 2 unique peptide IDs and at least 8 spec counts across replicates. For now I am going to ignore that 2 unique peptide rule and just keep proteins that have at least 8 spec counts.
SELECT * FROM [emmats@washington.edu].[Unique peptides all biological reps]
LEFT JOIN [emmats@washington.edu].[All SpC for 16 oysters with 0]
ON [emmats@washington.edu].[Unique peptides all biological reps].[All Proteins]=[emmats@washington.edu].[All SpC for 16 oysters with 0].[All Proteins]

File saved as "spec counts tech reps.csv" and re-uploaded into SQL. Joined file with protein lengths.

SELECT * FROM [emmats@washington.edu].[spec counts tech reps.csv]

LEFT JOIN [emmats@washington.edu].[table_protein length.txt]

ON [emmats@washington.edu].[spec counts tech reps.csv].[All Proteins]=[emmats@washington.edu].[table_protein length.txt].protein

Calculate spectral counts/protein length.

SELECT [All Proteins],

CAST([CG2_01] AS FLOAT)/[protein length] AS [CG2_01 SpC/L],

CAST([CG2_02] AS FLOAT)/[protein length] AS [CG2_02 SpC/L],

CAST([CG2_03] AS FLOAT)/[protein length] AS [CG2_03 SpC/L],

CAST([CG5_01] AS FLOAT)/[protein length] AS [CG5_01 SpC/L],

CAST([CG5_02] AS FLOAT)/[protein length] AS [CG5_02 SpC/L],

CAST([CG5_03] AS FLOAT)/[protein length] AS [CG5_03 SpC/L],

CAST([CG8_01] AS FLOAT)/[protein length] AS [CG8_01 SpC/L],

CAST([CG8_02] AS FLOAT)/[protein length] AS [CG8_02 SpC/L],

CAST([CG8_03] AS FLOAT)/[protein length] AS [CG8_03 SpC/L],

CAST([CG11_01] AS FLOAT)/[protein length] AS [CG11_01 SpC/L],

CAST([CG11_02] AS FLOAT)/[protein length] AS [CG11_02 SpC/L],

CAST([CG11_03] AS FLOAT)/[protein length] AS [CG11_03 SpC/L],

CAST([CG26_01] AS FLOAT)/[protein length] AS [CG26_01 SpC/L],

CAST([CG26_02] AS FLOAT)/[protein length] AS [CG26_02 SpC/L],

CAST([CG26_03] AS FLOAT)/[protein length] AS [CG26_03 SpC/L],

CAST([CG29_01] AS FLOAT)/[protein length] AS [CG29_01 SpC/L],

CAST([CG29_02] AS FLOAT)/[protein length] AS [CG29_02 SpC/L],

CAST([CG29_03] AS FLOAT)/[protein length] AS [CG29_03 SpC/L],

CAST([CG32_01] AS FLOAT)/[protein length] AS [CG32_01 SpC/L],

CAST([CG32_02] AS FLOAT)/[protein length] AS [CG32_02 SpC/L],

CAST([CG32_03] AS FLOAT)/[protein length] AS [CG32_03 SpC/L],

CAST([CG35_01] AS FLOAT)/[protein length] AS [CG35_01 SpC/L],

CAST([CG35_02] AS FLOAT)/[protein length] AS [CG35_02 SpC/L],

CAST([CG35_03] AS FLOAT)/[protein length] AS [CG35_03 SpC/L],

CAST([CG221_01] AS FLOAT)/[protein length] AS [CG221_01 SpC/L],

CAST([CG221_02] AS FLOAT)/[protein length] AS [CG221_02 SpC/L],

CAST([CG221_03] AS FLOAT)/[protein length] AS [CG221_03 SpC/L],

CAST([CG224_01] AS FLOAT)/[protein length] AS [CG224_01 SpC/L],

CAST([CG224_02] AS FLOAT)/[protein length] AS [CG224_02 SpC/L],

CAST([CG224_03] AS FLOAT)/[protein length] AS [CG224_03 SpC/L],

CAST([CG227_01] AS FLOAT)/[protein length] AS [CG227_01 SpC/L],

CAST([CG227_02] AS FLOAT)/[protein length] AS [CG227_02 SpC/L],

CAST([CG227_03] AS FLOAT)/[protein length] AS [CG227_03 SpC/L],

CAST([CG230_01] AS FLOAT)/[protein length] AS [CG230_01 SpC/L],

CAST([CG230_02] AS FLOAT)/[protein length] AS [CG230_02 SpC/L],

CAST([CG230_03] AS FLOAT)/[protein length] AS [CG230_03 SpC/L],

CAST([CG242_01] AS FLOAT)/[protein length] AS [CG242_01 SpC/L],

CAST([CG242_02] AS FLOAT)/[protein length] AS [CG242_02 SpC/L],

CAST([CG242_03] AS FLOAT)/[protein length] AS [CG242_03 SpC/L],

CAST([CG245_01] AS FLOAT)/[protein length] AS [CG245_01 SpC/L],

CAST([CG245_02] AS FLOAT)/[protein length] AS [CG245_02 SpC/L],

CAST([CG245_03] AS FLOAT)/[protein length] AS [CG245_03 SpC/L],

CAST([CG248_01] AS FLOAT)/[protein length] AS [CG248_01 SpC/L],

CAST([CG248_02] AS FLOAT)/[protein length] AS [CG248_02 SpC/L],

CAST([CG248_03] AS FLOAT)/[protein length] AS [CG248_03 SpC/L],

CAST([CG251_01] AS FLOAT)/[protein length] AS [CG251_01 SpC/L],

CAST([CG251_02] AS FLOAT)/[protein length] AS [CG251_02 SpC/L],

CAST([CG251_03] AS FLOAT)/[protein length] AS [CG251_03 SpC/L]

FROM [emmats@washington.edu].[tech reps with protein length]

Calculate the sum of all SpC/L for each technical replicate.

SELECT

SUM([CG2_01 SpC/L]) AS [CG2_01sum],

SUM([CG2_02 SpC/L]) AS [CG2_02sum],

SUM([CG2_03 SpC/L]) AS [CG2_03sum],

SUM([CG5_01 SpC/L]) AS [CG5_01sum],

SUM([CG5_02 SpC/L]) AS [CG5_02sum],

SUM([CG5_03 SpC/L]) AS [CG5_03sum],

SUM([CG8_01 SpC/L]) AS [CG8_01sum],

SUM([CG8_02 SpC/L]) AS [CG8_02sum],

SUM([CG8_03 SpC/L]) AS [CG8_03sum],

SUM([CG11_01 SpC/L]) AS [CG11_01sum],

SUM([CG11_02 SpC/L]) AS [CG11_02sum],

SUM([CG11_03 SpC/L]) AS [CG11_03sum],

SUM([CG26_01 SpC/L]) AS [CG26_01sum],

SUM([CG26_02 SpC/L]) AS [CG26_02sum],

SUM([CG26_03 SpC/L]) AS [CG26_03sum],

SUM([CG29_01 SpC/L]) AS [CG29_01sum],

SUM([CG29_02 SpC/L]) AS [CG29_02sum],

SUM([CG29_03 SpC/L]) AS [CG29_03sum],

SUM([CG32_01 SpC/L]) AS [CG32_01sum],

SUM([CG32_02 SpC/L]) AS [CG32_02sum],

SUM([CG32_03 SpC/L]) AS [CG32_03sum],

SUM([CG35_01 SpC/L]) AS [CG35_01sum],

SUM([CG35_02 SpC/L]) AS [CG35_02sum],

SUM([CG35_03 SpC/L]) AS [CG35_03sum],

SUM([CG221_01 SpC/L]) AS [CG221_01sum],

SUM([CG221_02 SpC/L]) AS [CG221_02sum],

SUM([CG221_03 SpC/L]) AS [CG221_03sum],

SUM([CG224_01 SpC/L]) AS [CG224_01sum],

SUM([CG224_02 SpC/L]) AS [CG224_02sum],

SUM([CG224_03 SpC/L]) AS [CG224_03sum],

SUM([CG227_01 SpC/L]) AS [CG227_01sum],

SUM([CG227_02 SpC/L]) AS [CG227_02sum],

SUM([CG227_03 SpC/L]) AS [CG227_03sum],

SUM([CG230_01 SpC/L]) AS [CG230_01sum],

SUM([CG230_02 SpC/L]) AS [CG230_02sum],

SUM([CG230_03 SpC/L]) AS [CG230_03sum],

SUM([CG242_01 SpC/L]) AS [CG242_01sum],

SUM([CG242_02 SpC/L]) AS [CG242_02sum],

SUM([CG242_03 SpC/L]) AS [CG242_03sum],

SUM([CG245_01 SpC/L]) AS [CG245_01sum],

SUM([CG245_02 SpC/L]) AS [CG245_02sum],

SUM([CG245_03 SpC/L]) AS [CG245_03sum],

SUM([CG248_01 SpC/L]) AS [CG248_01sum],

SUM([CG248_02 SpC/L]) AS [CG248_02sum],

SUM([CG248_03 SpC/L]) AS [CG248_03sum],

SUM([CG251_01 SpC/L]) AS [CG251_01sum],

SUM([CG251_02 SpC/L]) AS [CG251_02sum],

SUM([CG251_03 SpC/L]) AS [CG251_03sum]

FROM [emmats@washington.edu].[tech reps SpC-L]

Calculated NSAF

SELECT [All Proteins],

SPC.[CG2_01 SpC/L]/allspc.[CG2_01sum] AS [CG2_01 NSAF],

SPC.[CG2_02 SpC/L]/allspc.[CG2_02sum] AS [CG2_02 NSAF],

SPC.[CG2_03 SpC/L]/allspc.[CG2_03sum] AS [CG2_03 NSAF],

SPC.[CG5_01 SpC/L]/allspc.[CG5_01sum] AS [CG5_01 NSAF],

SPC.[CG5_02 SpC/L]/allspc.[CG5_02sum] AS [CG5_02 NSAF],

SPC.[CG5_03 SpC/L]/allspc.[CG5_03sum] AS [CG5_03 NSAF],

SPC.[CG8_01 SpC/L]/allspc.[CG8_01sum] AS [CG8_01 NSAF],

SPC.[CG8_02 SpC/L]/allspc.[CG8_02sum] AS [CG8_02 NSAF],

SPC.[CG8_03 SpC/L]/allspc.[CG8_03sum] AS [CG8_03 NSAF],

SPC.[CG11_01 SpC/L]/allspc.[CG11_01sum] AS [CG11_01 NSAF],

SPC.[CG11_02 SpC/L]/allspc.[CG11_02sum] AS [CG11_02 NSAF],

SPC.[CG11_03 SpC/L]/allspc.[CG11_03sum] AS [CG11_03 NSAF],

SPC.[CG26_01 SpC/L]/allspc.[CG26_01sum] AS [CG26_01 NSAF],

SPC.[CG26_02 SpC/L]/allspc.[CG26_02sum] AS [CG26_02 NSAF],

SPC.[CG26_03 SpC/L]/allspc.[CG26_03sum] AS [CG26_03 NSAF],

SPC.[CG29_01 SpC/L]/allspc.[CG29_01sum] AS [CG29_01 NSAF],

SPC.[CG29_02 SpC/L]/allspc.[CG29_02sum] AS [CG29_02 NSAF],

SPC.[CG29_03 SpC/L]/allspc.[CG29_03sum] AS [CG29_03 NSAF],

SPC.[CG32_01 SpC/L]/allspc.[CG32_01sum] AS [CG32_01 NSAF],

SPC.[CG32_02 SpC/L]/allspc.[CG32_02sum] AS [CG32_02 NSAF],

SPC.[CG32_03 SpC/L]/allspc.[CG32_03sum] AS [CG32_03 NSAF],

SPC.[CG35_01 SpC/L]/allspc.[CG35_01sum] AS [CG35_01 NSAF],

SPC.[CG35_02 SpC/L]/allspc.[CG35_02sum] AS [CG35_02 NSAF],

SPC.[CG35_03 SpC/L]/allspc.[CG35_03sum] AS [CG35_03 NSAF],

SPC.[CG221_01 SpC/L]/allspc.[CG221_01sum] AS [CG221_01 NSAF],

SPC.[CG221_02 SpC/L]/allspc.[CG221_02sum] AS [CG221_02 NSAF],

SPC.[CG221_03 SpC/L]/allspc.[CG221_03sum] AS [CG221_03 NSAF],

SPC.[CG224_01 SpC/L]/allspc.[CG224_01sum] AS [CG224_01 NSAF],

SPC.[CG224_02 SpC/L]/allspc.[CG224_02sum] AS [CG224_02 NSAF],

SPC.[CG224_03 SpC/L]/allspc.[CG224_03sum] AS [CG224_03 NSAF],

SPC.[CG227_01 SpC/L]/allspc.[CG227_01sum] AS [CG227_01 NSAF],

SPC.[CG227_02 SpC/L]/allspc.[CG227_02sum] AS [CG227_02 NSAF],

SPC.[CG227_03 SpC/L]/allspc.[CG227_03sum] AS [CG227_03 NSAF],

SPC.[CG230_01 SpC/L]/allspc.[CG230_01sum] AS [CG230_01 NSAF],

SPC.[CG230_02 SpC/L]/allspc.[CG230_02sum] AS [CG230_02 NSAF],

SPC.[CG230_03 SpC/L]/allspc.[CG230_03sum] AS [CG230_03 NSAF],

SPC.[CG242_01 SpC/L]/allspc.[CG242_01sum] AS [CG242_01 NSAF],

SPC.[CG242_02 SpC/L]/allspc.[CG242_02sum] AS [CG242_02 NSAF],

SPC.[CG242_03 SpC/L]/allspc.[CG242_03sum] AS [CG242_03 NSAF],

SPC.[CG245_01 SpC/L]/allspc.[CG245_01sum] AS [CG245_01 NSAF],

SPC.[CG245_02 SpC/L]/allspc.[CG245_02sum] AS [CG245_02 NSAF],

SPC.[CG245_03 SpC/L]/allspc.[CG245_03sum] AS [CG245_03 NSAF],

SPC.[CG248_01 SpC/L]/allspc.[CG248_01sum] AS [CG248_01 NSAF],

SPC.[CG248_02 SpC/L]/allspc.[CG248_02sum] AS [CG248_02 NSAF],

SPC.[CG248_03 SpC/L]/allspc.[CG248_03sum] AS [CG248_03 NSAF],

SPC.[CG251_01 SpC/L]/allspc.[CG251_01sum] AS [CG251_01 NSAF],

SPC.[CG251_02 SpC/L]/allspc.[CG251_02sum] AS [CG251_02 NSAF],

SPC.[CG251_03 SpC/L]/allspc.[CG251_03sum] AS [CG251_03 NSAF]

FROM [emmats@washington.edu].[tech reps SpC-L] spc,

[emmats@washington.edu].[SpC-L sum tech reps] allspc

external image tech%20rep%20NMDS%20by%20oyster.jpg

external image tech%20rep%20NMDS%20by%20oyster.jpg

solid blue squares = 400 µtam
open blue squares = 400 µatm + Mech stress
solid orange triangles = 2800 µatm
open orange triangles = 2800 µatm + mech stress
Most of the oysters's technical replicates cluster pretty well, except for a few notable exceptions showing poor technical replication: 230, 245, 227, 221, 8.

July 31, 2013
Secondary stress: Proteomics
Skyline is not suitable for my proteomics data so I am switching back to NSAF. I did a complete workflow for the NSAF data in SQL. Briefly, all tech rep files for each oyster were joined and then filtered to keep only proteins that had at least 2 unique peptides. Spec counts were summed across all biological replicates and only proteins with at least 8 spec counts across all reps were kept in the dataset. NSAF was calculated. See google doc NSAF workflow for more detailed explanations and links to files in SQL.
https://docs.google.com/document/d/1ivmzGPdJA40WpEsi-7OLpXqdKlcKtxG0xfNdWq35C1M/edit
Did NMDS using bray-curtis dissimilarity on log(x+1) transformed data for the 1956 proteins. Below are the results. All treatment comparisons completely overlap (no proteome expression difference between treatment groups). There is still a bit of a trend towards separation between treatments for OA comparison and mech stress at low pCO2, but complete overlap at high pCO2 (however p>0.05 for all ANOSIM at 1956 protein level). I need to do next step and see if any highly loaded proteins are significant for ANOSIM.
Plots are in order: 400 vs 2800 µatm, 400 µatm mech stress, 2800 µatm mech stress.
external image nmds%20oa%20080113.jpg

external image nmds%20low%20MS%20080113.jpg

external image nmds%20high%20ms%20080113.jpg

July 29, 2013
Secondary stress: proteomics
With the same cut-offs for significance for loadings as before (p-value < 0.01, MDS2 loading >0.9), 35 proteins contribute to the OA response, 37 proteins contribute to the mechanical stress response at 400 µatm, and 28 contribute to mech stress response at 2800 µatm (although the groups almost completely overlap on the NMDS). ANOSIM at level of all 2677 proteins is still not significant for all 3 treatment comparisons.
Joined file of sig NMDS proteins with SPID descriptions and protein expression values.
SELECT * FROM [emmats@washington.edu].[sig proteins from NMDS.csv]
LEFT JOIN [table_TJGR_Gene_SPID_evalue_Description.txt]
ON [emmats@washington.edu].[sig proteins from NMDS.csv].protein=[table_TJGR_Gene_SPID_evalue_Description.txt].[CGI protein]

SELECT * FROM [emmats@washington.edu].[sig NMDS proteins with SPID]

LEFT JOIN [3 peps per protein area avgd]

ON [sig NMDS proteins with SPID].protein=[3 peps per protein area avgd].protein

For ANOSIM based on just the proteins with sig loadings on the second MDS axis, the proteomic expression between high and high MS is still non sig, non significant for MS response at low pCO2 and significant response to OA.
Made a heat map of the 35 proteins in the OA response (annotated and unannotated).

July 26, 2013
Secondary stress: proteomics
Original input file had some peptides of charge state >2, so had to redo everything with fixed input file.
SR discovered that for some proteins, a peptide was sequenced multiple times and so had multiple expression values. From the unique protein associations file in SQLshare, I summed the expression values for all identical peptides.
SELECT [peptide sequence], SUM([2_01 TotalArea]) AS CG2_01, SUM([2_02 TotalArea]) AS CG2_02, SUM([2_03 TotalArea]) AS CG2_03, SUM([5_01 TotalArea]) AS CG5_01, SUM([5_02 TotalArea]) AS CG5_02, SUM([5_03 TotalArea]) AS CG5_03, SUM([8_01 TotalArea]) AS CG8_01, SUM([8_02 TotalArea]) AS CG8_02, SUM([8_03 TotalArea]) AS CG8_03, SUM([11_01 TotalArea]) AS CG11_01, SUM([11_02 TotalArea]) AS CG11_02, SUM([11_03 TotalArea]) AS CG11_03, SUM([26_01 TotalArea]) AS CG26_01, SUM([26_02 TotalArea]) AS CG26_02, SUM([26_03 TotalArea]) AS CG26_03, SUM([29_01 TotalArea]) AS CG29_01, SUM([29_02 TotalArea]) AS CG29_02, SUM([29_03 TotalArea]) AS CG29_03, SUM([32_01 TotalArea]) AS CG32_01, SUM([32_02 TotalArea]) AS CG32_02, SUM([32_03 TotalArea]) AS CG32_03, SUM([35_01 TotalArea]) AS CG35_01, SUM([35_02 TotalArea]) AS CG35_02, SUM([35_03 TotalArea]) AS CG35_03, SUM([221_01 TotalArea]) AS CG221_01, SUM([221_02 TotalArea]) AS CG221_02, SUM([221_03 TotalArea]) AS CG221_03, SUM([224_01 TotalArea]) AS CG224_01, SUM([224_02 TotalArea]) AS CG224_02, SUM([224_03 TotalArea]) AS CG224_03, SUM([227_01 TotalArea]) AS CG227_01, SUM([227_02 TotalArea]) AS CG227_02, SUM([227_03 TotalArea]) AS CG227_03, SUM([230_01 TotalArea]) AS CG230_01, SUM([230_02 TotalArea]) AS CG230_02, SUM([230_03 TotalArea]) AS CG230_03,
SUM([242_01 TotalArea]) AS CG242_01, SUM([242_02 TotalArea]) AS CG242_02, SUM([242_03 TotalArea]) AS CG242_03, SUM([245_01 TotalArea]) AS CG245_01, SUM([245_02 TotalArea]) AS CG245_02, SUM([245_03 TotalArea]) AS CG245_03, SUM([248_01 TotalArea]) AS CG248_01, SUM([248_02 TotalArea]) AS CG248_02, SUM([248_03 TotalArea]) AS CG248_03, SUM([251_01 TotalArea]) AS CG251_01, SUM([251_02 TotalArea]) AS CG251_02, SUM([251_03 TotalArea]) AS CG251_03

FROM [emmats@washington.edu].[unique protein associations]
GROUP BY [peptide sequence]

The file now needs to be rejoined with the unique peptide-protein association file.

SELECT * FROM [emmats@washington.edu].[Total peptide areas]

LEFT JOIN (SELECT protein, [peptide sequence] FROM [emmats@washington.edu].[unique protein associations])X

ON [emmats@washington.edu].[Total peptide areas].[peptide sequence]= X.[peptide sequence]

Now I need to redo all the analyses...(Files saved in one folder titled Re-analysis 072613)
To calculate q-values for all the proteins I did t-tests between treatments: 400 vs. 2800, 400 with mechanical stress, and 2800 with mechanical stress. 24 proteins were differentially expressed in response to OA (q-value <0.20), 9 were differentially expressed in response to mech stress at 400 µatm, and 0 were differentially expressed in response to mech stress at 2800 µatm. Joined these differentially expressed proteins with swissprot IDs and gene descriptions.

SELECT * FROM [emmats@washington.edu].[sig qvalues OA and lowMS.txt]

LEFT JOIN [table_TJGR_Gene_SPID_evalue_Description.txt]

ON [emmats@washington.edu].[sig qvalues OA and lowMS.txt].protein=[table_TJGR_Gene_SPID_evalue_Description.txt].[CGI Protein]

For response to mech stress at 400, all 10 proteins are down-regulated (1.5-2.7 fold). For response to OA, 5 proteins are up-regulated (2.4-5.6 fold) and 57 are down-regulated (1.1-5.9 fold).

I redid the NMDS with the new data. The plots look pretty much the same .

July 25, 2013
secondary stress: proteomics
citric acid/krebs cycle investigation
Some of the proteins that had highly significant loadings in the NMDS of 400 vs. 2800 µatm oysters are involved in the citric acid cycle. TCA produces CO2 and electron donors for oxidative phosphorylation in the e- transport chain that synthesize ATP. Citrate synthase and isocitrate dehydrogenase were both down-regulated and significant for the NMDS. Malate dehydrogenase was identified as a protein in the gill proteome but its expression was not affected by pCO2. Downstream of the cycle, NADH dehydrogenase (transfers e- to respiratory chain) was also down-regulated as well as mitochondrial ATP synthase, which were both implicated as important to OA response in the NMDS. Succinyl-CoA ligase subunit B, mitochondrial catalyzes a reaction in the opposite direction in TCA and was also down-regulated, but was no implicated in the significant loadings for the NMDS. Both malate dehydrogenase (CGI_10015004) and succinyl-CoA ligase (CGI_10003696) had eigenvector p-values <0.05 but >0.01 of 0.03 and 0.50, respectively. They also had strong, positive loadings on the second MDS axis of 0.82 for malate dehydrogenase and 0.62 for succinyl CoA ligase. Qvalues for each were 0.60 (MD) and 0.30 (SCL).

July 22, 2013
secondary stress: proteomics
Took file created May 15, 2013 that joined proteins significant from NMDS with KEGG IDs. Going to create 1 input file for iPath2 for significant proteins from OA response and mechanical stress response at low pCO2. Proteins that are differentially expressed >10-fold have a line width of 100, >5-fold have width 75, >2-fold have width 50, and <2-fold have width 25. For pCO2 response, proteins expressed less at high pCO2 are highlighted in yellow and those expressed more at high pCO2 are highlighted in orange. Proteins expressed more during mech stress at 400 µatm are in dark blue and those expressed less are in light blue. Only proteins with e-values from blast >1E-10 are used.
This didn't end up working very well so I'm going to stick with the original iPath2 output for now (treatment responses on separate plots).

July 19, 2013
Secondary stress: proteomics
Redid SQL workflow so that no part of it is done in excel. All files that are part of the workflow have the tag "published" so I know not to edit or delete them. Below is the workflow with the code. File names are in brackets.

Input file #1 [__pep peak areas all oysters.txt__]: expression values per peptide for each technical replicate (n=48 columns of data). This file is derived from the raw output from Skyline (see __Supp Data__ SX)

Input file #2[__ProtPep for all oysters.txt]__: associations between peptides and proteins, derived from ProteinProphet output (see __Supp Data__ SX+1)

Query 1
Join input file #1 to input file #2 to create [__peptide peak areas with protein associations__]

SELECT * FROM [emmats@washington.edu].[ProtPep for all oysters.txt]
LEFT JOIN [pep peak areas all oysters.txt]
ON [ProtPep for all oysters.txt].[peptide sequence]=[pep peak areas all oysters.txt].PeptideSequence

Query 2
2. From file peptide peak areas with protein associations (see step 1), remove peptides that match to multiple proteins [__unique protein associations__]

SELECT * FROM [emmats@washington.edu].[peptide peak areas with protein associations] WHERE [peptide sequence] IN
(SELECT [peptide sequence]
FROM [emmats@washington.edu].[peptide peak areas with protein associations]
GROUP BY [peptide sequence]
HAVING COUNT (*) <2)

Query 3
3. Remove retention time data from the file from step 2 [__Peptide peak areas for unique peptides__]

SELECT protein, [peptide sequence], [2_01 TotalArea], [2_02 TotalArea], [2_03 TotalArea], [5_01 TotalArea], [5_02 TotalArea], [5_03 TotalArea], [8_01 TotalArea],[8_02 TotalArea], [8_03 TotalArea],[11_01 TotalArea], [11_02 TotalArea], [11_03 TotalArea], [26_01 TotalArea], [26_02 TotalArea], [26_03 TotalArea], [29_01 TotalArea], [29_02 TotalArea], [29_03 TotalArea], [32_01 TotalArea], [32_02 TotalArea], [32_03 TotalArea], [35_01 TotalArea], [35_02 TotalArea], [35_03 TotalArea], [221_01 TotalArea], [221_02 TotalArea], [221_03 TotalArea], [224_01 TotalArea], [224_02 TotalArea], [224_03 TotalArea], [227_01 TotalArea], [227_02 TotalArea], [227_03 TotalArea], [230_01 TotalArea], [230_02 TotalArea], [230_02 TotalArea], [242_01 TotalArea], [242_02 TotalArea], [242_03 TotalArea], [245_01 TotalArea], [245_02 TotalArea], [245_03 TotalArea], [248_01 TotalArea], [248_02 TotalArea], [248_03 TotalArea], [251_01 TotalArea], [251_02 TotalArea], [251_03 TotalArea]
FROM [emmats@washington.edu].[unique protein associations]

Query 4
4. Average peptide expression (peak area) values across technical replicates for each oyster [__Average peptide expression__]

SELECT protein, [peptide sequence], ([2_01 TotalArea]+[2_02 TotalArea]+[2_03 TotalArea])/3 AS CG2, ([5_01 TotalArea]+[5_02 TotalArea]+[5_03 TotalArea])/3 AS CG5, ([8_01 TotalArea]+[8_02 TotalArea]+[8_03 TotalArea])/3 AS CG8,([11_01 TotalArea]+[11_02 TotalArea]+[11_03 TotalArea])/3 AS CG11, ([26_01 TotalArea]+[26_02 TotalArea]+[26_03 TotalArea])/3 AS CG26, ([29_01 TotalArea]+[29_02 TotalArea]+[29_03 TotalArea])/3 AS CG29, ([32_01 TotalArea]+[32_02 TotalArea]+[32_03 TotalArea])/3 AS CG32, ([35_01 TotalArea]+[35_02 TotalArea]+[35_03 TotalArea])/3 AS CG35, ([221_01 TotalArea]+[221_02 TotalArea]+[221_03 TotalArea])/3 AS CG221, ([224_01 TotalArea]+[224_02 TotalArea]+[224_03 TotalArea])/3 AS CG224, ([227_01 TotalArea]+[227_02 TotalArea]+[227_03 TotalArea])/3 AS CG227, ([230_01 TotalArea]+[230_02 TotalArea]+[230_02 TotalArea])/3 AS CG230, ([242_01 TotalArea]+[242_02 TotalArea]+[242_03 TotalArea])/3 AS CG242, ([245_01 TotalArea]+[245_02 TotalArea]+[245_03 TotalArea])/3 AS CG245, ([248_01 TotalArea]+[248_02 TotalArea]+[248_03 TotalArea])/3 AS CG248, ([251_01 TotalArea]+[251_02 TotalArea]+[251_03 TotalArea])/3 AS CG251
FROM [emmats@washington.edu].[unique protein associations]

Queries 5a and 5b
5. Keep only the 3 most abundant peptides per protein. Average peptide expression across all biological replicates [__Average expression across all oysters__] and rank by most abundant peptides per protein. This step is meant to distill the true expression profile of the protein. [__3 peps per protein__]

SELECT protein, [peptide sequence], (CG2+CG5+CG8+CG11+CG26+CG29+CG32+CG35+CG221+CG224+CG227+CG230+CG242+CG245+CG248+CG251)/16 AS avgallpeps
FROM [emmats@washington.edu].[Average peptide expression]

SELECT * FROM

(SELECT *, ROW_NUMBER ()

OVER (PARTITION BY protein ORDER BY avgallpeps DESC) AS pepabundance

FROM [emmats@washington.edu].[Average expression across all oysters])X

WHERE pepabundance <=3

Query 6
6. Join list of 3 most abundant peptides per protein (step 5b) to list of average peptide expression (step 4). [__3 peps per protein with expression__]

SELECT * FROM [emmats@washington.edu].[3 peps per protein]
LEFT JOIN [emmats@washington.edu].[Average peptide expression]
ON [emmats@washington.edu].[3 peps per protein].[peptide sequence]=[emmats@washington.edu].[Average peptide expression].[peptide sequence]

Query 7
7. Average the expression values of the 3 most abundant peptides per protein to go from expression based on peptides to expression based on proteins. [__3 peps per protein area avgd__]

SELECT protein
,avg(CAST(CG2 AS FLOAT))
,avg(CAST(CG5 AS FLOAT))
,avg(CAST(CG8 AS FLOAT))
,avg(CAST(CG11 AS FLOAT))
,avg(CAST(CG26 AS FLOAT))
,avg(CAST(CG29 AS FLOAT))
,avg(CAST(CG32 AS FLOAT))
,avg(CAST(CG35 AS FLOAT))
,avg(CAST(CG221 AS FLOAT))
,avg(CAST(CG224 AS FLOAT))
,avg(CAST(CG227 AS FLOAT))
,avg(CAST(CG230 AS FLOAT))
,avg(CAST(CG242 AS FLOAT))
,avg(CAST(CG245 AS FLOAT))
,avg(CAST(CG248 AS FLOAT))
,avg(CAST(CG251 AS FLOAT))
FROM [emmats@washington.edu].[3 peps per protein with expression]
GROUP BY protein

July 16, 2013
Secondary stress: proteomics
Took list of proteins that have a qvalue < 0.2 (see 7/15/13) and joined them with SPID and SPID descriptions in SQLshare.
SELECT * FROM [emmats@washington.edu].[proteins significant by qvalue.txt]
LEFT JOIN [table_Cg proteome db evalue -10.txt]
ON [proteins significant by qvalue.txt].protein=[table_Cg proteome db evalue -10.txt].Protein

SELECT * FROM [emmats@washington.edu].[proteins sig by qvalue with SPID]
LEFT JOIN [sr320@washington.edu].[qDOD Cgigas Gene Descriptions (Swiss-prot)]
ON [proteins sig by qvalue with SPID].SPID=[qDOD Cgigas Gene Descriptions (Swiss-prot)].SPID

Removed all annotations with evalue > 1E-10.

July 15, 2013
Secondary stress: proteomics
Calculating q-values associated with false discovery rate for differential expression between treatments (see 7/2 and 7/3/13). Based on Storey 2002 (http://onlinelibrary.wiley.com/store/10.1111/1467-9868.00346/asset/1467-9868.00346.pdf?v=1&t=hj5vvfya&s=3d50dcd3825c0a5c826fc723ebf74863174bcf1d) and using
definition of qvalue from the paper: "minimum pFDR [positive false discovery rate] that can occur when rejecting a statistic with a value t for the set of nested rejection regions"

Also of use, Storey & Tibshirani 2003, Statistical significance for genomewide studies:
"The q value for a particular feature is the expected proportion of
false positives incurred when calling that feature significant. Therefore, calculating theqvalues for each feature and thresholding them
at q-value level produces a set of significant features so that a
proportion of is expected to be false positives. Typically, the p
value is described as the probability of a null feature being as or
more extreme than the observed one. ‘‘As or more extreme’’ in this
setup means that it would appear higher on the list. The q value of
a particular feature can be described as the expected proportion of
false positives among all features as or more extreme than the
observed one"

"In our analysis, thresholding genes with q values 0.05 yields 160
genes significant for differential expression. This means that 8 of
the 160 genes called significant are expected to be false positives."

"In the example
presented above MSH2 has a qvalue equal to 0.013. This value does
not imply that MSH2 is a false positive with probability 0.013.
Rather, 0.013 is the expected proportion of false positives incurred
if we call MSH2 significant."

Bioconductor's qvalue package:
Alan Dabney, John D. Storey and with assistance from Gregory R. Warnes (). qvalue: Q-value estimation for false discovery rate control. R package version 1.32.0

I ran the code as indicated in the manual but 3 of my plots are blank for the pCO2 p-values. Everything worked for the other comparisons (LowMS and HighMS).
If I choose a q-value cut-off of 0.2 (which seems very liberal), then 64 proteins would be called significantly differentially expressed between 400 and 2800 µatm, but 12.8 (13) of these would be false positives. For response to mechanical stress at 400 µatm, 10 proteins would be differentially expressed with a total of 2 false positives. No proteins would be differentially expressed in response to mech stress at 2800 µatm. With this q-value cutoff, calcium/sodium exchanger 3 and UTP-1-glucose-phosphate uridylyltransferase are no longer considered differentially expressed.
Below is the output from qvalue for mech stress response at 400 µatm comparison between treatments.
external image qvalue%20lowMS%20results%20071513.jpg

external image qvalue%20lowMS%20results%20071513.jpg

July 9, 2013
Secondary stress: proteomics
Downloaded supp table 24 from Zhang et al. 2012 (proteins identified in shell) to see if any showed up in the differentially expressed proteins in the different treatment comparisons (see 7/2/13 and 7/3/13). In SQLshare joined the list of zhang proteins to the differential expression files for all treatments (2 steps because 2 different files).
SELECT * FROM [emmats@washington.edu].[Zhang Supp 24 shell proteins.txt]
LEFT JOIN [table_differential expression pCO2 and lowMS skyline.txt]
ON [Zhang Supp 24 shell proteins.txt].[Shell Protein]=[table_differential expression pCO2 and lowMS skyline.txt].protein

SELECT * FROM [emmats@washington.edu].[shell proteins with pCO2 lowMS diff exp]
LEFT JOIN [table_differential expression high MS.txt]
ON [shell proteins with pCO2 lowMS diff exp].[Shell Protein]=[table_differential expression high MS.txt].protein

final file is called shell proteins with all treatment comparisons.
For each treatment, a handful of these proteins were differentially expressed (for t.test p<0.050), but none of them showed a large fold-difference between treatment comparisons (all <3-fold).

July 8, 2013
Primer design for Etilet & Ahmed
Most C. gigas primers are from published papers:
HSP90 - Choi et al.
CYP1A1 - Boutet et al.
myc homolog - David et al. 2005
Barnacle primers (Balanus glandula) designed from congener NCBI sequences in NCBI. Arginine kinase and elongation factor are based on B. glandula sequences and hsp70 is B. amphitrite.
Arginine kinase: product length = 221 bp, Tm = 60°C, primer lengths = 20 bp (42-61 and 262-243), based on sequence AY543686

Elongation factor: product length = 178 bp, Tm = 60°C, primer lengths = 20bp (115-134, 292-273), based on sequence AY543685
Hsp70: product length = 99 bp, Tm = 60°C, primer lengths = 20 bp (1315-1334, 1413-1394), based on sequence AY150182

Papers on Na/Ca exchanger:
http://www.annualreviews.org/doi/abs/10.1146/annurev.physiol.62.1.111
http://physrev.physiology.org/content/86/1/155.short

July 3, 2013
Secondary stress: proteomics
Redid plots from 7/1/13 with error bars (standard deviation ^-7) and annotated with GO terms.
external image up-down%20pCO2%20with%20GO.jpg

external image up-down%20pCO2%20with%20GO.jpg

external image up-down%20lowMS%20with%20GO.jpg

197 proteins are differentially expressed in mechanically stressed oysters at 2800 µatm (p-value for t-test <0.50). 10 of these are more highly expressed in stressed oysters (all less than 4-fold) and 187 are down-regulated in stressed oysters (5 of these are > 10-fold down-regulated: Zinc finger MYND domain-containing protein 12, sodium/calcium exchanger 3, UTP-glucose-1-phosphate uridylyltransferase, uncharacterized protein C45G9.7 in C. elegans, PITH domain-containing protein GA19395). Both Na/Ca exchanger and UTP-glucose-etc. were expressed significantly higher at 2800 compared to 400 µatm, so their down-regulation during mechanical stress is interesting (see 7/2/13).

Explored the correlation between spec counts and NSAF. Ranked each protein for both methods of counts so that 1 = greatest expression value. See plot here. (https://www.evernote.com/shard/s242/sh/05e99501-2200-4ecd-b300-37227b1393e0/a19eb3eaf9b985e535ad7b0070347870)

July 2, 2013
Secondary stress: proteomics
I did a type 2, 2-tailed t-test of the Skyline data for 2800 vs 400 µatm and stress vs unstressed at 400 µatm. Not all of the highly up- or down-regulated proteins (see 7/1/13) were significantly differentially expressed. 295 proteins were differentially expressed in response to OA (p<0.050), 23 of these are expressed highly at 2800 µatm compared to 400 and 272 are expressed less at 2800 µatm. Guanine nucleotide-binding protein G(s) subunit alpha is significantly differentially expressed (23-fold lower at 2800 µatm, p=0.014) as well as sodium/calcium exchanger 3 (40-fold higher, p=0034), and UTP-glucose-1-phosphate uridylyltransferase (13-fold higher, p=0.036).
272 proteins are differentially expressed in response to stress at 400 µatm, 9 are expressed highly during mechanical stress and 263 are less expressed during stress. None of the highly differentially regulated proteins (see 7/1/13) are significantly differentially expressed.

July 1, 2014
Secondary stress: proteomics
Further investigation into Skyline data. I am working on comparing Skyline and NSAF results.
For all the Skyline proteins that contributed to a stress response (either high pCO2 or mech stress at 400 µatm), I made up-down plots showing the expression of each protein (summed across biological replicates). Bars in blue are proteins that are upregulated in the treatment condition and red are proteins that are downregulated.
external image up-down%20pCO2%2057%20proteins.jpg

external image up-down%20pCO2%2057%20proteins.jpg

Across all proteins, for both NSAF and Skyline, I performed the following: [sum expression for high pCO2]/[sum expression for low pCO2]. The 3 proteins that were > 10-fold up-regulated at 2800 µatm were not ones that had shown up in the NMDS: sodium/calcium exchanger 3, vacuolar protein sorting-associated protein 33A, UTP-glucose-1-phosphate uridylyltransferase. The same was true for the 3 proteins that were down-regulated >10-fold in the Skyline data: pre-mRNA splicing factor 38A, aldo-keto reductase family 1 member B10, guanine nucleotide-binding protein G(s) subunit alpha.
Out of the 20 proteins that were most up-regulated in the Skyline data for response to pCO2, only 3 of them were also upregulated in the NSAF data. Of the 10 proteins most up-regulated in the NSAF data, only 3 were also upregulated in the Skyline data.

Repeated the same steps for proteins expressed at 400 µatm + mech stress and 400 µatm (no stress).
external image up-down%20low%20MS%2022%20proteins.jpg

external image up-down%20low%20MS%2022%20proteins.jpg

Again, for the mech stress at 400 µatm like at the response to high pCO2, the NSAF results did not agree with the Skyline results. 1 protein was downregulated >10-fold in Skyling: Protein DEK; and 7 were upregulated >10-fold: microtubule-associated protein 2, megakaryocyte-associated tyrosine-protein kinase, Na/Ca exchanger 3, transcription elongation factor SPT5, 1,2-dihydroxy-3-keto-metylthiopentene dioxygenase, unidentified protein, sperm flagellar protein 2. The 4 proteins that were upregulated >5-fold in the NSAF data in response to MS were isoleucine tRNA ligase, muscle M-line assembly protein unc-89, dynamin-like 120 kDa protein, v-type proton ATPase subunit H.
*in the bar graphs, p.=putative.

June 24, 2013
Olympia oyster epigenetics
Added data analyzed 6/21/13 for primer set 2 to the previous primer set 2 data. Some of the data needs a little more attention, i.e. I think that allele 59 for the new data is actually 61 for the old data (I moved all of these over to the 61 bp allele column). The samples added are: CAS009_Hpa2, CAS010_Hpa2, DAB087_Hpa2, DAB088_Hpa2, DAB089_Hpa2, DAB090_Hpa2, DAB091_Hpa2, DAB092_Hpa2, DAB094_Msp2, FID093_Msp2, FID094_Msp2, DAB096_Msp2.
Checked precision of genotyping with the following samples for primer set 2: DAB093_Msp2, DAB095_Msp2, FID091_Mps2, FID092_Msp2, FID095_Msp2, FID096_Msp2, FID097_Msp2, FID098_Msp2, FID099_Msp2, FID100_Msp2. There was ~100% error rate in genotyping between the samples run on 4/1/13 and 6/19/13.

June 21, 2013**
Olympia oyster epigenetics
Fragment analysis of data run 6/19/13
no amplification: CAS007_Msp3. should also redo FID099_Hpa3
analysis of primer set 4: updated analysis method primer 4 ETS, panel = primer 4 032513, size standard = GS500 040113. no amplification: CAS007_Msp4. for primer set 4 added bins and re-analyzed.