Pacific oyster » GigaDB-1.pngTJGR

things just got real
Zhang, G; Fang, X; Guo, X; Li, L; Luo, R; Xu, F; Yang, P; Zhang, L; Wang, X; Qi, H; Zhu, Y; Yang, L; Huang, Z (2012) Genomic data from the Pacific oyster (Crassostrea gigas). GigaScience. http://dx.doi.org/10.5524/100030

Other Resources: OysterDB (Blast, Downloads) | Related manuscript: doi:10.1038/nature11413

.......................................................................................................................
[ http://genefish.wikispaces.com/page/xml/crassostreome?v=rss_2_0]

oyster.v9_M
Sequence
FASTA
60 Sequences; 75 MB
BSMAP
SAM
BS MBD
RNAseq
SAM
PROPS: DH
RNAseq
SAM
PROPS: BBC
RefMap
SAM
MBD Library
Annotation
BED
BS MBD CG features (no scores)
Annotation
BED
BS MBD CG features (with scores)
Annotation
BED
BS MBD CG features (0-20%)
Annotation
BED
BS MBD CG features (20-70%)
Annotation
BED
BS MBD CG features (70-100%)
Annotation
GFF
Tandem Repeats [Phobos] (imperfect search)
Annotation
BED
Tandem Repeats (interval and ID only)
Annotation
BED
Complement to CDS features
Annotation
GFF
CG motifs
Annotation
GFF
CG motifs non overlap w BS MBD CG (~250k)
Annotation
GFF
CG motifs with methylation status (~500k)
Annotation
BED
CG motifs(~500k) (interval and ID only)
Annotation
BED
50bp region (up and down) flanking CG motifs(~500k)
Sequence
FASTA
50bp region (up and down) flanking CG motifs(~500k)
Annotation
GFF
CG motifs (of 500k) that overlap with Tandem Repeats
Annotation
Blast Table
Transposable Elements: RepBase inv Blast
Annotation
BED
Transposable Elements: evalue 1E-10
Annotation
BED
Transposable Elements: evalue 1E-10; single directionality
Data
Tab Text
Interval Join: CG motifs(~500k) and TEs (RepBase)
Data
Tab Text
Interval Join: CG motifs(~500k) and Repeats (Phobos)
Data
Tab Text
Interval Join: CG motifs(~500k) and exons (CDS)
Data
Tab Text
Interval Join: CG motifs(~500k) and closest non overlapping exons
Data
Tab Text
Interval Join: CG motifs(~500k) not overlapping w/ mRNA (extragenic)
Description: Genomic scaffolds longer that 1M bps


oyster.v9_90
Sequence
FASTA
1670 Sequences; 488 MB
Annotation
Tab Text
Tiling Array Design v1
Description: Longest genomic scaffolds (1670; 14%) that cover over 90% of genome.


oyster.v9_proteome
Sequence
FASTA
[.gz] 28027 protein sequences (via gigadb.org)
Sequence
FASTA
28027 protein sequences (local)
Annotation
Blast Table
Blastp Swiss-Prot evalue: 10
Annotation
Blast Table
Blastp Swiss-Prot evalue: 1E-05
Annotation
Blast Table
Blastp Estrogen Biosynthetic Process; 1E-20, 4 max_targets, (details)
Description: Proteins

oyster.v9_genes
Sequence
FASTA
[.gz] 28027 gene (CDS only) sequences (via gigadb.org)
Sequence
FASTA
28027 gene (CDS only) sequences (local)
Annotation
Blast table
Blastx Swiss-Prot evalue: 1E-05
Data
Tab Text
Number of exons per gene
Annotation
Tab Text
Corresponding SPID and evalues
Annotation
Tab Text
Corresponding SPID, evalues, and descriptions
Annotation
Tab Text
Corresponding SPID, evalues, and GO# (using recent GO file)
Annotation
Tab Text
Corresponding SPID, evalues, and GO and GOslim
Description: Coding sequence



oyster.v9
Sequence
FASTA
[.gz] 11969 Sequences (via gigadb.org)
Sequence
FASTA
[.gz] 11969 Sequences (local)
Sequence
FASTA
11969 Sequences (local) 560MB
Annotation
GFF
[.gz] gene features (via gigadb.org)
Annotation
GFF
gene features (CDS and mRNA)
Annotation
GFF
gene features (CDS only)
Annotation
BED
gene features (CDS only) (interval and ID)
Annotation
GFF
gene features (mRNA only)
Annotation
BED
gene features (mRNA only)
Annotation
GFF
promoter region (1000bp 5' of mRNA)
Annotation
GFF
mRNA GOslim = cell adhesion or signal transduction or cell-cell signaling
Annotation
GFF
mRNA GOslim = DNA metabolism or RNA metabolism or protein metabolism
Annotation
GFF
DESeq (p<0.05) Gill v Male Gonad RNA-seq from Zhang et al
Description: Zhang, G; Fang, X; Guo, X; Li, L; Luo, R; Xu, F; Yang, P; Zhang, L; Wang, X; Qi, H; Zhu, Y; Yang, L; Huang, Z (2012) Genomic data from the Pacific oyster (Crassostrea gigas). GigaScience. http://dx.doi.org/10.5524/100030




external image 20120227-buriq24ytfwyrjsh4ut9qw4nhb.pngTGAGA (alpha)

towards getting a gigas assembly (alpha)
................................................................................................................................................




cgigas_alpha_v0.4.0
Sequence
FASTA

230,270 Sequences
Description: Independent assembly of 19 fosmids. Sequences were analyzed using CD-HIT EST to obtain non-redundant dataset.
Combination of assemblies v0.2.0 and v0.3.1.

cgigas_alpha_v0.3.2
Sequence
FASTA
272 Sequences
Annotation
GFF v1 (gff)
blastn Sigenae8
Annotation
BED v1 (bed)
blastn Sigenae8
Visual
Blast Directionality (htm)
based on BED v1
Annotation
GFF v2 (gff)
blastn Sigenae8 (includes gene description)
Annotation
GFF v3 (gff)
blastn Sigenae8 GOslim= cell adhesion or signal transduction or cell-cell signaling
Annotation
GFF v3 (bed)
blastn Sigenae8 GOslim= cell adhesion or signal transduction or cell-cell signaling
Annotation
GFF v4 (gff)
blastn Sigenae8 GOslim= DNA metabolism or RNA metabolism or protein metabolism
Annotation
GFF v4 (bed)
blastn Sigenae8 GOslim= DNA metabolism or RNA metabolism or protein metabolism
Annotation
MicroSatellite (GFF3)
MicroSatellite Regions (HP)
Annotation
Repeat_Regions(GFF3)
Regions that are repeated 10 or more times(HP)
Annotation
Repeat_Regions v1 (GFF3)
Regions that are repeated 10 or more times, derivative regions omitted (HP)
Annotation
Repeat_Regions v2 (GFF3)
Fixed orientation and score issues (HP)
Annotation
Repeat_Regions v3 (GFF3)
Identified some Retrotransposon regions from Repbase (HP)
Annotation
Repeat_Regions(BED)
Regions that are repeated 10 or more times(HP)
Annotation
Repeat_Regions v1 (BED)
Regions that are repeated 10 or more times, derivative regions omitted(HP)
Annotation
GFF v5 (gff)
Tandem repeats (Geneious:Phobos)
Annotation
GFF v6 (gff)
Transcription Factors (Geneious)
Alignment
RefMap MBD_meth (bed)
gill tissue
Alignment
RefMap MBD_unmeth (bed)
gill tissue
Alignment
RNAseq BB3 (bed)
gill tissue
Alignment
RNAseq DH3 (bed)
gill tissue
Alignment
RefMap larvae_SRP002286 (bed)
larvae
Alignment
RefMap larvae_SRP004696 (bed)
Two 454 SRA files - larvae
Motif
CG motifs (gff)

BSMAP
BSMAP_MBD_all (gff)
Bisulfite-seq 10x cov ; all
BSMAP
BSMAP_MBD_0 (gff)
Bisulfite-seq 10x cov ; 0% methylated
BSMAP
BSMAP_MBD_1_9 (gff)
Bisulfite-seq 10x cov ; 1-9% methylated
BSMAP
BSMAP_MBD_10_49 (gff)
Bisulfite-seq 10x cov ; 10-49% methylated
BSMAP
BSMAP_MBD_50_89 (gff)
Bisulfite-seq 10x cov ; 50-89% methylated
BSMAP
BSMAP_MBD_90_100 (gff)
Bisulfite-seq 10x cov ; 90-100% methylated
Visual
Track Visualization BS (Galaxy)
Tracks include: exon annotation, RNA-seq, select GO terms, CpG methylation
Visual
Track Visualization 454 (Galaxy)
Tracks include: RefMap larvae_SRP004696 (bed)
Annotation
bigwig 1

Annotation
bigwig 2

Description: Independent assembly of 10 fosmids. Sequences were analyzed using CD-HIT EST to obtain non-redundant dataset. Sequences greater than 20k bp retained. Cgigas BAC clones from NCBI (60) reduced to 53 clusters. Plus 12 select sequences with known genomic structure.



cgigas_alpha_v0.3.1
Sequence
FASTA
500MB
203,216 Sequences
Description: Independent assembly of 10 fosmids. Sequences were analyzed using CD-HIT EST to obtain non-redundant dataset.


cgigas_alpha_v0.3.0
Sequence
FASTA
12.3MB
272 Sequences
Annotation
GFF v1 (gff)

blastn Sigenae8
Annotation
BED v1 (bed)

blastn Sigenae8
Alignment
MBD-meth reads (sam)


Alignment
MBD-ummeth reads (sam)


Alignment
RNAseq BB3 (sam)


Alignment
RNAseq DH3 (sam)


Description: Independent assembly of 10 fosmids. Consensus sequences >20,000bp along with publicly available BAC sequences were analyzed using CD-HIT EST to obtain non-redundant dataset. Plus 12 select sequences with known genomic structure.


cgigas_alpha_v0.2.0
Sequence
FASTA
615MB
205,903 Sequences
Annotation







Description: Independent assembly of nine fosmids. Consensus sequences were analyzed using CD-HIT EST to obtain non-redundant dataset.


cgigas_alpha_v0.1.3
Sequence
FASTA
6MB
262 Sequences
Annotation







Description: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset. Remaining sequences greater than 20,000 bp are included. Plus 12 select sequences with known genomic structure.


cgigas_alpha_v0.1.2
Sequence
FASTA
215MB
28,538 Sequences
Annotation
BLASTn Sigenae v8 (tsv)
800MB
*rev
Annotation
GFF v1 (gff)
9MB

Description: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset. Plus 12 select sequences with known genomic structure.


cgigas_alpha_v0.1.1
Sequence
FASTA
215MB
28,526 Sequences
Annotation







Description: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset.


cgigas_alpha_v0.1.0
Sequence
FASTA
5.8MB
250 Sequences
Annotation
BLASTn Sigenae v8 (csv)

top hit
Annotation
SW Sigenae v8 (tsv)

top hit
Annotation
GFF v0 (gff)

12,582 total hits
Annotation
GFF v1 (gff)

3891 hits, gene names
Annotation
GFF v2 (gff)

1105 hits, used rev blast
Description: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset. Remaining sequences greater than 20,000 bp are included.


cgigas_PREalpha_v0.2.0
Sequence
FASTA

12 sequences
Annotation
GFF v1 (gff)


Annotation
GFF v2 (gff)

Exons only
Alignment
MBD-BS reads (sam)




*BLAST option: blastn, megablast select database from menu


external image 20120227-f95u9h9wt2gqwihpusy1qccqrx.pngRelated Resources

various other genomic resources for the oyster.






GigasBase

Public Sigenae Contig Browser: Oyster

The Oyster EST contig browser aims to produce and maintain an automatic annotation of Oyster EST libraries. This database GigasDatabase was initiated within the frame of the AquaFirst European project, it now gathers EST sequences produced by a Marine Genomics Europe project (GOCE-CT-2004-505403) and a Genoscope project. GigasDatabase is regularly updated in the context of the ANR project "Gametogenes" (ANR-08-GENM-041).


GigasDatabase Assemblies

Version 8
cgigas_all_contigs_v8.fa
cgigas_all_contigs_v8_BESTHIT.tsv
cgigas_all_contigs_v8_ontology.tsv


Archived GigasDatabase Assemblies

Version 6
Sigenae_v6_assembly.fa
Sigenae _v6_SPhits.xls




NCBI

Crassostrea gigas Entrez Records
Crassostrea virginica Entrez Records

NCBI: SRA

Crassostrea gigas
FASTA export of quality trimmed reads (export date 02/10/11)

Misc

Cgigas_BAConly.fa (export date 01/31/11)
Cgigas_genomic_NCBI.fa (export date 01/31/11)

Roberts Lab Submissions

Crassostrea Nucleotide Database entries
Crassostrea gigas EST Database entries








customizable counter