Week 3

October 13
external image bioinformatics.gif

Bioinformatics

Definition via Wikipedia: Bioinformatics and computational biology involve the use or development of techniques, including applied mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems, usually on the molecular level. The primary goal of bioinformatics is to increase our understanding of biological processes. What sets it apart from other approaches, however, is its focus on developing and applying computationally intensive techniques (e.g., data mining, and machine learning algorithms) to achieve this goal. Major research efforts in the field include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, and the modeling of evolution.



NCBI Science Primer on Bioinformatics




For more information see:
Roberts Lab wiki (includes several nice video tutorials)
FISH507: Bioinformatics Course
Bioinformatics Cheat Sheet.doc


Paper:
A hitchhiker’s guide to expressed sequence tag (EST) analysis



BLAST: The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

Nice Tutorial via Geospiza



Interesting Databases:
KEGG PATHWAY
Enzymes
- macgavery macgavery


Short demo of what one might do with a fragment obtained by PCR, DEG, SSH, ETC.




Numerous examples of aggregation of gene related information (click on image)
Picture_1.png





Another primary use is finding sequences of interest that are publicly available.
How might you go about doing this???





Text from Lisa's Cheat Sheet
Bioinformatics: field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned (NCBI).
EST: small pieces of DNA sequence (usually 200 to 500 nucleotides long) that are generated by sequencing either one or both ends of an expressed gene. The idea is to sequence bits of DNA that represent genes expressed in certain cells, tissues, or organs from different organisms and use these "tags" to fish a gene out of a portion of chromosomal DNA by matching base pairs. The challenge associated with identifying genes from genomic sequences varies among organisms and is dependent upon genome size as well as the presence or absence of introns, the intervening DNA sequences interrupting the protein coding sequence of a gene (NCBI).




How are ESTs Generated?
mRNA (expressed genes) → reverse transcriptase → cDNA (double stranded) →cloned →sequenced (single pass or full length)
Data Sources:








High Performance Computing


Chris Dwan speaks on genomics and high performance computing at Grey Thumb Boston on September 7th, 2007.