{
 "metadata": {
  "name": "",
  "signature": "sha256:bdbbea3081fa555e3bbe4d5235a17adf45c0e997e88eaecd2cfc64fc6a4edce3"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "See http://www.biomedcentral.com/1471-2164/13/644"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Methods\n",
      "Identification of putative TEs within the flax WGS assembly\n",
      "An unmasked WGS assembly of flax comprising 318,250,901 bases was used as input for TE detection [40]. De-novo identification of transposable elements was performed using RepeatScout [108], PILER [109], LTR_finder [110] and LTR_STRUC [111]. Repeats identified by RepeatScout, under default parameters, were filtered for low complexity using Tandem Repeats Finder [112] and nseg [113]. Repeats with less than 10 hits in the genome were eliminated from the library. For PILER-DF [109] analysis the full genome was compared to itself using PALS (part of the PILER implementation) using the default parameters. Families of dispersed repeats were created using a minimum family size of 3 members and a maximum length difference of 5% between all family members. The consensus sequence for each family was created after aligning the sequences with MUSCLE [114]. LTR TEs were found using LTR_finder using the option \u2013w 2 to get a table output that could be parsed to obtain the sequences corresponding to the elements. LTR_STRUC was used under default parameters. The sequences output by all of these programs were used to create a unified repeat library that could be compared to previously characterized elements. Annotation of the repeats was performed comparing the library to a Viridiplantae TEs database downloaded from Repbase (http://www.girinst.org/repbase/ webcite - update 20110920) and a Plant Repeat Database (http://plantrepeats.plantbiology.msu.edu/ webcite) of TEs created from the families Brassicaceae, Fabaceae, Gramineae and Solanaceae (v2_1_0 update 20112006), using tBLASTx and BLASTn, and to the RepeatPeps database of TEs that comes with RepeatMasker (update 20110920) using BLASTx. To test whether TEs might have captured fragments of other genes or belong to gene families instead of TE families, BLASTx was performed against the Genbank nr database. Repeats were classified in a TE superfamily [49] if they showed E values of at least 1e-5 with a common annotation in at least two of the databases to which they were compared. Repeats characterized as putative TEs by the previous approach were joined to the Viridiplantae database of TEs (update 20110920) to use as a library for comparison to find the distribution and coverage TEs in the genome assembly using RepeatMasker v-3.3.0 (http://www.repeatmasker.org/ webcite). RMBlast was used as search algorithm with Smith-Waterman cutoff of 225 (this cutoff was used for all RepeatMasker analyses). To automatically annotate the masked regions (matches of the TEs in the genome) in their respective TE superfamilies a custom Perl script was used (kindly provided by Robert Hubley - Institute for Systems Biology, http://www.systemsbiology.org/ webcite -). A table for TEs abundance and coverage was built after filtering and annotation. The percentages were calculated for the elements based on the total number of bases including runs of Xs and Ns since some elements can also include at times undetermined bases; therefore total percentages may differ a slightly from those reported in the original description of the flax genome [40]. The TE values of the WGS assembly were compared to BAC-end sequencing TEs [47]."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}