<rdf:RDF xmlns:burst="http://xmlns.com/burst/0.1/" xmlns:admin="http://webns.net/mvcb/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:cc="http://web.resource.org/cc/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:swrc="http://swrc.ontoware.org/ontology#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><channel rdf:about="http://www.bibsonomy.org/burst/user/dzerbino/DNA,"><title>BibSonomy publications for /user/dzerbino/DNA,</title><link>http://www.bibsonomy.org/burst/user/dzerbino/DNA,</link><description>BibSonomy BuRST Feed for /user/dzerbino/DNA,</description><dc:date>2008-09-05T07:25:28+02:00</dc:date><items><rdf:Seq><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2aa8bc1f2986f316dcdd470da0aa1588d/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2094f2ea8388f71a5fb448d2e3517059b/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2150cd10c40aace2c238aaa20c8480e08/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/281238c67b322e3148e22776c9d9cbf3d/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2187f0308ae9e5fa2f47476b9ab180a20/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/238f73cc8ed9f2f976ae6a7360b532cfe/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/208726a1ee302ce26b55d2d0ae419d2b4/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/239fe09991ee839c5c69c0ad2c0461d33/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/292574a7dfe4924f76351f0ac33eae32d/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/211f0d08db68b4f73ce33ffab95ffef98/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2ee9d2aa5c67377101e1ad619b08eac15/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/29acd2b8b071ac8bb83b8007ef69b9d88/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2502e0622b4d381412eafa06bb77d377e/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/21dc921e2ef4587944697d75bf48c2db4/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2d35d83616ff5162f0dc9ae73792e90bf/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2e05ca61552c8d9f5951978da7619860d/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/285853a4fe7db3494508d6631d10f55ca/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/26fefee7c3e5e1c27b6a90750a6c4c153/dzerbino"/></rdf:Seq></items></channel><item rdf:about="http://www.bibsonomy.org/bibtex/2aa8bc1f2986f316dcdd470da0aa1588d/dzerbino"><title>Generating consensus sequences from partial order multiple sequence alignment graphs</title><link>http://www.bibsonomy.org/bibtex/2aa8bc1f2986f316dcdd470da0aa1588d/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Consensus Analysis, Algorithms, Gene Alignment, Sequence DNA, Software, Humans, Profiling, Expression </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Christopher &lt;a href=&#034;http://www.bibsonomy.org/author/Lee&#034;&gt;Lee&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;19(8):999--1008&lt;/em&gt;&lt;em&gt;May2003. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Consensus"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Profiling,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Expression"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2aa8bc1f2986f316dcdd470da0aa1588d/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2aa8bc1f2986f316dcdd470da0aa1588d/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>May</swrc:month><swrc:number>8</swrc:number><swrc:pages>999--1008</swrc:pages><swrc:title>Generating consensus sequences from partial order multiple sequence alignment graphs</swrc:title><swrc:volume>19</swrc:volume><swrc:year>2003</swrc:year><swrc:keywords>Consensus Analysis, Algorithms, Gene Alignment, Sequence DNA, Software, Humans, Profiling, Expression </swrc:keywords><swrc:abstract>MOTIVATION: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. However, how can a consensus be constructed when its inherent assumption-that the aligned sequences form a single linear consensus-is not true? RESULTS: Partial Order Alignment (POA) enables construction and analysis of multiple sequence alignments as directed acyclic graphs containing complex branching structure. Here we present a dynamic programming algorithm (heaviest_bundle) for generating multiple consensus sequences from such complex alignments. The number and relationships of these consensus sequences reveals the degree of structural complexity of the source alignment. This is a powerful and general approach for analyzing and visualizing complex alignment structures, and can be applied to any alignment. We illustrate its value for analyzing expressed sequence alignments to detect alternative splicing, reconstruct full length mRNA isoform sequences from EST fragments, and separate paralog mixtures that can cause incorrect SNP predictions. AVAILABILITY: The heaviest_bundle source code is available at http://www.bioinformatics.ucla.edu/poa</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="12761063" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="8" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="UCLA-DOE Center for Genomics and Proteomics, Molecular Biology Institute Department of Chemistry, University of California, Los Angeles, Los Angeles, CA 90095-1570, USA. leec@mbi.ucla.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p4" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2003/Lee/Bioinformatics%202003%20Lee.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Christopher Lee"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2094f2ea8388f71a5fb448d2e3517059b/dzerbino"><title>An Eulerian path approach to local multiple alignment for DNA sequences</title><link>http://www.bibsonomy.org/bibtex/2094f2ea8388f71a5fb448d2e3517059b/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Acid, Nucleic Repetitive DNA, Genetic, Molecular Homology, Base Alignment, Sequence, Models, Acid Data, Sequences, Sequence </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Yu &lt;a href=&#034;http://www.bibsonomy.org/author/Zhang&#034;&gt;Zhang&lt;/a&gt;  und Michael S &lt;a href=&#034;http://www.bibsonomy.org/author/Waterman&#034;&gt;Waterman&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Proc Natl Acad Sci USA&lt;/em&gt;&lt;em&gt;102(5):1285--90&lt;/em&gt;&lt;em&gt;Feb2005. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Acid,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Nucleic"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Repetitive"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genetic,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Homology,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Acid"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequences,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2094f2ea8388f71a5fb448d2e3517059b/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2094f2ea8388f71a5fb448d2e3517059b/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Proc Natl Acad Sci USA</swrc:journal><swrc:month>Feb</swrc:month><swrc:number>5</swrc:number><swrc:pages>1285--90</swrc:pages><swrc:title>An Eulerian path approach to local multiple alignment for DNA sequences</swrc:title><swrc:volume>102</swrc:volume><swrc:year>2005</swrc:year><swrc:keywords>Acid, Nucleic Repetitive DNA, Genetic, Molecular Homology, Base Alignment, Sequence, Models, Acid Data, Sequences, Sequence </swrc:keywords><swrc:abstract>Expensive computation in handling a large number of sequences limits the application of local multiple sequence alignment. We present an Eulerian path approach to local multiple alignment for DNA sequences. The computational time and memory usage of this approach is approximately linear to the total size of sequences analyzed; hence, it can handle thousands of sequences or millions of letters simultaneously. By constructing a De Bruijn graph, most of the conserved segments are amplified as heavy Eulerian paths in the graph, and the original patterns distributed in sequences are recovered even if they do not exist in any single sequence. This approach can accurately detect unknown conserved regions, for both short and long, conserved and degenerate patterns. We further present a Poisson heuristic to estimate the significance of a local multiple alignment. The performance of our method is demonstrated by finding Alu repeats in the human genome. We compare the results with Alus marked by repeatmasker, where the two programs are in good agreement. Our method is robust under various conditions and superior to other methods in terms of efficiency and accuracy.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="0409240102" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="15668398" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="5" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Mathematics, University of Southern California, 1042 West 36th Place, DRB289, Los Angeles, CA 90089-1113, USA." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p7" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2005/Zhang/Proc%20Natl%20Acad%20Sci%20USA%202005%20Zhang.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1073/pnas.0409240102" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Yu Zhang"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Michael S Waterman"/></rdf:_2></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2150cd10c40aace2c238aaa20c8480e08/dzerbino"><title>Minimus: a fast, lightweight genome assembler</title><link>http://www.bibsonomy.org/bibtex/2150cd10c40aace2c238aaa20c8480e08/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Algorithms, Alignment, Sequence, Sequence Data, User-Computer Molecular Base Chromosome Analysis, Software, Software Mapping, DNA, Design, Interface </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Daniel D &lt;a href=&#034;http://www.bibsonomy.org/author/Sommer&#034;&gt;Sommer&lt;/a&gt;  und Arthur L &lt;a href=&#034;http://www.bibsonomy.org/author/Delcher&#034;&gt;Delcher&lt;/a&gt;  und Steven L &lt;a href=&#034;http://www.bibsonomy.org/author/Salzberg&#034;&gt;Salzberg&lt;/a&gt;  und Mihai &lt;a href=&#034;http://www.bibsonomy.org/author/Pop&#034;&gt;Pop&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;BMC Bioinformatics&lt;/em&gt;&lt;em&gt;Feb2007. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/User-Computer"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosome"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Design,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Interface"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2150cd10c40aace2c238aaa20c8480e08/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2150cd10c40aace2c238aaa20c8480e08/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>BMC Bioinformatics</swrc:journal><swrc:month>Feb</swrc:month><swrc:pages>64</swrc:pages><swrc:title>Minimus: a fast, lightweight genome assembler</swrc:title><swrc:volume>8</swrc:volume><swrc:year>2007</swrc:year><swrc:keywords>Algorithms, Alignment, Sequence, Sequence Data, User-Computer Molecular Base Chromosome Analysis, Software, Software Mapping, DNA, Design, Interface </swrc:keywords><swrc:abstract>BACKGROUND: Genome assemblers have grown very large and complex in response to the need for algorithms to handle the challenges of large whole-genome sequencing projects. Many of the most common uses of assemblers, however, are best served by a simpler type of assembler that requires fewer software components, uses less memory, and is far easier to install and run. RESULTS: We have developed the Minimus assembler to address these issues, and tested it on a range of assembly problems. We show that Minimus performs well on several small assembly tasks, including the assembly of viral genomes, individual genes, and BAC clones. In addition, we evaluate Minimus&#039; performance in assembling bacterial genomes in order to assess its suitability as a component of a larger assembly pipeline. We show that, unlike other software currently used for these tasks, Minimus produces significantly fewer assembly errors, at the cost of generating a more fragmented assembly. CONCLUSION: We find that for small genomes and other small assembly tasks, Minimus is faster and far more flexible than existing tools. Due to its small size and modular design Minimus is perfectly suited to be a component of complex assembly pipelines. Minimus is released as an open-source software project and the code is available as part of the AMOS project at Sourceforge.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="1471-2105-8-64" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="17324286" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA. dsommer@umiacs.umd.edu &lt;dsommer@umiacs.umd.edu&gt;" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p22" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2007/Sommer/BMC%20Bioinformatics%202007%20Sommer.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1186/1471-2105-8-64" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Daniel D Sommer"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Arthur L Delcher"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Steven L Salzberg"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Mihai Pop"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/281238c67b322e3148e22776c9d9cbf3d/dzerbino"><title>Multiple sequence alignment using partial order graphs</title><link>http://www.bibsonomy.org/bibtex/281238c67b322e3148e22776c9d9cbf3d/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Glucose-1-Phosphate Humans, Algorithms, Homology, Nucleotidyltransferases, Plant Sequence RNA, Data, Expressed Databases, Adenylyltransferase, Models, Factors, Genetic, Molecular Messenger, Sensitivity Software, Statistical, Proteins, Sequence, Specificity, Alignment, Tags, Time DNA, Base and Control Quality </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Christopher &lt;a href=&#034;http://www.bibsonomy.org/author/Lee&#034;&gt;Lee&lt;/a&gt;  und Catherine &lt;a href=&#034;http://www.bibsonomy.org/author/Grasso&#034;&gt;Grasso&lt;/a&gt;  und Mark F &lt;a href=&#034;http://www.bibsonomy.org/author/Sharlow&#034;&gt;Sharlow&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;18(3):452--64&lt;/em&gt;&lt;em&gt;Mar2002. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Glucose-1-Phosphate"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Homology,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Nucleotidyltransferases,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Plant"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/RNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Expressed"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Databases,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Adenylyltransferase,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Factors,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genetic,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Messenger,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sensitivity"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Statistical,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Proteins,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Specificity,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Tags,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Time"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/and"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Control"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Quality"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/281238c67b322e3148e22776c9d9cbf3d/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/281238c67b322e3148e22776c9d9cbf3d/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Mar</swrc:month><swrc:number>3</swrc:number><swrc:pages>452--64</swrc:pages><swrc:title>Multiple sequence alignment using partial order graphs</swrc:title><swrc:volume>18</swrc:volume><swrc:year>2002</swrc:year><swrc:keywords>Glucose-1-Phosphate Humans, Algorithms, Homology, Nucleotidyltransferases, Plant Sequence RNA, Data, Expressed Databases, Adenylyltransferase, Models, Factors, Genetic, Molecular Messenger, Sensitivity Software, Statistical, Proteins, Sequence, Specificity, Alignment, Tags, Time DNA, Base and Control Quality </swrc:keywords><swrc:abstract>MOTIVATION: Progressive Multiple Sequence Alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of information needed for accurate alignment, and gap scoring artifacts. RESULTS: We present a graph representation of an MSA that can itself be aligned directly by pairwise dynamic programming, eliminating the need to reduce the MSA to a profile. This enables our algorithm (Partial Order Alignment (POA)) to guarantee that the optimal alignment of each new sequence versus each sequence in the MSA will be considered. Moreover, this algorithm introduces a new edit operator, homologous recombination, important for multidomain sequences. The algorithm has improved speed (linear time complexity) over existing MSA algorithms, enabling construction of massive and complex alignments (e.g. an alignment of 5000 sequences in 4 h on a Pentium II). We demonstrate the utility of this algorithm on a family of multidomain SH2 proteins, and on EST assemblies containing alternative splicing and polymorphism. AVAILABILITY: The partial order alignment program POA is available at http://www.bioinformatics.ucla.edu/poa.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="11934745" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="3" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095-1570, USA. leec@mbi.ucla.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p9" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2002/Lee/Bioinformatics%202002%20Lee.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Christopher Lee"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Catherine Grasso"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Mark F Sharlow"/></rdf:_3></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2187f0308ae9e5fa2f47476b9ab180a20/dzerbino"><title>Fragment assembly with short reads</title><link>http://www.bibsonomy.org/bibtex/2187f0308ae9e5fa2f47476b9ab180a20/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Molecular Gene Feasibility DNA, Contig Analysis, Base Sequence Data, Mapping, Expression Algorithms, Profiling, Alignment, Studies, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Mark &lt;a href=&#034;http://www.bibsonomy.org/author/Chaisson&#034;&gt;Chaisson&lt;/a&gt;  und Pavel &lt;a href=&#034;http://www.bibsonomy.org/author/Pevzner&#034;&gt;Pevzner&lt;/a&gt;  und Haixu &lt;a href=&#034;http://www.bibsonomy.org/author/Tang&#034;&gt;Tang&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;20(13):2067--74&lt;/em&gt;&lt;em&gt;Sep2004. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Feasibility"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Expression"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Profiling,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Studies,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2187f0308ae9e5fa2f47476b9ab180a20/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2187f0308ae9e5fa2f47476b9ab180a20/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Sep</swrc:month><swrc:number>13</swrc:number><swrc:pages>2067--74</swrc:pages><swrc:title>Fragment assembly with short reads</swrc:title><swrc:volume>20</swrc:volume><swrc:year>2004</swrc:year><swrc:keywords>Molecular Gene Feasibility DNA, Contig Analysis, Base Sequence Data, Mapping, Expression Algorithms, Profiling, Alignment, Studies, </swrc:keywords><swrc:abstract>MOTIVATION: Current DNA sequencing technology produces reads of about 500-750 bp, with typical coverage under 10x. New sequencing technologies are emerging that produce shorter reads (length 80-200 bp) but allow one to generate significantly higher coverage (30x and higher) at low cost. Modern assembly programs and error correction routines have been tuned to work well with current read technology but were not designed for assembly of short reads. RESULTS: We analyze the limitations of assembling reads generated by these new technologies and present a routine for base-calling in reads prior to their assembly. We demonstrate that while it is feasible to assemble such short reads, the resulting contigs will require significant (if not prohibitive) finishing efforts. AVAILABILITY: Available from the web at http://www.cse.ucsd.edu/groups/bioinformatics/software.html</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="bth205" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="15059830" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="13" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Bioinformatics Program, University of California San Diego, La Jolla, CA 92093, USA. mchaisso@bioinf.ucsd.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p25" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2004/Chaisson/Bioinformatics%202004%20Chaisson.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1093/bioinformatics/bth205" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Mark Chaisson"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Pavel Pevzner"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Haixu Tang"/></rdf:_3></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/238f73cc8ed9f2f976ae6a7360b532cfe/dzerbino"><title>The fragment assembly string graph</title><link>http://www.bibsonomy.org/bibtex/238f73cc8ed9f2f976ae6a7360b532cfe/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Mapping Fragmentation, Analysis, Chromosome Algorithms, Data, DNA Sequence, Base Sequence Molecular DNA, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Eugene W &lt;a href=&#034;http://www.bibsonomy.org/author/Myers&#034;&gt;Myers&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;Sep2005. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Fragmentation,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosome"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/238f73cc8ed9f2f976ae6a7360b532cfe/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/238f73cc8ed9f2f976ae6a7360b532cfe/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Sep</swrc:month><swrc:pages>ii79--85</swrc:pages><swrc:title>The fragment assembly string graph</swrc:title><swrc:volume>21 Suppl 2</swrc:volume><swrc:year>2005</swrc:year><swrc:keywords>Mapping Fragmentation, Analysis, Chromosome Algorithms, Data, DNA Sequence, Base Sequence Molecular DNA, </swrc:keywords><swrc:abstract>We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="21/suppl_2/ii79" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="16204131" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Computer Science, University of California Berkeley, CA, USA. gene@eecs.berkeley.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p35" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2005/Myers/Bioinformatics%202005%20Myers.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1093/bioinformatics/bti1114" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Eugene W Myers"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/208726a1ee302ce26b55d2d0ae419d2b4/dzerbino"><title>Reconstructing large regions of an ancestral mammalian genome in silico</title><link>http://www.bibsonomy.org/bibtex/208726a1ee302ce26b55d2d0ae419d2b4/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Functions, Genetic Regulator, Alignment, Fibrosis Base Likelihood Computer (Genetics), Mammals, Simulation, Cystic Variation Conductance Transmembrane Animals, Molecular Sequence, Molecular, Analysis, Evolution, Phylogeny, Sequence Data, Genome, Models, DNA, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Mathieu &lt;a href=&#034;http://www.bibsonomy.org/author/Blanchette&#034;&gt;Blanchette&lt;/a&gt;  und Eric D &lt;a href=&#034;http://www.bibsonomy.org/author/Green&#034;&gt;Green&lt;/a&gt;  und Webb &lt;a href=&#034;http://www.bibsonomy.org/author/Miller&#034;&gt;Miller&lt;/a&gt;  und David &lt;a href=&#034;http://www.bibsonomy.org/author/Haussler&#034;&gt;Haussler&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Genome Res&lt;/em&gt;&lt;em&gt;14(12):2412--23&lt;/em&gt;&lt;em&gt;Dec2004. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Functions,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genetic"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Regulator,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Fibrosis"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Likelihood"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Computer"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/(Genetics),"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mammals,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Simulation,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Cystic"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Variation"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Conductance"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Transmembrane"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Animals,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Evolution,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Phylogeny,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/208726a1ee302ce26b55d2d0ae419d2b4/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/208726a1ee302ce26b55d2d0ae419d2b4/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Genome Res</swrc:journal><swrc:month>Dec</swrc:month><swrc:number>12</swrc:number><swrc:pages>2412--23</swrc:pages><swrc:title>Reconstructing large regions of an ancestral mammalian genome in silico</swrc:title><swrc:volume>14</swrc:volume><swrc:year>2004</swrc:year><swrc:keywords>Functions, Genetic Regulator, Alignment, Fibrosis Base Likelihood Computer (Genetics), Mammals, Simulation, Cystic Variation Conductance Transmembrane Animals, Molecular Sequence, Molecular, Analysis, Evolution, Phylogeny, Sequence Data, Genome, Models, DNA, </swrc:keywords><swrc:abstract>It is believed that most modern mammalian lineages arose from a series of rapid speciation events near the Cretaceous-Tertiary boundary. It is shown that such a phylogeny makes the common ancestral genome sequence an ideal target for reconstruction. Simulations suggest that with methods currently available, we can expect to get 98% of the bases correct in reconstructing megabase-scale euchromatic regions of an eutherian ancestral genome from the genomes of approximately 20 optimally chosen modern mammals. Using actual genomic sequences from 19 extant mammals, we reconstruct 1.1 Mb of ancient genome sequence around the CFTR locus. Detailed examination suggests the reconstruction is accurate and that it allows us to identify features in modern species, such as remnants of ancient transposon insertions, that were not identified by direct analysis. Tracing the predicted evolutionary history of the bases in the reconstructed region, estimates are made of the amount of DNA turnover due to insertion, deletion, and substitution in the different placental mammalian lineages since the common eutherian ancestor, showing considerable variation between lineages. In coming years, such reconstructions may help in identifying and understanding the genetic features common to eutherian mammals and may shed light on the evolution of human or primate-specific traits.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="14/12/2412" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="15574820" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="12" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Howard Hughes Medical Institute, University of California, Santa Cruz, California 95064, USA. blanchem@mcb.mcgill.ca" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p36" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2004/Blanchette/Genome%20Res%202004%20Blanchette.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1101/gr.2800104" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Mathieu Blanchette"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Eric D Green"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Webb Miller"/></rdf:_3><rdf:_4><swrc:Person swrc:name="David Haussler"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/239fe09991ee839c5c69c0ad2c0461d33/dzerbino"><title>Toward simplifying and accurately formulating fragment assembly</title><link>http://www.bibsonomy.org/bibtex/239fe09991ee839c5c69c0ad2c0461d33/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Statistical Results, of Reproducibility Probability, Mathematics, Models, Sequence, Base DNA, Oligodeoxyribonucleotides, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;E W &lt;a href=&#034;http://www.bibsonomy.org/author/Myers&#034;&gt;Myers&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;J Comput Biol&lt;/em&gt;&lt;em&gt;2(2):275--90&lt;/em&gt;&lt;em&gt;Jan1995. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Statistical"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Results,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/of"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Reproducibility"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Probability,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mathematics,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Oligodeoxyribonucleotides,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/239fe09991ee839c5c69c0ad2c0461d33/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/239fe09991ee839c5c69c0ad2c0461d33/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>J Comput Biol</swrc:journal><swrc:month>Jan</swrc:month><swrc:number>2</swrc:number><swrc:pages>275--90</swrc:pages><swrc:title>Toward simplifying and accurately formulating fragment assembly</swrc:title><swrc:volume>2</swrc:volume><swrc:year>1995</swrc:year><swrc:keywords>Statistical Results, of Reproducibility Probability, Mathematics, Models, Sequence, Base DNA, Oligodeoxyribonucleotides, </swrc:keywords><swrc:abstract>The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the underlying problems are NP-hard. In practice, the transformed problems are so small that simple branch-and-bound algorithms successfully solve them, thus permitting auxiliary experimental information to be taken into account in the form of overlap, orientation, and distance constraints.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="7497129" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="2" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Computer Science, University of Arizona, Tucson 85721, USA." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p33" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/1995/Myers/J%20Comput%20Biol%201995%20Myers.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="E W Myers"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/292574a7dfe4924f76351f0ac33eae32d/dzerbino"><title>Assembling millions of short DNA sequences using SSAKE</title><link>http://www.bibsonomy.org/bibtex/292574a7dfe4924f76351f0ac33eae32d/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Base Contig Molecular Data, Mapping Sequence, Sequence Analysis, Software, Chromosome Algorithms, DNA, Mapping, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Ren&#039;e L &lt;a href=&#034;http://www.bibsonomy.org/author/Warren&#034;&gt;Warren&lt;/a&gt;  und Granger G &lt;a href=&#034;http://www.bibsonomy.org/author/Sutton&#034;&gt;Sutton&lt;/a&gt;  und Steven J M &lt;a href=&#034;http://www.bibsonomy.org/author/Jones&#034;&gt;Jones&lt;/a&gt;  und Robert A &lt;a href=&#034;http://www.bibsonomy.org/author/Holt&#034;&gt;Holt&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;23(4):500--1&lt;/em&gt;&lt;em&gt;Feb2007. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosome"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/292574a7dfe4924f76351f0ac33eae32d/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/292574a7dfe4924f76351f0ac33eae32d/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Feb</swrc:month><swrc:number>4</swrc:number><swrc:pages>500--1</swrc:pages><swrc:title>Assembling millions of short DNA sequences using SSAKE</swrc:title><swrc:volume>23</swrc:volume><swrc:year>2007</swrc:year><swrc:keywords>Base Contig Molecular Data, Mapping Sequence, Sequence Analysis, Software, Chromosome Algorithms, DNA, Mapping, </swrc:keywords><swrc:abstract>Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: http://www.bcgsc.ca/bioinfo/software/ssake.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="btl629" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="17158514" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="4" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="British Columbia Cancer Agency, Genome Sciences Centre, 675 West 10th Avenue, Vancouver, BC V5Z 1L3, Canada. rwarren@bcgsc.ca" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p21" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2007/Warren/Bioinformatics%202007%20Warren.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1093/bioinformatics/btl629" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Ren{\&#039;e} L Warren"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Granger G Sutton"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Steven J M Jones"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Robert A Holt"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/211f0d08db68b4f73ce33ffab95ffef98/dzerbino"><title>A whole-genome assembly of Drosophila</title><link>http://www.bibsonomy.org/bibtex/211f0d08db68b4f73ce33ffab95ffef98/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Acid, Biology Animals, Chromosome Genes, Molecular Sequences, Repetitive Sequence Heterochromatin, Insect, Analysis, Drosophila Tagged Contig Sites, Euchromatin, Physical melanogaster, Chromatin, Computational Mapping, Data, DNA, Algorithms, Nucleic Genome, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;E W &lt;a href=&#034;http://www.bibsonomy.org/author/Myers&#034;&gt;Myers&lt;/a&gt;  und G G &lt;a href=&#034;http://www.bibsonomy.org/author/Sutton&#034;&gt;Sutton&lt;/a&gt;  und A L &lt;a href=&#034;http://www.bibsonomy.org/author/Delcher&#034;&gt;Delcher&lt;/a&gt;  und I M &lt;a href=&#034;http://www.bibsonomy.org/author/Dew&#034;&gt;Dew&lt;/a&gt;  und D P &lt;a href=&#034;http://www.bibsonomy.org/author/Fasulo&#034;&gt;Fasulo&lt;/a&gt;  und M J &lt;a href=&#034;http://www.bibsonomy.org/author/Flanigan&#034;&gt;Flanigan&lt;/a&gt;  und S A &lt;a href=&#034;http://www.bibsonomy.org/author/Kravitz&#034;&gt;Kravitz&lt;/a&gt;  und C M &lt;a href=&#034;http://www.bibsonomy.org/author/Mobarry&#034;&gt;Mobarry&lt;/a&gt;  und K H &lt;a href=&#034;http://www.bibsonomy.org/author/Reinert&#034;&gt;Reinert&lt;/a&gt;  und K A &lt;a href=&#034;http://www.bibsonomy.org/author/Remington&#034;&gt;Remington&lt;/a&gt;  und E L &lt;a href=&#034;http://www.bibsonomy.org/author/Anson&#034;&gt;Anson&lt;/a&gt;  und R A &lt;a href=&#034;http://www.bibsonomy.org/author/Bolanos&#034;&gt;Bolanos&lt;/a&gt;  und H H &lt;a href=&#034;http://www.bibsonomy.org/author/Chou&#034;&gt;Chou&lt;/a&gt;  und C M &lt;a href=&#034;http://www.bibsonomy.org/author/Jordan&#034;&gt;Jordan&lt;/a&gt;  und A L &lt;a href=&#034;http://www.bibsonomy.org/author/Halpern&#034;&gt;Halpern&lt;/a&gt;  und S &lt;a href=&#034;http://www.bibsonomy.org/author/Lonardi&#034;&gt;Lonardi&lt;/a&gt;  und E M &lt;a href=&#034;http://www.bibsonomy.org/author/Beasley&#034;&gt;Beasley&lt;/a&gt;  und R C &lt;a href=&#034;http://www.bibsonomy.org/author/Brandon&#034;&gt;Brandon&lt;/a&gt;  und L &lt;a href=&#034;http://www.bibsonomy.org/author/Chen&#034;&gt;Chen&lt;/a&gt;  und P J &lt;a href=&#034;http://www.bibsonomy.org/author/Dunn&#034;&gt;Dunn&lt;/a&gt;  und Z &lt;a href=&#034;http://www.bibsonomy.org/author/Lai&#034;&gt;Lai&lt;/a&gt;  und Y &lt;a href=&#034;http://www.bibsonomy.org/author/Liang&#034;&gt;Liang&lt;/a&gt;  und D R &lt;a href=&#034;http://www.bibsonomy.org/author/Nusskern&#034;&gt;Nusskern&lt;/a&gt;  und M &lt;a href=&#034;http://www.bibsonomy.org/author/Zhan&#034;&gt;Zhan&lt;/a&gt;  und Q &lt;a href=&#034;http://www.bibsonomy.org/author/Zhang&#034;&gt;Zhang&lt;/a&gt;  und X &lt;a href=&#034;http://www.bibsonomy.org/author/Zheng&#034;&gt;Zheng&lt;/a&gt;  und G M &lt;a href=&#034;http://www.bibsonomy.org/author/Rubin&#034;&gt;Rubin&lt;/a&gt;  und M D &lt;a href=&#034;http://www.bibsonomy.org/author/Adams&#034;&gt;Adams&lt;/a&gt;  und J C &lt;a href=&#034;http://www.bibsonomy.org/author/Venter&#034;&gt;Venter&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Science&lt;/em&gt;&lt;em&gt;287(5461):2196--204&lt;/em&gt;&lt;em&gt;Mar2000. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Acid,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Biology"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Animals,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosome"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genes,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequences,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Repetitive"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Heterochromatin,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Insect,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Drosophila"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Tagged"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sites,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Euchromatin,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Physical"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/melanogaster,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromatin,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Computational"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Nucleic"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/211f0d08db68b4f73ce33ffab95ffef98/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/211f0d08db68b4f73ce33ffab95ffef98/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Science</swrc:journal><swrc:month>Mar</swrc:month><swrc:number>5461</swrc:number><swrc:pages>2196--204</swrc:pages><swrc:title>A whole-genome assembly of Drosophila</swrc:title><swrc:volume>287</swrc:volume><swrc:year>2000</swrc:year><swrc:keywords>Acid, Biology Animals, Chromosome Genes, Molecular Sequences, Repetitive Sequence Heterochromatin, Insect, Analysis, Drosophila Tagged Contig Sites, Euchromatin, Physical melanogaster, Chromatin, Computational Mapping, Data, DNA, Algorithms, Nucleic Genome, </swrc:keywords><swrc:abstract>We report on the quality of a whole-genome assembly of Drosophila melanogaster and the nature of the computer algorithms that accomplished it. Three independent external data sources essentially agree with and support the assembly&#039;s sequence and ordering of contigs across the euchromatic portion of the genome. In addition, there are isolated contigs that we believe represent nonrepetitive pockets within the heterochromatin of the centromeres. Comparison with a previously sequenced 2.9- megabase region indicates that sequencing accuracy within nonrepetitive segments is greater than 99. 99% without manual curation. As such, this initial reconstruction of the Drosophila sequence should be of substantial value to the scientific community.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="8395" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10731133" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="5461" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Celera Genomics, Inc., 45 West Gude Drive, Rockville, MD 20850, USA. Gene.Myers@celera.com" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p20" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2000/Myers/Science%202000%20Myers.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="E W Myers"/></rdf:_1><rdf:_2><swrc:Person swrc:name="G G Sutton"/></rdf:_2><rdf:_3><swrc:Person swrc:name="A L Delcher"/></rdf:_3><rdf:_4><swrc:Person swrc:name="I M Dew"/></rdf:_4><rdf:_5><swrc:Person swrc:name="D P Fasulo"/></rdf:_5><rdf:_6><swrc:Person swrc:name="M J Flanigan"/></rdf:_6><rdf:_7><swrc:Person swrc:name="S A Kravitz"/></rdf:_7><rdf:_8><swrc:Person swrc:name="C M Mobarry"/></rdf:_8><rdf:_9><swrc:Person swrc:name="K H Reinert"/></rdf:_9><rdf:_10><swrc:Person swrc:name="K A Remington"/></rdf:_10><rdf:_11><swrc:Person swrc:name="E L Anson"/></rdf:_11><rdf:_12><swrc:Person swrc:name="R A Bolanos"/></rdf:_12><rdf:_13><swrc:Person swrc:name="H H Chou"/></rdf:_13><rdf:_14><swrc:Person swrc:name="C M Jordan"/></rdf:_14><rdf:_15><swrc:Person swrc:name="A L Halpern"/></rdf:_15><rdf:_16><swrc:Person swrc:name="S Lonardi"/></rdf:_16><rdf:_17><swrc:Person swrc:name="E M Beasley"/></rdf:_17><rdf:_18><swrc:Person swrc:name="R C Brandon"/></rdf:_18><rdf:_19><swrc:Person swrc:name="L Chen"/></rdf:_19><rdf:_20><swrc:Person swrc:name="P J Dunn"/></rdf:_20><rdf:_21><swrc:Person swrc:name="Z Lai"/></rdf:_21><rdf:_22><swrc:Person swrc:name="Y Liang"/></rdf:_22><rdf:_23><swrc:Person swrc:name="D R Nusskern"/></rdf:_23><rdf:_24><swrc:Person swrc:name="M Zhan"/></rdf:_24><rdf:_25><swrc:Person swrc:name="Q Zhang"/></rdf:_25><rdf:_26><swrc:Person swrc:name="X Zheng"/></rdf:_26><rdf:_27><swrc:Person swrc:name="G M Rubin"/></rdf:_27><rdf:_28><swrc:Person swrc:name="M D Adams"/></rdf:_28><rdf:_29><swrc:Person swrc:name="J C Venter"/></rdf:_29></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2ee9d2aa5c67377101e1ad619b08eac15/dzerbino"><title>Efficiently detecting polymorphisms during the fragment assembly process</title><link>http://www.bibsonomy.org/bibtex/2ee9d2aa5c67377101e1ad619b08eac15/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Algorithms, Fragmentation DNA Expression Fragment Restriction Profiling, Length, Base Polymorphism, Molecular Alignment, Analysis, Gene Variation Data, DNA, Sequence, (Genetics), Consensus Sequence Genetic, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Daniel &lt;a href=&#034;http://www.bibsonomy.org/author/Fasulo&#034;&gt;Fasulo&lt;/a&gt;  und Aaron &lt;a href=&#034;http://www.bibsonomy.org/author/Halpern&#034;&gt;Halpern&lt;/a&gt;  und Ian &lt;a href=&#034;http://www.bibsonomy.org/author/Dew&#034;&gt;Dew&lt;/a&gt;  und Clark &lt;a href=&#034;http://www.bibsonomy.org/author/Mobarry&#034;&gt;Mobarry&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;Jan2002. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Fragmentation"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Expression"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Fragment"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Restriction"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Profiling,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Length,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Polymorphism,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Variation"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/(Genetics),"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Consensus"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genetic,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2ee9d2aa5c67377101e1ad619b08eac15/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2ee9d2aa5c67377101e1ad619b08eac15/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Jan</swrc:month><swrc:pages>S294--302</swrc:pages><swrc:title>Efficiently detecting polymorphisms during the fragment assembly process</swrc:title><swrc:volume>18 Suppl 1</swrc:volume><swrc:year>2002</swrc:year><swrc:keywords>Algorithms, Fragmentation DNA Expression Fragment Restriction Profiling, Length, Base Polymorphism, Molecular Alignment, Analysis, Gene Variation Data, DNA, Sequence, (Genetics), Consensus Sequence Genetic, </swrc:keywords><swrc:abstract>MOTIVATION: Current genomic sequence assemblers assume that the input data is derived from a single, homogeneous source. However, recent whole-genome shotgun sequencing projects have violated this assumption, resulting in input fragments covering the same region of the genome whose sequences differ due to polymorphic variation in the population. While single-nucleotide polymorphisms (SNPs) do not pose a significant problem to state-of-the-art assembly methods, these methods do not handle insertion/deletion (indel) polymorphisms of more than a few bases. RESULTS: This paper describes an efficient method for detecting sequence discrepencies due to polymorphism that avoids resorting to global use of more costly, less stringent affine sequence alignments. Instead, the algorithm uses graph-based methods to determine the small set of fragments involved in each polymorphism and performs more sophisticated alignments only among fragments in that set. Results from the incorporation of this method into the Celera Assembler are reported for the D. melanogaster, H. sapiens, and M. musculus genomes.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="12169559" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Informatics Research, Celera Genomics, 45 W. Gude Dr., Rockville MD 20850, USA. daniel.fasulo@celera.com" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p17" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2002/Fasulo/Bioinformatics%202002%20Fasulo.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Daniel Fasulo"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Aaron Halpern"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Ian Dew"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Clark Mobarry"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/29acd2b8b071ac8bb83b8007ef69b9d88/dzerbino"><title>Occupancy modeling of coverage distribution for whole genome shotgun DNA sequencing</title><link>http://www.bibsonomy.org/bibtex/29acd2b8b071ac8bb83b8007ef69b9d88/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Algorithms, Genome, Models, Genomics, Humans, DNA, Analysis, Statistical Animals, Sequence </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Michael C &lt;a href=&#034;http://www.bibsonomy.org/author/Wendl&#034;&gt;Wendl&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bull Math Biol&lt;/em&gt;&lt;em&gt;68(1):179--96&lt;/em&gt;&lt;em&gt;Jan2006. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genomics,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Statistical"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Animals,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/29acd2b8b071ac8bb83b8007ef69b9d88/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/29acd2b8b071ac8bb83b8007ef69b9d88/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bull Math Biol</swrc:journal><swrc:month>Jan</swrc:month><swrc:number>1</swrc:number><swrc:pages>179--96</swrc:pages><swrc:title>Occupancy modeling of coverage distribution for whole genome shotgun DNA sequencing</swrc:title><swrc:volume>68</swrc:volume><swrc:year>2006</swrc:year><swrc:keywords>Algorithms, Genome, Models, Genomics, Humans, DNA, Analysis, Statistical Animals, Sequence </swrc:keywords><swrc:abstract>Expected-value models have long provided a rudimentary theoretical foundation for random DNA sequencing. Here, we are interested in improving characterization of genome coverage in terms of its underlying probability distributions. We find that the mathematical notion of occupancy serves as a good model for evolution of the coverage distribution function and reveals new insights related to sequence redundancy. Established concepts, such as &#034;full shotgun depth,&#034; have been assumed invariant, but actually depend on project size and decrease over time. For most microbial projects, the full shotgun milestone should be revised downward by about 30%. Accordingly, many already-completed genomes appear to have been over-sequenced. Results also suggest that read lengths for emerging high-throughput sequencing methods must be increased substantially before they can be considered as possible successors to the standard Sanger method. In particular, gains in throughput and sequence depth cannot be made to compensate for diminished read length. Limits are well approximated by a simple logarithmic equation, which should be useful in estimating maximum coverage-based redundancy for future projects.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="16794926" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="1" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Genome Sequencing Center, Washington University, 4444 Forest Park Boulevard, Campus Box 8501, St. Louis, MO 63108, USA. mwendl@wustl.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p32" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2006/Wendl/Bull%20Math%20Biol%202006%20Wendl.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1007/s11538-005-9021-4" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Michael C Wendl"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2502e0622b4d381412eafa06bb77d377e/dzerbino"><title>An analysis of the feasibility of short read sequencing</title><link>http://www.bibsonomy.org/bibtex/2502e0622b4d381412eafa06bb77d377e/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Humans, Studies, Human, Genomics, Analysis, Pair Feasibility Genome, 1, Chromosomes, DNA, Sequence Viral, Bacterial </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Nava &lt;a href=&#034;http://www.bibsonomy.org/author/Whiteford&#034;&gt;Whiteford&lt;/a&gt;  und Niall &lt;a href=&#034;http://www.bibsonomy.org/author/Haslam&#034;&gt;Haslam&lt;/a&gt;  und Gerald &lt;a href=&#034;http://www.bibsonomy.org/author/Weber&#034;&gt;Weber&lt;/a&gt;  und Adam &lt;a href=&#034;http://www.bibsonomy.org/author/Pr{\&amp;#034;u}gel-Bennett&#034;&gt;Pr&amp;#252;gel-Bennett&lt;/a&gt;  und Jonathan W &lt;a href=&#034;http://www.bibsonomy.org/author/Essex&#034;&gt;Essex&lt;/a&gt;  und Peter L &lt;a href=&#034;http://www.bibsonomy.org/author/Roach&#034;&gt;Roach&lt;/a&gt;  und Mark &lt;a href=&#034;http://www.bibsonomy.org/author/Bradley&#034;&gt;Bradley&lt;/a&gt;  und Cameron &lt;a href=&#034;http://www.bibsonomy.org/author/Neylon&#034;&gt;Neylon&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Nucleic Acids Res&lt;/em&gt;&lt;em&gt;33(19):e171&lt;/em&gt;&lt;em&gt;Nov2005. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Studies,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Human,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genomics,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Pair"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Feasibility"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/1,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosomes,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Viral,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Bacterial"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2502e0622b4d381412eafa06bb77d377e/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2502e0622b4d381412eafa06bb77d377e/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Nucleic Acids Res</swrc:journal><swrc:month>Nov</swrc:month><swrc:number>19</swrc:number><swrc:pages>e171</swrc:pages><swrc:title>An analysis of the feasibility of short read sequencing</swrc:title><swrc:volume>33</swrc:volume><swrc:year>2005</swrc:year><swrc:keywords>Humans, Studies, Human, Genomics, Analysis, Pair Feasibility Genome, 1, Chromosomes, DNA, Sequence Viral, Bacterial </swrc:keywords><swrc:abstract>Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20-30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="33/19/e171" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="16275781" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="19" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="School of Chemistry, University of Southampton, Southampton SO17 1BJ, UK." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p10" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2005/Whiteford/Nucleic%20Acids%20Res%202005%20Whiteford.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1093/nar/gni170" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Nava Whiteford"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Niall Haslam"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Gerald Weber"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Adam Pr{\&#034;u}gel-Bennett"/></rdf:_4><rdf:_5><swrc:Person swrc:name="Jonathan W Essex"/></rdf:_5><rdf:_6><swrc:Person swrc:name="Peter L Roach"/></rdf:_6><rdf:_7><swrc:Person swrc:name="Mark Bradley"/></rdf:_7><rdf:_8><swrc:Person swrc:name="Cameron Neylon"/></rdf:_8></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/21dc921e2ef4587944697d75bf48c2db4/dzerbino"><title>Gene maps linearization using genomic rearrangement distances</title><link>http://www.bibsonomy.org/bibtex/21dc921e2ef4587944697d75bf48c2db4/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Software DNA, Analysis, Sequence Genomics, Computational Genome, Biology, Algorithms, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Guillaume &lt;a href=&#034;http://www.bibsonomy.org/author/Blin&#034;&gt;Blin&lt;/a&gt;  und Eric &lt;a href=&#034;http://www.bibsonomy.org/author/Blais&#034;&gt;Blais&lt;/a&gt;  und Danny &lt;a href=&#034;http://www.bibsonomy.org/author/Hermelin&#034;&gt;Hermelin&lt;/a&gt;  und Pierre &lt;a href=&#034;http://www.bibsonomy.org/author/Guillon&#034;&gt;Guillon&lt;/a&gt;  und Mathieu &lt;a href=&#034;http://www.bibsonomy.org/author/Blanchette&#034;&gt;Blanchette&lt;/a&gt;  und Nadia &lt;a href=&#034;http://www.bibsonomy.org/author/El-Mabrouk&#034;&gt;El-Mabrouk&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;J Comput Biol&lt;/em&gt;&lt;em&gt;14(4):394--407&lt;/em&gt;&lt;em&gt;May2007. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genomics,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Computational"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Biology,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/21dc921e2ef4587944697d75bf48c2db4/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/21dc921e2ef4587944697d75bf48c2db4/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>J Comput Biol</swrc:journal><swrc:month>May</swrc:month><swrc:number>4</swrc:number><swrc:pages>394--407</swrc:pages><swrc:title>Gene maps linearization using genomic rearrangement distances</swrc:title><swrc:volume>14</swrc:volume><swrc:year>2007</swrc:year><swrc:keywords>Software DNA, Analysis, Sequence Genomics, Computational Genome, Biology, Algorithms, </swrc:keywords><swrc:abstract>A preliminary step to most comparative genomics studies is the annotation of chromosomes as ordered sequences of genes. Different genetic mapping techniques often give rise to different maps with unequal gene content and sets of unordered neighboring genes. Only partial orders can thus be obtained from combining such maps. However, once a total order O is known for a given genome, it can be used as a reference to order genes of a closely related species characterized by a partial order P. Our goal is to find a linearization of P that is as close as possible to O, in term of a given genomic distance. We first prove NP-completeness complexity results considering the breakpoint and the common interval distances. We then focus on the breakpoint distance and give a dynamic programming algorithm whose running time is exponential for general partial orders, but polynomial when the partial order is derived from a bounded number of genetic maps. A time-efficient greedy heuristic is then given for the general case and is empirically shown to produce solutions within 10% of the optimal solution, on simulated data. Applications to the analysis of grass genomes are presented.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="17572019" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="4" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="IGM-LabInfo, UMR CNRS 8049, Universit{\&#039;e" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p37" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2007/Blin/J%20Comput%20Biol%202007%20Blin.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1089/cmb.2007.A002" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Guillaume Blin"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Eric Blais"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Danny Hermelin"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Pierre Guillon"/></rdf:_4><rdf:_5><swrc:Person swrc:name="Mathieu Blanchette"/></rdf:_5><rdf:_6><swrc:Person swrc:name="Nadia El-Mabrouk"/></rdf:_6></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2d35d83616ff5162f0dc9ae73792e90bf/dzerbino"><title>SSAHA: a fast search method for large DNA databases</title><link>http://www.bibsonomy.org/bibtex/2d35d83616ff5162f0dc9ae73792e90bf/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Sensitivity Alignment, Management Sequence, Composition, Specificity and Base DNA, Databases, Factual, Database Systems, Sequence Algorithms, Software, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Z &lt;a href=&#034;http://www.bibsonomy.org/author/Ning&#034;&gt;Ning&lt;/a&gt;  und A J &lt;a href=&#034;http://www.bibsonomy.org/author/Cox&#034;&gt;Cox&lt;/a&gt;  und J C &lt;a href=&#034;http://www.bibsonomy.org/author/Mullikin&#034;&gt;Mullikin&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Genome Res&lt;/em&gt;&lt;em&gt;11(10):1725--9&lt;/em&gt;&lt;em&gt;Oct2001. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sensitivity"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Management"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Composition,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Specificity"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/and"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Databases,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Factual,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Database"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Systems,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2d35d83616ff5162f0dc9ae73792e90bf/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2d35d83616ff5162f0dc9ae73792e90bf/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Genome Res</swrc:journal><swrc:month>Oct</swrc:month><swrc:number>10</swrc:number><swrc:pages>1725--9</swrc:pages><swrc:title>SSAHA: a fast search method for large DNA databases</swrc:title><swrc:volume>11</swrc:volume><swrc:year>2001</swrc:year><swrc:keywords>Sensitivity Alignment, Management Sequence, Composition, Specificity and Base DNA, Databases, Factual, Database Systems, Sequence Algorithms, Software, </swrc:keywords><swrc:abstract>We describe an algorithm, SSAHA (Sequence Search and Alignment by Hashing Algorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples of k contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the &#034;hits&#034; for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHA algorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="11591649" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p5" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2001/Ning/Genome%20Res%202001%20Ning.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1101/gr.194201" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Z Ning"/></rdf:_1><rdf:_2><swrc:Person swrc:name="A J Cox"/></rdf:_2><rdf:_3><swrc:Person swrc:name="J C Mullikin"/></rdf:_3></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2e05ca61552c8d9f5951978da7619860d/dzerbino"><title>A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates</title><link>http://www.bibsonomy.org/bibtex/2e05ca61552c8d9f5951978da7619860d/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Sequence Duplication, Bacterial Algorithms, Genome, Software, Gene Computational Gammaproteobacteria, DNA, Biology, Analysis, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;S&#039;ebastien &lt;a href=&#034;http://www.bibsonomy.org/author/Angibaud&#034;&gt;Angibaud&lt;/a&gt;  und Guillaume &lt;a href=&#034;http://www.bibsonomy.org/author/Fertin&#034;&gt;Fertin&lt;/a&gt;  und Irena &lt;a href=&#034;http://www.bibsonomy.org/author/Rusu&#034;&gt;Rusu&lt;/a&gt;  und St&#039;ephane &lt;a href=&#034;http://www.bibsonomy.org/author/Vialette&#034;&gt;Vialette&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;J Comput Biol&lt;/em&gt;&lt;em&gt;14(4):379--93&lt;/em&gt;&lt;em&gt;May2007. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Duplication,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Bacterial"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Computational"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gammaproteobacteria,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Biology,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2e05ca61552c8d9f5951978da7619860d/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2e05ca61552c8d9f5951978da7619860d/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>J Comput Biol</swrc:journal><swrc:month>May</swrc:month><swrc:number>4</swrc:number><swrc:pages>379--93</swrc:pages><swrc:title>A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates</swrc:title><swrc:volume>14</swrc:volume><swrc:year>2007</swrc:year><swrc:keywords>Sequence Duplication, Bacterial Algorithms, Genome, Software, Gene Computational Gammaproteobacteria, DNA, Biology, Analysis, </swrc:keywords><swrc:abstract>Computing genomic distances between whole genomes is a fundamental problem in comparative genomics. Recent researches have resulted in different genomic distance definitions, for example, number of breakpoints, number of common intervals, number of conserved intervals, and Maximum Adjacency Disruption number. Unfortunately, it turns out that, in presence of duplications, most problems are NP-hard, and hence several heuristics have been recently proposed. However, while it is relatively easy to compare heuristics between them, until now very little is known about the absolute accuracy of these heuristics. Therefore, there is a great need for algorithmic approaches that compute exact solutions for these genomic distances. In this paper, we present a novel generic pseudo-boolean approach for computing the exact genomic distance between two whole genomes in presence of duplications, and put strong emphasis on common intervals under the maximum matching model. Of particular importance, we show three heuristics which provide very good results on a well-known public dataset of gamma-Proteobacteria.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="17572018" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="4" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Laboratoire d&#039;Informatique de Nantes-Atlantique, FRE CNRS 2729, Universit{\&#039;e" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p39" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2007/Angibaud/J%20Comput%20Biol%202007%20Angibaud.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1089/cmb.2007.A001" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="S{\&#039;e}bastien Angibaud"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Guillaume Fertin"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Irena Rusu"/></rdf:_3><rdf:_4><swrc:Person swrc:name="St{\&#039;e}phane Vialette"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/285853a4fe7db3494508d6631d10f55ca/dzerbino"><title>An Eulerian path approach to DNA fragment assembly</title><link>http://www.bibsonomy.org/bibtex/285853a4fe7db3494508d6631d10f55ca/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Models, Contig Software, Analysis, lactis Sequence Mapping, DNA, Algorithms, Theoretical, Alignment, Bacterial, Neisseria Campylobacter Genome, jejuni, Lactococcus meningitidis, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;P A &lt;a href=&#034;http://www.bibsonomy.org/author/Pevzner&#034;&gt;Pevzner&lt;/a&gt;  und H &lt;a href=&#034;http://www.bibsonomy.org/author/Tang&#034;&gt;Tang&lt;/a&gt;  und M S &lt;a href=&#034;http://www.bibsonomy.org/author/Waterman&#034;&gt;Waterman&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Proc Natl Acad Sci USA&lt;/em&gt;&lt;em&gt;98(17):9748--53&lt;/em&gt;&lt;em&gt;Aug2001. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/lactis"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Theoretical,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Bacterial,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Neisseria"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Campylobacter"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/jejuni,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Lactococcus"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/meningitidis,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/285853a4fe7db3494508d6631d10f55ca/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/285853a4fe7db3494508d6631d10f55ca/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Proc Natl Acad Sci USA</swrc:journal><swrc:month>Aug</swrc:month><swrc:number>17</swrc:number><swrc:pages>9748--53</swrc:pages><swrc:title>An Eulerian path approach to DNA fragment assembly</swrc:title><swrc:volume>98</swrc:volume><swrc:year>2001</swrc:year><swrc:keywords>Models, Contig Software, Analysis, lactis Sequence Mapping, DNA, Algorithms, Theoretical, Alignment, Bacterial, Neisseria Campylobacter Genome, jejuni, Lactococcus meningitidis, </swrc:keywords><swrc:abstract>For the last 20 years, fragment assembly in DNA sequencing followed the &#034;overlap-layout-consensus&#034; paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical &#034;overlap-layout-consensus&#034; approach in favor of a new euler algorithm that, for the first time, resolves the 20-year-old &#034;repeat problem&#034; in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem that allows one to generate accurate solutions of large-scale sequencing problems. euler, in contrast to the celera assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="98/17/9748" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="11504945" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="17" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Computer Science and Engineering, University of California, San Diego, La Jolla, USA." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p15" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2001/Pevzner/Proc%20Natl%20Acad%20Sci%20USA%202001%20Pevzner.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1073/pnas.171285098" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="P A Pevzner"/></rdf:_1><rdf:_2><swrc:Person swrc:name="H Tang"/></rdf:_2><rdf:_3><swrc:Person swrc:name="M S Waterman"/></rdf:_3></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/26fefee7c3e5e1c27b6a90750a6c4c153/dzerbino"><title>Using guide trees to construct multiple-sequence evolutionary HMMs</title><link>http://www.bibsonomy.org/bibtex/26fefee7c3e5e1c27b6a90750a6c4c153/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Gene Sequence Analysis, Algorithms, Alignment, Expression Genetic, Regulation, Chains, Profiling, DNA, Markov Molecular, Cluster Homology Evolution, Statistical, Protein, Models, Software, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;I &lt;a href=&#034;http://www.bibsonomy.org/author/Holmes&#034;&gt;Holmes&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;Jan2003. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Expression"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genetic,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Regulation,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chains,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Profiling,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Markov"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Cluster"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Homology"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Evolution,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Statistical,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Protein,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/26fefee7c3e5e1c27b6a90750a6c4c153/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/26fefee7c3e5e1c27b6a90750a6c4c153/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Jan</swrc:month><swrc:pages>i147--57</swrc:pages><swrc:title>Using guide trees to construct multiple-sequence evolutionary HMMs</swrc:title><swrc:volume>19 Suppl 1</swrc:volume><swrc:year>2003</swrc:year><swrc:keywords>Gene Sequence Analysis, Algorithms, Alignment, Expression Genetic, Regulation, Chains, Profiling, DNA, Markov Molecular, Cluster Homology Evolution, Statistical, Protein, Models, Software, </swrc:keywords><swrc:abstract>MOTIVATION: Score-based progressive alignment algorithms do dynamic programming on successive branches of a guide tree. The analogous probabilistic construct is an Evolutionary HMM. This is a multiple-sequence hidden Markov model (HMM) made by combining transducers (conditionally normalised Pair HMMs) on the branches of a phylogenetic tree. METHODS: We present general algorithms for constructing an Evolutionary HMM from any Pair HMM and for doing dynamic programming to any Multiple-sequence HMM. RESULTS: Our prototype implementation, Handel, is based on the Thorne-Kishino-Felsenstein evolutionary model and is benchmarked using structural reference alignments.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="12855451" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Statistics, University of Oxford. 1 South Parks Road, Oxford OX1 3TG, UK." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p6" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2003/Holmes/Bioinformatics%202003%20Holmes.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="I Holmes"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication></item></rdf:RDF>