<rdf:RDF xmlns:burst="http://xmlns.com/burst/0.1/" xmlns:admin="http://webns.net/mvcb/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:cc="http://web.resource.org/cc/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:swrc="http://swrc.ontoware.org/ontology#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><channel rdf:about="http://www.bibsonomy.org/burst/user/dzerbino/Algorithms,"><title>BibSonomy publications for /user/dzerbino/Algorithms,</title><link>http://www.bibsonomy.org/burst/user/dzerbino/Algorithms,</link><description>BibSonomy BuRST Feed for /user/dzerbino/Algorithms,</description><dc:date>2008-10-13T08:55:09+02:00</dc:date><items><rdf:Seq><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2aa8bc1f2986f316dcdd470da0aa1588d/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/217ec918bb871c3d8239479bd26364a71/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2150cd10c40aace2c238aaa20c8480e08/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2e2924e542de58d497e99b128d8659faa/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/24296f5595607517ac76189523adbafdb/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/281238c67b322e3148e22776c9d9cbf3d/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/251d90db8174d7d1da830b654951e5ea0/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2ccc1431d60d762580edc392082a74be9/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2187f0308ae9e5fa2f47476b9ab180a20/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/238f73cc8ed9f2f976ae6a7360b532cfe/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2d9334e2d25ffcac3e74f3caa0122db9e/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/292574a7dfe4924f76351f0ac33eae32d/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/211f0d08db68b4f73ce33ffab95ffef98/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2ee9d2aa5c67377101e1ad619b08eac15/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/29acd2b8b071ac8bb83b8007ef69b9d88/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/21dc921e2ef4587944697d75bf48c2db4/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2d35d83616ff5162f0dc9ae73792e90bf/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2e05ca61552c8d9f5951978da7619860d/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2003fc7a74e62f697dad1196f70c46da8/dzerbino"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/285853a4fe7db3494508d6631d10f55ca/dzerbino"/></rdf:Seq></items></channel><item rdf:about="http://www.bibsonomy.org/bibtex/2aa8bc1f2986f316dcdd470da0aa1588d/dzerbino"><title>Generating consensus sequences from partial order multiple sequence alignment graphs</title><link>http://www.bibsonomy.org/bibtex/2aa8bc1f2986f316dcdd470da0aa1588d/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Profiling, Gene Software, Humans, Consensus Expression Sequence Alignment, Algorithms, DNA, Analysis, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Christopher &lt;a href=&#034;http://www.bibsonomy.org/author/Lee&#034;&gt;Lee&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;19(8):999--1008&lt;/em&gt;&lt;em&gt;May2003. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Profiling,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Consensus"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Expression"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2aa8bc1f2986f316dcdd470da0aa1588d/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2aa8bc1f2986f316dcdd470da0aa1588d/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>May</swrc:month><swrc:number>8</swrc:number><swrc:pages>999--1008</swrc:pages><swrc:title>Generating consensus sequences from partial order multiple sequence alignment graphs</swrc:title><swrc:volume>19</swrc:volume><swrc:year>2003</swrc:year><swrc:keywords>Profiling, Gene Software, Humans, Consensus Expression Sequence Alignment, Algorithms, DNA, Analysis, </swrc:keywords><swrc:abstract>MOTIVATION: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. However, how can a consensus be constructed when its inherent assumption-that the aligned sequences form a single linear consensus-is not true? RESULTS: Partial Order Alignment (POA) enables construction and analysis of multiple sequence alignments as directed acyclic graphs containing complex branching structure. Here we present a dynamic programming algorithm (heaviest_bundle) for generating multiple consensus sequences from such complex alignments. The number and relationships of these consensus sequences reveals the degree of structural complexity of the source alignment. This is a powerful and general approach for analyzing and visualizing complex alignment structures, and can be applied to any alignment. We illustrate its value for analyzing expressed sequence alignments to detect alternative splicing, reconstruct full length mRNA isoform sequences from EST fragments, and separate paralog mixtures that can cause incorrect SNP predictions. AVAILABILITY: The heaviest_bundle source code is available at http://www.bioinformatics.ucla.edu/poa</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="12761063" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="8" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="UCLA-DOE Center for Genomics and Proteomics, Molecular Biology Institute Department of Chemistry, University of California, Los Angeles, Los Angeles, CA 90095-1570, USA. leec@mbi.ucla.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p4" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2003/Lee/Bioinformatics%202003%20Lee.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Christopher Lee"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/217ec918bb871c3d8239479bd26364a71/dzerbino"><title>ARACHNE: a whole-genome shotgun assembler</title><link>http://www.bibsonomy.org/bibtex/217ec918bb871c3d8239479bd26364a71/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Fungal, Humans Human, melanogaster, influenzae, Consensus Haemophilus Algorithms, Alignment, Contig Animals, Genome, cerevisiae, Software, Saccharomyces Sequence, Mapping, Sequence Bacterial, Drosophila </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Serafim &lt;a href=&#034;http://www.bibsonomy.org/author/Batzoglou&#034;&gt;Batzoglou&lt;/a&gt;  und David B &lt;a href=&#034;http://www.bibsonomy.org/author/Jaffe&#034;&gt;Jaffe&lt;/a&gt;  und Ken &lt;a href=&#034;http://www.bibsonomy.org/author/Stanley&#034;&gt;Stanley&lt;/a&gt;  und Jonathan &lt;a href=&#034;http://www.bibsonomy.org/author/Butler&#034;&gt;Butler&lt;/a&gt;  und Sante &lt;a href=&#034;http://www.bibsonomy.org/author/Gnerre&#034;&gt;Gnerre&lt;/a&gt;  und Evan &lt;a href=&#034;http://www.bibsonomy.org/author/Mauceli&#034;&gt;Mauceli&lt;/a&gt;  und Bonnie &lt;a href=&#034;http://www.bibsonomy.org/author/Berger&#034;&gt;Berger&lt;/a&gt;  und Jill P &lt;a href=&#034;http://www.bibsonomy.org/author/Mesirov&#034;&gt;Mesirov&lt;/a&gt;  und Eric S &lt;a href=&#034;http://www.bibsonomy.org/author/Lander&#034;&gt;Lander&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Genome Res&lt;/em&gt;&lt;em&gt;12(1):177--89&lt;/em&gt;&lt;em&gt;Jan2002. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Fungal,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Human,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/melanogaster,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/influenzae,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Consensus"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Haemophilus"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Animals,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/cerevisiae,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Saccharomyces"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Bacterial,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Drosophila"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/217ec918bb871c3d8239479bd26364a71/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/217ec918bb871c3d8239479bd26364a71/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Genome Res</swrc:journal><swrc:month>Jan</swrc:month><swrc:number>1</swrc:number><swrc:pages>177--89</swrc:pages><swrc:title>ARACHNE: a whole-genome shotgun assembler</swrc:title><swrc:volume>12</swrc:volume><swrc:year>2002</swrc:year><swrc:keywords>Fungal, Humans Human, melanogaster, influenzae, Consensus Haemophilus Algorithms, Alignment, Contig Animals, Genome, cerevisiae, Software, Saccharomyces Sequence, Mapping, Sequence Bacterial, Drosophila </swrc:keywords><swrc:abstract>We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by correcting errors before assembly, read merger based on forward-reverse links, and detection of repeat contigs by forward-reverse link inconsistency. To test ARACHNE, we created simulated reads providing approximately 10-fold coverage of the genomes of H. influenzae, S. cerevisiae, and D. melanogaster, as well as human chromosomes 21 and 22. The assemblies of these simulated reads yielded nearly complete coverage of the respective genomes, with a small number of contigs joined into a smaller number of supercontigs (or scaffolds). For example, analysis of the D. melanogaster genome yielded approximately 98% coverage with an N50 contig length of 324 kb and an N50 supercontig length of 5143 kb. The assembly accuracy was high, although not perfect: small errors occurred at a frequency of roughly 1 per 1 Mb (typically, deletion of approximately 1 kb in size), with a very small number of other misassemblies. The assembly was rapid: the Drosophila assembly required only 21 hours on a single 667 MHz processor and used 8.4 Gb of memory.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="11779843" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="1" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p14" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2002/Batzoglou/Genome%20Res%202002%20Batzoglou.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1101/gr.208902" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Serafim Batzoglou"/></rdf:_1><rdf:_2><swrc:Person swrc:name="David B Jaffe"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Ken Stanley"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Jonathan Butler"/></rdf:_4><rdf:_5><swrc:Person swrc:name="Sante Gnerre"/></rdf:_5><rdf:_6><swrc:Person swrc:name="Evan Mauceli"/></rdf:_6><rdf:_7><swrc:Person swrc:name="Bonnie Berger"/></rdf:_7><rdf:_8><swrc:Person swrc:name="Jill P Mesirov"/></rdf:_8><rdf:_9><swrc:Person swrc:name="Eric S Lander"/></rdf:_9></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2150cd10c40aace2c238aaa20c8480e08/dzerbino"><title>Minimus: a fast, lightweight genome assembler</title><link>http://www.bibsonomy.org/bibtex/2150cd10c40aace2c238aaa20c8480e08/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Software Base Chromosome User-Computer Algorithms, Alignment, Analysis, Interface Software, Mapping, Sequence, Sequence Design, Data, Molecular DNA, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Daniel D &lt;a href=&#034;http://www.bibsonomy.org/author/Sommer&#034;&gt;Sommer&lt;/a&gt;  und Arthur L &lt;a href=&#034;http://www.bibsonomy.org/author/Delcher&#034;&gt;Delcher&lt;/a&gt;  und Steven L &lt;a href=&#034;http://www.bibsonomy.org/author/Salzberg&#034;&gt;Salzberg&lt;/a&gt;  und Mihai &lt;a href=&#034;http://www.bibsonomy.org/author/Pop&#034;&gt;Pop&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;BMC Bioinformatics&lt;/em&gt;&lt;em&gt;Feb2007. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosome"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/User-Computer"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Interface"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Design,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2150cd10c40aace2c238aaa20c8480e08/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2150cd10c40aace2c238aaa20c8480e08/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>BMC Bioinformatics</swrc:journal><swrc:month>Feb</swrc:month><swrc:pages>64</swrc:pages><swrc:title>Minimus: a fast, lightweight genome assembler</swrc:title><swrc:volume>8</swrc:volume><swrc:year>2007</swrc:year><swrc:keywords>Software Base Chromosome User-Computer Algorithms, Alignment, Analysis, Interface Software, Mapping, Sequence, Sequence Design, Data, Molecular DNA, </swrc:keywords><swrc:abstract>BACKGROUND: Genome assemblers have grown very large and complex in response to the need for algorithms to handle the challenges of large whole-genome sequencing projects. Many of the most common uses of assemblers, however, are best served by a simpler type of assembler that requires fewer software components, uses less memory, and is far easier to install and run. RESULTS: We have developed the Minimus assembler to address these issues, and tested it on a range of assembly problems. We show that Minimus performs well on several small assembly tasks, including the assembly of viral genomes, individual genes, and BAC clones. In addition, we evaluate Minimus&#039; performance in assembling bacterial genomes in order to assess its suitability as a component of a larger assembly pipeline. We show that, unlike other software currently used for these tasks, Minimus produces significantly fewer assembly errors, at the cost of generating a more fragmented assembly. CONCLUSION: We find that for small genomes and other small assembly tasks, Minimus is faster and far more flexible than existing tools. Due to its small size and modular design Minimus is perfectly suited to be a component of complex assembly pipelines. Minimus is released as an open-source software project and the code is available as part of the AMOS project at Sourceforge.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="1471-2105-8-64" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="17324286" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA. dsommer@umiacs.umd.edu &lt;dsommer@umiacs.umd.edu&gt;" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p22" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2007/Sommer/BMC%20Bioinformatics%202007%20Sommer.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1186/1471-2105-8-64" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Daniel D Sommer"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Arthur L Delcher"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Steven L Salzberg"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Mihai Pop"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2e2924e542de58d497e99b128d8659faa/dzerbino"><title>PCAP: a whole-genome assembly program</title><link>http://www.bibsonomy.org/bibtex/2e2924e542de58d497e99b128d8659faa/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Animals, Genome, Software, Humans Mapping, Computational Sequence Alignment, Algorithms, Mice, Contig Biology, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Xiaoqiu &lt;a href=&#034;http://www.bibsonomy.org/author/Huang&#034;&gt;Huang&lt;/a&gt;  und Jianmin &lt;a href=&#034;http://www.bibsonomy.org/author/Wang&#034;&gt;Wang&lt;/a&gt;  und Srinivas &lt;a href=&#034;http://www.bibsonomy.org/author/Aluru&#034;&gt;Aluru&lt;/a&gt;  und Shiaw-Pyng &lt;a href=&#034;http://www.bibsonomy.org/author/Yang&#034;&gt;Yang&lt;/a&gt;  und LaDeana &lt;a href=&#034;http://www.bibsonomy.org/author/Hillier&#034;&gt;Hillier&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Genome Res&lt;/em&gt;&lt;em&gt;13(9):2164--70&lt;/em&gt;&lt;em&gt;Sep2003. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Animals,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Computational"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mice,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Biology,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2e2924e542de58d497e99b128d8659faa/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2e2924e542de58d497e99b128d8659faa/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Genome Res</swrc:journal><swrc:month>Sep</swrc:month><swrc:number>9</swrc:number><swrc:pages>2164--70</swrc:pages><swrc:title>PCAP: a whole-genome assembly program</swrc:title><swrc:volume>13</swrc:volume><swrc:year>2003</swrc:year><swrc:keywords>Animals, Genome, Software, Humans Mapping, Computational Sequence Alignment, Algorithms, Mice, Contig Biology, </swrc:keywords><swrc:abstract>We describe a whole-genome assembly program named PCAP for processing tens of millions of reads. The PCAP program has several features to address efficiency and accuracy issues in assembly. Multiple processors are used to perform most time-consuming computations in assembly. A more sensitive method is used to avoid missing overlaps caused by sequencing errors. Repetitive regions of reads are detected on the basis of many overlaps with other reads, instead of many shorter word matches with other reads. Contaminated end regions of reads are identified and removed. Generation of a consensus sequence for a contig is based on an alignment of reads in the contig, in which both base quality values and coverage information are used to determine every consensus base. The PCAP program was tested on a mouse whole-genome data set of 30 million reads and a human Chromosome 20 data set of 1.7 million reads. The program is freely available for academic use.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="13/9/2164" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="12952883" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="9" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Computer Science Iowa State University, Ames, Iowa 50011-1040, USA. xqhuang@cs.iastate.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p27" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2003/Huang/Genome%20Res%202003%20Huang.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1101/gr.1390403" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Xiaoqiu Huang"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Jianmin Wang"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Srinivas Aluru"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Shiaw-Pyng Yang"/></rdf:_4><rdf:_5><swrc:Person swrc:name="LaDeana Hillier"/></rdf:_5></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/24296f5595607517ac76189523adbafdb/dzerbino"><title>A novel method for multiple alignment of sequences with repeated and shuffled elements</title><link>http://www.bibsonomy.org/bibtex/24296f5595607517ac76189523adbafdb/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Software, Homology, Acid Protein, Databases, Amino Sequence Genetic, Alignment, Algorithms, Analysis, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Benjamin &lt;a href=&#034;http://www.bibsonomy.org/author/Raphael&#034;&gt;Raphael&lt;/a&gt;  und Degui &lt;a href=&#034;http://www.bibsonomy.org/author/Zhi&#034;&gt;Zhi&lt;/a&gt;  und Haixu &lt;a href=&#034;http://www.bibsonomy.org/author/Tang&#034;&gt;Tang&lt;/a&gt;  und Pavel &lt;a href=&#034;http://www.bibsonomy.org/author/Pevzner&#034;&gt;Pevzner&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Genome Res&lt;/em&gt;&lt;em&gt;14(11):2336--46&lt;/em&gt;&lt;em&gt;Nov2004. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Homology,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Acid"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Protein,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Databases,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Amino"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genetic,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/24296f5595607517ac76189523adbafdb/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/24296f5595607517ac76189523adbafdb/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Genome Res</swrc:journal><swrc:month>Nov</swrc:month><swrc:number>11</swrc:number><swrc:pages>2336--46</swrc:pages><swrc:title>A novel method for multiple alignment of sequences with repeated and shuffled elements</swrc:title><swrc:volume>14</swrc:volume><swrc:year>2004</swrc:year><swrc:keywords>Software, Homology, Acid Protein, Databases, Amino Sequence Genetic, Alignment, Algorithms, Analysis, </swrc:keywords><swrc:abstract>We describe ABA (A-Bruijn alignment), a new method for multiple alignment of biological sequences. The major difference between ABA and existing multiple alignment methods is that ABA represents an alignment as a directed graph, possibly containing cycles. This representation provides more flexibility than does a traditional alignment matrix or the recently introduced partial order alignment (POA) graph by allowing a larger class of evolutionary relationships between the aligned sequences. Our graph representation is particularly well-suited to the alignment of protein sequences with shuffled and/or repeated domain structure, and allows one to construct multiple alignments of proteins containing (1) domains that are not present in all proteins, (2) domains that are present in different orders in different proteins, and (3) domains that are present in multiple copies in some proteins. In addition, ABA is useful in the alignment of genomic sequences that contain duplications and inversions. We provide several examples illustrating the applications of ABA.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="14/11/2336" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="15520295" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="11" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093-0114, USA. braphael@ucsd.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p8" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2004/Raphael/Genome%20Res%202004%20Raphael.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1101/gr.2657504" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Benjamin Raphael"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Degui Zhi"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Haixu Tang"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Pavel Pevzner"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/281238c67b322e3148e22776c9d9cbf3d/dzerbino"><title>Multiple sequence alignment using partial order graphs</title><link>http://www.bibsonomy.org/bibtex/281238c67b322e3148e22776c9d9cbf3d/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Base Humans, Proteins, Quality Databases, Nucleotidyltransferases, Alignment, Algorithms, Control Specificity, RNA, Time Software, Homology, Tags, Factors, Genetic, Sequence Glucose-1-Phosphate Molecular Sensitivity Plant Adenylyltransferase, Messenger, and Models, Expressed Statistical, Sequence, Data, DNA, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Christopher &lt;a href=&#034;http://www.bibsonomy.org/author/Lee&#034;&gt;Lee&lt;/a&gt;  und Catherine &lt;a href=&#034;http://www.bibsonomy.org/author/Grasso&#034;&gt;Grasso&lt;/a&gt;  und Mark F &lt;a href=&#034;http://www.bibsonomy.org/author/Sharlow&#034;&gt;Sharlow&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;18(3):452--64&lt;/em&gt;&lt;em&gt;Mar2002. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Proteins,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Quality"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Databases,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Nucleotidyltransferases,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Control"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Specificity,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/RNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Time"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Homology,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Tags,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Factors,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genetic,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Glucose-1-Phosphate"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sensitivity"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Plant"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Adenylyltransferase,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Messenger,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/and"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Expressed"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Statistical,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/281238c67b322e3148e22776c9d9cbf3d/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/281238c67b322e3148e22776c9d9cbf3d/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Mar</swrc:month><swrc:number>3</swrc:number><swrc:pages>452--64</swrc:pages><swrc:title>Multiple sequence alignment using partial order graphs</swrc:title><swrc:volume>18</swrc:volume><swrc:year>2002</swrc:year><swrc:keywords>Base Humans, Proteins, Quality Databases, Nucleotidyltransferases, Alignment, Algorithms, Control Specificity, RNA, Time Software, Homology, Tags, Factors, Genetic, Sequence Glucose-1-Phosphate Molecular Sensitivity Plant Adenylyltransferase, Messenger, and Models, Expressed Statistical, Sequence, Data, DNA, </swrc:keywords><swrc:abstract>MOTIVATION: Progressive Multiple Sequence Alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of information needed for accurate alignment, and gap scoring artifacts. RESULTS: We present a graph representation of an MSA that can itself be aligned directly by pairwise dynamic programming, eliminating the need to reduce the MSA to a profile. This enables our algorithm (Partial Order Alignment (POA)) to guarantee that the optimal alignment of each new sequence versus each sequence in the MSA will be considered. Moreover, this algorithm introduces a new edit operator, homologous recombination, important for multidomain sequences. The algorithm has improved speed (linear time complexity) over existing MSA algorithms, enabling construction of massive and complex alignments (e.g. an alignment of 5000 sequences in 4 h on a Pentium II). We demonstrate the utility of this algorithm on a family of multidomain SH2 proteins, and on EST assemblies containing alternative splicing and polymorphism. AVAILABILITY: The partial order alignment program POA is available at http://www.bioinformatics.ucla.edu/poa.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="11934745" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="3" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095-1570, USA. leec@mbi.ucla.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p9" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2002/Lee/Bioinformatics%202002%20Lee.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Christopher Lee"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Catherine Grasso"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Mark F Sharlow"/></rdf:_3></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/251d90db8174d7d1da830b654951e5ea0/dzerbino"><title>De novo repeat classification and fragment assembly</title><link>http://www.bibsonomy.org/bibtex/251d90db8174d7d1da830b654951e5ea0/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Humans, Human, Multigene Linkage (Genetics), Algorithms, Alignment, Cluster Sequences, Nucleic Analysis, Biology, Contig Acid, Genome, Mapping, Computational Family Artificial, Chromosomes, Sequence Repetitive Bacterial, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Pavel A &lt;a href=&#034;http://www.bibsonomy.org/author/Pevzner&#034;&gt;Pevzner&lt;/a&gt;  und Paul A &lt;a href=&#034;http://www.bibsonomy.org/author/Pevzner&#034;&gt;Pevzner&lt;/a&gt;  und Haixu &lt;a href=&#034;http://www.bibsonomy.org/author/Tang&#034;&gt;Tang&lt;/a&gt;  und Glenn &lt;a href=&#034;http://www.bibsonomy.org/author/Tesler&#034;&gt;Tesler&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Genome Res&lt;/em&gt;&lt;em&gt;14(9):1786--96&lt;/em&gt;&lt;em&gt;Sep2004. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Human,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Multigene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Linkage"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/(Genetics),"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Cluster"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequences,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Nucleic"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Biology,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Acid,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Computational"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Family"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Artificial,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosomes,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Repetitive"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Bacterial,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/251d90db8174d7d1da830b654951e5ea0/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/251d90db8174d7d1da830b654951e5ea0/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Genome Res</swrc:journal><swrc:month>Sep</swrc:month><swrc:number>9</swrc:number><swrc:pages>1786--96</swrc:pages><swrc:title>De novo repeat classification and fragment assembly</swrc:title><swrc:volume>14</swrc:volume><swrc:year>2004</swrc:year><swrc:keywords>Humans, Human, Multigene Linkage (Genetics), Algorithms, Alignment, Cluster Sequences, Nucleic Analysis, Biology, Contig Acid, Genome, Mapping, Computational Family Artificial, Chromosomes, Sequence Repetitive Bacterial, </swrc:keywords><swrc:abstract>Repetitive sequences make up a significant fraction of almost any genome, and an important and still open question in bioinformatics is how to represent all repeats in DNA sequences. We propose a new approach to repeat classification that represents all repeats in a genome as a mosaic of sub-repeats. Our key algorithmic idea also leads to new approaches to multiple alignment and fragment assembly. In particular, we show that our FragmentGluer assembler improves on Phrap and ARACHNE in assembly of BACs and bacterial genomes.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="14/9/1786" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="15342561" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="9" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093, USA." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p26" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2004/Pevzner/Genome%20Res%202004%20Pevzner.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1101/gr.2395204" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Pavel A Pevzner"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Paul A Pevzner"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Haixu Tang"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Glenn Tesler"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2ccc1431d60d762580edc392082a74be9/dzerbino"><title>ProbCons: Probabilistic consistency-based multiple sequence alignment</title><link>http://www.bibsonomy.org/bibtex/2ccc1431d60d762580edc392082a74be9/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Software Acid Protein, Databases, Benchmarking, Algorithms, Alignment, Biology, Software, (Genetics) Amino Sequence, Computational Variation Validation, Sequence Data, Molecular </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Chuong B &lt;a href=&#034;http://www.bibsonomy.org/author/Do&#034;&gt;Do&lt;/a&gt;  und Mahathi S P &lt;a href=&#034;http://www.bibsonomy.org/author/Mahabhashyam&#034;&gt;Mahabhashyam&lt;/a&gt;  und Michael &lt;a href=&#034;http://www.bibsonomy.org/author/Brudno&#034;&gt;Brudno&lt;/a&gt;  und Serafim &lt;a href=&#034;http://www.bibsonomy.org/author/Batzoglou&#034;&gt;Batzoglou&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Genome Res&lt;/em&gt;&lt;em&gt;15(2):330--40&lt;/em&gt;&lt;em&gt;Feb2005. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Acid"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Protein,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Databases,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Benchmarking,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Biology,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/(Genetics)"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Amino"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Computational"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Variation"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Validation,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2ccc1431d60d762580edc392082a74be9/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2ccc1431d60d762580edc392082a74be9/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Genome Res</swrc:journal><swrc:month>Feb</swrc:month><swrc:number>2</swrc:number><swrc:pages>330--40</swrc:pages><swrc:title>ProbCons: Probabilistic consistency-based multiple sequence alignment</swrc:title><swrc:volume>15</swrc:volume><swrc:year>2005</swrc:year><swrc:keywords>Software Acid Protein, Databases, Benchmarking, Algorithms, Alignment, Biology, Software, (Genetics) Amino Sequence, Computational Variation Validation, Sequence Data, Molecular </swrc:keywords><swrc:abstract>To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce probabilistic consistency, a novel scoring function for multiple sequence comparisons. We present ProbCons, a practical tool for progressive protein multiple sequence alignment based on probabilistic consistency, and evaluate its performance on several standard alignment benchmark data sets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, ProbCons achieves statistically significant improvement over other leading methods while maintaining practical speed. ProbCons is publicly available as a Web resource.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="15/2/330" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="15687296" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="2" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Computer Science, Stanford University, Stanford, California 94305, USA." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p2" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2005/Do/Genome%20Res%202005%20Do.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1101/gr.2821705" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Chuong B Do"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Mahathi S P Mahabhashyam"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Michael Brudno"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Serafim Batzoglou"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2187f0308ae9e5fa2f47476b9ab180a20/dzerbino"><title>Fragment assembly with short reads</title><link>http://www.bibsonomy.org/bibtex/2187f0308ae9e5fa2f47476b9ab180a20/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Base Feasibility Gene Profiling, Studies, Algorithms, Alignment, Analysis, Contig Mapping, Expression Sequence Data, Molecular DNA, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Mark &lt;a href=&#034;http://www.bibsonomy.org/author/Chaisson&#034;&gt;Chaisson&lt;/a&gt;  und Pavel &lt;a href=&#034;http://www.bibsonomy.org/author/Pevzner&#034;&gt;Pevzner&lt;/a&gt;  und Haixu &lt;a href=&#034;http://www.bibsonomy.org/author/Tang&#034;&gt;Tang&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;20(13):2067--74&lt;/em&gt;&lt;em&gt;Sep2004. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Feasibility"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Profiling,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Studies,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Expression"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2187f0308ae9e5fa2f47476b9ab180a20/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2187f0308ae9e5fa2f47476b9ab180a20/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Sep</swrc:month><swrc:number>13</swrc:number><swrc:pages>2067--74</swrc:pages><swrc:title>Fragment assembly with short reads</swrc:title><swrc:volume>20</swrc:volume><swrc:year>2004</swrc:year><swrc:keywords>Base Feasibility Gene Profiling, Studies, Algorithms, Alignment, Analysis, Contig Mapping, Expression Sequence Data, Molecular DNA, </swrc:keywords><swrc:abstract>MOTIVATION: Current DNA sequencing technology produces reads of about 500-750 bp, with typical coverage under 10x. New sequencing technologies are emerging that produce shorter reads (length 80-200 bp) but allow one to generate significantly higher coverage (30x and higher) at low cost. Modern assembly programs and error correction routines have been tuned to work well with current read technology but were not designed for assembly of short reads. RESULTS: We analyze the limitations of assembling reads generated by these new technologies and present a routine for base-calling in reads prior to their assembly. We demonstrate that while it is feasible to assemble such short reads, the resulting contigs will require significant (if not prohibitive) finishing efforts. AVAILABILITY: Available from the web at http://www.cse.ucsd.edu/groups/bioinformatics/software.html</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="bth205" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="15059830" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="13" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Bioinformatics Program, University of California San Diego, La Jolla, CA 92093, USA. mchaisso@bioinf.ucsd.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p25" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2004/Chaisson/Bioinformatics%202004%20Chaisson.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1093/bioinformatics/bth205" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Mark Chaisson"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Pavel Pevzner"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Haixu Tang"/></rdf:_3></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/238f73cc8ed9f2f976ae6a7360b532cfe/dzerbino"><title>The fragment assembly string graph</title><link>http://www.bibsonomy.org/bibtex/238f73cc8ed9f2f976ae6a7360b532cfe/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Base Fragmentation, Sequence, DNA Chromosome Mapping Sequence Algorithms, Molecular Data, DNA, Analysis, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Eugene W &lt;a href=&#034;http://www.bibsonomy.org/author/Myers&#034;&gt;Myers&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;Sep2005. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Fragmentation,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosome"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/238f73cc8ed9f2f976ae6a7360b532cfe/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/238f73cc8ed9f2f976ae6a7360b532cfe/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Sep</swrc:month><swrc:pages>ii79--85</swrc:pages><swrc:title>The fragment assembly string graph</swrc:title><swrc:volume>21 Suppl 2</swrc:volume><swrc:year>2005</swrc:year><swrc:keywords>Base Fragmentation, Sequence, DNA Chromosome Mapping Sequence Algorithms, Molecular Data, DNA, Analysis, </swrc:keywords><swrc:abstract>We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="21/suppl_2/ii79" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="16204131" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Computer Science, University of California Berkeley, CA, USA. gene@eecs.berkeley.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p35" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2005/Myers/Bioinformatics%202005%20Myers.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1093/bioinformatics/bti1114" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Eugene W Myers"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2d9334e2d25ffcac3e74f3caa0122db9e/dzerbino"><title>Genome-scale evolution: reconstructing gene orders in the ancestral species</title><link>http://www.bibsonomy.org/bibtex/2d9334e2d25ffcac3e74f3caa0122db9e/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Evolution, Gene Humans, Human, Order, Rearrangement, Algorithms, Animals, Cats, Molecular, Chromosomes Genome, Models, Chromosomes, Genetic, Mice, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Guillaume &lt;a href=&#034;http://www.bibsonomy.org/author/Bourque&#034;&gt;Bourque&lt;/a&gt;  und Pavel A &lt;a href=&#034;http://www.bibsonomy.org/author/Pevzner&#034;&gt;Pevzner&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Genome Res&lt;/em&gt;&lt;em&gt;12(1):26--36&lt;/em&gt;&lt;em&gt;Jan2002. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Evolution,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Human,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Order,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Rearrangement,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Animals,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Cats,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosomes"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosomes,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genetic,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mice,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2d9334e2d25ffcac3e74f3caa0122db9e/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2d9334e2d25ffcac3e74f3caa0122db9e/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Genome Res</swrc:journal><swrc:month>Jan</swrc:month><swrc:number>1</swrc:number><swrc:pages>26--36</swrc:pages><swrc:title>Genome-scale evolution: reconstructing gene orders in the ancestral species</swrc:title><swrc:volume>12</swrc:volume><swrc:year>2002</swrc:year><swrc:keywords>Evolution, Gene Humans, Human, Order, Rearrangement, Algorithms, Animals, Cats, Molecular, Chromosomes Genome, Models, Chromosomes, Genetic, Mice, </swrc:keywords><swrc:abstract>Recent progress in genome-scale sequencing and comparative mapping raises new challenges in studies of genome rearrangements. Although the pairwise genome rearrangement problem is well-studied, algorithms for reconstructing rearrangement scenarios for multiple species are in great need. The previous approaches to multiple genome rearrangement problem were largely based on the breakpoint distance rather than on a more biologically accurate rearrangement (reversal) distance. Another shortcoming of the existing software tools is their inability to analyze rearrangements (inversions, translocations, fusions, and fissions) of multichromosomal genomes. This paper proposes a new multiple genome rearrangement algorithm that is based on the rearrangement (rather than breakpoint) distance and that is applicable to both unichromosomal and multichromosomal genomes. We further apply this algorithm for genome-scale phylogenetic tree reconstruction and deriving ancestral gene orders. In particular, our analysis suggests a new improved rearrangement scenario for a very difficult Campanulaceae cpDNA dataset and a putative rearrangement scenario for human, mouse and cat genomes.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="11779828" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="1" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Mathematics, University of Southern California, California 90089, USA. gbourque@usc.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p38" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2002/Bourque/Genome%20Res%202002%20Bourque.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Guillaume Bourque"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Pavel A Pevzner"/></rdf:_2></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/292574a7dfe4924f76351f0ac33eae32d/dzerbino"><title>Assembling millions of short DNA sequences using SSAKE</title><link>http://www.bibsonomy.org/bibtex/292574a7dfe4924f76351f0ac33eae32d/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Base Chromosome Algorithms, Analysis, Contig Software, Mapping, Sequence, Mapping Sequence Data, Molecular DNA, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Ren&#039;e L &lt;a href=&#034;http://www.bibsonomy.org/author/Warren&#034;&gt;Warren&lt;/a&gt;  und Granger G &lt;a href=&#034;http://www.bibsonomy.org/author/Sutton&#034;&gt;Sutton&lt;/a&gt;  und Steven J M &lt;a href=&#034;http://www.bibsonomy.org/author/Jones&#034;&gt;Jones&lt;/a&gt;  und Robert A &lt;a href=&#034;http://www.bibsonomy.org/author/Holt&#034;&gt;Holt&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;23(4):500--1&lt;/em&gt;&lt;em&gt;Feb2007. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosome"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/292574a7dfe4924f76351f0ac33eae32d/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/292574a7dfe4924f76351f0ac33eae32d/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Feb</swrc:month><swrc:number>4</swrc:number><swrc:pages>500--1</swrc:pages><swrc:title>Assembling millions of short DNA sequences using SSAKE</swrc:title><swrc:volume>23</swrc:volume><swrc:year>2007</swrc:year><swrc:keywords>Base Chromosome Algorithms, Analysis, Contig Software, Mapping, Sequence, Mapping Sequence Data, Molecular DNA, </swrc:keywords><swrc:abstract>Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: http://www.bcgsc.ca/bioinfo/software/ssake.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="btl629" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="17158514" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="4" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="British Columbia Cancer Agency, Genome Sciences Centre, 675 West 10th Avenue, Vancouver, BC V5Z 1L3, Canada. rwarren@bcgsc.ca" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p21" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2007/Warren/Bioinformatics%202007%20Warren.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1093/bioinformatics/btl629" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Ren{\&#039;e} L Warren"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Granger G Sutton"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Steven J M Jones"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Robert A Holt"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/211f0d08db68b4f73ce33ffab95ffef98/dzerbino"><title>A whole-genome assembly of Drosophila</title><link>http://www.bibsonomy.org/bibtex/211f0d08db68b4f73ce33ffab95ffef98/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Genes, Tagged melanogaster, Chromosome Algorithms, Contig Analysis, Nucleic Sites, Computational Mapping, Sequence Repetitive Molecular Chromatin, Heterochromatin, Insect, Euchromatin, Sequences, Animals, Physical Acid, Genome, Drosophila Data, DNA, Biology </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;E W &lt;a href=&#034;http://www.bibsonomy.org/author/Myers&#034;&gt;Myers&lt;/a&gt;  und G G &lt;a href=&#034;http://www.bibsonomy.org/author/Sutton&#034;&gt;Sutton&lt;/a&gt;  und A L &lt;a href=&#034;http://www.bibsonomy.org/author/Delcher&#034;&gt;Delcher&lt;/a&gt;  und I M &lt;a href=&#034;http://www.bibsonomy.org/author/Dew&#034;&gt;Dew&lt;/a&gt;  und D P &lt;a href=&#034;http://www.bibsonomy.org/author/Fasulo&#034;&gt;Fasulo&lt;/a&gt;  und M J &lt;a href=&#034;http://www.bibsonomy.org/author/Flanigan&#034;&gt;Flanigan&lt;/a&gt;  und S A &lt;a href=&#034;http://www.bibsonomy.org/author/Kravitz&#034;&gt;Kravitz&lt;/a&gt;  und C M &lt;a href=&#034;http://www.bibsonomy.org/author/Mobarry&#034;&gt;Mobarry&lt;/a&gt;  und K H &lt;a href=&#034;http://www.bibsonomy.org/author/Reinert&#034;&gt;Reinert&lt;/a&gt;  und K A &lt;a href=&#034;http://www.bibsonomy.org/author/Remington&#034;&gt;Remington&lt;/a&gt;  und E L &lt;a href=&#034;http://www.bibsonomy.org/author/Anson&#034;&gt;Anson&lt;/a&gt;  und R A &lt;a href=&#034;http://www.bibsonomy.org/author/Bolanos&#034;&gt;Bolanos&lt;/a&gt;  und H H &lt;a href=&#034;http://www.bibsonomy.org/author/Chou&#034;&gt;Chou&lt;/a&gt;  und C M &lt;a href=&#034;http://www.bibsonomy.org/author/Jordan&#034;&gt;Jordan&lt;/a&gt;  und A L &lt;a href=&#034;http://www.bibsonomy.org/author/Halpern&#034;&gt;Halpern&lt;/a&gt;  und S &lt;a href=&#034;http://www.bibsonomy.org/author/Lonardi&#034;&gt;Lonardi&lt;/a&gt;  und E M &lt;a href=&#034;http://www.bibsonomy.org/author/Beasley&#034;&gt;Beasley&lt;/a&gt;  und R C &lt;a href=&#034;http://www.bibsonomy.org/author/Brandon&#034;&gt;Brandon&lt;/a&gt;  und L &lt;a href=&#034;http://www.bibsonomy.org/author/Chen&#034;&gt;Chen&lt;/a&gt;  und P J &lt;a href=&#034;http://www.bibsonomy.org/author/Dunn&#034;&gt;Dunn&lt;/a&gt;  und Z &lt;a href=&#034;http://www.bibsonomy.org/author/Lai&#034;&gt;Lai&lt;/a&gt;  und Y &lt;a href=&#034;http://www.bibsonomy.org/author/Liang&#034;&gt;Liang&lt;/a&gt;  und D R &lt;a href=&#034;http://www.bibsonomy.org/author/Nusskern&#034;&gt;Nusskern&lt;/a&gt;  und M &lt;a href=&#034;http://www.bibsonomy.org/author/Zhan&#034;&gt;Zhan&lt;/a&gt;  und Q &lt;a href=&#034;http://www.bibsonomy.org/author/Zhang&#034;&gt;Zhang&lt;/a&gt;  und X &lt;a href=&#034;http://www.bibsonomy.org/author/Zheng&#034;&gt;Zheng&lt;/a&gt;  und G M &lt;a href=&#034;http://www.bibsonomy.org/author/Rubin&#034;&gt;Rubin&lt;/a&gt;  und M D &lt;a href=&#034;http://www.bibsonomy.org/author/Adams&#034;&gt;Adams&lt;/a&gt;  und J C &lt;a href=&#034;http://www.bibsonomy.org/author/Venter&#034;&gt;Venter&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Science&lt;/em&gt;&lt;em&gt;287(5461):2196--204&lt;/em&gt;&lt;em&gt;Mar2000. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genes,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Tagged"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/melanogaster,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromosome"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Nucleic"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sites,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Computational"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Repetitive"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chromatin,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Heterochromatin,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Insect,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Euchromatin,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequences,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Animals,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Physical"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Acid,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Drosophila"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Biology"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/211f0d08db68b4f73ce33ffab95ffef98/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/211f0d08db68b4f73ce33ffab95ffef98/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Science</swrc:journal><swrc:month>Mar</swrc:month><swrc:number>5461</swrc:number><swrc:pages>2196--204</swrc:pages><swrc:title>A whole-genome assembly of Drosophila</swrc:title><swrc:volume>287</swrc:volume><swrc:year>2000</swrc:year><swrc:keywords>Genes, Tagged melanogaster, Chromosome Algorithms, Contig Analysis, Nucleic Sites, Computational Mapping, Sequence Repetitive Molecular Chromatin, Heterochromatin, Insect, Euchromatin, Sequences, Animals, Physical Acid, Genome, Drosophila Data, DNA, Biology </swrc:keywords><swrc:abstract>We report on the quality of a whole-genome assembly of Drosophila melanogaster and the nature of the computer algorithms that accomplished it. Three independent external data sources essentially agree with and support the assembly&#039;s sequence and ordering of contigs across the euchromatic portion of the genome. In addition, there are isolated contigs that we believe represent nonrepetitive pockets within the heterochromatin of the centromeres. Comparison with a previously sequenced 2.9- megabase region indicates that sequencing accuracy within nonrepetitive segments is greater than 99. 99% without manual curation. As such, this initial reconstruction of the Drosophila sequence should be of substantial value to the scientific community.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="8395" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10731133" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="5461" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Celera Genomics, Inc., 45 West Gude Drive, Rockville, MD 20850, USA. Gene.Myers@celera.com" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p20" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2000/Myers/Science%202000%20Myers.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="E W Myers"/></rdf:_1><rdf:_2><swrc:Person swrc:name="G G Sutton"/></rdf:_2><rdf:_3><swrc:Person swrc:name="A L Delcher"/></rdf:_3><rdf:_4><swrc:Person swrc:name="I M Dew"/></rdf:_4><rdf:_5><swrc:Person swrc:name="D P Fasulo"/></rdf:_5><rdf:_6><swrc:Person swrc:name="M J Flanigan"/></rdf:_6><rdf:_7><swrc:Person swrc:name="S A Kravitz"/></rdf:_7><rdf:_8><swrc:Person swrc:name="C M Mobarry"/></rdf:_8><rdf:_9><swrc:Person swrc:name="K H Reinert"/></rdf:_9><rdf:_10><swrc:Person swrc:name="K A Remington"/></rdf:_10><rdf:_11><swrc:Person swrc:name="E L Anson"/></rdf:_11><rdf:_12><swrc:Person swrc:name="R A Bolanos"/></rdf:_12><rdf:_13><swrc:Person swrc:name="H H Chou"/></rdf:_13><rdf:_14><swrc:Person swrc:name="C M Jordan"/></rdf:_14><rdf:_15><swrc:Person swrc:name="A L Halpern"/></rdf:_15><rdf:_16><swrc:Person swrc:name="S Lonardi"/></rdf:_16><rdf:_17><swrc:Person swrc:name="E M Beasley"/></rdf:_17><rdf:_18><swrc:Person swrc:name="R C Brandon"/></rdf:_18><rdf:_19><swrc:Person swrc:name="L Chen"/></rdf:_19><rdf:_20><swrc:Person swrc:name="P J Dunn"/></rdf:_20><rdf:_21><swrc:Person swrc:name="Z Lai"/></rdf:_21><rdf:_22><swrc:Person swrc:name="Y Liang"/></rdf:_22><rdf:_23><swrc:Person swrc:name="D R Nusskern"/></rdf:_23><rdf:_24><swrc:Person swrc:name="M Zhan"/></rdf:_24><rdf:_25><swrc:Person swrc:name="Q Zhang"/></rdf:_25><rdf:_26><swrc:Person swrc:name="X Zheng"/></rdf:_26><rdf:_27><swrc:Person swrc:name="G M Rubin"/></rdf:_27><rdf:_28><swrc:Person swrc:name="M D Adams"/></rdf:_28><rdf:_29><swrc:Person swrc:name="J C Venter"/></rdf:_29></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2ee9d2aa5c67377101e1ad619b08eac15/dzerbino"><title>Efficiently detecting polymorphisms during the fragment assembly process</title><link>http://www.bibsonomy.org/bibtex/2ee9d2aa5c67377101e1ad619b08eac15/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Base Profiling, Gene Consensus (Genetics), Algorithms, Alignment, Restriction Analysis, Fragmentation Sequence, DNA Variation Length, Expression Polymorphism, Sequence Genetic, Molecular Data, DNA, Fragment </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Daniel &lt;a href=&#034;http://www.bibsonomy.org/author/Fasulo&#034;&gt;Fasulo&lt;/a&gt;  und Aaron &lt;a href=&#034;http://www.bibsonomy.org/author/Halpern&#034;&gt;Halpern&lt;/a&gt;  und Ian &lt;a href=&#034;http://www.bibsonomy.org/author/Dew&#034;&gt;Dew&lt;/a&gt;  und Clark &lt;a href=&#034;http://www.bibsonomy.org/author/Mobarry&#034;&gt;Mobarry&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bioinformatics&lt;/em&gt;&lt;em&gt;Jan2002. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Profiling,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Consensus"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/(Genetics),"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Restriction"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Fragmentation"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Variation"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Length,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Expression"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Polymorphism,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genetic,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Data,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Fragment"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2ee9d2aa5c67377101e1ad619b08eac15/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2ee9d2aa5c67377101e1ad619b08eac15/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bioinformatics</swrc:journal><swrc:month>Jan</swrc:month><swrc:pages>S294--302</swrc:pages><swrc:title>Efficiently detecting polymorphisms during the fragment assembly process</swrc:title><swrc:volume>18 Suppl 1</swrc:volume><swrc:year>2002</swrc:year><swrc:keywords>Base Profiling, Gene Consensus (Genetics), Algorithms, Alignment, Restriction Analysis, Fragmentation Sequence, DNA Variation Length, Expression Polymorphism, Sequence Genetic, Molecular Data, DNA, Fragment </swrc:keywords><swrc:abstract>MOTIVATION: Current genomic sequence assemblers assume that the input data is derived from a single, homogeneous source. However, recent whole-genome shotgun sequencing projects have violated this assumption, resulting in input fragments covering the same region of the genome whose sequences differ due to polymorphic variation in the population. While single-nucleotide polymorphisms (SNPs) do not pose a significant problem to state-of-the-art assembly methods, these methods do not handle insertion/deletion (indel) polymorphisms of more than a few bases. RESULTS: This paper describes an efficient method for detecting sequence discrepencies due to polymorphism that avoids resorting to global use of more costly, less stringent affine sequence alignments. Instead, the algorithm uses graph-based methods to determine the small set of fragments involved in each polymorphism and performs more sophisticated alignments only among fragments in that set. Results from the incorporation of this method into the Celera Assembler are reported for the D. melanogaster, H. sapiens, and M. musculus genomes.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="12169559" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Informatics Research, Celera Genomics, 45 W. Gude Dr., Rockville MD 20850, USA. daniel.fasulo@celera.com" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p17" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2002/Fasulo/Bioinformatics%202002%20Fasulo.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Daniel Fasulo"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Aaron Halpern"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Ian Dew"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Clark Mobarry"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/29acd2b8b071ac8bb83b8007ef69b9d88/dzerbino"><title>Occupancy modeling of coverage distribution for whole genome shotgun DNA sequencing</title><link>http://www.bibsonomy.org/bibtex/29acd2b8b071ac8bb83b8007ef69b9d88/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Animals, Statistical Genome, Models, Humans, Sequence Algorithms, Genomics, DNA, Analysis, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Michael C &lt;a href=&#034;http://www.bibsonomy.org/author/Wendl&#034;&gt;Wendl&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Bull Math Biol&lt;/em&gt;&lt;em&gt;68(1):179--96&lt;/em&gt;&lt;em&gt;Jan2006. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Animals,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Statistical"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Humans,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genomics,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/29acd2b8b071ac8bb83b8007ef69b9d88/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/29acd2b8b071ac8bb83b8007ef69b9d88/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Bull Math Biol</swrc:journal><swrc:month>Jan</swrc:month><swrc:number>1</swrc:number><swrc:pages>179--96</swrc:pages><swrc:title>Occupancy modeling of coverage distribution for whole genome shotgun DNA sequencing</swrc:title><swrc:volume>68</swrc:volume><swrc:year>2006</swrc:year><swrc:keywords>Animals, Statistical Genome, Models, Humans, Sequence Algorithms, Genomics, DNA, Analysis, </swrc:keywords><swrc:abstract>Expected-value models have long provided a rudimentary theoretical foundation for random DNA sequencing. Here, we are interested in improving characterization of genome coverage in terms of its underlying probability distributions. We find that the mathematical notion of occupancy serves as a good model for evolution of the coverage distribution function and reveals new insights related to sequence redundancy. Established concepts, such as &#034;full shotgun depth,&#034; have been assumed invariant, but actually depend on project size and decrease over time. For most microbial projects, the full shotgun milestone should be revised downward by about 30%. Accordingly, many already-completed genomes appear to have been over-sequenced. Results also suggest that read lengths for emerging high-throughput sequencing methods must be increased substantially before they can be considered as possible successors to the standard Sanger method. In particular, gains in throughput and sequence depth cannot be made to compensate for diminished read length. Limits are well approximated by a simple logarithmic equation, which should be useful in estimating maximum coverage-based redundancy for future projects.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="16794926" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="1" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Genome Sequencing Center, Washington University, 4444 Forest Park Boulevard, Campus Box 8501, St. Louis, MO 63108, USA. mwendl@wustl.edu" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p32" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2006/Wendl/Bull%20Math%20Biol%202006%20Wendl.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1007/s11538-005-9021-4" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Michael C Wendl"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/21dc921e2ef4587944697d75bf48c2db4/dzerbino"><title>Gene maps linearization using genomic rearrangement distances</title><link>http://www.bibsonomy.org/bibtex/21dc921e2ef4587944697d75bf48c2db4/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Software Genome, Computational Sequence Algorithms, Genomics, DNA, Biology, Analysis, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Guillaume &lt;a href=&#034;http://www.bibsonomy.org/author/Blin&#034;&gt;Blin&lt;/a&gt;  und Eric &lt;a href=&#034;http://www.bibsonomy.org/author/Blais&#034;&gt;Blais&lt;/a&gt;  und Danny &lt;a href=&#034;http://www.bibsonomy.org/author/Hermelin&#034;&gt;Hermelin&lt;/a&gt;  und Pierre &lt;a href=&#034;http://www.bibsonomy.org/author/Guillon&#034;&gt;Guillon&lt;/a&gt;  und Mathieu &lt;a href=&#034;http://www.bibsonomy.org/author/Blanchette&#034;&gt;Blanchette&lt;/a&gt;  und Nadia &lt;a href=&#034;http://www.bibsonomy.org/author/El-Mabrouk&#034;&gt;El-Mabrouk&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;J Comput Biol&lt;/em&gt;&lt;em&gt;14(4):394--407&lt;/em&gt;&lt;em&gt;May2007. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Computational"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genomics,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Biology,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/21dc921e2ef4587944697d75bf48c2db4/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/21dc921e2ef4587944697d75bf48c2db4/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>J Comput Biol</swrc:journal><swrc:month>May</swrc:month><swrc:number>4</swrc:number><swrc:pages>394--407</swrc:pages><swrc:title>Gene maps linearization using genomic rearrangement distances</swrc:title><swrc:volume>14</swrc:volume><swrc:year>2007</swrc:year><swrc:keywords>Software Genome, Computational Sequence Algorithms, Genomics, DNA, Biology, Analysis, </swrc:keywords><swrc:abstract>A preliminary step to most comparative genomics studies is the annotation of chromosomes as ordered sequences of genes. Different genetic mapping techniques often give rise to different maps with unequal gene content and sets of unordered neighboring genes. Only partial orders can thus be obtained from combining such maps. However, once a total order O is known for a given genome, it can be used as a reference to order genes of a closely related species characterized by a partial order P. Our goal is to find a linearization of P that is as close as possible to O, in term of a given genomic distance. We first prove NP-completeness complexity results considering the breakpoint and the common interval distances. We then focus on the breakpoint distance and give a dynamic programming algorithm whose running time is exponential for general partial orders, but polynomial when the partial order is derived from a bounded number of genetic maps. A time-efficient greedy heuristic is then given for the general case and is empirically shown to produce solutions within 10% of the optimal solution, on simulated data. Applications to the analysis of grass genomes are presented.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="17572019" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="4" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="IGM-LabInfo, UMR CNRS 8049, Universit{\&#039;e" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p37" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2007/Blin/J%20Comput%20Biol%202007%20Blin.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1089/cmb.2007.A002" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Guillaume Blin"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Eric Blais"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Danny Hermelin"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Pierre Guillon"/></rdf:_4><rdf:_5><swrc:Person swrc:name="Mathieu Blanchette"/></rdf:_5><rdf:_6><swrc:Person swrc:name="Nadia El-Mabrouk"/></rdf:_6></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2d35d83616ff5162f0dc9ae73792e90bf/dzerbino"><title>SSAHA: a fast search method for large DNA databases</title><link>http://www.bibsonomy.org/bibtex/2d35d83616ff5162f0dc9ae73792e90bf/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Systems, Base Sensitivity Management Specificity Database Databases, Algorithms, Alignment, and Composition, Software, Factual, Sequence, Sequence DNA, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Z &lt;a href=&#034;http://www.bibsonomy.org/author/Ning&#034;&gt;Ning&lt;/a&gt;  und A J &lt;a href=&#034;http://www.bibsonomy.org/author/Cox&#034;&gt;Cox&lt;/a&gt;  und J C &lt;a href=&#034;http://www.bibsonomy.org/author/Mullikin&#034;&gt;Mullikin&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Genome Res&lt;/em&gt;&lt;em&gt;11(10):1725--9&lt;/em&gt;&lt;em&gt;Oct2001. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Systems,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Base"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sensitivity"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Management"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Specificity"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Database"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Databases,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/and"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Composition,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Factual,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2d35d83616ff5162f0dc9ae73792e90bf/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2d35d83616ff5162f0dc9ae73792e90bf/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Genome Res</swrc:journal><swrc:month>Oct</swrc:month><swrc:number>10</swrc:number><swrc:pages>1725--9</swrc:pages><swrc:title>SSAHA: a fast search method for large DNA databases</swrc:title><swrc:volume>11</swrc:volume><swrc:year>2001</swrc:year><swrc:keywords>Systems, Base Sensitivity Management Specificity Database Databases, Algorithms, Alignment, and Composition, Software, Factual, Sequence, Sequence DNA, </swrc:keywords><swrc:abstract>We describe an algorithm, SSAHA (Sequence Search and Alignment by Hashing Algorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples of k contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the &#034;hits&#034; for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHA algorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="11591649" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p5" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2001/Ning/Genome%20Res%202001%20Ning.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1101/gr.194201" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Z Ning"/></rdf:_1><rdf:_2><swrc:Person swrc:name="A J Cox"/></rdf:_2><rdf:_3><swrc:Person swrc:name="J C Mullikin"/></rdf:_3></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2e05ca61552c8d9f5951978da7619860d/dzerbino"><title>A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates</title><link>http://www.bibsonomy.org/bibtex/2e05ca61552c8d9f5951978da7619860d/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Duplication, Genome, Gene Software, Computational Gammaproteobacteria, Sequence Bacterial Algorithms, DNA, Biology, Analysis, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;S&#039;ebastien &lt;a href=&#034;http://www.bibsonomy.org/author/Angibaud&#034;&gt;Angibaud&lt;/a&gt;  und Guillaume &lt;a href=&#034;http://www.bibsonomy.org/author/Fertin&#034;&gt;Fertin&lt;/a&gt;  und Irena &lt;a href=&#034;http://www.bibsonomy.org/author/Rusu&#034;&gt;Rusu&lt;/a&gt;  und St&#039;ephane &lt;a href=&#034;http://www.bibsonomy.org/author/Vialette&#034;&gt;Vialette&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;J Comput Biol&lt;/em&gt;&lt;em&gt;14(4):379--93&lt;/em&gt;&lt;em&gt;May2007. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Duplication,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gene"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Computational"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Gammaproteobacteria,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Bacterial"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Biology,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2e05ca61552c8d9f5951978da7619860d/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2e05ca61552c8d9f5951978da7619860d/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>J Comput Biol</swrc:journal><swrc:month>May</swrc:month><swrc:number>4</swrc:number><swrc:pages>379--93</swrc:pages><swrc:title>A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates</swrc:title><swrc:volume>14</swrc:volume><swrc:year>2007</swrc:year><swrc:keywords>Duplication, Genome, Gene Software, Computational Gammaproteobacteria, Sequence Bacterial Algorithms, DNA, Biology, Analysis, </swrc:keywords><swrc:abstract>Computing genomic distances between whole genomes is a fundamental problem in comparative genomics. Recent researches have resulted in different genomic distance definitions, for example, number of breakpoints, number of common intervals, number of conserved intervals, and Maximum Adjacency Disruption number. Unfortunately, it turns out that, in presence of duplications, most problems are NP-hard, and hence several heuristics have been recently proposed. However, while it is relatively easy to compare heuristics between them, until now very little is known about the absolute accuracy of these heuristics. Therefore, there is a great need for algorithmic approaches that compute exact solutions for these genomic distances. In this paper, we present a novel generic pseudo-boolean approach for computing the exact genomic distance between two whole genomes in presence of duplications, and put strong emphasis on common intervals under the maximum matching model. Of particular importance, we show three heuristics which provide very good results on a well-known public dataset of gamma-Proteobacteria.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="17572018" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="4" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Laboratoire d&#039;Informatique de Nantes-Atlantique, FRE CNRS 2729, Universit{\&#039;e" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p39" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2007/Angibaud/J%20Comput%20Biol%202007%20Angibaud.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1089/cmb.2007.A001" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="S{\&#039;e}bastien Angibaud"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Guillaume Fertin"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Irena Rusu"/></rdf:_3><rdf:_4><swrc:Person swrc:name="St{\&#039;e}phane Vialette"/></rdf:_4></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2003fc7a74e62f697dad1196f70c46da8/dzerbino"><title>Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP model</title><link>http://www.bibsonomy.org/bibtex/2003fc7a74e62f697dad1196f70c46da8/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>Molecular, Folding, Models, Conformation, Chemical Algorithms, Protein </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;R &lt;a href=&#034;http://www.bibsonomy.org/author/Agarwala&#034;&gt;Agarwala&lt;/a&gt;  und S &lt;a href=&#034;http://www.bibsonomy.org/author/Batzoglou&#034;&gt;Batzoglou&lt;/a&gt;  und V &lt;a href=&#034;http://www.bibsonomy.org/author/Danc{\&amp;#039;\i}k&#034;&gt;Danc&#039;ik&lt;/a&gt;  und S E &lt;a href=&#034;http://www.bibsonomy.org/author/Decatur&#034;&gt;Decatur&lt;/a&gt;  und S &lt;a href=&#034;http://www.bibsonomy.org/author/Hannenhalli&#034;&gt;Hannenhalli&lt;/a&gt;  und M &lt;a href=&#034;http://www.bibsonomy.org/author/Farach&#034;&gt;Farach&lt;/a&gt;  und S &lt;a href=&#034;http://www.bibsonomy.org/author/Muthukrishnan&#034;&gt;Muthukrishnan&lt;/a&gt;  und S &lt;a href=&#034;http://www.bibsonomy.org/author/Skiena&#034;&gt;Skiena&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;J Comput Biol&lt;/em&gt;&lt;em&gt;4(3):275--96&lt;/em&gt;&lt;em&gt;Jan1997. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Molecular,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Folding,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Conformation,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Chemical"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Protein"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2003fc7a74e62f697dad1196f70c46da8/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2003fc7a74e62f697dad1196f70c46da8/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>J Comput Biol</swrc:journal><swrc:month>Jan</swrc:month><swrc:number>3</swrc:number><swrc:pages>275--96</swrc:pages><swrc:title>Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP model</swrc:title><swrc:volume>4</swrc:volume><swrc:year>1997</swrc:year><swrc:keywords>Molecular, Folding, Models, Conformation, Chemical Algorithms, Protein </swrc:keywords><swrc:abstract>We consider the problem of determining the three-dimensional folding of a protein given its one-dimensional amino acid sequence. We use the HP model for protein folding proposed by Dill (1985), which models protein as a chain of amino acid residues that are either hydrophobic or polar, and hydrophobic interactions are the dominant initial driving force for the protein folding. Hart and Istrail (1996a) gave approximation algorithms for folding proteins on the cubic lattice under the HP model. In this paper, we examine the choice of a lattice by considering its algorithmic and geometric implications and argue that the triangular lattice is a more reasonable choice. We present a set of folding rules for a triangular lattice and analyze the approximation ratio they achieve. In addition, we introduce a generalization of the HP model to account for residues having different levels of hydrophobicity. After describing the biological foundation for this generalization, we show that in the new model we are able to achieve similar constant factor approximation guarantees on the triangular lattice as were achieved in the standard HP model. While the structures derived from our folding rules are probably still far from biological reality, we hope that having a set of folding rules with different properties will yield more interesting folds when combined.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="3" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="National Human Genome Research Institute/National Institutes of Health, Bethesda, Maryland 20892, USA. richa@helix.nih.gov" swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p19" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/1997/Agarwala/J%20Comput%20Biol%201997%20Agarwala.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="R Agarwala"/></rdf:_1><rdf:_2><swrc:Person swrc:name="S Batzoglou"/></rdf:_2><rdf:_3><swrc:Person swrc:name="V Danc{\&#039;\i}k"/></rdf:_3><rdf:_4><swrc:Person swrc:name="S E Decatur"/></rdf:_4><rdf:_5><swrc:Person swrc:name="S Hannenhalli"/></rdf:_5><rdf:_6><swrc:Person swrc:name="M Farach"/></rdf:_6><rdf:_7><swrc:Person swrc:name="S Muthukrishnan"/></rdf:_7><rdf:_8><swrc:Person swrc:name="S Skiena"/></rdf:_8></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/285853a4fe7db3494508d6631d10f55ca/dzerbino"><title>An Eulerian path approach to DNA fragment assembly</title><link>http://www.bibsonomy.org/bibtex/285853a4fe7db3494508d6631d10f55ca/dzerbino</link><dc:creator>dzerbino</dc:creator><dc:date>2007-09-17T20:19:41+02:00</dc:date><dc:subject>meningitidis, lactis Neisseria Campylobacter Theoretical, Algorithms, Alignment, Analysis, Contig jejuni, Genome, Models, Software, Mapping, Lactococcus Sequence Bacterial, DNA, </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;P A &lt;a href=&#034;http://www.bibsonomy.org/author/Pevzner&#034;&gt;Pevzner&lt;/a&gt;  und H &lt;a href=&#034;http://www.bibsonomy.org/author/Tang&#034;&gt;Tang&lt;/a&gt;  und M S &lt;a href=&#034;http://www.bibsonomy.org/author/Waterman&#034;&gt;Waterman&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Proc Natl Acad Sci USA&lt;/em&gt;&lt;em&gt;98(17):9748--53&lt;/em&gt;&lt;em&gt;Aug2001. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/meningitidis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/lactis"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Neisseria"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Campylobacter"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Theoretical,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Algorithms,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Alignment,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Analysis,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Contig"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/jejuni,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Genome,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Models,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Software,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Mapping,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Lactococcus"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Sequence"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/Bacterial,"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/DNA,"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/285853a4fe7db3494508d6631d10f55ca/dzerbino"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/285853a4fe7db3494508d6631d10f55ca/dzerbino"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><swrc:date>Mon Sep 17 20:19:41 CEST 2007</swrc:date><swrc:journal>Proc Natl Acad Sci USA</swrc:journal><swrc:month>Aug</swrc:month><swrc:number>17</swrc:number><swrc:pages>9748--53</swrc:pages><swrc:title>An Eulerian path approach to DNA fragment assembly</swrc:title><swrc:volume>98</swrc:volume><swrc:year>2001</swrc:year><swrc:keywords>meningitidis, lactis Neisseria Campylobacter Theoretical, Algorithms, Alignment, Analysis, Contig jejuni, Genome, Models, Software, Mapping, Lactococcus Sequence Bacterial, DNA, </swrc:keywords><swrc:abstract>For the last 20 years, fragment assembly in DNA sequencing followed the &#034;overlap-layout-consensus&#034; paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical &#034;overlap-layout-consensus&#034; approach in favor of a new euler algorithm that, for the first time, resolves the 20-year-old &#034;repeat problem&#034; in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem that allows one to generate accurate solutions of large-scale sequencing problems. euler, in contrast to the celera assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="98/17/9748" swrc:key="pii"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="11504945" swrc:key="pmid"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="17" swrc:key="issue"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="Department of Computer Science and Engineering, University of California, San Diego, La Jolla, USA." swrc:key="affiliation"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="English" swrc:key="language"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="papers://055852FE-1648-42FE-91D0-8CA474D2B905/Paper/p15" swrc:key="uri"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="file://localhost/Users/danielzerbino/Documents/Papers/2001/Pevzner/Proc%20Natl%20Acad%20Sci%20USA%202001%20Pevzner.pdf" swrc:key="url"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="10.1073/pnas.171285098" swrc:key="doi"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="P A Pevzner"/></rdf:_1><rdf:_2><swrc:Person swrc:name="H Tang"/></rdf:_2><rdf:_3><swrc:Person swrc:name="M S Waterman"/></rdf:_3></rdf:Seq></swrc:author></rdf:Description></burst:publication></item></rdf:RDF>