-
[
International Worm Meeting,
2013]
Since its first release in 2001, WormBase
(http://www.wormbase.org) has grown from a small database serving the specific research community of a single species, to a resource encompassing the breadth of the nematode phylum and serving as a fundamental tool for broader biomedical and agricultural research. We now include the genome sequences of over twenty nematode species, around half of which are parasitic worms implicated in animal or plant disease. We have begun to engage directly with parasitic nematode research communities, and have recently collaborated on the annotation of a new version of the Brugia malayi reference genome, manually curating around one fifth of the gene models. We also continue to scale-up our literature curation workflows and develop our phenotype, life-stage and anatomy ontologies to apply them beyond the Caenorhabditis genus. In parallel with this diversification, we continue to increase the depth and detail of information for C.elegans. We have adopted new approaches to how we represent and analyse genomic variation data in response to continued growth in whole-genome sequencing of mutant and wild isolate strains; we have enriched our data sets of gene expression, transcriptional regulation, pathways, and interactions; and we now provide both predicted and experimentally-confirmed associations between C. elegans genes and human disease genes, collaborating with other model organism databases in the adoption of a common ontology for human disease. Finally, continued expansion of the resource, both in breadth and depth, has motivated us to completely rethink how data is presented to and accessed by users. The new WormBase web-site, demonstrated in 2011, is now in full production, resulting in greater speed, stability and flexibility. We are now pleased to introduce our new platform for data mining, WormMine. Conceptually similar to our previous data-mining tool (WormMart), WormMine allows custom queries to be composed from pre-prepared query templates, greatly improving the speed and ease with which users can obtain the precise data they need.
-
[
International Worm Meeting,
2013]
Since 2011, the number of nematode species represented in WormBase increased by half from 12 to 19. These species can be divided into close relatives of Caenorhabditis elegans and parasitic nematodes. In addition the in-depth curation of a limited set of core species has been extended, with the support of the wider research community, to include Brugia malayi, a nematode of clade III and one of the causative agents of lymphatic filariasis in human. Support for multiple alternative reference genomes for a single species (like Ascaris suum somatic/germline assemblies) has been added. To support the extension of WormBase to non-Caenorhabditis species and provide a standardised nomenclature across nematodes, we have also extended our gene-naming service to include B.malayi and helped to provide a first pass gene nomenclature based on a combination of publications and predictions through orthologs. Another development focus was the mapping procedure for RNAi and expression probes, which now allows for more accurate mappings in a species-agnostic way. Also the use of RNASeq by WormBase increased and mappings of RNASeq libraries deposited in the Short Read Archive are not only provided as BAM files, but are also post processed and used in the curation process providing information on TSL sites, splicing, expression asymmetry, polycistronic transcripts and tissue and life-stage specific expression levels (available through SPELL). This is supplemented by the integration sequence features from modENCODE data, like transcription factor, RNA polymerase II and histone binding sites as well as genelets and transcripts.
-
[
International Worm Meeting,
2013]
We continue to see growth in volume and diversity of nematode genomic variation data, in large part due to increasing research effort in whole-genome sequencing (WGS) of numerous C.elegans mutant and wild-isolate strains. WormBase have responded to the challenges presented by this growth by making changes to the way in which we curate, store and display variation data. One significant change has been to more-clearly distinguish between naturally-occurring polymorphisms and laboratory-induced mutations at the display level. These are now show in two separate tables on Gene Summary pages, with laboratory-induced alleles identified by the allele designation of the laboratory of origin, and naturally-occurring polymorphisms identified by WormBase variation accessions. We have also begun to consolidate redundant data from independent wild-isolate sequencing projects. Previously, if a specific molecular variation had been identified by multiple independent projects, and/or in multiple strains, a separate variation object would have been created for each. Now, a single reference variation is created which cross-references all studies that have characterised that variation and all strains that carry it. A new version of the WormBase website was launched in early 2012, and we continue to refine and improve the display of variation data. Coloured fields in the Strain widget on the Variation Summary Page clearly show which strains carry a variation and whether the strain is available from the CGC. The Gene Summary Page now allows customisation in the the way Variations are viewed; both variation tables can be sorted by various properties, including type of molecular change, effect on the protein, and the number of associated phenotypes. We have also increased the complement of variation tracks on the genome browser, clearly separating classical alleles from those generated by large-scale sequencing projects, and creating additional tracks for single-nucleotide variants that confer a putative change-of-function on a protein. We we continue to refine the presentation of this data, and welcome feedback from the C.elegans research community.
-
Berriman, Matthew, Sternberg, Paul, Rodgers, Faye, Le, Tuan, Howe, Kevin, Bazant, Wojtek
[
International Worm Meeting,
2019]
WormBase ParaSite
(http://parasite.wormbase.org) is a sister site to WormBase that aims to integrate, organise and present data for nearly all parasitic nematode and platyhelminth genomes. Five years on from the start of the project, we now include data for over 150 genomes, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), and comparative analysis (e.g. orthologues and paralogues). We provide several ways to explore the data, including genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, a choice of genome browsers, and a programmatic interface. We will describe recent updates to WormBase ParaSite. In particular, WBPS13 (released in spring 2019) includes a major update: a gene expression sub-portal. This has been underpinned by three key components of work: (a) curation of the majority of published helminth RNASeq studies, inferring the experimental design and standardising on sample labels and descriptions; (b) quantification of samples on a per-gene and per-transcript basis using a established tools; and (c) for selected studies, differential expression analysis. The results are made available via a number of new views in WormBase ParaSite, including a per-genome study explorer, and per-gene summaries of expression across all studies.
-
Paulini, Michael, Howe, Kevin, Williams, Gary, Rodgers, Faye, Davis, Paul, Russel, Matthew
[
International Worm Meeting,
2019]
WormBase
(http://www.wormbase.org) is a central data repository for genetic and genomic data relating to C. elegans and other well studied nematodes. WormBase is also a founding member of the Alliance of Genome Resources (https://www.alliancegenome.org), an effort to align data representation and curation workflows between the major model organism databases (MODs). We will present an update on the curation of nematode genomes outside the Caenorhabditis genus, demonstrate how we use data-mining to add information from current and historical publications, and show examples of the use and benefit to the nematode community. The genome of Trichuris muris has been added by WormBase to act as a representative reference genome for Blaxter clade I group. WormBase have performed first-pass annotation of the genome and manual curation of selected gene structures, with the aim of helping with annotation efforts of closely related Trichuris species. We have also added an updated version of the Pristionchus pacificus genome. This represents the start of a formal collaboration between WormBase and pristionchus.org on the joint maintenance and curation of the reference annotation. Finally, the curation of the Brugia malayi genome is ongoing, and reference annotation provided by WormBase is being used by research groups to analyse the evolution of filarial nematodes and the diseases they cause.
-
Schedl, Tim, Paulini, Michael, Howe, Kevin, Williams, Gary, Davis, Paul, Russell, Matthew
[
International Worm Meeting,
2019]
WormBase
(http://www.wormbase.org) is a central data repository for nematode biologists and scientists enabling experimental research in C. elegans. WormBase is also a founding member of the Alliance of Genome Resources (https://www.alliancegenome.org), an effort to align data representation and curation workflows between the major model organism databases (MODs). We will describe how WormBase acquires Strain and Variant data from a variety of sources, keeps up to date and feeds improvements back to third parties. One particular change has been to preserve all strain data in perpetuity, so as to not lose information about historical Strains. A project that we will be undertaking in the near future is to formally accession Strains to allow for inter-resource stability and reliability. This is particularly important for the representation of cross-species strain and population data in the Alliance of Genome Resources. Two of the primary sources of data are the CGC (Caenorhabditis Genetics Center) and CeNDR (Caenorhabditis elegans Natural Diversity Resource). WormBase uses a variety of techniques to distribute the data from these centers and advertise the availability of Strains from these resources. We will describe the status of C. elegans variation data, summarising the different types and the completeness of information from these data types. Common representation of variation data between the MODs is an important project for the Alliance of Genome Resources, and we will provide an update of progress and plans in that area. Finally, nomenclature is a key area within WormBase and the C. elegans community as a whole. For decades the Worm Community has been a leading the way, having formal nomenclature for genes, variations and strains for all labs actively working long term in the field. We will summarise some of the aspects of this activity and how the various processes work.
-
Williams, Gary, Down, Thomas, Bolt, Bruce, Howe, Kevin, Paulini, Michael, Davis, Paul, Lomax, Jane
[
International Worm Meeting,
2015]
WormBase is the major public online database resource for the Caenorhabditis research community. The database was developed primarily for the nematode C. elegans but expanded to host genomes and biological data from closely related nematodes and nematodes of agricultural or medical significance. WormBase has a team of curators who are responsible for gathering, incorporating and curating a large collection of scientific data types from a variety of sources. One area of curation targets sequence based data types which are used to confirm or allow curators to improve the quality of gene structures and their products. Within this area there has been significant manual curation of the genes of Caenorhabditis elegans, with "Topic" based or bursts of curation on other core species and user driven response curation of the species not currently in primary focus. This curatorial activity draws on the large amount of data that has been incorporated to give as complete representation of the gene as possible. Often a curator will have to deal with conflicting or problematic data to resolve the correct structure. This poster illustrates some of the data that has been incorporated, interesting observations made by curators as well as some of the challenges faced by curators and MODs in general.
-
Down, Thomas, Bolt, Bruce, Paulini, Michael, Howe, Kevin, Lomax, Jane, Davis, Paul, Williams, Gareth
[
International Worm Meeting,
2015]
WormBase currently uses a well-established tool-set for curation of gene structures and other sequence features on nematode reference genomes. This comprises the AceDB Fmap software, augmented by our own graphical curation tool that continues to to be developed as we incorporate new types of data into the curation workflows. Although this system has worked extremely well for a number of years, it is limited in two ways: (a) it does not support the direct visualisation of external supporting evidence tracks; and (b) it does not support direct contribution of curation effort from scientists outside of WormBase. We have recently embarked on a major internal project to migrate our complete database and curation infrastructure to more modern, scalable, cloud-oriented solutions. As part of this project, we have been piloting the use of WebApollo, a plug-in for the JBrowse genome browser which supports visualisation of external supporting evidence tracks distributed (i.e. community) curation. Here, we summarise the work done in migrating WormBase curation workflows to WebApollo, the challenges that remain, and propose a prototype working model for WormBase-faciliated community annotation of gene structures in nematode genomes.
-
Berriman, Matt, Howe, Kevin, Sternberg, Paul, Stein, Lincoln, Kersey, Paul, Harris, Todd, Schedl, Tim
[
International Worm Meeting,
2015]
WormBase has existed for 15 years and has evolved in many ways. The new website is fully operational and has made the process of adding new data types, displays, and tools easier. Behind the scenes we are piloting an overhaul of the underlying database infrastructure to allow us to handle the ever increasing data, have the website perform faster, and allow more frequent updates of information. This is a critical time for the project, as we face considerable pressure from two directions. The first is that our funders really want us to do more with less. We are responding to this by leading the way in making curation (the process of extracting information from papers and data sets into computable form) more efficient using a new version of Textpresso (to be released later this calendar year); by discussing with other model organism information resources ways to work together to be more efficient and inter-connected; and by seeking additional sources of funding. The second, delightful, pressure is an increase in data and results generated by the C. elegans and nematode communities. While we are handling this increase by changes in our software for curation, the database infrastructure, and the website, we do need your help. Many of you have helped us over the last few years to identify data in your papers or by sending us data directly. We now need you to help with a few types of information by submitting the data via specially designed, user-friendly forms that ensure good quality and the use of standard terminology. In particular, we have a large backlog of uncurated information associating alleles with phenotypes. We pledge to make this process as painless as possible, and to improve WormBase's description of phenotypes with your feedback, starting at this meeting at the WormBase booth, workshops and posters. With your help, continual improvement of our efficiency, and additional sources of funding, we are optimistic that we can do much more with even somewhat less effort.Consortium: Paul Davis, Michael Paulini, Gary Williams, Bruce Bolt, Thomas Down, Jane Lomax, Todd Harris, Sibyl Gao, Scott Cain, Xiaodong Wang, Karen Yook, Juancarlos Chan, Wen Chen, Chris Grove, Mary Ann Tuli, Kimberly Van Auken, D. Wang, Ranjana Kishore, Raymond Lee, John DeModena, James Done, Yuling Li, H.-M. Mueller, Cecilia Nakamura, Daniela Raciti, Gary Schindelman.
-
Howe, Kevin, Davis, Paul, Bolt, Bruce, Down, Thomas, Paulini, Michael, Lomax, Jane, Williams, Gary
[
International Worm Meeting,
2015]
Debilitating disease in humans caused by parasitic worms (helminths) is estimated to be equivalent to the loss of at least fifty million productive years of life, whilst agricultural losses can be measured in hundreds of millions of dollars. As international efforts to sequence helminth genomes and transcriptomes gather pace, a systematic approach to the curation, integration and presentation of this data is needed to provide maximum utility for biomedical and agricultural research.WormBase
(http://www.wormbase.org) has approached this problem from two directions. Firstly, we have expanded our sequence curation activities to include selected parasitic nematodes with defined research communities, namely: Brugia malayi (causative agent of lymphatic filariasis), Onchocerca volvulus (river blindness), and Strongyloides ratti (rat model for strongyloidiasis). We now administer the reference genomes for these species, and actively curate the gene models and other annotations. Secondly, we have created a sub-portal, WormBase ParaSite
(http://parasite.wormbase.org), aimed at researchers engaged in parasitic worm genomics. WormBase ParaSite encompases flatworms as well as nematodes, and provides genome sequence, genome browsers, semi-automatic annotation and comparative genomics data for approximately one hundred species. Additional tools include a cross-species data-mining platform, protein and nucleotide sequence search, and a variant effect predictor to enable the analysis of different strain/isolate genomes in the context of the reference.