Search pages
- Proteins: search for proteins with certain domains or domain arrangements) for the following species: C. elegans, C. briggsae, C. remanei, C. brenneri, C. japonica, P. pacificus, B. malayi, O. volvulus and S. ratti.
- Mutations: find and display the location of mutations in C. elegans genes of interest or identify mutations affecting particular protein domains or splice variants.
- Genes: search for and display domain organization, phenotype, expression, Gene Ontology terms and map position of C. elegans genes.
- Expression: search for genes with a particular expression profile or show the expression profiles of a list of genes of interest
- Compare: a simple interface to compare two sets of genes to identify common and unique genes in the sets.
Search fields
Protein domains
Protein domain(s)
Enter one or more of the domain abbreviations from the table of abbreviations or SMART (or Pfam) identifiers. Comma separated entries are combined for an 'or' search.
Domain pattern
Use this field to search for certain domain combinations. Think of a protein as a linear sequence of domains like 'IG IG IG FN3 FN3 TM 324'. Domain abbreviations in this field (but not in the domain field) are case-sensitive! You can use wildcards and numbers (but no boolean terms) in the following way:
-
"IG IG IG' means three IGs in a row with no other domain in between.
-
'IG*FN3' is IG eventually followed by FN3 (* acting as wildcard for 0 or more domains of any kind between the specified domains).
-
'3IG' will be expanded to 'IG IG IG', i.e. 3 IG domains in a row. Make sure you don't have any space between the number and the domain abbreviation.
-
'3IG%' means 'IG*IG*IG', i.e. three IGs potentially with other domains interspersed (% simply acting as * wildcards during expansion of the search term).
-
'3-5IG' not surprisingly means 'IG IG IG' or 'IG IG IG IG' or 'IG IG IG IG IG'.
-
'<3IG FN3' is translated as 'IG FN3' or 'IG IG FN3', i.e. at least one, but not more than 2 followed by ...
-
'>3IG FN3' is actually 'IG IG IG IG*FN3', i.e. at least 4 IGs and potentially more (but I'm not really checking) followed by ...
-
and of course you can combine everything.
Suppress gaps smaller than ...
Any part of the protein not assigned to a domain and smaller than the number of amino acids given in this field is not displayed. Effectively this suppresses small gaps between adjacent domains due to predicted domains often being slightly smaller than the actual domains. The default of 30 amino acids comes from the idea that anything smaller than 30 amino acids is too small to be a real domain (i.e. a independently self-folding structure).
Genes
Gene/sequence name(s)
Enter one or more gene names here. You can separate gene names by space, comma or newline characters. You can mix gene names (rig-6) with sequence names (C33F10.5). Names will be stripped of splice designations. You can paste large lists of genes (i.e. several thousand) in here.
Allele(s)
Enter one or more allele names here.
Chromosome
Specify a chromosomal region either as genetic of physical interval.
Orthologs
Ortholog names
Enter one or more ENSEMBL IDs here (comma separated). The database contains human, mouse, Drosophila and yeast (S. cerevisiae) homologs of C. elegans genes. Orthologs were assembled from ortholog lists provided by Wormbase, which generally contain only one ortholog per species.
Gene Ontology
Gene Ontology terms
Enter one or more Gene Ontology terms. Note that this is a highly specialized vocabulary. Substitute SPACE with _ (underscore) for multi-word terms. Auto-suggest/auto-complete might be helpful here - it only offers the 2570 terms actually used for C. elegans proteins. You can use boolean search logic as well as wildcards here (in fact you don't even have to use wildcards, just use 'receptor' for all terms containing 'receptor'). Note that for all species other than C. elegans, GO terms are currently mainly derived from protein sequence information.
Gene Ontology IDs
If you are an expert on Gene Ontology terms, you might find it easier to enter a list of GO term identifers in this field instead of using the more verbose term search field.
Phenotype, Description, Expression
Gene description full text search
Searches the 'concise description' that comes with the gene. Multi-word search terms like 'body wall muscle' currently are translated as 'body OR wall OR muscle'.
Phenotypes
Uses the phenotype vocabulary from Wormbase.
Expression pattern full text search
Use plain English and be aware of the complications ('muscle' might get you also '... and was not expressed in muscle').
Expression profiles (life stages)
Data
Expression values are given in DCPM (depth of coverage per million reads of length 35). To convert to the more common rpkm (reads per kilo base per million), multiply by 1000/35. For more details, see here. Reads were parsed against the WS220 gene models, regular gene names (but not gene models!) were updated using WS250.
The times are in minutes of embryonic development from the first cell division, scaled to the times given in Sulston et al. (1983). The "sterile adult" sample is from synchronized spe-9(hc88) L1 worms cultured at 23°C for 8 days after L4 molt. The spe-9(hc88) animals have defective sperm and fail to fertilize oocytes. The "L4 soma" sample is from glp-1 mutant animals, which were used to generate L4 animals lacking germ cells.
Graphs
The line segments in the right-hand graph show the weighted average of the individual samples for the post-embryonic series, with the individual samples shown as points. The dauer time course is superimposed on the hermaphrodite stages, as are the soma, dissected gonad and L4 male samples. The sterile sample is from older spe-9 mutant adults, but is plotted with the young adults for simplicity.
The line segments on the left-hand graph show the unified expression values derived from 4 different time series (see here for a more complete description of the methods) with the data points from the individual series shown as different colored points. The polyA series was collected selecting for polyA+ message and the 0223, 0411 and 0419 series were done using ribozero to deplete the sample of rRNA. As a result these latter series include polyA- mRNA, most noticeably histone messages. Also, for reference, a sample from 250 hand-picked 4-cell embryos is included here, but was not used to derive the unified plot. The times are in minutes of development from the first cell division, scaled to the times given in Sulston et al. (1983).
Expression profiles (tissues and cell types)
Data
Cell types were identified in a single cell RNA-seq experiment that profiled the transcriptomes of synchronized L2 worms. This was accomplished by performing t-SNE dimensionality reduction, identifying clusters using the density peak algorithm, and then identifying the cell type each cluster corresponded to based on expression of marker genes from the literature. Data for single cells of a given cell type are then aggregated to give a consensus expression profile. Expression values are given in transcripts per million (TPM).
Graphs
Bars plots show the mean expression of the gene in a given cell type. Error bars show the 95% confidence interval for the estimate of the mean. Grey bars labeled with an asterisk indicate expression level estimates that are based on a very small number of reads, for which a confidence interval cannot be estimated.
Expression profiles (embryonic tissues and cell lineages)
Data
FACS was used to isolate fluorescently labeled tissues/cell lineages at discrete time points during embryonic development from synchronized populations of embryos. The initial time point was ~120 minutes after embryo isolation, and about halfway through gastrulation. The last time point (~480 minutes after isolation) was approximately when the embryo reached the three-fold stage, when most cells are beginning terminal differentiation, but before the cuticle forms. For each tissue/cell lineage, 2 replicates were obtained with the exception of ABa lineage (tbx-37) and muscle (hlh-1), which had 3 and 4 replicates respectively. RNA-seq libraries were generated from rRNA depleted RNA samples and PCR duplicates were removed prior to gene expression calculation. Gene expression values are in Transcripts Per Million (TPM).
Graphs
Each tissue/cell lineage is represented by its replicates (open symbols) at each time point, along with an average of the replicates (closed circles). Time points are labeled 0 through 4 with timing as follows: 0 (120 min), 1 (210 min), 2 (300 min), 3 (390 min), and 4 (480 min).