CTD is a robust, publicly available database that aims to advance understanding about how
environmental exposures affect human health. It provides manually curated information
about chemical–gene/protein interactions, chemical–disease and gene–disease relationships.
These data are integrated with functional and pathway data to aid in development of
hypotheses about the mechanisms underlying environmentally influenced diseases.
We also have additional ongoing projects involving manual curation of exposome data
and chemical–phenotype relationships to help identify pre–disease biomarkers resulting
from environmental exposures.
The initial release of CTD was on November 12, 2004. We’re grateful to our strong community support and encourage
you to give us feedback so we can continue to evolve with your research needs.
This program is supported by funds from the National Institute of Environmental Health Sciences
- R01ES014065, Comparative Toxicogenomics Database
- R01ES019604, Generation of a centralized and integrated resource for exposure data
- R01ES023788, Advancing mechanism–based studies with cross–species chemical–phenotype data
- R01ES019604-04S1, Integrating Big Data and curated literature to advance discoveries about disease
We’re also proud to be part of the NIEHS Environmental Health Science Center at NC State, the
Center for Human Health and the Environment (P30ES025128).
CTD integrates a chemical subset of the Medical Subject
Headings (MeSH®), the hierarchical vocabulary from the
U.S. National Library of Medicine. You can view diverse information about chemicals, including chemical
structures, curated interacting genes and proteins, curated and inferred disease relationships, and enriched
pathways and functional annotations. You can
browse chemicals, or use them to formulate
chemical–gene interaction, or
CTD's MEDIC disease vocabulary is a modified subset
of descriptors from the “Diseases” category of the U.S. National Library of Medicine (NLM)
Medical Subject Headings (MeSH®),
combined with genetic disorders from the Online Mendelian Inheritance in Man®
CTD biocurators mapped OMIM diseases to terms within the hierarchical MeSH disease vocabulary to expand our
representation. This combined vocabulary is used to curate gene–disease and chemical–disease
associations. You can browse diseases, or use them to
formulate gene or
The CTD cross-species gene vocabulary (symbols, names,
and synonyms) is derived from the Gene database at the National
Center for Biotechnology Information (NCBI), a division of the U.S. National Library of Medicine. You can view
diverse information about genes, including curated interacting chemicals, curated and inferred disease
relationships, and associated pathways and functional annotations. You can
browse genes, or access them using the Keyword
search or by formulating advanced queries.
At CTD, we distinguish between diseases and phenotypes, wherein a phenotype refers to a non-disease-term
biological event: e.g., abnormal cell cycle arrest (phenotype) vs. lung cancer (disease),
increased fat cell differentiation (phenotype) vs. obesity (disease), decreased spermatogenesis (phenotype) vs.
male infertility (disease), etc. CTD uses the Gene Ontology (GO) as a source of phenotype vocabulary
terms for biological outcomes. All GO terms have comprehensive definitions and stable accession
identifiers, the latter of which allows GO annotations to act as a nexus to connect, integrate,
and harmonize knowledge from domains curated across a variety of databases. You can
browse Gene Ontology
terms directly, or access them through the Keyword search, or you can perform
Chemical-Phenotype Interaction queries.
- Chemical–Gene/Protein Interactions
To improve understanding about the mechanisms of chemical actions, we manually curate
chemical–gene and –protein interactions in vertebrates and invertebrates from the published
These interactions are both direct (e.g., “chemical binds to protein”) and indirect
(e.g., “chemical results in increased phosphorylation of a protein” via intermediate events).
We curate interactions using a hierarchical
interaction-type vocabulary that
characterizes common physical, regulatory, and biochemical interactions between chemicals and genes or proteins.
This vocabulary comprises 70 terms including actions (e.g., “binds to”, “imports”),
operators that describe the degree of a chemical's effect (e.g., “increases”),
and qualifiers that specify the form of the gene or chemical involved in an interaction (e.g.,
“protein” or “chemical metabolite,” respectively).
You can search chemical–gene interactions directly via the
chemical–gene interaction query,
or access them via a gene, chemical, disease, or reference.
- Gene–Disease Associations
CTD contains curated and inferred gene–disease associations.
Curated gene–disease associations are extracted from the published literature by CTD biocurators,
or are derived from the OMIM database using the mim2gene file from the
NCBI Gene database. Inferred associations (see
figure) are established via CTD–curated chemical–gene interactions (e.g., gene A is associated with
disease B because gene A has a curated interaction with chemical C, and chemical C has a curated association with
disease B). Curated and inferred associations are identified, and help users develop hypotheses
about mechanisms underlying environmental diseases.
Inference scores are calculated for all inferred relationships. These scores reflect the degree of
similarity between CTD chemical–gene–disease networks and a similar scale-free random network. The higher the
score, the more likely the inference network has atypical connectivity. Many biological networks, such as disease
and metabolic networks, have been shown to be scale-free random
The inference score is calculated as the log-transformed product of two common-neighbor statistics used to assess
the functional relationships between proteins in a protein–protein interaction
The first statistic takes into account the connectivity of the chemical and disease along with the number of genes
used to make the inference. The second statistic takes into the account the connectivity of each of the genes used
to make the inference.
- Chemical–Disease Associations
CTD contains curated and inferred chemical–disease associations. Curated
chemical–disease associations are extracted from the published literature by CTD biocurators.
Inferred associations (see figure) are established via CTD–curated chemical–gene
interactions (e.g., chemical A is associated with disease B because chemical A has a curated interaction with gene
C, and gene C has a curated association with disease B). Curated and inferred associations are
identified, and help users develop hypotheses about mechanisms underlying environmental diseases.
- Chemical–Phenotype Interactions
A CTD chemical-phenotype interaction statement includes 8 types of data (C-Q-E-A-T-M-S-P) annotated using 8
controlled vocabularies, including, at a minimum: C, a chemical from the CTD Chemical Vocabulary; Q, a CTD action
qualifier that reflects the direction of the interaction ("increases," "decreases," or "affects," when not
specified by the authors); E, the entity phenotype from GO; A, an anatomical term from the MeSH "Anatomy [A]" branch;
T, an organism from NCBI Taxonomy; M, a CTD method code (in vivo, in vitro); S, the CTD information source
code (abstract, full text); and P, the article identifier (PMID) from NCBI PubMed. "Not reported" is allowed
for both taxon and anatomy fields if the authors do not provide this information.
Chemical-phenotype content can be accessed using the Keyword Search Box in the upper right hand corner of
any CTD page by querying either the "Chemical" or "GO" field (from the drop-down pick-list) with a
term-of-interest. A phenotype icon identifies the retrieved matching terms that have chemical-phenotype
associated data. Clicking the icon, or going to the "Chemical Interactions" tab on a respective GO page,
shows all the curated chemical-phenotype interactions in a tabular web-display. Users can sort the
information by clicking on any column header. Any co-mentioned terms (e.g., chemicals, genes,
and other phenotypes) are hyperlinked to their respective CTD pages, allowing users to easily traverse the database.
- Gene–Gene Interactions
CTD represents gene–gene interactions from BioGRID
that consist of genetic and protein interactions curated from primary literature for all major model organisms by
These interactions are available for each gene and reference, and for the inference networks underlying each
chemical–disease association. In addition, you can generate pathways for custom collections of genes using
the Set Analyzer tool.
CTD contains reference articles related to
toxicologically significant vertebrate and invertebrate genes, diseases, and associated chemicals. References were
identified by information retrieval methods, and comprise a subset of
®/PubMed®, a database of the U.S. National Library of Medicine.
CTD's hierarchical organism vocabulary consists of the
(vertebrates and invertebrates) branch of the Taxonomy Database from
the National Center for Biotechnology Information (NCBI), a division of
the U.S. National Library of Medicine. You can
browse organisms, or use them to formulate
- Gene Ontology
Gene Ontology (GO)
annotations are integrated with gene data in CTD. In addition, GO terms that are statistically enriched among
genes/proteins that interact with a chemical are displayed for each chemical. You can
browse GO and use it to formulate
KEGG and REACTOME
pathway data describe known molecular interaction and reaction networks. These data are integrated with chemicals,
genes, and diseases in CTD to provide insights into molecular networks that may be affected by chemicals, and possible
mechanisms underlying environmental diseases.
You can browse pathways, or use them to formulate
chemical–gene interaction queries. Pathway
information is provided for chemical, gene, and disease detail pages. Pathways that are statistically enriched
among genes/proteins that interact with a chemical are displayed for each chemical.
CTD is working to enhance the capacity to identify environment–disease connections by developing an Exposure
Ontology (ExO) that will be used to curate and present exposure data.