Use the Set Analyzer tool to perform analyses such as set-based enrichment for collections of chemicals or genes, and pathway generation for collections of genes.
Select this option to analyze a set of chemicals that you enter in Step 2. You may specify chemicals by MeSH® name, synonym, or accession ID (“MESH:…”), or by CAS RN.
Select this option to analyze a set of genes that you enter in Step 2. You may specify genes by NCBI symbols or accession IDs (“GENE:…”).
Enter or paste the names/symbols or accession IDs (see above) for your set of chemicals or genes.
You may separate terms by returns, tabs, or vertical bars (|).
Displays the diseases (MEDIC terms) that are statistically enriched among your input genes/proteins. A disease is considered enriched if the proportion of genes annotated to it in a test set is significantly larger than the proportion of all genes annotated to it in the genome.
Many of the genes/proteins with curated chemical interactions in CTD are associated with human diseases. To provide insight into the diseases that may be influenced by a particular chemical, for example, you can run this analysis using the genes/proteins that have curated interactions with that chemical.
Displays the GO terms that are statistically enriched among your input genes/proteins. The enrichment calculations consider only human GO annotations. A GO term is considered enriched if the proportion of genes annotated to it in a test set is significantly larger than the proportion of all genes annotated to it in the genome.
Many genes/proteins with curated chemical interactions in CTD have GO annotations that provide information about their associated biological processes, molecular functions, and cellular components. To provide insight into the biological properties that may be affected by a chemical, for example, you can run this analysis using the genes/proteins that have curated interactions with that chemical.
Displays the pathways that are statistically enriched among your input genes/proteins. A pathway is considered enriched if the proportion of genes annotated to it in a test set is significantly larger than the proportion of all genes annotated to it in the genome.
Many of the genes/proteins with curated chemical interactions in CTD are represented in KEGG and REACTOME pathway maps that represent molecular interaction and reaction networks. To provide insight into the pathways and networks that may be affected by a particular chemical, for example, you can run this analysis using the genes/proteins that have curated interactions with that chemical.
Displays the gene–gene and protein–protein interactions (from BioGRID) among your input genes/proteins.
CTD represents gene–gene interactions from BioGRID[2] that consist of genetic and protein interactions curated from primary literature for all major model organisms by BioGRID curators.
For enrichment analyses, you must specify a significance threshold—a corrected or raw p-value—for the results. Only enriched terms with a p-value less than your setting will appear.
We recommend that you interpret the significance of whether a particular disease, GO term, or pathway is enriched based on the corrected p-value. By default, the tool uses a corrected p-value threshold of 0.01.
The significance of enrichment is calculated by the hypergeometric distribution and adjusted for multiple testing using the Bonferroni method.[1] Depending on whether you selected the enrichment to be calculated for diseases, GO terms, or pathways, Set Analyzer iterates over the list of diseases, GO terms, or pathways annotated to the gene set. The hypergeometric distribution is used to calculate the probability that the fraction of genes annotated to the GO term or pathway is significantly higher than the fraction of all human genes annotated to that disease, GO term, or pathway in the genome.
For a particular disease, GO term, or pathway, the probability from the hypergeometric distribution is the raw p-value. As with any enrichment analysis, the raw p-value needs to be corrected for multiple testing as the number of false positives is proportional to the number of enrichment tests performed and the raw p-value threshold applied. The most conservative approach for multiple testing correction is the Bonferroni method where the raw p-value is corrected by multiplying it by the number of tests. In this case, the number of tests is the number of diseases, GO terms, or pathways that are annotated to one or more genes in your input list. Corrected p-values greater than 1.0 are displayed as 1.0.
For each enriched disease, GO term, or pathway with a p-value less than the value you specified, the following information is displayed:
You may visualize the pathways of interacting genes by either:
The XGMML file can be viewed using Cytoscape, an open-source pathway visualization application:
The following data is presented for each gene/protein interaction:
Sort these data differently by clicking a column heading.
Save these data into a comma-separated values (CSV), Excel, XML, or tab-separated values (TSV) file by clicking a Download link at the bottom of the table.