In the previous article Connecting Clinical Trials to Research Articles , we have seen how to search PubMed database by specifying clinical trial id(s) and retrieve all the relevant journal articles. In this article, let’s learn about the association of symptoms and diseases, and Phenotype-Genotype.
Importance of symptom and disease relationship
Disease is an abnormal condition that negatively affects the functionality of an organism. Symptom is a physical or mental feature which can indicate a condition. The relation between the diseases and their symptoms are important to diagnose any disease. This information is also useful for medical research purposes.
Each article in the PubMed is associated to metadata that includes major topics of the article. By using a perl script with the NCBI E-utilities, we can retrieve PubMed identifiers of any symptom and disease terms.The symptom and disease terms are defined by MeSH. We can find an association between symptoms and diseases by using the PubMed ids.
Program: The below program gives the pubmed identifiers of co-occurrence of symptoms and diseases.
Input file for Diseases:
Input file for Symptoms:
Output file: The output file contains list of pubmed identifiers of co-occurrence of diseases and symptoms.
Once the association between diseases and symptoms are identified, we can find the phenotype and genotype information based on symptoms. Let’s take a look at “Phenotype-Genotype” integrator.
Phenotype is the composite of the organism’s observable characteristics.
Genotype is the part of the genetic makeup of a cell which determines one of its characteristics.
What is Phegeni ?
Phegeni is a web interface that integrates various genomic databases with genome wide association study (GWAS).
The genomic databases are from National Center for Biotechnology Information (NCBI) and the association data from National Human Genome Research Institute (NHGRI). Here, the phenotype terms are MESH terms .
- The GWAS is a study of a genome-wide set of genetic variants in different individuals to observe if any variant is associated with phenotype/trait.
- Clinicians and epidemiologists are interested with the results of GWAS because it helps to study design considerations and generation of biological hypotheses.
- GWAS consists of various results that is SNP rs,Gene ,Gene ID,Gene2,Gene ID2,Chromosome and Pubmed ids.
Phegeni Association results file:
Downloading all associate results at PheGeni browser and sample file looks as below.
The Association results can be accessed from here – https://www.ncbi.nlm.nih.gov/gap/phegeni
Program: The below program gives the list of SNP rs,Gene ,Gene ID,Gene2,Gene ID2,Chromosome and Pubmed ids of respective phenotype term.
Input file contains a list of phenotype search terms based on MESH and the sample file looks as below.
Output file: Output file contains list of SNP rs,Gene ,Gene ID,Gene2,Gene ID2,Chromosome and Pubmed ids of respective phenotype term.
In this way,we can retrieve genetic variants related to any Phenotype(s).
All the files (input, script and results) of symptom disease relationship,we have used in the above example are available on GitHub and can be downloadable from https://github.com/VaidhyaMegha/SymptomDiseaseRelationships
All the files (input, script and results) of phegeni,we have used in the above example are available on GitHub and can be downloadable from https://github.com/VaidhyaMegha/Phegeni