Skip to main content

NCBI Resources: dbGAP

Contains information about the NCBI databases to be used as a teaching tool.

The database of Genotypes and Phenotypes (dbGaP) contains studies that have investigated the interaction of genotype and phenotype. 

dbGaP Unique Identifiers

  • Analysis ID:    phaXXXXXX
  • Dataset ID:     phtXXXXXX
  • Document ID: phdXXXXXX
  • Study ID:        phsXXXXXX       (phs000159.v7.p4)
  • Variable ID:     phvXXXXXX

Boolean Operators

Boolean operators allow you to combine search terms: 

  • AND: Finds documents that contain both terms.
  • OR: Finds documents that contain either term. 
  • NOT: Finds documents that contain the term on the left but not the term on the right.

Advanced Search

By default, Entrez searches your text in "All fields", which looks for the text anywhere in the entry. So if you're getting irrelevant results, try limiting your text to a particular field.

Don't know what fields you can search for? Use the Advanced search

1. Click on advanced beneath the search bar. 

2. Click the drop down menu under Builder to see what fields are stored in gene records.

This is a comprehensive list of all fields you can search for. Fields will vary by database. Example shown here is for the Gene database. 

Need help? Ask Wlad

Searching dbGaP

you can Search dbGaP by

You can combine these fields to refine your search strategies using Boolean operators

Search Results

The Search results page has tabs for Studies, Variables, Study documents, Analyses, and Datasets records that match your search. We will focus on the Studies tab.

The Studies tab contains:

  • Study ID and name -The unique identifier for the study and the study's title
  • Embargo Release - whether the data is available or not. Submitters can embargo the data before publication.
  • Details- Colored square is shaded if info isn't available for this study
    • V - Variables: what was measured in the study
    • D - Documents: documentation about the study
    • A - Analysis: types of analyses performed on the data
    • S - Sequence read archive (SRA) data
  • Number of participants - how many people participated in the study
  • Type of Study - study design (case control, twin study, etc) 
  • Links - links to other NCBI databases with information about this study (BioProject, BioSample, MeSH, PubMed, etc)
  • Platform (technical platform used to identify the SNPs for GWAS)

Unlike most of the NCBI databases, dbGaP does not provide filters in the left side bar. You can make your own, which will appear at the top of the right side bar. See the myNCBI tutorial for details.

Click on the Study ID or name to get more information about the study.

Study Page

The study page has 6 sections: 

  • Study - gives general information about the study, access, investigators, and links to other NCBI resources
  • Variables - information about variables assessed in the study
  • Documents - usually questionaires and forms
  • Analyses - how the data were processed to make conclusions
  • Datasets - Description, unique ids, and listing of all available data for each subject
  • Molecular Data - Tabular list of molecular data generated in each arm of the study.

Click the following link to see an example record: http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001058.v1.p1