Skip to main content

NCBI Resources: GEO DataSets

Contains information about the NCBI databases to be used as a teaching tool.

GEO Datasets: Unique Identifiers

  • GEO DataSet (GDS) - Curated data set derived from investigator submitted data
  • Series (GSE) – List of expression profiles that conducted for the experiment (test, control, replicates)
  • Samples (GSM) – Information about the biological samples used in the experiments, including extraction procedures
  • Platforms (GPL) – What platform the samples were run on (like Affymetrix Mouse Genome 430 2.0 Array

Boolean Operators

Boolean operators allow you to combine search terms: 

  • AND: Finds documents that contain both terms.
  • OR: Finds documents that contain either term. 
  • NOT: Finds documents that contain the term on the left but not the term on the right.

Advanced Search

By default, Entrez searches your text in "All fields", which looks for the text anywhere in the entry. So if you're getting irrelevant results, try limiting your text to a particular field.

Don't know what fields you can search for? Use the Advanced search

1. Click on advanced beneath the search bar. 

2. Click the drop down menu under Builder to see what fields are stored in gene records.

This is a comprehensive list of all fields you can search for. Fields will vary by database. Example shown here is for the Gene database. 

What does curation mean?

Curation is the process of converting a GEO Series (GSExxx) to a GEO dataset (GDSxxxx).

A GEO Series (GSExxx) is all the experimental data submitted by the investigator. A GEO DataSet is a collection of comparable Samples processed using the same Platform and includes experimental variables. Both of these entries are searchable in the GEO DataSets Database, but advanced analysis features are available only for GEO DataSets. GEO2R is useful for analyzing uncurated datasets.

How do I download the data?

Whether the data in GEO DataSets is curated or not, you can still download the data. 

Curated: on the right of the page under the heat map

Uncurated: at the end of the record

Need help? Ask Wlad

Searching GEO Datasets

You can search GEO DataSets by:

To see the full list of options, go to the Advanced search

Available fields are listed in the drop down menus on the resulting page, and you can see what values are in these fields by clicking Show index list to the right of the search box.

GEO DataSet Search Results

After submitting your search, a list of results will appear with gene records that match your query. The results page is divided into 3 columns:

  1. Filters: more ways to limit your search by things like genome location
  2. Results: a list of the Gene records that match your query with basic information
  3. Discovery Column: links to other databases and Search details

Use the filters on the left to limit your data set to certain entry types (red box), Organism (orange arrow), study type (blue box) or attribute name (purple box)

The records that meet your search criteria are in the center column. The display shows the title of the record, a description of the experiment, and the unique identifiers for the associated records, as well as links to external resources.

The far right column is the discovery column. It also contains links to other databases, and your search strategy (yellow box).

To get to an individual GEO DataSet record of interest, click on the title of the dataset (blue link to the right of the check box).

The result will appear in one of two ways, depending on whether it has been curated by NCBI staff or not.

GEO Datasets Records (curated)

Curated: Expression data that has been tidied up by curators at NCBI so it ban be used with analysis tools like finding the expression of a particular gene in the DataSet, cluster heat maps, and comparing sets of samples (ex: treatment vs. control).  


GEO Series (uncurated)

Uncurated: data submitted by the investigator who generated the data. There are no native tools to analyze this data, but you can use GEO2R to analyze these data.