Data Sources

Transparency about where our compound data comes from and how we process it.

NIHPubChem

PubChem

Primary Data Source

All compound data displayed on CompoundLookup is sourced from PubChem, the world's largest free chemistry database. PubChem is maintained by the National Center for Biotechnology Information (NCBI), which is part of the National Library of Medicine (NLM) at the National Institutes of Health (NIH).

PubChem Statistics:

  • • Over 115 million unique chemical structures
  • • Data from 850+ contributing organizations
  • • Updated daily with new compounds and data
  • • Freely accessible to everyone worldwide
Visit PubChem

How We Process Data

1

Data Extraction

We download compound data from PubChem's public FTP server, which includes molecular formulas, IUPAC names, molecular weights, and structural information for millions of compounds.

2

Element Indexing

We parse each compound's molecular formula to identify which elements it contains. We then create an index that maps element combinations to compounds. This is what enables our unique element-based search functionality.

3

Database Organization

Compounds are organized into 200,000+ unique element combinations. Each combination (like "Carbon + Hydrogen" or "Oxygen + Nitrogen + Sulfur") has its own searchable page listing all matching compounds.

4

Regular Updates

We periodically update our database with new compounds from PubChem. This ensures our data stays current with the latest chemical discoveries and additions to PubChem's database.

What Data We Include

For each compound in our database, we display the following information:

Molecular Formula

The chemical formula showing elements and their quantities (e.g., H₂O, C₆H₁₂O₆)

IUPAC Name

The standardized chemical name according to IUPAC nomenclature

Molecular Weight

The mass of one mole of the compound in grams per mole (g/mol)

PubChem CID

The unique identifier for the compound in PubChem's database

How to Cite

If you use data obtained through CompoundLookup in academic work, please cite PubChem as the original data source:

Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2023 update. Nucleic Acids Res. 2023 Jan 6;51(D1):D1373-D1380. doi: 10.1093/nar/gkac956. PMID: 36305812; PMCID: PMC9825602.

You may also mention CompoundLookup as the tool used to access the data, but the primary citation should always be to PubChem.

Data Accuracy & Limitations

Important Notice

While we strive for accuracy, CompoundLookup should be used for educational and preliminary research purposes. For critical applications, always verify data directly from PubChem or other primary sources.

  • • Data may not be 100% complete or up-to-date at all times
  • • Some compound names may be simplified or abbreviated
  • • Always double-check data for professional or safety-critical use

External Resources