Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Biomedical Knowledge Graph Benchmark

100 queries. 4 knowledge graphs. 74 million nodes. 1 billion edges. One query language.

Samyama’s biomedical benchmark demonstrates real-world cross-knowledge-graph queries across the largest open biomedical dataset we know of — unified in a single queryable graph on commodity hardware.

The Dataset

Knowledge GraphSourceNodesEdgesKey Entities
PubMed/MEDLINENLM66.2M1.04BArticle, Author, MeSHTerm, Chemical, Journal, Grant
Clinical TrialsAACT/ClinicalTrials.gov7.8M27MClinicalTrial, Intervention, AdverseEvent, Site, Outcome, Sponsor, Condition, Drug
PathwaysReactome119K835KProtein, Pathway, Complex, Reaction, GOTerm
Drug InteractionsDrugBank + ChEMBL + SIDER + DGIdb245K388KDrug, Gene, SideEffect, Indication, AdverseEvent, Bioactivity
NCT BridgeAACT study_references1M+REFERENCED_IN (Article → ClinicalTrial)
Total74.3M1.07B

Results Summary

100 queries executed. 96 returned data. 3 returned empty (specific data not in snapshot). 1 timed out.

CategoryQueriesPassEmptyError
PubMed353311
Clinical Trials201910
Pathways151500
Drug Interactions151410
Cross-KG151500
Total1009631

Infrastructure

  • Instance: r6a.8xlarge (32 vCPU, 256 GB RAM, AMD EPYC)
  • Cost: ~$2.50 (AWS spot, ap-south-1)
  • Import time: 31 minutes from v2 snapshots
  • NCT bridge: 1,018,483 REFERENCED_IN edges created in 109 seconds
  • Index creation: 10 indexes in ~320 seconds

PubMed Queries (35)

Point Lookups

IDQueryTimeRowsResult
PM01Article PMID=1234567819.6s1“Denpasar Declaration on Population and Development” (1994)
PM02Article PMID=250000001.3s1“How to measure technology assessment: an introduction”
PM03Article PMID=350000001.3s1“Hypusinated EIF5A as a feasible drug target…” (2022)
PM04Article PMID=1 (oldest)1.3s1“Formate assay in body fluids: application in methanol poisoning”

1-Hop Traversals

IDQueryTimeRowsResult
PM06Article → Authors13.3s1Arie Hasman
PM07Article → MeSH terms2.4s6Attitude of Health Personnel, Consumer Behavior, Health Information Systems, Medical Informatics, Technology Assessment
PM09Article → Journal1.3s1Studies in health technology and informatics
PM11Author → Articles (reverse)1.3s10Arie Hasman’s publications
PM12MeSH → Articles2.0s10Articles annotated with “Neoplasms”
PM13Chemical → Articles1.4s10Articles mentioning “Aspirin”
PM14Journal → Articles1.4s10Articles in “Nature”
PM17Articles citing a specific article1.3s1Citation found

Multi-Hop Analytics

IDQueryTimeRowsTop Result
PM15Co-authors of an article1.3s10Co-author network for PMID 25000000
PM19MeSH co-occurrence (Neoplasms)17.5s10Humans (513,845), Female (138,966), Animals (127,482)
PM20MeSH co-occurrence (Diabetes)6.2s10Humans (139,986)
PM21Chemical co-occurrence (Aspirin)2.3s10Platelet Aggregation Inhibitors (12,722)
PM22Author collaboration network1.3s10Jan Talmon (10 co-authored papers)

Aggregations

IDQueryTimeRowsTop Result
PM23Top authors (Smith*)32.1s10Smith Giri (233 papers)
PM24Most cited articles42.7s10PMID 20000334 (461 citations)
PM25ML publication trend8.4s182020: 4,974 papers
PM26Top cancer journals5.3s10Cancer (6,059 articles)
PM27Cancer funding agencies3.9s10NCI NIH HHS (46,137 papers)
PM29Diabetes funding agencies8.7s10NIDDK NIH HHS (5,123 papers)
PM30Most published journals6.0s10Nature (140,152 articles)
PM31Chemical mentions in Nature2.3s10DNA (3,726 mentions)
PM32MeSH terms for NCI articles63.7s10Humans (438,053)
PM35ML prolific authors5.4s10Wei Wang (149 papers)

Clinical Trials Queries (20)

Sample Queries

IDQueryTimeRowsResult
CT01Trial interventions2.2s10NCT05524376 → no intervention, NCT03092076 → Ticagrelor
CT02Trial adverse events1.6s10NCT02028182 → Pruritus, NCT03790111 → Nausea
CT03Trial sites1.6s10Switzerland, Canada, China, Denmark
CT05Trial sponsors1.6s10Sun Yat-Sen Memorial Hospital
CT06Trial conditions1.6s10Sepsis

Aggregations

IDQueryTimeRowsTop Result
CT08Trials per country15.0s15United States (190,879 trials)
CT09Most common interventions10.1s15Placebo (41,155 trials)
CT10Most common adverse events16.3s15Headache (28,130 trials)
CT11Most studied conditions8.2s15Healthy (10,898 trials)
CT12Top sponsors7.1s10Assiut University (4,547 trials)

Complex Queries

IDQueryTimeRowsTop Result
CT13Cancer trial interventions159s10Placebo (1,597)
CT14Diabetes trial adverse events150s10Headache (970)
CT16Condition → Intervention → AE chain120s10Hypertension → Placebo → Headache (116)
CT17Cancer trial sponsors163s10National Cancer Institute (1,320)
CT20Multi-arm trials with AE142s10NCT01682876 (4 arms) → Nausea

Pathways Queries (15)

IDQueryTimeRowsTop Result
PW01Protein interactions9.7s10IGKV2D-28 → IGHV3-11, IL17A
PW03Insulin pathways1.4s9Insulin effects on Xylulose-5-Phosphate synthesis
PW04Protein lookup (TP53)1.2s1TP53 found
PW05Largest pathways1.8s10Signal Transduction (2,614 proteins), Disease (2,575), Immune System (2,330)
PW06Most connected proteins1.7s10TP53 (571 interactions) — the “guardian of the genome”
PW08GO term annotations1.4s10IGKV2D-28 → adaptive immune response
PW11Pathway hierarchy1.4s102-LTR circle formation → Integration of provirus
PW14Immune system proteins1.3s20Full list
PW15Protein interaction depth 21.3s102-hop interaction network from IGKV2D-28

Drug Interactions Queries (15)

IDQueryTimeRowsTop Result
DI01Drug side effects1.3s10Bivalirudin → Abdominal pain, Anaemia
DI02Drug-gene interactions1.3s10Cetuximab → gene targets
DI03Drug indications1.3s10Bivalirudin → Haemorrhage
DI04Drug adverse events1.3s10Cetuximab → adverse events
DI06Drugs with most side effects1.4s10Pregabalin (839 side effects)
DI07Most common side effects1.5s10Nausea (985 drugs)
DI08Drugs sharing gene targets1.3s10Cetuximab ↔ Erythropoietin (shared gene)
DI09Drugs for diabetes1.4s10Desmopressin → Diabetes insipidus
DI13Drug indications + side effects103s10Bivalirudin: Haemorrhage (indication) + Abdominal pain (side effect)

Cross-Knowledge-Graph Queries (15)

These traverse REFERENCED_IN edges connecting 747,505 PubMed articles to clinical trials via PMID↔NCT ID mapping from AACT study_references.

The Headline Results

-- XK02: What drugs are tested in cancer research trials?
-- Spans: MeSH → Article → ClinicalTrial → Intervention (3 KGs)
MATCH (m:MeSHTerm)<-[:ANNOTATED_WITH]-(a:Article)
      -[:REFERENCED_IN]->(t:ClinicalTrial)-[:TESTS]->(i:Intervention)
WHERE m.name = 'Neoplasms'
RETURN i.name, count(DISTINCT t) AS trials ORDER BY trials DESC LIMIT 10
InterventionTrials
Placebo521
Pembrolizumab137
Carboplatin106
Paclitaxel106
Cyclophosphamide98

Time: 5.2s — Pembrolizumab (Keytruda) is the most-tested non-placebo cancer drug.


-- XK03: What drugs are tested in diabetes research trials?
MATCH (m:MeSHTerm)<-[:ANNOTATED_WITH]-(a:Article)
      -[:REFERENCED_IN]->(t:ClinicalTrial)-[:TESTS]->(i:Intervention)
WHERE m.name = 'Diabetes Mellitus'
RETURN i.name, count(DISTINCT t) AS trials ORDER BY trials DESC LIMIT 10
InterventionTrials
Placebo324
Metformin70
Usual care50
Insulin25
Exercise23

Time: 2.4s


-- XK04: What adverse events appear in heart disease trials?
MATCH (m:MeSHTerm)<-[:ANNOTATED_WITH]-(a:Article)
      -[:REFERENCED_IN]->(t:ClinicalTrial)-[:REPORTED]->(ae:AdverseEvent)
WHERE m.name = 'Heart Diseases'
RETURN ae.term, count(DISTINCT t) AS trials ORDER BY trials DESC LIMIT 10
Adverse EventTrials
Headache60
Nausea56
Syncope51
Pneumonia49

Time: 2.0s


-- XK06: What adverse events appear in Metformin-linked trials?
-- Spans: Chemical → Article → ClinicalTrial → AdverseEvent (4 entities)
MATCH (c:Chemical)<-[:MENTIONS_CHEMICAL]-(a:Article)
      -[:REFERENCED_IN]->(t:ClinicalTrial)-[:REPORTED]->(ae:AdverseEvent)
WHERE c.name = 'Metformin'
RETURN ae.term, count(DISTINCT t) AS trials ORDER BY trials DESC LIMIT 10
Adverse EventTrials
Headache215
Nausea207
Nasopharyngitis186
Diarrhoea185

Time: 2.1s — Diarrhoea is a known Metformin side effect, confirmed across PubMed + ClinicalTrials.gov.


All Cross-KG Results

IDQueryTimeRowsTop Result
XK01Article → Trial links39.8s10PMID 1 → NCT03260829
XK02Cancer → Trial interventions5.2s10Pembrolizumab (137 trials)
XK03Diabetes → Trial interventions2.4s10Metformin (70 trials)
XK04Heart disease → Trial AE2.0s10Headache (60 trials)
XK05Aspirin → Trials1.5s10NCT00000491 “Aspirin MI study”
XK06Metformin → Trial AE2.1s10Headache (215), Diarrhoea (185)
XK07Cancer trial sites3.8s10US (4,062), China (1,170), France (827)
XK08NCI-funded → Interventions19.4s10Placebo (933), Lab biomarker (614), Cyclophosphamide (517)
XK09NCT-linked count98.6s1747,505 articles linked to trials
XK11HIV → Trial sites11.6s10US (2,384)
XK12Alzheimer → Interventions2.4s10Placebo (345)
XK13NHLBI-funded → Trial AE18.3s10Headache (643)
XK14Paclitaxel → Trial sponsors1.9s10NCI (64 trials)
XK15Breast cancer → Outcomes4.0s16,591 outcome measures

What These Results Mean

  1. Pembrolizumab dominates cancer trials — The immunotherapy revolution is visible in the data. Across all PubMed articles annotated with “Neoplasms” that link to clinical trials, Keytruda appears in 137 trials, more than any classic chemotherapy agent.

  2. Metformin’s GI side effects confirmed cross-database — Diarrhoea ranks 4th in Metformin-linked trial adverse events (185 trials), consistent with clinical knowledge. This was found by traversing Chemical → Article → Trial → AdverseEvent — four entities across two databases.

  3. The US conducts 45% of global clinical trials — 190,879 of ~420K total trials. China is second at 1,170 cancer-specific trials.

  4. Nature has 140,152 articles in PubMed — making it the most-indexed journal. DNA is its most-mentioned chemical (3,726 articles).

  5. TP53 is the most connected protein — 571 interaction partners in Reactome, confirming its role as the “guardian of the genome.”

  6. NCI NIH funds 46,137 cancer research papers — and those papers link to 933 placebo-controlled trials.

Query Files

The full query catalog and results are available as CSV for automated benchmarking:

Reproducing

All knowledge graph snapshots and the benchmark runner are included in Samyama Graph Enterprise Edition. Contact us to access the pre-built snapshots and benchmark tooling.