Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Public Health Knowledge Graph Benchmark

40 queries. 2 knowledge graphs. 305K nodes. 305K edges. Cross-KG federation with health system capacity data.

The public health benchmark demonstrates Samyama’s ability to load, query, and cross-correlate population health determinants with health system capacity data — answering questions from “Why are populations vulnerable?” to “What capacity exists to respond?”

The Dataset

Knowledge GraphSourceNodesEdgesKey Entities
Health DeterminantsWorld Bank WDI + WHO Air Quality + WHO Water/Sanitation285,635285,628Country, Region, SocioeconomicIndicator, EnvironmentalFactor, NutritionIndicator, DemographicProfile, WaterResource
Health SystemsWHO SPAR v2 + WHO NHWA (workforce)19,66119,428Country, EmergencyResponse, HealthWorkforce
Total305,296305,056

Data coverage:

  • 211 countries with World Bank indicators across 5 categories (1990–2024)
  • 233 countries with IHR SPAR v2 capacity scores + health workforce density (2020–2023)
  • 80 curated WDI indicators: socioeconomic (16), environmental (10), nutrition (10), demographic (15), water (10)
  • 1,880 country-year PM2.5 air quality records from WHO GHO
  • 47,531 water/sanitation records (safely managed + basic water/sanitation)
  • 11,008 health workforce density records (physicians, nurses, dentists, pharmacists)
  • 15 IHR capacities: legislation, coordination, financing, laboratory, surveillance, human resources, emergency management, health services, IPC, risk communication, points of entry, zoonotic, food safety, chemical, radiation

Results Summary

40 queries executed. 40 returned data. 0 empty. 0 errors.

CategoryQueriesPassEmptyErrorMedian (ms)
Health Determinants20200027.4
Health Systems10100012.2
Cross-KG / Cross-Indicator101000190.7
Total40400015.1

Performance

Infrastructure: MacBook Pro M-series, single process, in-memory graph loaded from .sgsnap snapshots.

MetricValue
Snapshot import (Health Determinants)1.6s (239K nodes, 240K edges)
Snapshot import (Health Systems)0.1s (8.7K nodes, 8.4K edges)
Point lookups1.3–4.2 ms
Single-hop traversals1.8–24.7 ms
Aggregation scans10.6–133.5 ms
Cross-indicator joins105–478 ms
All 40 queries total~3.6s

Query Categories

Health Determinants (HD01–HD20)

Point lookups, indicator traversals, cross-category analysis, and regional aggregations over World Bank WDI data.

IDNameTime (ms)Rows
HD01Country by ISO code4.22
HD02Country by name1.92
HD03All regions1.37
HD04Countries in a region18.16
HD05GNI per capita for India (latest)2.21
HD06India poverty rate trend2.25
HD07PM2.5 air pollution for India1.95
HD08Energy use per capita top 1056.210
HD09Stunting prevalence Nigeria2.05
HD10Countries with high stunting (>30%)36.78
HD11Population top 10 countries133.510
HD12Life expectancy Brazil trend2.510
HD13Infant mortality worst countries119.910
HD14Water stress high countries53.610
HD15Safe drinking water lowest59.610
HD16India all indicators for 20222.311
HD17India environmental profile1.94
HD18Poverty vs life expectancy478.110
HD19Health expenditure vs infant mortality350.910
HD20Regional average GNI per capita238.27

Health Systems (HS01–HS10)

IHR SPAR capacity queries — country profiles, rankings, gaps, and trends.

IDNameTime (ms)Rows
HS01Country by ISO code1.82
HS02SPAR scores for India11.614
HS03SPAR scores for Nigeria11.414
HS04Top 10 by surveillance capacity10.810
HS05Countries with lowest lab capacity11.210
HS06Average SPAR score per country12.210
HS07Countries with lowest avg SPAR10.610
HS08India SPAR trend over years16.04
HS09Risk communication capacity gap (<40)12.710
HS10Count countries per SPAR year18.04

Cross-KG / Cross-Indicator (PH01–PH10)

Queries spanning multiple indicator categories or joining determinants with health system capacity data.

IDNameTime (ms)Rows
PH01India vulnerability + SPAR capacity11.914
PH02High poverty countries SPAR scores105.220
PH03PM2.5 vs surveillance capacity66.35
PH04Water stress + health expenditure274.610
PH05Stunting + infant mortality correlation315.610
PH06Urbanization vs life expectancy404.610
PH07Under-5 mortality + safe water access309.810
PH08Low income full profile + SPAR106.120
PH09Health expenditure vs infant mortality348.510
PH10Nigeria full determinants profile2.08

Sample Query Results

PH02: High poverty countries — do they have health capacity?

MATCH (c1:Country)-[:HAS_INDICATOR]->(pov:SocioeconomicIndicator)
WHERE pov.indicator_code = 'SI.POV.DDAY' AND pov.value > 20
WITH c1.iso_code AS iso, pov.value AS poverty
ORDER BY poverty DESC LIMIT 5
MATCH (e:EmergencyResponse)-[:CAPACITY_FOR]->(c2:Country)
WHERE c2.iso_code = iso AND e.year = 2023
RETURN c2.name, poverty, e.capacity_code, e.score
ORDER BY poverty DESC

This query identifies countries with >20% extreme poverty and correlates with their IHR emergency preparedness scores — answering “Are the most vulnerable populations in countries with the least health system capacity?”

PH07: Under-5 mortality + safe water access

MATCH (c:Country)-[:DEMOGRAPHIC_OF]->(d:DemographicProfile)
WHERE d.indicator_code = 'SH.DYN.MORT' AND d.year = 2022
WITH c, d.value AS u5_mortality
MATCH (c)-[:WATER_RESOURCE_OF]->(w:WaterResource)
WHERE w.indicator_code = 'SH.H2O.SMDW.ZS' AND w.year = 2022
RETURN c.name, u5_mortality, w.value AS safe_water_pct
ORDER BY u5_mortality DESC LIMIT 10

Central African Republic tops both worst lists: 387 under-5 deaths per 1,000 and only 6.1% safely managed drinking water.

6-KG Federation Vision

These two KGs complete the public health trifecta alongside the existing Surveillance KG (217K nodes). Together with the biomedical trifecta (Pathways + Clinical Trials + Drug Interactions), Samyama now supports queries spanning molecular biology to population health:

PUBLIC HEALTH TRIFECTA              BIOMEDICAL TRIFECTA
Surveillance (217K)  ←──────────→  Clinical Trials (7.7M)
         ↓ Country.iso_code               ↑ Drug
Health Determinants (240K)         Drug Interactions (245K)
         ↓ Country.iso_code               ↑ Gene
Health Systems (8.7K) ←─────────→  Pathways (119K)

Bridge property: Country.iso_code (ISO 3166-1 alpha-3) links all public health KGs. Cross-trifecta bridges use Disease.icd_code, Drug.drugbank_id, and Pathogen → Gene mappings.

Data Files

FileDescription
health-determinants-queries.csv20 Health Determinants queries
health-systems-queries.csv10 Health Systems queries
public-health-cross-kg-queries.csv10 Cross-KG queries
public-health-results.csvAll 40 query results with timing

Snapshots

Available for download:

SnapshotSizeSource
health-determinants.sgsnap6.5 MBGitHub Release kg-snapshots-v6
health-systems.sgsnap0.5 MBGitHub Release kg-snapshots-v6
Also on S3s3://samyama-data/snapshots/