Keyboard shortcuts

Press โ† or โ†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Knowledge Graph Catalog

Samyama ships with 8 pre-built knowledge graphs spanning biomedicine, public health, sports, and industrial operations. Each KG is available as a portable .sgsnap snapshot that loads in seconds, stored on S3 (s3://samyama-data/snapshots/) and GitHub Releases. All KGs are cataloged in the Supabase kg_registry table.


Catalog Overview

graph TB
    subgraph "Biomedical Trifecta"
        PKG["๐Ÿงฌ Pathways KG<br/>119K nodes ยท 835K edges"]
        CTKG["๐Ÿ’Š Clinical Trials KG<br/>7.8M nodes ยท 27M edges"]
        DIKG["๐Ÿ’‰ Drug Interactions KG<br/>245K nodes ยท 388K edges"]
    end

    subgraph "Public Health Trifecta"
        DSKG["๐Ÿฆ  Disease Surveillance KG<br/>217K nodes ยท 241K edges"]
        HDKG["๐Ÿ“Š Health Determinants KG<br/>286K nodes ยท 286K edges"]
        HSKG["๐Ÿฅ Health Systems KG<br/>20K nodes ยท 19K edges"]
    end

    subgraph "Sports"
        CKG["๐Ÿ Cricket KG<br/>36K nodes ยท 1.4M edges"]
    end

    subgraph "Industrial"
        AOKG["๐Ÿญ AssetOps KG<br/>781 nodes ยท 955 edges"]
    end

    PKG -.->|"Gene ยท Protein"| DIKG
    DIKG -.->|"Drug ยท Intervention"| CTKG
    DSKG -.->|"Disease ยท Condition"| CTKG
    DSKG -.->|"Drug ยท AMR"| DIKG
    DSKG -.->|"Region"| HDKG
    DSKG -.->|"Region"| HSKG

    style CKG fill:#3b82f6,stroke:#333,color:#fff
    style PKG fill:#10b981,stroke:#333,color:#fff
    style CTKG fill:#8b5cf6,stroke:#333,color:#fff
    style DIKG fill:#ec4899,stroke:#333,color:#fff
    style AOKG fill:#f59e0b,stroke:#333,color:#fff
    style DSKG fill:#06b6d4,stroke:#333,color:#fff
    style HDKG fill:#84cc16,stroke:#333,color:#fff
    style HSKG fill:#f97316,stroke:#333,color:#fff
KGNodesEdgesLabelsEdge TypesSnapshotSourceStatus
Cricket KG36,6191,392,01761221 MBCricsheetLive
Pathways KG118,686834,785599 MBReactome, STRING, GOLive
Clinical Trials KG7,774,44626,973,9971525711 MBAACT, MeSH, RxNormLive
Drug Interactions KG245,000388,000898.1 MBDrugBank, SIDER, ChEMBL, DGIdb, OpenFDALive
AssetOps KG781955810< 1 MBSynthetic (AssetOpsBench)Live
Disease Surveillance KG216,553241,084655.7 MBWHO GHOLive
Health Determinants KG285,635285,628766.5 MBWorld Bank WDI, WHO Air Quality, WHO WASHLive
Health Systems KG19,66119,428320.5 MBWHO SPAR, WHO NHWALive

Cross-KG Federation

The biomedical trifecta (Pathways + Clinical Trials + Drug Interactions) enables queries spanning molecular biology, translational medicine, and pharmacogenomics. With PubMed: 74.3M nodes, 1.07B edges โ€” 96/100 queries pass. BiomedQA benchmark: 98% accuracy with MCP tools.

The public health trifecta (Disease Surveillance + Health Determinants + Health Systems) enables queries from disease outbreaks to population vulnerability to health system capacity โ€” 40/40 queries pass. Bridges to the biomedical trifecta via Country.iso_code, Drug.drugbank_id, and Gene.gene_name.

Together, the 6-KG federation spans molecular biology to population health in a single OpenCypher query. See Cross-KG Federation for details.


Cricket KG

21K international cricket matches from Cricsheet โ€” ball-by-ball data spanning T20, ODI, and Test formats.

Cricket KG Demo

Click for full demo (1:56) โ€” Dashboard, Cypher Queries, and Graph Simulation

Schema

graph LR
    Player -->|BATTED_IN| Match
    Player -->|BOWLED_IN| Match
    Player -->|DISMISSED| Player
    Player -->|FIELDED_DISMISSAL| Player
    Player -->|PLAYED_FOR| Team
    Player -->|PLAYER_OF_MATCH| Match
    Team -->|COMPETED_IN| Match
    Team -->|WON| Match
    Team -->|WON_TOSS| Match
    Match -->|HOSTED_AT| Venue
    Match -->|IN_SEASON| Season
    Match -->|PART_OF| Tournament

    style Player fill:#3b82f6,stroke:#333,color:#fff
    style Match fill:#8b5cf6,stroke:#333,color:#fff
    style Team fill:#ef4444,stroke:#333,color:#fff
    style Venue fill:#f59e0b,stroke:#333,color:#fff
    style Tournament fill:#10b981,stroke:#333,color:#fff
    style Season fill:#ec4899,stroke:#333,color:#fff
LabelCountKey Properties
Match21,324date, match_type, season, winner
Player12,933name
Tournament1,053name
Venue877name, city
Team383name
Season49name

Example Queries

-- Top 10 run scorers across all formats
MATCH (p:Player)-[b:BATTED_IN]->(m:Match)
RETURN p.name AS player, sum(b.runs) AS total_runs
ORDER BY total_runs DESC LIMIT 10

-- Bowler-batsman rivalries
MATCH (bowler:Player)-[d:DISMISSED]->(victim:Player)
RETURN bowler.name, victim.name, count(d) AS times
ORDER BY times DESC LIMIT 10

-- Venue-team affinity (home advantage)
MATCH (t:Team)-[:WON]->(m:Match)-[:HOSTED_AT]->(v:Venue)
WITH t, v, count(m) AS wins WHERE wins >= 5
RETURN t.name, v.name, wins ORDER BY wins DESC LIMIT 15

Repository: samyama-ai/cricket-kg Snapshot: kg-snapshots-v1 (cricket.sgsnap, 21 MB)


Pathways KG

Biological pathways knowledge graph combining 5 open-license data sources โ€” Reactome, STRING, Gene Ontology, WikiPathways, and UniProt. Human-only (organism 9606).

Pathways KG Demo

Click for full demo (2:06) โ€” Dashboard, Cypher Queries, and Graph Simulation

Schema

graph LR
    Protein -->|PARTICIPATES_IN| Pathway
    Protein -->|CATALYZES| Reaction
    Protein -->|COMPONENT_OF| Complex
    Protein -->|ANNOTATED_WITH| GOTerm
    Protein -->|INTERACTS_WITH| Protein
    Pathway -->|CHILD_OF| Pathway
    GOTerm -->|IS_A| GOTerm
    GOTerm -->|PART_OF| GOTerm
    GOTerm -->|REGULATES| GOTerm

    style Protein fill:#3b82f6,stroke:#333,color:#fff
    style Pathway fill:#10b981,stroke:#333,color:#fff
    style GOTerm fill:#8b5cf6,stroke:#333,color:#fff
    style Reaction fill:#f59e0b,stroke:#333,color:#fff
    style Complex fill:#ef4444,stroke:#333,color:#fff
LabelCountKey Properties
GOTerm51,897go_id, name, namespace, definition
Protein37,990uniprot_id, name, gene_name
Complex15,963reactome_id, name
Reaction9,988reactome_id, name
Pathway2,848reactome_id, name, source
Edge TypeCountDescription
ANNOTATED_WITH265,492Protein โ†’ GO term annotation
INTERACTS_WITH227,818Protein-protein interaction (STRING, score โ‰ฅ 700)
PARTICIPATES_IN140,153Protein โ†’ Pathway membership
CATALYZES121,365Protein โ†’ Reaction catalysis
IS_A58,799GO term hierarchy
COMPONENT_OF8,186Protein โ†’ Complex membership
PART_OF7,122GO term part-of relation
REGULATES2,986GO term regulation
CHILD_OF2,864Pathway hierarchy

Repository: samyama-ai/pathways-kg Snapshot: kg-snapshots-v3 (pathways.sgsnap, 9 MB)


Clinical Trials KG

575K+ clinical studies from ClinicalTrials.gov enriched with MeSH disease hierarchy, RxNorm drug normalization, ATC drug classification, OpenFDA adverse events, and PubMed publications.

Schema

graph LR
    ClinicalTrial -->|STUDIES| Condition
    ClinicalTrial -->|TESTS| Intervention
    ClinicalTrial -->|HAS_ARM| ArmGroup
    ClinicalTrial -->|MEASURES| Outcome
    ClinicalTrial -->|SPONSORED_BY| Sponsor
    ClinicalTrial -->|CONDUCTED_AT| Site
    ClinicalTrial -->|REPORTED| AdverseEvent
    ClinicalTrial -->|PUBLISHED_IN| Publication
    ArmGroup -->|USES| Intervention
    Intervention -->|CODED_AS_DRUG| Drug
    Condition -->|CODED_AS_MESH| MeSHDescriptor
    Drug -->|TARGETS| Protein
    Drug -->|CLASSIFIED_AS| DrugClass
    Drug -->|TREATS| Condition
    Gene -->|ENCODES| Protein
    Gene -->|ASSOCIATED_WITH| Condition
    MeSHDescriptor -->|BROADER_THAN| MeSHDescriptor

    style ClinicalTrial fill:#8b5cf6,stroke:#333,color:#fff
    style Condition fill:#ef4444,stroke:#333,color:#fff
    style Intervention fill:#3b82f6,stroke:#333,color:#fff
    style Drug fill:#10b981,stroke:#333,color:#fff
    style Protein fill:#f59e0b,stroke:#333,color:#fff
    style Gene fill:#ec4899,stroke:#333,color:#fff
    style MeSHDescriptor fill:#06b6d4,stroke:#333,color:#fff
    style Publication fill:#84cc16,stroke:#333,color:#fff
LabelKey PropertiesSource
ClinicalTrialnct_id, title, phase, overall_status, enrollmentClinicalTrials.gov
Conditionname, mesh_id, icd10_codeClinicalTrials.gov
Interventionname, type (DRUG/DEVICE/โ€ฆ), rxnorm_cuiClinicalTrials.gov
Drugrxnorm_cui, name, drugbank_idRxNorm
Proteinuniprot_id, name, functionUniProt
Genegene_id, symbol, nameLinked ontologies
MeSHDescriptordescriptor_id, name, tree_numbersMeSH (NLM)
Sponsorname, class (INDUSTRY/NIH/โ€ฆ)ClinicalTrials.gov
Sitefacility, city, country, latitude, longitudeClinicalTrials.gov
Publicationpmid, title, journal, doiPubMed
AdverseEventterm, organ_system, is_seriousOpenFDA
ArmGrouplabel, type (EXPERIMENTAL/โ€ฆ)ClinicalTrials.gov
Outcomemeasure, time_frame, typeClinicalTrials.gov
DrugClassatc_code, name, levelATC
LabTestloinc_code, nameLOINC

Repository: samyama-ai/clinicaltrials-kg (private) Snapshot: kg-snapshots-v1 (clinical-trials.sgsnap, 711 MB)


Drug Interactions KG

Drug-gene interactions, side effects, indications, bioactivities, and adverse events from 5 open pharmacology databases. Bridges to Pathways KG (via gene name) and Clinical Trials KG (via DrugBank ID).

Schema

graph LR
    Drug -->|INTERACTS_WITH_GENE| Gene
    Drug -->|HAS_SIDE_EFFECT| SideEffect
    Drug -->|HAS_INDICATION| Indication
    Drug -->|HAS_BIOACTIVITY| Bioactivity
    Drug -->|HAS_ADVERSE_EVENT| AdverseEvent
    Drug -->|CLASSIFIED_AS| DrugClass
    Bioactivity -->|BIOACTIVITY_TARGET| Target

    style Drug fill:#ec4899,stroke:#333,color:#fff
    style Gene fill:#3b82f6,stroke:#333,color:#fff
    style SideEffect fill:#ef4444,stroke:#333,color:#fff
    style Indication fill:#10b981,stroke:#333,color:#fff

Performance

  • Rust native loader: 245,000 nodes + 388,000 edges in 7.7s
  • Snapshot: 8.1 MB (druginteractions.sgsnap)
  • BiomedQA benchmark: 98% accuracy with MCP tools

Sources (5 of 6)

SourceLicenseContent
DrugBank CC0CC019,842 drug vocabulary + synonym mappings
DGIdbOpenDrug-gene interactions from 4,182 genes
SIDERCC-BY-SASide effects + indications
ChEMBLCC-BY-SA208K bioactivity records
OpenFDA FAERSPublic1.7K adverse event reports

Repository: samyama-ai/druginteractions-kg Rust loader: examples/druginteractions_loader.rs in samyama-graph Snapshot: druginteractions.sgsnap (8.1 MB)


Disease Surveillance KG

WHO Global Health Observatory data โ€” disease case/death counts, vaccine coverage, and health indicators across 234 countries.

Schema

graph LR
    Country -->|IN_REGION| Region
    Country -->|REPORTED| DiseaseReport
    DiseaseReport -->|REPORT_OF| Disease
    Country -->|HAS_COVERAGE| VaccineCoverage
    Country -->|HAS_INDICATOR| HealthIndicator

    style Country fill:#06b6d4,stroke:#333,color:#fff
    style Region fill:#84cc16,stroke:#333,color:#fff
    style Disease fill:#ef4444,stroke:#333,color:#fff
    style DiseaseReport fill:#8b5cf6,stroke:#333,color:#fff
    style VaccineCoverage fill:#f59e0b,stroke:#333,color:#fff
    style HealthIndicator fill:#ec4899,stroke:#333,color:#fff
LabelCountSource
Country234WHO GHO
Region6WHO regions (AFR, AMR, SEAR, EUR, EMR, WPR)
Disease15Cholera, Malaria, TB, HIV, Meningitis, etc.
DiseaseReport~49KAnnual case/death counts per country per disease
VaccineCoverage~10KDTP3, MCV1, BCG, Polio3 coverage per country per year
HealthIndicator~164KLife expectancy, infant mortality, sanitation, water

Repository: samyama-ai/surveillance-kg Rust loader: examples/surveillance_loader.rs in samyama-graph Snapshot: surveillance.sgsnap (5.7 MB) Data source: WHO GHO OData API โ€” 30 indicators across infectious diseases, vaccines, and health metrics


Health Determinants KG

Population vulnerability indicators from World Bank WDI, WHO Air Quality, and WHO Water/Sanitation โ€” 80 curated indicators across 211 countries (1990โ€“2024).

Schema

graph LR
    Country -->|IN_REGION| Region
    Country -->|HAS_INDICATOR| SocioeconomicIndicator
    Country -->|ENVIRONMENT_OF| EnvironmentalFactor
    Country -->|NUTRITION_STATUS| NutritionIndicator
    Country -->|DEMOGRAPHIC_OF| DemographicProfile
    Country -->|WATER_RESOURCE_OF| WaterResource

    style Country fill:#84cc16,stroke:#333,color:#fff
    style Region fill:#06b6d4,stroke:#333,color:#fff
    style SocioeconomicIndicator fill:#f59e0b,stroke:#333,color:#fff
    style EnvironmentalFactor fill:#ef4444,stroke:#333,color:#fff
    style NutritionIndicator fill:#10b981,stroke:#333,color:#fff
    style DemographicProfile fill:#8b5cf6,stroke:#333,color:#fff
    style WaterResource fill:#3b82f6,stroke:#333,color:#fff
CategoryNodesIndicatorsExamples
Countries211โ€”ISO 3166-1 alpha-3, income level, WB region
Regions7โ€”South Asia, Sub-Saharan Africa, etc.
Socioeconomic52,36716GDP, GNI, poverty, Gini, unemployment, literacy, health expenditure
Environmental35,66710 + PM2.5PM2.5 exposure, forest area, renewable energy, CO2
Nutrition14,07310Stunting, wasting, obesity, undernourishment, anemia
Demographic107,08815Population, fertility, life expectancy, infant mortality, urbanization
Water76,22210 + WASHDrinking water, sanitation, freshwater withdrawal, water stress

Performance

  • Rust native loader: 285,635 nodes + 285,628 edges in 2.4s
  • Snapshot: 6.5 MB (health-determinants.sgsnap)
  • Benchmark: 40/40 queries pass (median 15ms)

Sources

SourceLicenseContent
World Bank WDICC-BY 4.080 indicators, 211 countries, 1990โ€“2024
WHO GHO Air QualityOpenPM2.5 annual mean (1,880 records)
WHO GHO WASHOpenWater/sanitation (47K records)

Repository: samyama-ai/health-determinants-kg Rust loader: examples/health_determinants_loader.rs in samyama-graph-enterprise Snapshot: health-determinants.sgsnap (6.5 MB)


Health Systems KG

Health system capacity โ€” IHR emergency preparedness and health workforce density from WHO, across 233 countries.

Schema

graph LR
    EmergencyResponse -->|CAPACITY_FOR| Country
    HealthWorkforce -->|SERVES| Country

    style Country fill:#f97316,stroke:#333,color:#fff
    style EmergencyResponse fill:#ef4444,stroke:#333,color:#fff
    style HealthWorkforce fill:#3b82f6,stroke:#333,color:#fff
LabelCountSource
Country233WHO GHO
EmergencyResponse8,430WHO SPAR v2 โ€” 15 IHR capacities per country per year
HealthWorkforce10,998WHO NHWA โ€” physicians, nurses, dentists, pharmacists

Performance

  • Rust native loader: 19,661 nodes + 19,428 edges in 149ms
  • Snapshot: 0.5 MB (health-systems.sgsnap)
  • Benchmark: 40/40 queries pass (cross-KG with Health Determinants)

Sources

SourceLicenseContent
WHO SPAR v2Open15 IHR capacity scores, 233 countries
WHO NHWAOpenHealth workforce density per 10K population

Pending sources (loaders built, data not yet downloaded): GAVI vaccine supply, Global Fund disbursements, IHME health expenditure

Repository: samyama-ai/health-systems-kg Rust loader: examples/health_systems_loader.rs in samyama-graph-enterprise Snapshot: health-systems.sgsnap (0.5 MB)


AssetOps KG

Synthetic industrial operations graph from the AssetOpsBench benchmark. Models assets, sensors, maintenance schedules, and failure modes for industrial IoT.

LabelCountExamples
Asset~200Pumps, compressors, turbines
Sensor~150Temperature, vibration, pressure
WorkOrder~100Maintenance tasks
FailureMode~80Bearing failure, seal leak
Component~100Bearings, seals, impellers
Location~50Plants, areas, units
Operator~50Maintenance technicians
Schedule~50Maintenance windows

Repository: samyama-ai/assetops-kg (private)


Quick Start โ€” Loading Any Snapshot

All snapshots follow the same load pattern:

# 1. Start Samyama Graph (v0.7.0+)
./target/release/samyama

# 2. Create a tenant
curl -X POST http://localhost:8080/api/tenants \
  -H 'Content-Type: application/json' \
  -d '{"id":"TENANT_ID","name":"TENANT_NAME"}'

# 3. Import snapshot into the tenant
curl -X POST http://localhost:8080/api/tenants/TENANT_ID/snapshot/import \
  -F "file=@snapshot.sgsnap"

# 4. Query
curl -X POST http://localhost:8080/api/query \
  -H 'Content-Type: application/json' \
  -d '{"query":"MATCH (n) RETURN labels(n), count(n)","graph":"TENANT_ID"}'

# 5. Explore in Insight
cd samyama-insight && npm run dev
# โ†’ http://localhost:5173 (select tenant from dropdown)
# โ†’ http://localhost:5173/simulation/TENANT_ID

Note: Use /api/tenants/:id/snapshot/import (tenant-specific endpoint), NOT /api/snapshot/import. The generic endpoint always loads into the default tenant.