Frequently Asked Questions

This FAQ covers common questions about Samyama’s architecture, usage, and capabilities. Use your browser’s search (Ctrl+F / Cmd+F) or the mdBook search bar to quickly find answers.

Getting Started

How do I install and run Samyama?

# Clone and build
git clone https://github.com/samyama-ai/samyama-graph.git
cd samyama-graph
cargo build --release

# Start the server (RESP on :6379, HTTP on :8080)
cargo run --release

# Run a demo
cargo run --example banking_demo

What protocols does Samyama support? Is it Postgres wire protocol?

No, Samyama does not use the Postgres wire protocol. It exposes two protocols:

RESP (Redis Protocol) on port 6379 — use any Redis client (redis-cli, Jedis, ioredis, etc.)
HTTP API on port 8080 — RESTful endpoints for queries and status

We chose RESP over Postgres wire protocol because: (1) RESP is simpler and faster (binary protocol, minimal framing overhead), (2) it enables drop-in compatibility with the RedisGraph ecosystem (which was sunset by Redis Ltd), and (3) graph queries are fundamentally different from SQL — we didn’t want to shoehorn Cypher into a SQL-shaped protocol.

Example using redis-cli:

redis-cli GRAPH.QUERY default "CREATE (n:Person {name: 'Alice', age: 30})"
redis-cli GRAPH.QUERY default "MATCH (n:Person) RETURN n.name, n.age"

Example using HTTP:

curl -s -X POST http://localhost:8080/api/query \
  -d '{"query": "MATCH (n) RETURN count(n)", "graph": "default"}'

curl -s http://localhost:8080/api/status | python3 -m json.tool

See the SDKs, CLI & API chapter.

What query language does Samyama use?

Samyama supports OpenCypher with ~90% coverage. Supported clauses: MATCH, OPTIONAL MATCH, CREATE, DELETE, SET, REMOVE, MERGE, WITH, UNWIND, UNION, RETURN DISTINCT, ORDER BY, SKIP, LIMIT, EXPLAIN, EXISTS subqueries.

Example — create a small social graph and query it:

CREATE (a:Person {name: 'Alice', age: 30})-[:KNOWS]->(b:Person {name: 'Bob', age: 25})
CREATE (b)-[:KNOWS]->(c:Person {name: 'Charlie', age: 35})

MATCH (p:Person)-[:KNOWS]->(friend)
WHERE p.age > 28
RETURN p.name, friend.name

See the Query Engine chapter.

What are the minimum system requirements?

Samyama runs on any system with a Rust 1.83+ toolchain:

CPU: Any x86_64 or ARM64 (M-series Macs fully supported)
RAM: 512MB minimum; 4GB+ recommended for production
Disk: Depends on data size; RocksDB with LZ4 compression is space-efficient
GPU (Enterprise only): Any Metal, Vulkan, or DX12-compatible GPU

What is the difference between Community and Enterprise?

	Community (OSS)	Enterprise
License	Apache 2.0	Commercial (JET token)
Core Engine	✅ Full	✅ Full
Multi-Tenancy	Single namespace (`default`)	Tenant CRUD API, quotas, isolation
Monitoring	Logging only	Prometheus, health checks, audit trail
Backup	WAL only	Full/incremental backup, PITR
HA	Basic Raft	HTTP/2 transport, snapshot streaming
GPU	❌	✅ (wgpu: Metal, Vulkan, DX12)

See the Enterprise Edition chapter for full details.

Query Engine

What Cypher features are NOT yet supported?

Remaining gaps: list slicing ([1..3]) and pattern comprehensions. The Future Roadmap tracks planned additions.

Added in v0.6.0: Named paths (p = (a)-[]->(b)), CASE expressions, collect(DISTINCT x), datetime({year: 2026, month: 3}) constructor, parameterized queries ($param), and PROFILE.

-- Named paths (v0.6.0):
MATCH p = (a:Person)-[:KNOWS]->(b:Person) RETURN p, length(p)

-- CASE expressions (v0.6.0):
MATCH (n:Person) RETURN n.name, CASE WHEN n.age > 30 THEN 'senior' ELSE 'junior' END AS category

-- collect(DISTINCT x) (v0.6.0):
MATCH (n:Person)-[:LIVES_IN]->(c:City) RETURN collect(DISTINCT c.name) AS cities

-- Parameterized queries (v0.6.0):
MATCH (n:Person {age: $age}) RETURN n

How do I check if my query is using an index?

Use EXPLAIN before your query:

EXPLAIN MATCH (n:Person {name: 'Alice'}) RETURN n

If you see IndexScanOperator in the output, the index is being used. If you see NodeScanOperator, the query is doing a full label scan — consider creating an index:

-- Before: full scan (slow on large graphs)
EXPLAIN MATCH (n:Person) WHERE n.name = 'Alice' RETURN n
-- Output: NodeScanOperator(Person) → FilterOperator(n.name = 'Alice')

-- Create the index:
CREATE INDEX ON :Person(name)

-- After: index scan (fast O(log n))
EXPLAIN MATCH (n:Person) WHERE n.name = 'Alice' RETURN n
-- Output: IndexScanOperator(Person.name = 'Alice')

See the Query Optimization chapter.

Can I use EXPLAIN to see estimated costs?

Yes. EXPLAIN returns the operator tree with estimated row counts and graph statistics (label counts, edge type counts, property selectivity):

EXPLAIN MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a.age > 25
RETURN a.name, b.name

Output includes:

ProjectOperator [a.name, b.name]
  └── FilterOperator [a.age > 25]
        └── ExpandOperator [KNOWS]
              └── NodeScanOperator [Person]
--- Statistics ---
  Person: 10,000 nodes
  KNOWS: 45,000 edges
  avg_out_degree: 4.5

PROFILE (with actual execution timing and row counts per operator) is supported since v0.6.0:

PROFILE MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a.age > 25
RETURN a.name, b.name

How many physical operators does the engine have?

33 operators covering scan, traversal, filter, join, aggregation, sort, write, index, constraint, and specialized operations. See the operator table.

Does Samyama support transactions?

Samyama provides per-query atomicity via RocksDB WriteBatch + WAL. Each write query (CREATE, DELETE, SET, MERGE) executes as an atomic unit — either all changes commit or none do.

-- This entire query is atomic — both nodes and the edge are created together:
CREATE (a:Account {id: 'A1', balance: 1000})-[:TRANSFER {amount: 500}]->(b:Account {id: 'A2', balance: 2000})

Interactive BEGIN...COMMIT transactions (spanning multiple queries) are on the roadmap. See the ACID Guarantees section.

Indexes & Data Access

What types of indexes does Samyama support?

Samyama provides four index types:

Index Type	Data Structure	Purpose	Created By
Property Index	`BTreeMap<PropertyValue, HashSet<NodeId>>`	Fast property lookups and range scans	`CREATE INDEX`
Label Index	`HashMap<Label, HashSet<NodeId>>`	Fast label-based node retrieval	Automatic (built-in)
Edge Type Index	`HashMap<EdgeType, HashSet<EdgeId>>`	Fast edge type lookups	Automatic (built-in)
Vector Index	HNSW (Hierarchical Navigable Small World)	Approximate nearest neighbor search	`CREATE VECTOR INDEX`

How do property indexes work?

Property indexes use a B-tree (BTreeMap) that maps property values to sets of node IDs. This gives O(log n) lookups for both exact matches and range queries.

Creating a property index:

CREATE INDEX ON :Person(name)
CREATE INDEX ON :Person(age)
CREATE INDEX ON :Transaction(amount)

How it’s used — the planner automatically selects an index scan when a WHERE predicate matches an indexed property:

-- Exact match → index lookup, returns matching NodeIds directly
MATCH (n:Person) WHERE n.name = 'Alice' RETURN n

-- Range query → B-tree range scan
MATCH (n:Person) WHERE n.age > 25 RETURN n.name, n.age

-- Supported comparison operators: =, >, >=, <, <=
MATCH (t:Transaction) WHERE t.amount >= 10000 RETURN t

Performance characteristics:

Operation	Complexity
Exact match (`=`)	O(log n)
Range query (`>`, `>=`, `<`, `<=`)	O(log n + k) where k = results
Insert (on node create/update)	O(log n)
Remove (on node delete/update)	O(log n)

Composite indexes (v0.6.0): Multi-property indexes are supported — CREATE INDEX ON :Person(firstName, lastName) creates a composite index used when both properties appear in a WHERE clause.

How do the built-in label and edge type indexes work?

These are automatic indexes maintained internally — you don’t create or manage them.

Label index — maps each label to all nodes with that label:

-- Uses label_index internally to find all Person nodes in O(1)
MATCH (n:Person) RETURN n

-- Statistics show label cardinality:
EXPLAIN MATCH (n:Person) RETURN n
-- Output: NodeScanOperator [Person] (est. 10,000 rows)

Edge type index — maps each edge type to all edges of that type:

-- Uses edge_type_index to find all KNOWS edges
MATCH ()-[r:KNOWS]->() RETURN count(r)

Both indexes use HashMap<Key, HashSet<Id>> for O(1) lookup by label/type and O(m) iteration over all matching entities.

How do vector indexes work?

Vector indexes use HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search, powered by the hnsw_rs crate.

Creating a vector index:

CREATE VECTOR INDEX embedding_idx
FOR (d:Document) ON (d.embedding)
OPTIONS {dimensions: 768, similarity: 'cosine'}

Supported distance metrics:

Metric	Best For	Formula
`cosine`	Text embeddings, normalized vectors	`1.0 - cos(a, b)`
`l2`	Spatial data, raw feature vectors	`sqrt(sum((a_i - b_i)^2))`
`dot_product`	Pre-normalized embeddings	`1.0 - dot(a, b)`

Querying:

-- Find the 5 documents most similar to a query vector
CALL db.index.vector.queryNodes('Document', 'embedding', [0.12, -0.34, ...], 5)
YIELD node, score
RETURN node.title, score

HNSW parameters (compile-time defaults):

max_elements: 100,000
M: 16 connections per layer
ef_construction: 200
ef_search: 2 × k (set at query time)

Via the Rust SDK:

#![allow(unused)]
fn main() {
client.create_vector_index("Document", "embedding", 768, DistanceMetric::Cosine).await?;
client.add_vector("Document", "embedding", node_id, &embedding_vec).await?;
let results = client.vector_search("Document", "embedding", &query_vec, 5).await?;
}

Are composite (multi-property) indexes supported?

Yes, since v0.6.0. Composite indexes cover multiple properties on the same label:

CREATE INDEX ON :Person(firstName, lastName)

-- The planner uses the composite index when both properties appear in WHERE:
MATCH (n:Person) WHERE n.firstName = 'Alice' AND n.lastName = 'Smith' RETURN n
-- Plan: IndexScanOperator(Person.firstName='Alice', Person.lastName='Smith')

Single-property indexes are also supported. When a WHERE clause has multiple indexed predicates with AND, the planner uses AND-chain index selection (v0.6.0) to pick the most selective index.

Are unique constraints supported?

Yes, since v0.6.0. You can enforce property uniqueness within a label:

CREATE CONSTRAINT ON (n:Person) ASSERT n.email IS UNIQUE

Attempting to create a node with a duplicate value on a unique-constrained property will return an error. Use SHOW CONSTRAINTS to list active constraints.

Is DROP INDEX supported?

Yes, since v0.6.0. You can drop indexes via Cypher:

DROP INDEX ON :Person(name)

Can I list all indexes?

Yes, since v0.6.0. Use SHOW INDEXES and SHOW CONSTRAINTS:

SHOW INDEXES
-- Returns: label, property, index type for all active indexes

SHOW CONSTRAINTS
-- Returns: label, property, constraint type for all active constraints

Query Planner & Optimizer

What cost model does the query planner use?

Since v0.6.1, Samyama has a graph-native cost-based planner (ADR-015) with a multiplicative cardinality model. The planner uses two tiers of statistics:

GraphCatalog (incremental) — triple-level stats per (source_label, edge_type, target_label) pattern, maintained on every edge create/delete
GraphStatistics (batch) — label counts, edge type counts, per-property selectivity (sampled from first 1,000 nodes per label)

The cost model in cost_model.rs assigns estimated cardinalities to each operator:

Operator	Cost Formula
LabelScan	`label_count` (from catalog)
IndexLookup	Fixed `10.0` (highly selective)
Expand (Forward)	`input_cost × avg_out_degree`
Expand (Reverse)	`input_cost × avg_in_degree`
ExpandInto	`input_cost × edge_existence_probability`
Filter	`input_cost × 0.5` (default selectivity)
Join	`left_cost + right_cost`
CartesianProduct	`left_cost × right_cost`

The planner pipeline:

Plan enumeration: For each node in the MATCH pattern, build a candidate plan via BFS (plan_enumerator.rs), choosing optimal traversal direction at each step
Logical optimization: Apply predicate pushdown and ExpandInto insertion (logical_optimizer.rs)
Cost estimation: Score each candidate using the multiplicative cost model (cost_model.rs)
Plan selection: Sort candidates by cost, pick the cheapest (up to 64 candidates evaluated)
Physical translation: Convert the logical plan to executable operators (physical_planner.rs)
Index selection: If a property index exists for a WHERE predicate, use IndexScanOperator; for AND-chains, select the most selective index
Plan caching: Plans cached with generation-based invalidation tied to catalog changes

Example — the planner selects different starting points based on catalog stats:

-- 1,000 Person nodes, 10 Company nodes, each Person works at 1 Company
EXPLAIN MATCH (p:Person)-[:WORKS_AT]->(c:Company) RETURN p.name, c.name

-- Candidate 1 (start from Person): LabelScan(1000) × Expand(1.0) = 1000
-- Candidate 2 (start from Company): LabelScan(10) × ReverseExpand(100.0) = 1000
-- Planner evaluates BOTH, picks cheapest

Example — index selection:

-- Without index: full label scan
EXPLAIN MATCH (n:Person) WHERE n.name = 'Alice' RETURN n
-- Plan: NodeScanOperator(Person) → FilterOperator(name = 'Alice') → ProjectOperator

-- With index: index scan
CREATE INDEX ON :Person(name)
EXPLAIN MATCH (n:Person) WHERE n.name = 'Alice' RETURN n
-- Plan: IndexScanOperator(Person.name = 'Alice') → ProjectOperator

See the Query Optimization chapter.

How are individual operator costs estimated?

Since v0.6.1, the graph-native planner assigns a multiplicative cardinality estimate to every operator in a candidate plan via cost_model::estimate_plan_cost(). The cost model is recursive — each operator’s cost depends on its input’s estimated cardinality:

Example: MATCH (p:Person)-[:KNOWS]->(q:Person) WHERE q.age > 30 RETURN q

Plan (start from p):
  LabelScan(Person)                     cost = 1,000        (label count)
  → Expand(:KNOWS, Forward)             cost = 1,000 × 5.0 = 5,000  (avg_out_degree)
  → Filter(age > 30)                    cost = 5,000 × 0.5 = 2,500  (default selectivity)
  Total plan cost: 2,500

Plan (start from q):
  LabelScan(Person)                     cost = 1,000
  → Filter(age > 30)                    cost = 1,000 × 0.5 = 500    (filter pushed down!)
  → Expand(:KNOWS, Reverse)             cost = 500 × 5.0 = 2,500
  Total plan cost: 2,500

The planner compares all candidate plans and selects the lowest-cost one. EXPLAIN shows the chosen plan with operator descriptions; PLAN_DIAGNOSTICS (accessible in EXPLAIN output) shows how many candidates were evaluated and their costs.

Current limitations:

Filter selectivity is hardcoded at 0.5 (no property-level histograms yet)
Sort/Aggregate operators are always appended after the chosen scan+expand plan
Property-level estimate_equality_selectivity exists in GraphStatistics but is not yet wired into the graph-native cost model

What cardinality estimation techniques are used?

Two tiers of estimation methods:

GraphCatalog (triple-level, used by graph-native planner):

Method	What It Returns	Complexity
`estimate_label_scan(label)`	Exact node count for a label	O(1)
`estimate_expand_out(src_label, edge_type)`	Average outgoing degree (sum across target labels)	O(k)
`estimate_expand_in(tgt_label, edge_type)`	Average incoming degree (sum across source labels)	O(k)
`estimate_edge_existence(src, et, tgt)`	Probability a random (src, tgt) pair has an edge	O(1)

GraphStatistics (batch, used for EXPLAIN display and legacy planner):

Method	What It Returns	Complexity
`estimate_label_scan(label)`	Exact node count for a label	O(1)
`estimate_expand(edge_type)`	Total edge count for a type	O(1)
`estimate_equality_selectivity(label, prop)`	`1.0 / distinct_count` for the property	O(1)

Example — GraphCatalog triple-level estimation:

Graph: 1,000 Persons, 10 Companies, 1,000 WORKS_AT edges (each person → 1 company)

Catalog TripleStats for (:Person, :WORKS_AT, :Company):
  count = 1,000, avg_out_degree = 1.0, avg_in_degree = 100.0

Plan A (start Person):  1,000 × 1.0 = 1,000 cost
Plan B (start Company): 10 × 100.0 = 1,000 cost  (same total, different shape)

Example — property selectivity:

10,000 Person nodes, 'name' has 8,000 distinct values:
estimate_equality_selectivity("Person", "name") → 1/8,000 = 0.000125
Estimated rows for WHERE name = 'Alice' → 10,000 × 0.000125 ≈ 1.25

How are statistics collected and maintained?

Samyama maintains statistics at two levels:

GraphCatalog (incremental, always up-to-date):

The GraphCatalog tracks per-triple-pattern statistics (source_label, edge_type, target_label) and is updated incrementally on every graph mutation:

on_label_added(label) / on_label_removed(label) — updates label counts
on_edge_created(src, src_labels, et, tgt, tgt_labels) — updates triple stats for all label combinations
on_edge_deleted(...) — mirrors edge creation

For each triple pattern, the catalog tracks:

count — total edges matching this pattern
avg_out_degree — count / distinct_sources
avg_in_degree — count / distinct_targets
distinct_sources / distinct_targets — unique endpoints
max_out_degree — peak degree for worst-case estimation

A generation counter increments on every change, enabling plan cache invalidation.

GraphStatistics (batch, computed on demand):

Computed via GraphStore::compute_statistics():

Iterates all labels in label_index and counts nodes per label
Iterates all edge types in edge_type_index and counts edges per type
Samples the first 1,000 nodes per label to compute per-property stats:
- null_fraction — fraction of sampled nodes missing the property
- distinct_count — number of distinct values observed
- selectivity — 1.0 / distinct_count (uniform distribution assumption)
Computes avg_out_degree across all nodes

GraphStatistics are recomputed on each EXPLAIN call. Adding histogram-based distributions and wiring property selectivity into the graph-native cost model is on the roadmap.

How does the planner handle cardinality estimation errors?

Since v0.6.0, statistics drive cost-based plan selection (join order, index choice). This means cardinality estimation errors can now cause suboptimal plans — for example, choosing a less selective index or the wrong join order.

-- If the planner estimates 100 rows but there are actually 1,000,000:
MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a.city = 'Mumbai'
RETURN a.name, b.name

-- The CBO might build the hash table on the wrong side
-- or choose an index that isn't actually the most selective

Mitigations: use EXPLAIN to verify estimates, and ensure statistics are fresh (they are recomputed on each EXPLAIN call). In mature optimizers, cardinality estimation errors can cause severe performance problems. Tools like Picasso visualize these errors as cardinality diagrams, mapping estimation accuracy across the selectivity space to expose where the optimizer’s statistics are most inaccurate.

What about multi-column correlations and compound predicates?

Not yet handled. The current selectivity model assumes independence between properties — selectivity(A AND B) = selectivity(A) × selectivity(B). This is the standard simplifying assumption but can be wildly wrong when properties are correlated.

Example:

MATCH (n:Person) WHERE n.city = 'Mumbai' AND n.country = 'India' RETURN n
-- Independence assumption: selectivity = (1/500 cities) × (1/200 countries) = 1/100,000
-- Reality: everyone in Mumbai is in India, so selectivity = 1/500
-- The estimate is off by 200x!

Future work includes:

Multi-column statistics (joint distinct counts or dependency graphs)
Histogram-based estimation (equi-width or equi-depth histograms per property)
Sketch-based estimation (HyperLogLog for distinct counts, Count-Min Sketch for frequency estimation)

Does Samyama support parameterized or templatized queries?

Yes, since v0.6.0. Use $param syntax with parameter bindings:

-- Parameterized query:
MATCH (n:Person {age: $age}) RETURN n
-- Pass parameters via the SDK or RESP protocol

-- Literal values also work:
MATCH (n:Person {age: 30}) RETURN n

Parameterized queries enable plan cache reuse across different parameter values, reducing parsing and planning overhead. Prepared statements (PREPARE/EXECUTE) are on the roadmap.

How do parameterized queries affect plan stability?

In optimizers that support parameterized queries, a key concern is plan stability — whether the same query template produces different plans for different parameter values. This is the phenomenon visualized by tools like Picasso as plan diagrams: color-coded maps showing how the optimal plan changes as selectivity varies.

Example of plan instability in a hypothetical future CBO:

-- Template: MATCH (n:Person) WHERE n.age > $threshold RETURN n
-- With $threshold = 99 (selectivity 1%):  IndexScan is optimal
-- With $threshold = 10 (selectivity 90%): LabelScan is optimal
-- The optimizer must pick the right plan for each value

Since v0.6.0, parameterized queries are supported and plans are cached. The plan cache uses query string hashing to avoid re-parsing and re-planning for repeated queries. This means the “plan sniffing” concern is relevant — a cached plan may not be optimal for all parameter values. Currently Samyama uses a simple cache with statistics-based invalidation. Adaptive re-planning (when estimated vs. actual cardinalities diverge) is on the roadmap.

What join algorithms does Samyama use?

Three join strategies are available:

Operator	Algorithm	When Used
JoinOperator	Hash Join	MATCH clauses share a variable
LeftOuterJoinOperator	Left Outer Hash Join	`OPTIONAL MATCH`
CartesianProductOperator	Cross Product	No shared variables

Example — hash join on a shared variable b:

-- Two patterns sharing variable 'b' → HashJoin
MATCH (a:Person)-[:WORKS_AT]->(b:Company)
MATCH (b)<-[:INVESTED_IN]-(c:Fund)
RETURN a.name, b.name, c.name
-- Plan: HashJoin on 'b'
--   Left:  NodeScan(Person) → Expand(WORKS_AT)
--   Right: NodeScan(Fund) → Expand(INVESTED_IN)

Example — cross product with no shared variable:

-- No shared variable → CartesianProduct (expensive!)
MATCH (a:Person), (b:Product)
RETURN a.name, b.name
-- Plan: CartesianProduct (|Person| × |Product| rows)

Example — left outer join for optional patterns:

-- OPTIONAL MATCH → LeftOuterHashJoin (NULLs for non-matches)
MATCH (p:Person)
OPTIONAL MATCH (p)-[:HAS_ADDRESS]->(a:Address)
RETURN p.name, a.city
-- Persons without addresses appear with a.city = NULL

The hash join materializes the left side into a HashMap<Value, Vec<Record>> and probes it for each right-side record.

How is join order determined?

Since v0.6.0, the planner performs join reordering based on cardinality estimates — it places the smaller (more selective) side as the build side of the hash join, regardless of the order in the query text.

-- Both versions now produce the same optimal plan:
MATCH (a:Person), (b:Company) WHERE a.worksAt = b.name RETURN a, b
MATCH (b:Company), (a:Person) WHERE a.worksAt = b.name RETURN a, b
-- Planner puts Company (1K nodes) as build side, Person (1M) as probe side

Not yet implemented: Bushy join trees (the planner always produces left-deep trees) or adaptive joins that switch strategy mid-execution.

Are there additional join strategies on the roadmap?

Yes. Future join strategies under consideration:

Algorithm	Best For	Complexity
Nested-Loop Join	Small right side, or when index exists on join key	O(n × m) worst case
Merge Join	Both sides already sorted on join key	O(n + m)
Index Nested-Loop Join	Right side has index on join key	O(n × log m)
Adaptive Join	Switches strategy based on runtime cardinalities	Variable

What scan and traversal operators are available?

Samyama has 42 physical operators in total. The key scan and traversal operators:

Operator	Access Method	When Chosen
NodeScanOperator	Full label scan via `label_index`	Default — no index matches the WHERE predicate
IndexScanOperator	B-tree range scan on property index	Index exists on `(label, property)` and WHERE has a matching `=`, `>`, `>=`, `<`, or `<=` predicate
VectorSearchOperator	HNSW approximate nearest neighbor	`CALL db.index.vector.queryNodes(...)`
ExpandOperator	Adjacency list traversal (outgoing or incoming)	Graph-native planner chooses direction based on catalog stats
ExpandIntoOperator	Binary search edge existence check O(log d)	Both endpoints already bound (triangle/clique patterns)
NodeByIdOperator	Direct node lookup from pre-computed set	Internal use (subquery results)
ShortestPathOperator	BFS shortest path with predicates	`shortestPath()` function in MATCH

Example showing the scan selection logic:

-- No index on :Person(age) → NodeScanOperator + FilterOperator
MATCH (n:Person) WHERE n.age > 30 RETURN n
-- Plan: NodeScan(Person) → Filter(age > 30) → Project
-- Scans ALL Person nodes, filters in memory

-- After: CREATE INDEX ON :Person(age)
MATCH (n:Person) WHERE n.age > 30 RETURN n
-- Plan: IndexScan(Person.age > 30) → Project
-- Scans ONLY nodes with age > 30 via B-tree range query

Can multiple indexes be used for a single query (index intersection)?

Since v0.6.0, the planner uses AND-chain index selection to pick the most selective index when a WHERE clause has multiple indexed predicates:

CREATE INDEX ON :Person(age)
CREATE INDEX ON :Person(city)

MATCH (n:Person) WHERE n.age > 30 AND n.city = 'Mumbai' RETURN n
-- Planner picks the more selective index (e.g., city = 'Mumbai' if fewer matches)
-- and applies the other predicate as a post-scan filter

Full index intersection (scanning both indexes independently and intersecting the result sets) is on the roadmap for further optimization.

Are there other scan limitations I should know about?

Yes:

Only the start node of each MATCH path is considered for index scans — intermediate or end nodes always use label scan + filter:

-- Index on :Person(name) is used for 'a' (start node):
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'}) RETURN b
-- Plan: IndexScan(a) → Expand(KNOWS) → Filter(b.name = 'Bob')
-- Note: b.name = 'Bob' is filtered in memory, not via index

OR predicates do not trigger index union scans:

MATCH (n:Person) WHERE n.age = 30 OR n.age = 40 RETURN n
-- Falls back to full label scan + filter (even if age is indexed)

String predicates (CONTAINS, STARTS WITH, ENDS WITH) do not use indexes

To verify which scan your query uses, always prefix with EXPLAIN.

How does the query planner choose between possible plans?

The graph-native planner follows this pipeline:

Parse the Cypher AST (cached for repeated queries)
Extract a PatternGraph from the MATCH clause — nodes, edges, labels, directions
Enumerate candidate plans: for each pattern node as starting point, BFS through the pattern graph building a logical plan tree. At each edge, choose_direction() compares estimate_expand_out vs estimate_expand_in to pick the cheaper traversal direction
Optimize each candidate: predicate pushdown (move Filter below Expand when safe) and ExpandInto insertion (when both endpoints already bound)
Score each candidate via estimate_plan_cost() using GraphCatalog triple-level stats
Select the cheapest plan
Translate to physical operators via logical_to_physical() (direction reversal: Logical Reverse → Physical Incoming)
Cache the plan with generation-based invalidation

MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a.name = 'Alice'
RETURN b.name
ORDER BY b.name
LIMIT 10
-- Plan: IndexScan(Person.name='Alice') → Expand(KNOWS) → Project(b.name) → Sort(b.name) → Limit(10)

EXPLAIN shows diagnostics including candidates evaluated and chosen plan cost. The planner reorders joins automatically — query text order does not affect plan quality.

What is the graph-native planner and how does it differ from the legacy planner?

Since v0.6.1, Samyama has a graph-native cost-based optimizer (ADR-015) that follows the same fundamental approach as mature systems like PostgreSQL:

Enumerate candidate plans — one per starting node in the MATCH pattern, with BFS traversal through the pattern graph
Estimate the cost of each plan using the multiplicative cardinality model and GraphCatalog triple-level statistics
Optimize each candidate with predicate pushdown and ExpandInto insertion
Compare all candidates and select the lowest-cost plan (up to 64 evaluated)

Key differences from the legacy planner:

Aspect	Legacy Planner	Graph-Native Planner
Starting point	Always leftmost node in AST	Evaluates ALL pattern nodes
Direction	Always follows AST direction	Chooses cheapest direction per edge
ExpandInto	Not available	O(log d) edge existence check
Cost model	Heuristic (no numeric costs)	Multiplicative cardinality estimation
Plan candidates	1 (single greedy plan)	Up to 64 per query
Statistics	Batch (GraphStatistics)	Incremental (GraphCatalog)
Predicate pushdown	Basic	Cost-aware, below Expand nodes

Example — the graph-native planner considers multiple plans for a 3-way join:

MATCH (a:Person)-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)
WHERE a.age > 25 AND c.size > 1000
RETURN a.name, c.name

-- Plan A (start a): LabelScan(Person) → Filter(age>25) → Expand(KNOWS) → Expand(WORKS_AT) → Filter(size>1000)
-- Plan B (start c): LabelScan(Company) → Filter(size>1000) → ReverseExpand(WORKS_AT) → ReverseExpand(KNOWS) → Filter(age>25)
-- Plan C (start b): LabelScan(Person) → Expand(KNOWS, Reverse) → Expand(WORKS_AT) → Filter(age>25, size>1000)
-- Planner estimates cost of each via catalog stats, picks cheapest

The ExpandInto operator is a key graph-native optimization. When both endpoints of an edge are already bound, instead of scanning all neighbors (O(degree)), it checks edge existence via binary search on sorted adjacency lists (O(log degree)):

-- Triangle pattern: a→b, b→c, a→c
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person), (a)-[:KNOWS]->(c)
-- Plan: LabelScan(a) → Expand(a→b) → Expand(b→c) → ExpandInto(a→c)
-- ExpandInto checks if edge exists between already-bound a and c

Picasso visualization (available in samyama-insight) helps analyze CBO behavior by generating plan diagrams — color-coded maps showing which plan the optimizer selects at each point in the selectivity/parameter space. These visualizations reveal:

Plan switches: Where the optimizer changes its preferred plan
Cost cliffs: Sudden spikes in estimated cost at plan boundaries
Nervous regions: Areas where small selectivity changes cause frequent plan switches
Robust plans: Plans that perform well across a wide range of selectivities

The graph-native planner is enabled via PlannerConfig { graph_native: true } and falls back gracefully to the legacy planner for unsupported patterns (e.g., variable-length paths).

What are “plan cliffs” and does Samyama have them?

A plan cliff occurs when a small change in data distribution causes the optimizer to switch to a dramatically different (and often worse) plan.

Example in a hypothetical CBO:

Selectivity of WHERE age > $threshold:
  threshold=95 → IndexScan  (fast, 5% of data)   → 2ms
  threshold=94 → IndexScan  (fast, 6% of data)   → 2.4ms
  threshold=93 → LabelScan! (slow, full table)    → 200ms  ← CLIFF!

The optimizer switches from index scan to full scan at a threshold, causing a 100x latency spike. Picasso visualizes these as sudden color changes in plan diagrams or sharp spikes in 3D cost surface plots.

Since v0.6.1, Samyama’s graph-native planner evaluates multiple candidate plans per query. Plan cliffs are possible (e.g., switching starting point or direction as data distribution shifts). samyama-insight’s Picasso tool visualizes these by sweeping parameter or pattern space and coloring cells by chosen plan, revealing plan switches and cost cliffs.

Can I evaluate alternative plans for the same query (Foreign Plan Costing)?

Yes, partially. The graph-native planner stores PlanDiagnostics for each query, accessible via EXPLAIN:

EXPLAIN MATCH (p:Person)-[:WORKS_AT]->(c:Company) RETURN p, c

Planner diagnostics:
  Candidates evaluated: 2
  Chosen plan cost: 1000.0
  Alternatives:
    Plan starting from p: cost 1000.0 ← selected
    Plan starting from c: cost 1000.0

samyama-insight’s Picasso page extends this further — sweeping parameter ranges and showing which plan wins at each point in the selectivity space. Full FPC-style “force a specific plan and measure sub-optimality” is on the roadmap.

Can I visualize and compare execution plans (Plan Diffing)?

EXPLAIN outputs a textual operator tree, which can be compared manually between different queries:

-- Query A:
EXPLAIN MATCH (n:Person) WHERE n.name = 'Alice' RETURN n
-- Output: IndexScanOperator(Person.name = 'Alice') → ProjectOperator

-- Query B:
EXPLAIN MATCH (n:Person) WHERE n.age > 25 RETURN n
-- Output: NodeScanOperator(Person) → FilterOperator(age > 25) → ProjectOperator

-- Manual diff: Query A uses IndexScan, Query B uses NodeScan + Filter
-- → Create an index on :Person(age) to improve Query B

There is no built-in plan diffing tool that automatically highlights differences between two plans. Plan diffing, plan diagram generation, and graphical plan visualization are on the roadmap.

Is there plan caching or AST caching?

Yes, since v0.6.0. Samyama caches both parsed ASTs and execution plans, keyed by query string hash. Repeated queries skip parsing and planning entirely:

-- First execution: parse + plan + execute
MATCH (n:Person) WHERE n.name = 'Alice' RETURN n    -- cold: ~40ms

-- Subsequent executions: cache hit, execute only
MATCH (n:Person) WHERE n.name = 'Alice' RETURN n    -- warm: ~2ms (cache hit)

The plan cache significantly reduces warm-query latency. LDBC benchmarks show high cache hit rates (e.g., 63 hits vs 21 misses on the SNB Interactive workload).

Prepared statements (PREPARE/EXECUTE syntax) are on the roadmap for explicit cache management.

What is predicate pushdown, and does Samyama do it?

Predicate pushdown moves filter conditions as close to the data source as possible — filtering early reduces the number of records flowing through the rest of the plan.

Since v0.6.0, Samyama performs full predicate pushdown across paths and MATCH clauses:

Index pushdown: When a WHERE predicate matches an indexed property, the IndexScanOperator applies the filter during the scan itself
Label filtering: NodeScanOperator only scans nodes with the specified label, not all nodes
Cross-scope pushdown (v0.6.0): WHERE predicates are scoped across paths and MATCH clauses, filtering as early as possible

-- Index pushdown (index on :Person(name)):
MATCH (n:Person) WHERE n.name = 'Alice' RETURN n
-- Plan: IndexScan(name='Alice')  ← filter is INSIDE the scan operator

-- Cross-scope pushdown (v0.6.0):
MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE b.age > 30
RETURN a.name, b.name
-- Plan: NodeScan(Person) → Expand(KNOWS) → Filter(b.age > 30) [pushed to earliest point]

Not yet implemented:

Predicates on aggregation results (HAVING-style) are not pushed below the aggregation
Edge predicates are not pushed into the ExpandOperator

Can I force a specific execution plan or provide optimizer hints?

Not yet. Samyama does not currently support:

USING INDEX directives (Neo4j-style)
USING SCAN to force a label scan
USING JOIN ON to force a specific join variable
Query hints or optimizer directives of any kind

The only way to influence plan selection today is:

-- 1. Create indexes so the planner automatically uses them:
CREATE INDEX ON :Person(name)
CREATE INDEX ON :Person(age)

-- 2. Reorder MATCH clauses (put most selective first):
-- Slow (scans all 1M persons first):
MATCH (a:Person), (b:Department {name: 'Engineering'}) ...
-- Fast (scans 1 department first):
MATCH (b:Department {name: 'Engineering'}), (a:Person) ...

-- 3. Use EXPLAIN to verify the plan:
EXPLAIN MATCH (n:Person) WHERE n.name = 'Alice' RETURN n

Optimizer hints and plan forcing are planned for a future release.

What is the query optimizer roadmap?

The optimizer roadmap, roughly in priority order:

Feature	Impact	Status
AST caching	Eliminate re-parsing (~22ms savings)	Done (v0.6.0)
Plan memoization	Eliminate re-planning (~18ms savings)	Done (v0.6.0)
Parameterized queries (`$param`)	Enable plan reuse across parameter values	Done (v0.6.0)
`PROFILE` (runtime statistics)	Actual rows, timing per operator	Done (v0.6.0)
`DROP INDEX` / `SHOW INDEXES`	Index lifecycle management	Done (v0.6.0)
Composite indexes	Multi-property indexes	Done (v0.6.0)
AND-chain index selection	Use best index for multi-predicate WHERE	Done (v0.6.0)
Predicate pushdown across scopes	Reduce intermediate result sizes	Done (v0.6.0)
Cost-based plan selection	Compare alternative plans by estimated cost	Done (v0.6.0)
Join reordering	Pick optimal join order based on cardinalities	Done (v0.6.0)
Early LIMIT propagation	Push LIMIT down to reduce work	Done (v0.6.0)
Index intersection	Combine multiple index scans	Planned
`USING INDEX` / `USING SCAN` hints	User-controlled plan forcing	Planned
Histogram-based statistics	Better selectivity estimates for skewed data	Planned
Adaptive query execution	Re-plan mid-execution if estimates are wrong	Research

How many physical operators does Samyama have?

42 physical operators organized into these categories:

Category	Operators	Count
Scan & Traverse	NodeScanOperator, ExpandOperator, ExpandIntoOperator, IndexScanOperator, VectorSearchOperator, NodeByIdOperator, ShortestPathOperator	7
Relational	FilterOperator, ProjectOperator, JoinOperator, LeftOuterJoinOperator, CartesianProductOperator	5
Aggregation	AggregateOperator, UnwindOperator, ForeachOperator	3
Sort & Limit	SortOperator, LimitOperator, SkipOperator, WithBarrierOperator	4
Write	CreateNodeOperator, CreateEdgeOperator, CreateNodesAndEdgesOperator, MatchCreateEdgeOperator, MatchMergeEdgeOperator, DeleteOperator, SetPropertyOperator, RemovePropertyOperator, MergeOperator	9
Schema/DDL	CreateIndexOperator, CreateVectorIndexOperator, CompositeCreateIndexOperator, CreateConstraintOperator, DropIndexOperator, ShowIndexesOperator, ShowConstraintsOperator, ShowLabelsOperator, ShowRelationshipTypesOperator, ShowPropertyKeysOperator, SchemaVisualizationOperator	11
Special	SingleRowOperator, AlgorithmOperator	2
Navigation	ShortestPathOperator	1

All operators implement the Volcano iterator model (lazy, pull-based) with late materialization (Value::NodeRef instead of full node clones).

How many index types does Samyama have?

6 distinct index types:

Index	Storage	Use Case	Complexity
PropertyIndex	B-Tree (`BTreeMap<PropertyValue, HashSet<NodeId>>`)	Point lookups and range scans on (label, property)	O(log n)
VectorIndex	HNSW (Hierarchical Navigable Small World)	Approximate nearest neighbor search	O(log n)
LabelIndex	`HashMap<Label, HashSet<NodeId>>`	Fast node lookup by label	O(1)
EdgeTypeIndex	`HashMap<EdgeType, HashSet<EdgeId>>`	Fast edge lookup by type	O(1)
SortedAdjacencyLists	Vec-of-Vec + FrozenAdjacency (CSR)	Neighbor traversal, `edge_between()` binary search	O(log d)
ColumnStore	Columnar property storage	Vectorized property reads for late materialization	O(1)

Additionally:

Composite indexes create individual PropertyIndex entries per property in the list
Unique constraints are enforced via PropertyIndex with uniqueness validation on insert
GraphCatalog maintains triple-level statistics (not an index, but used for cost-based planning)

Graph Algorithms

What algorithms are available?

13 algorithms in the samyama-graph-algorithms crate:

Category	Algorithms
Centrality	PageRank, Local Clustering Coefficient (directed + undirected)
Community	WCC, SCC, CDLP, Triangle Counting
Pathfinding	BFS, Dijkstra, BFS All Shortest Paths
Network Flow	Edmonds-Karp (Max Flow), Prim’s MST
Statistical	PCA (Randomized SVD + Power Iteration)

How do I run PageRank?

Via Cypher:

CALL algo.pagerank({label: 'Person', edge_type: 'KNOWS', damping: 0.85, iterations: 20})
YIELD node, score

Via SDK (Rust):

#![allow(unused)]
fn main() {
use samyama_sdk::AlgorithmClient;

let config = PageRankConfig { damping: 0.85, iterations: 20, tolerance: 1e-6 };
let scores = client.page_rank(config, "Person", "KNOWS").await?;
for (node_id, score) in &scores {
    println!("Node {}: {:.4}", node_id, score);
}
}

How do I find shortest paths?

Using Dijkstra for weighted shortest paths:

CALL algo.dijkstra({
  source_label: 'City', source_property: 'name', source_value: 'Mumbai',
  target_label: 'City', target_property: 'name', target_value: 'Delhi',
  edge_type: 'ROAD', weight_property: 'distance'
})
YIELD path, cost

Using BFS for unweighted shortest paths:

CALL algo.bfs({
  source_label: 'Person', source_property: 'name', source_value: 'Alice',
  edge_type: 'KNOWS'
})
YIELD node, depth

What is the CSR format and why is it used?

Compressed Sparse Row (CSR) is a cache-efficient array-based representation of a graph. Algorithms project from GraphStore into CSR for OLAP workloads because sequential memory access patterns allow CPU prefetching with ~100% accuracy.

Example — a graph with 4 nodes and 5 edges in CSR:

Adjacency:  0→1, 0→2, 1→2, 2→3, 3→0

out_offsets:  [0, 2, 3, 4, 5]   ← node i's edges start at out_offsets[i]
out_targets:  [1, 2, 2, 3, 0]   ← target node IDs, packed contiguously
weights:      [1.0, 1.0, ...]   ← optional edge weights

To iterate node 0's neighbors: out_targets[0..2] = [1, 2]
To iterate node 1's neighbors: out_targets[2..3] = [2]

This layout is ~10x faster than HashMap<NodeId, Vec<NodeId>> for iterative algorithms because it eliminates pointer chasing and hash lookups. See the Analytical Power chapter.

Does PCA support auto-selection of the solver?

Yes. PcaSolver::Auto selects Randomized SVD when n > 500 and k < 0.8 * min(n, d), otherwise falls back to Power Iteration.

Example via Cypher:

CALL algo.pca({
  label: 'Document',
  properties: ['feature1', 'feature2', 'feature3', 'feature4'],
  components: 2,
  solver: 'auto'
})
YIELD node, components

Via Rust SDK:

#![allow(unused)]
fn main() {
let config = PcaConfig { components: 2, solver: PcaSolver::Auto };
let results = client.pca(config, "Document", &["feature1", "feature2", "feature3"]).await?;
}

Vector Search & AI

What distance metrics are supported?

Three metrics: Cosine, L2 (Euclidean), and Dot Product.

Example — choosing the right metric:

-- Cosine: best for text embeddings (direction matters, not magnitude)
CREATE VECTOR INDEX FOR (d:Document) ON (d.embedding) OPTIONS {dimensions: 768, similarity: 'cosine'}

-- L2: best for spatial data (absolute distance matters)
CREATE VECTOR INDEX FOR (p:Point) ON (p.coords) OPTIONS {dimensions: 3, similarity: 'l2'}

-- Dot Product: best for pre-normalized embeddings
CREATE VECTOR INDEX FOR (i:Item) ON (i.features) OPTIONS {dimensions: 128, similarity: 'dot_product'}

What is Graph RAG?

Graph RAG combines vector search with graph traversal in a single query. Instead of retrieving vectors and filtering in the application layer, Samyama applies graph filters inside the execution engine.

Example — find documents similar to a query, but only from a specific author’s department:

MATCH (a:Author {name: 'Alice'})-[:WORKS_IN]->(dept:Department)
MATCH (d:Document)-[:AUTHORED_BY]->(colleague)-[:WORKS_IN]->(dept)
CALL db.index.vector.queryNodes('Document', 'embedding', $query_vector, 10)
YIELD node, score
WHERE node = d
RETURN d.title, score, colleague.name
ORDER BY score DESC

This prevents the “filter-out-all-results” problem where a pure vector search returns documents from irrelevant departments. See AI & Vector Search.

How do I generate embeddings? Why is Mock the default?

Samyama indexes and searches vectors but does not bundle an embedding model. The default Mock provider generates random vectors — this is deliberate to keep the binary small (~30MB savings), avoid mandatory model downloads, and let you choose the embedding model that fits your domain.

For real embeddings, choose based on your stack:

Stack	Provider	Setup
Python	`sentence-transformers`	`pip install sentence-transformers` — best model selection, easiest path
Rust	`ort` crate (ONNX Runtime)	Export model to ONNX, load with `ort::Session` — fastest, no Python
Any language	OpenAI API	HTTP call to `/v1/embeddings` — simplest, pay-per-use
Any language (local)	Ollama	`ollama pull nomic-embed-text` — free, private, runs anywhere

Python example with sentence-transformers:

from samyama import SamyamaClient
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")  # 384-dim
client = SamyamaClient.embedded()
client.create_vector_index("Document", "embedding", 384, "cosine")

embedding = model.encode("Graph databases unify structure and search").tolist()
client.add_vector("Document", "embedding", node_id, embedding)

See AI & Vector Search — Embedding Providers for complete examples across all providers.

What is Agentic Enrichment (GAK)?

Generation-Augmented Knowledge (GAK) is the inverse of RAG. Instead of using the database to help an LLM, the database uses an LLM to help build itself.

Example flow:

1. Event:    New node created: (:Company {name: 'Acme Corp'})
2. Trigger:  AgentRuntime detects missing properties (industry, revenue, CEO)
3. LLM Call: "What industry is Acme Corp in? Who is the CEO?"
4. Result:   SET n.industry = 'Manufacturing', n.revenue = 5000000
             CREATE (n)-[:LED_BY]->(:Person {name: 'Jane Smith', role: 'CEO'})
5. Safety:   Schema validation + destructive query rejection before commit

See Agentic Enrichment.

What LLM providers are supported for NLQ?

The NLQClient supports: OpenAI, Google Gemini, Ollama (local), Anthropic (Claude API), Claude Code, and Azure OpenAI. A Mock provider is also available for testing.

Example — natural language to Cypher:

#![allow(unused)]
fn main() {
let pipeline = NLQPipeline::new(NLQConfig {
    enabled: true,
    provider: LLMProvider::OpenAI,
    model: "gpt-4o".to_string(),
    api_key: Some(env::var("OPENAI_API_KEY")?),
    api_base_url: None,
    system_prompt: None,
})?;

let cypher = pipeline.text_to_cypher(
    "Who are Alice's friends that work at Google?",
    &schema_summary
).await?;
// Returns: MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(f:Person)-[:WORKS_AT]->(c:Company {name: 'Google'}) RETURN f.name
}

Supported providers: LLMProvider::OpenAI, Ollama, Gemini, Anthropic, ClaudeCode, AzureOpenAI, Mock (for testing).

The pipeline uses a whitelist safety check — only queries starting with MATCH, RETURN, UNWIND, CALL, or WITH are allowed through, preventing accidental mutations from LLM-generated Cypher.

Optimization

How many solvers are available?

22 metaheuristic solvers in the samyama-optimization crate:

Metaphor-less: Jaya, QOJAYA, Rao (1-3), TLBO, ITLBO, GOTLBO
Swarm/Evolutionary: PSO, DE, GA, GWO, ABC, BAT, Cuckoo, Firefly, FPA
Physics-based: GSA, SA, HS, BMR, BWR
Multi-objective: NSGA-II, MOTLBO

How do I run an optimization solver?

Via Cypher:

-- Single-objective: minimize supply chain cost
CALL algo.or.solve({
  solver: 'jaya',
  dimensions: 5,
  bounds: [[0, 100], [0, 100], [0, 100], [0, 100], [0, 100]],
  objective: 'minimize',
  fitness_function: 'supply_chain_cost',
  iterations: 1000,
  population: 50
})
YIELD solution, fitness

-- Multi-objective: Pareto-optimal trade-offs
CALL algo.or.solve({
  solver: 'nsga2',
  dimensions: 3,
  bounds: [[0, 1], [0, 1], [0, 1]],
  objectives: ['minimize_cost', 'maximize_quality'],
  population: 100,
  generations: 200
})
YIELD pareto_front

Are the optimization solvers open-source or enterprise-only?

All 22 solvers are in the open-source samyama-optimization crate. Enterprise adds GPU-accelerated constraint evaluation for large-scale problems.

How do I choose the right solver?

Scenario	Recommended Solver	Why
Simple optimization, no tuning	Jaya	Parameter-free, good baseline
Constraints with penalty functions	PSO or GWO	Good constraint handling
Multiple conflicting objectives	NSGA-II	Constrained Dominance Principle, Pareto front
High-dimensional search space	DE	Good for 10+ dimensions
Need global optimum, avoid local minima	SA (Simulated Annealing)	Probabilistic escape from local minima
Teaching/learning-inspired	TLBO	No algorithm-specific parameters

Performance & Scaling

What are the latest benchmark numbers?

On Mac Mini M4 (16GB RAM), v0.6.0:

Benchmark	CPU	GPU
Node Ingestion	255K/s	412K/s
Edge Ingestion	4.2M/s	5.2M/s
Cypher OLTP (1M nodes)	115K QPS	—
PageRank (1M nodes)	92ms	11ms (8.2x)
Vector Search (10K, 128d)	15K QPS	—

When should I use GPU acceleration?

GPU acceleration is beneficial for graphs with > 100,000 nodes. Below this threshold, CPU-GPU memory transfer overhead dominates.

Example — PageRank speedup at different scales:

10K nodes:   CPU 0.6ms vs GPU 9.3ms  → GPU is SLOWER (0.06x)
100K nodes:  CPU 8.2ms vs GPU 3.1ms  → GPU wins (2.6x faster)
1M nodes:    CPU 92ms  vs GPU 11ms   → GPU wins big (8.2x faster)

For PCA specifically, the threshold is 50,000 nodes and > 32 dimensions.

Has Samyama been validated against industry benchmarks?

Yes. Samyama achieved 28/28 (100%) on the LDBC Graphalytics benchmark suite across 6 algorithms (BFS, PageRank, WCC, CDLP, LCC, SSSP) on both XS and S-size datasets.

# Run the validation yourself:
cargo bench --bench graphalytics_benchmark -- --all

S-size datasets include cit-Patents (3.8M vertices), datagen-7_5-fb (633K vertices, 68M edges), and wiki-Talk (2.4M vertices). See Performance & Benchmarks.

What is the bottleneck in query execution?

At 1M nodes, the bottleneck is the language frontend (parsing: 54%, planning: 44%), not execution (2%):

Component          Time      % of total
─────────────────────────────────────────
Parse (Pest)       ~22ms     54%
Plan (AST→Ops)     ~18ms     44%
Execute (iterate)  <1ms       2%  ← actual graph work is sub-millisecond!

As of v0.6.0, a plan cache memoizes compiled execution plans for repeated queries, eliminating the parsing and planning overhead on warm queries. Parameterized queries ($param) further improve cache hit rates by separating query structure from literal values.

Where do the Neo4j and Memgraph comparison numbers come from?

Table 10 in the arxiv paper (2603.08036) compares Samyama against Neo4j 5.x and Memgraph 2.x. Here are the sources for each competitor number:

1-Hop Query Latency — Memgraph ~1.1 ms, Neo4j ~28 ms: From Memgraph’s official benchmark (Expansion 1 query: Memgraph 1.09 ms, Neo4j 27.96 ms).

Source: Memgraph vs. Neo4j: A Performance Comparison

Node Ingestion — Neo4j ~26K/s, Memgraph ~295K/s: From Memgraph’s write speed analysis — Neo4j took 3.8s to create 100K nodes (~26K/s); Memgraph took ~400ms for 100K nodes (~250K/s).

Source: Memgraph or Neo4j: Analyzing Write Speed Performance

Memory (1M nodes) — Neo4j ~1,200 MB, Memgraph ~600 MB: Neo4j’s JVM heap sizing recommendations (heap + page cache overhead for graph workloads); Memgraph’s C++ in-memory architecture characteristics.

Source: Neo4j Memory Configuration
Source: Memgraph vs Neo4j in 2025

GC Pauses — Neo4j 10-100 ms, Samyama/Memgraph 0 ms: Neo4j’s GC tuning documentation describes old-generation garbage collection pauses; Samyama (Rust) and Memgraph (C++) have no garbage collector.

Source: Neo4j GC Tuning

Additional resources:

Memgraph BenchGraph — interactive benchmark comparison tool
Memgraph White Paper: Performance Benchmark

Note: The memory numbers (~1,200 MB for Neo4j, ~600 MB for Memgraph at 1M nodes) are estimates based on architecture characteristics rather than a single published benchmark at exactly 1M nodes. The ingestion and latency numbers come from Memgraph’s published benchmarks, which were conducted on their hardware and configuration. Samyama numbers are measured on Mac Mini M4 (16 GB RAM). As stated in the paper: “Direct comparison is approximate due to different hardware, datasets, and query optimization levels.”

Architecture Deep Dive

Is Samyama ACID-compliant or eventually consistent?

Samyama provides local ACID guarantees for single-node deployments:

Atomicity: Each write query (CREATE, DELETE, SET, MERGE) executes as an atomic WriteBatch via RocksDB. Either all changes commit or none do.
Consistency: Unique constraints (when defined) are enforced before commit. Schema integrity is maintained across labels, edges, and properties.
Isolation: The in-memory GraphStore uses a RwLock — multiple concurrent readers with exclusive writer access. Queries see a consistent snapshot.
Durability: The Write-Ahead Log (WAL) persists every mutation before acknowledgement. On crash recovery, uncommitted WAL entries are replayed.

In a Raft cluster (Enterprise), writes go through consensus — a write is acknowledged only after a majority of nodes have persisted the log entry. This provides strong consistency (linearizable writes) at the cost of write latency. There is no “eventually consistent” mode.

Interactive multi-statement transactions (BEGIN...COMMIT) are on the roadmap. Today, each Cypher statement is an implicit transaction.

Is Samyama multi-master? How does Raft synchronization work?

No. Samyama uses single-leader Raft consensus (via the openraft crate):

One leader accepts all write requests and replicates them to followers.
Followers can serve read queries (read replicas) for horizontal read scaling.
If the leader fails, a new leader is automatically elected (typically within 1–2 seconds).

This is not a multi-master architecture. Multi-master would require conflict resolution (CRDTs, last-write-wins, etc.), which adds complexity and weakens consistency guarantees. Single-leader Raft gives us strong consistency without conflict resolution overhead.

Client Write ──► Leader ──► Follower 1 (ack)
                       └──► Follower 2 (ack)
                       └──► majority acked → commit → respond to client

Does Samyama use the RocksDB C/C++ library or a Rust port?

Samyama uses rust-rocksdb, which is a Rust binding to the original C++ RocksDB library from Meta (Facebook). It is NOT a Rust rewrite — it links against the actual C++ RocksDB via FFI (Foreign Function Interface). This means:

We get the battle-tested, production-proven RocksDB storage engine (used by Meta, CockroachDB, TiKV, etc.)
The Rust binding provides safe, idiomatic Rust APIs over the C++ core
Performance is identical to native RocksDB — no overhead from the binding layer

RocksDB handles compaction, compression (LZ4/Zstd), bloom filters, and sorted string tables (SSTs). Samyama uses RocksDB column families for multi-tenancy isolation.

How does concurrency work?

Samyama uses a readers-writer lock (tokio::sync::RwLock) at the GraphStore level:

Reads (MATCH queries): Multiple readers can execute concurrently. Each reader acquires a shared read lock.
Writes (CREATE, DELETE, SET, MERGE): A writer acquires an exclusive lock. No reads or other writes proceed while a write is in progress.
RESP server: The Tokio async runtime handles thousands of concurrent connections. Read queries are processed concurrently; write queries are serialized.

This model is simple and correct. For read-heavy workloads (typical for graph databases), it provides excellent throughput since reads never block each other. Write throughput is limited to one writer at a time, but individual writes are fast (sub-millisecond for most mutations).

Future work includes finer-grained concurrency (per-partition or MVCC-based), but the current model handles production workloads well because graph queries spend most time in traversal (reading), not mutation.

Are you using SIMD for graph traversal?

Not currently in explicit SIMD intrinsics, but we benefit from auto-vectorization by the LLVM backend (Rust compiles via LLVM). The --release build enables -O3 optimizations which include:

Auto-vectorized array operations in adjacency list scanning
SIMD-friendly memory layouts in the CSR (Compressed Sparse Row) representation used by graph algorithms
Cache-line-aligned data structures for traversal hot paths

For GPU acceleration (Enterprise), we use WGSL compute shaders via wgpu — this is massively parallel computation (thousands of GPU threads), which is a different paradigm from CPU SIMD. GPU shaders handle PageRank, CDLP, LCC, Triangle Counting, and PCA on large graphs (>100K nodes).

Explicit CPU SIMD intrinsics (e.g., for batch property filtering or distance calculations) are on the roadmap but not yet implemented.

How does multi-tenancy work internally? Is there database-level isolation?

Yes, tenants get storage-level isolation via RocksDB Column Families:

Each tenant gets its own Column Family in a single RocksDB instance. Column families are logically separate key-value namespaces — they have independent memtables, SST files, and compaction schedules.
One tenant’s heavy writes or compaction do not affect other tenants’ read/write performance.
Per-tenant quotas are enforced: max_nodes, max_edges, max_memory_bytes, max_storage_bytes, max_connections, and max_query_time_ms.

┌──────────── Single RocksDB Instance ────────────┐
│  ┌─────────────┐  ┌─────────────┐  ┌──────────┐ │
│  │  CF: acme   │  │ CF: globex  │  │ CF: ...  │ │
│  │  memtable   │  │  memtable   │  │          │ │
│  │  SST files  │  │  SST files  │  │          │ │
│  │  WAL        │  │  WAL        │  │          │ │
│  └─────────────┘  └─────────────┘  └──────────┘ │
└─────────────────────────────────────────────────┘

We chose a single RocksDB instance with column families over multiple RocksDB instances because:

Lower resource overhead: One set of background threads, one WAL, shared block cache
Simpler operations: One database to back up, monitor, and recover
Proven at scale: TiKV (TiDB’s storage engine) uses the same column-family-per-region approach

If you need stronger isolation (separate processes, separate machines), the Raft cluster topology allows deploying dedicated nodes per tenant.

How does embedding work? Is it a .so file or a Rust library?

Both options are available:

Rust library (primary): Add samyama-sdk as a Cargo dependency. The EmbeddedClient runs the full engine in-process — no server, no network, no serialization overhead.

[dependencies]
samyama-sdk = "0.6"

#![allow(unused)]
fn main() {
let client = EmbeddedClient::new();
client.query("default", "CREATE (n:Person {name: 'Alice'})").await?;
}

Python binding (PyO3): The Python SDK compiles to a native .so / .dylib shared library via PyO3. Install with pip install samyama (or maturin develop from source). No Rust toolchain needed at runtime.
```
from samyama import SamyamaClient
client = SamyamaClient.embedded()
result = client.query("default", "MATCH (n) RETURN count(n)")
```
C FFI (planned): A C-compatible shared library (.so / .dll) for embedding from any language with FFI support (Go, Java, C#, etc.) is on the roadmap.

For production services, most users run Samyama as a standalone server (RESP on :6379, HTTP on :8080) and connect via the Rust, Python, or TypeScript SDK using the RemoteClient.

Distributed Deployment & High Availability

Does Samyama support replication?

Yes. Samyama implements Raft consensus (via the openraft Rust crate) for distributed replication. All write operations (CREATE, SET, DELETE, MERGE) are replicated to followers before being committed.

How it works:

Client sends a write to the Raft leader
Leader appends to its local log (uncommitted)
Leader sends AppendEntries to followers in parallel
Once a quorum (majority) acknowledges, the entry is committed
Leader applies to the graph store and returns success
Followers apply in the next heartbeat cycle

Configuration: 500ms heartbeat, 1.5–3s election timeout, log compaction after 5000 entries.

How does a node failure get handled?

Scenario	Behavior	Downtime
Follower fails (1 of 3)	Quorum still holds (2/2), writes continue	None
Leader fails	Election triggered, new leader elected	150–300ms
Network partition	Majority partition continues; minority rejects writes	Auto-heals on reconnection

Recovery: When a failed node comes back online, it receives heartbeats from the current leader, requests missing log entries, catches up, and rejoins the cluster. No manual intervention needed.

Data safety: A Raft entry is committed only after replication to a majority. Even if the leader crashes immediately after committing, at least one other node has the data.

How does tenant persistence and restore work?

Each tenant’s data is persisted to RocksDB using column families (one per tenant). The write path is:

Write-Ahead Log (WAL) — sequential log for durability
RocksDB — indexed storage with tenant-prefixed keys
In-memory graph — the live GraphStore

On restart, PersistenceManager::recover(tenant) scans all nodes and edges from RocksDB and rebuilds the in-memory adjacency lists.

Snapshots (.sgsnap) provide an additional backup mechanism:

Export: POST /api/snapshot/export → gzip-compressed JSON-lines file
Import: POST /api/snapshot/import → ID remapping allows importing into non-empty stores
Use cases: disaster recovery, tenant migration, version-controlled deployments

How does this work in a distributed deployment?

In a Raft cluster:

All nodes hold a full copy of every tenant’s data (full replication, not partitioned)
The leader processes writes and replicates via Raft log entries
Followers can serve read queries (if configured for read replicas)
Snapshot and WAL are per-node; Raft log is the source of truth for consistency

Tenant-level sharding is implemented: a routing layer maps each tenant to a specific Raft cluster. Different tenants can be served by different clusters, providing logical isolation.

Tenant A → Raft Cluster 1 (nodes 1, 2, 3)
Tenant B → Raft Cluster 2 (nodes 4, 5, 6)
Tenant C → Raft Cluster 1 (same cluster as A)

What if a tenant needs 1 billion nodes? Isn’t sharding necessary?

Yes. Today, Samyama’s graph store is in-memory, so a single graph is limited by available RAM on one node. Practical limits:

Nodes	Edges	Approx. RAM
100K	1M	~500 MB
1M	10M	~5 GB
8M	28M	~33 GB
100M	500M	~150 GB
1B	5B	~1.5 TB

For 1 billion nodes, you would need either a very large machine (1.5+ TB RAM) or graph-level sharding — partitioning a single graph across multiple nodes.

Current status: Graph-level sharding is designed but not yet implemented (ADR-009). The approach uses graph-aware partitioning (METIS min-cut algorithm) to minimize cross-partition edges, with scatter-gather distributed query execution via Arrow Flight RPC.

Why not yet? It’s a research-level problem with very high complexity. The current Raft replication handles the majority of production use cases. Graph-level sharding will be implemented when customer demand justifies the engineering investment.

Workaround today: For very large graphs, use a machine with sufficient RAM (e.g., AWS r6i.24xlarge with 768 GB, or x2idn.32xlarge with 2 TB). The in-memory architecture means queries are extremely fast on these machines.

What are the recommended cluster sizes?

Cluster	Quorum	Fault Tolerance	Write Latency
1 node	1	None	~1.2ms
3 nodes	2	1 failure	~2.8ms
5 nodes	3	2 failures	~3.5ms

Recommendation: 3 nodes for most deployments (balances availability and latency). 5 nodes for critical workloads requiring tolerance of 2 simultaneous failures.

Enterprise & Operations

How does licensing work?

Enterprise uses JET (JSON Enablement Token)—an Ed25519-signed token containing org, edition, features, expiry, and machine fingerprint. 30-day grace period after expiry.

# Check license status:
redis-cli ADMIN.LICENSE

# Set license file:
SAMYAMA_LICENSE_FILE=/path/to/samyama.license cargo run --release --features gpu

See Enterprise Edition.

How do I create a backup?

# Full snapshot
redis-cli ADMIN.BACKUP CREATE

# List all backups
redis-cli ADMIN.BACKUP LIST

# Verify integrity of backup #5
redis-cli ADMIN.BACKUP VERIFY 5

# Restore from backup
redis-cli ADMIN.BACKUP RESTORE 5

What is Point-in-Time Recovery (PITR)?

PITR replays archived WAL entries against a snapshot to restore the database to an exact moment.

Example scenario:

10:30:00  Backup snapshot taken
10:30:04  Normal writes happening
10:30:05  Accidental: DELETE (n:Customer) WHERE n.region = 'APAC'   ← oops!
10:30:06  More writes

# Restore to 10:30:04 (before the accidental delete):
redis-cli ADMIN.PITR RESTORE "2026-03-04T10:30:04.000000"
# All APAC customers are back, writes after 10:30:04 are lost

How does multi-tenancy work?

Each tenant gets a dedicated RocksDB Column Family with per-tenant resource quotas (memory, storage, query time). Compaction is independent per tenant—one tenant’s write-heavy workload won’t affect others.

Example — querying within a specific tenant:

# Create a graph in tenant "acme"
redis-cli GRAPH.QUERY acme "CREATE (n:User {name: 'Alice'})"

# Query within that tenant (isolated from other tenants)
redis-cli GRAPH.QUERY acme "MATCH (n:User) RETURN n.name"

# Different tenant, different data
redis-cli GRAPH.QUERY globex "MATCH (n:User) RETURN n.name"  -- returns different results

See Observability & Multi-tenancy.

RDF & SPARQL

What RDF serialization formats are supported?

Format	Read	Write	Example
Turtle (.ttl)	✅	✅	`@prefix ex: <http://example.org/> . ex:Alice a ex:Person .`
N-Triples (.nt)	✅	✅	`<http://example.org/Alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/Person> .`
RDF/XML (.rdf)	✅	✅	`<rdf:Description rdf:about="http://example.org/Alice">`
JSON-LD (.jsonld)	❌	✅	`{"@id": "http://example.org/Alice", "@type": "Person"}`

Is SPARQL fully implemented?

SPARQL parser infrastructure is in place (via the spargebra crate), but query execution is not yet operational. The focus is on the OpenCypher engine.

Example of what will be supported:

PREFIX ex: <http://example.org/>
SELECT ?name ?age
WHERE {
  ?person a ex:Person .
  ?person ex:name ?name .
  ?person ex:age ?age .
  FILTER (?age > 25)
}
ORDER BY ?name

See RDF & SPARQL.

Can I use RDF and property graph data together?

A mapping framework (MappingConfig) is defined for converting between RDF triples and property graph nodes/edges. Automatic bidirectional conversion is on the roadmap.

Example of the conceptual mapping:

RDF Triple:  <ex:Alice>  <ex:knows>  <ex:Bob>
                  ↕              ↕           ↕
Property Graph:  (:Person {uri: 'ex:Alice'}) -[:knows]-> (:Person {uri: 'ex:Bob'})

SDKs & Integration

Which SDKs are available?

SDK	Language	Transport	Install
`samyama-sdk`	Rust	Embedded + HTTP	`cargo add samyama-sdk`
`samyama`	Python	Embedded + HTTP (PyO3)	`pip install samyama`
`samyama-sdk`	TypeScript	HTTP only	`npm install samyama-sdk`
`samyama-cli`	CLI	HTTP	`cargo install samyama-cli`

Can I embed Samyama in my application without running a server?

Yes. The Rust SDK’s EmbeddedClient runs the full engine in-process with zero network overhead:

#![allow(unused)]
fn main() {
use samyama_sdk::{EmbeddedClient, SamyamaClient};

let client = EmbeddedClient::new();

// Write data
client.query("default", "CREATE (n:Person {name: 'Alice', age: 30})").await?;
client.query("default", "CREATE (n:Person {name: 'Bob', age: 25})").await?;

// Query data
let result = client.query("default", "MATCH (n:Person) WHERE n.age > 28 RETURN n.name").await?;
println!("{:?}", result.rows);  // [["Alice"]]
}

How do I use the CLI?

# Single query
samyama-cli query "MATCH (n:Person) RETURN n.name, n.age" --format table

# Output:
# +-------+-----+
# | n.name| n.age|
# +-------+-----+
# | Alice |  30  |
# | Bob   |  25  |
# +-------+-----+

# Interactive REPL
samyama-cli shell
samyama> MATCH (n) RETURN count(n)
samyama> CREATE (n:City {name: 'Mumbai', population: 20000000})

# Server status
samyama-cli status --format json

# Health check
samyama-cli ping

Does the Python SDK support algorithms directly?

Yes (v0.6.0+). The Python SDK provides direct method-level algorithm access in embedded mode, in addition to Cypher CALL algo.* queries:

from samyama import SamyamaClient

# Embedded mode (no server required)
client = SamyamaClient.embedded()

# Create data
client.query("default", "CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})")

# Direct algorithm methods (embedded mode only)
scores = client.page_rank("Person", "KNOWS", damping=0.85, iterations=20)
components = client.wcc("Person", "KNOWS")
distances = client.bfs("Person", "KNOWS", start_node_id=0)
shortest = client.dijkstra("Person", "KNOWS", source_id=0, target_id=1, weight_property="weight")

# Also available: scc(), pca(), triangle_count()

# Or via Cypher (works in both embedded and remote mode)
result = client.query("default", """
    CALL algo.pagerank({label: 'Person', edge_type: 'KNOWS', iterations: 20})
    YIELD node, score
""")

How do I use the TypeScript SDK?

import { SamyamaClient } from 'samyama-sdk';

const client = SamyamaClient.connectHttp('http://localhost:8080');

// Query
const result = await client.query('default', 'MATCH (n:Person) RETURN n.name');
console.log(result.rows);

// Create data
await client.query('default', `
  CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})
`);

Project & Commercial

What is Samyama’s motivation and long-term vision?

Samyama was born from the observation that existing graph databases force users to choose between performance (C++/Rust in-memory engines), features (Cypher, vector search, NLQ, graph algorithms), and operational simplicity (easy deployment, Redis protocol compatibility). We believe a modern graph database should deliver all three.

The name “Samyama” comes from Sanskrit — it means “integration” or “bringing together.” The database integrates property graphs, vector search, natural language queries, graph algorithms, and constrained optimization into a single engine.

Long-term, Samyama aims to be the converged graph + AI database — where graph structure, vector embeddings, and LLM-powered queries work together natively, not as bolted-on features.

How do you plan to maintain this over 6–8 years?

Three pillars:

Rust as a foundation: Rust’s memory safety, zero-cost abstractions, and absence of garbage collection give us a codebase that is inherently more maintainable than C++ (no memory bugs) and more performant than JVM-based alternatives (no GC pauses). The compiler catches entire classes of bugs at compile time.
Open-core model: The Community Edition (Apache 2.0) ensures the core engine always has community scrutiny and contributions. Enterprise features (monitoring, backup, GPU, audit) are layered on top — they don’t fork the core. This means maintenance effort focuses on one engine, not two.
Revenue-funded engineering: The Enterprise tier funds dedicated engineering. We’re not dependent on VC funding cycles. The pricing model (data-scale tiers, not per-seat) ensures revenue grows with customer success.

We also invest heavily in automated quality: 250+ unit tests, 10 benchmark suites, LDBC Graphalytics validation (100% pass rate), and LDBC SNB Interactive/BI benchmarks run on every release.

What features are Enterprise-only vs. open source?

The core principle: Enterprise gates operations, not functionality. The full query engine, all algorithms, vector search, NLQ, persistence, and multi-tenancy are in the open-source Community Edition. Enterprise adds:

Enterprise-Only Feature	Why Enterprise
GPU acceleration (wgpu shaders)	Hardware-specific, driver dependencies
Prometheus metrics / health checks	Production monitoring
Backup & restore (full/incremental/PITR)	Data protection SLA
Audit logging	Compliance (SOC2, GDPR)
Enhanced Raft (HTTP/2 transport, snapshot streaming)	Production HA
ADMIN commands (CONFIG, STATS, TENANTS)	Operational control

How is the Enterprise edition priced?

Samyama uses a data-scale + cluster-size pricing model — not per-seat, not per-CPU, not per-query. Pricing is transparent and published:

Tier	Price	Data Limit	Cluster	Support
Community	Free	Unlimited	1 node	GitHub community
Pro	$499/mo ($4,990/yr)	10M nodes	Up to 3 nodes	Email, 48h SLA
Enterprise	$2,499/mo ($24,990/yr)	100M nodes	Unlimited	24/7, 4h Sev1 SLA
Dedicated Cloud	Contact sales	Unlimited	Unlimited	Named TAM, 1h Sev1 SLA

Annual commitment saves 17%. Multi-year (3-year) saves 30%.

We deliberately avoid per-CPU/per-core licensing — customers shouldn’t worry about hardware choices. Price scales with the value delivered (data size, operational maturity), not with infrastructure decisions.

Do you provide support? What does it look like?

Tier	Support Level	Response Time
Community	GitHub Issues, community forums	Best-effort
Pro	Email support	48h for general, 24h for Sev1
Enterprise	24/7 support, phone escalation	4h for Sev1, 8h for Sev2
Dedicated	Named Technical Account Manager	1h for Sev1, custom SLA

Add-ons available: dedicated support engineer (+$2,000/mo), premium SLA upgrade (+$500/mo), custom integration/consulting ($250/hr).

Is the pricing recurring or one-time? Per-CPU?

Recurring — monthly or annual subscription. Annual prepay saves 17%.

We explicitly avoid per-CPU/per-core licensing. The pricing model is based on data scale (node count) and cluster size (number of HA nodes). Customers can run on any hardware without license implications — whether it’s a 4-core laptop or a 128-core server.

Do you offer OEM licensing?

Yes. For partners who embed Samyama within their own product or manage it on behalf of their clients, we offer OEM / Embedded licensing with:

White-label deployment: No Samyama branding visible to end customers
Volume-based pricing: Per-deployment or per-end-customer pricing rather than per-instance
Redistribution rights: Bundle Samyama binaries within your product installer
Dedicated integration support: Engineering assistance for embedding and customization

OEM licensing is structured as a custom annual agreement. Contact sales for terms that match your deployment model (SaaS platform, managed service, on-prem appliance, etc.).

Keyboard shortcuts

Building Samyama: The Architecture of a Modern Rust Graph Database