Glossary
Key terms and concepts used throughout this book, organized alphabetically.
- Adjacency List
- A graph representation where each node stores a list of its outgoing and incoming edge IDs. Used in
GraphStorefor fast neighbor lookups. O(1) access to a node’s neighbors. - AGE (Autonomous Graph Evolution)
- AI agents that build knowledge graphs from unstructured data — entity extraction + relationship detection. Supports customer-defined regex patterns (deterministic, confidence=1.0) and/or LLM-based extraction (additive, schema-constrained). Per-tenant configurable. Tagline: “Paste any text, get a knowledge graph.” See also GAK (which enriches existing graphs rather than building them). Enterprise feature.
- Agentic Enrichment
- See GAK.
- Arena Allocation
- A memory management pattern where objects are allocated in contiguous blocks rather than scattered heap allocations. Samyama uses a versioned arena (
Vec<Vec<T>>) for nodes and edges, giving cache-friendly sequential memory access. - AST (Abstract Syntax Tree)
- The intermediate tree representation produced by the Pest parser after parsing a Cypher query string. Transformed by the
QueryPlannerinto a physical execution plan. - Bincode
- A Rust-specific binary serialization format used for RocksDB value encoding. Faster than JSON or Protobuf for Rust-to-Rust communication. Used to serialize
StoredNodeandStoredEdgestructs. - CAP Theorem
- States that a distributed system can provide only two of three guarantees: Consistency, Availability, Partition Tolerance. Samyama chooses CP (Consistency + Partition Tolerance) via Raft.
- CDLP (Community Detection via Label Propagation)
- A graph algorithm where each node adopts the most frequent label among its neighbors. Converges to natural community boundaries. LDBC Graphalytics standard.
- Column Family
- A RocksDB feature that logically partitions data. Samyama uses column families for tenant isolation (separate compaction, backup, and key namespaces per tenant).
- ColumnStore
- Samyama’s columnar property storage. Stores all values of a given property (e.g., all “ages”) in a contiguous array, enabling cache-efficient analytical queries and late materialization.
- Cost-Based Optimizer (CBO)
- The query planning component that uses
GraphStatistics(label counts, edge counts, property selectivity) to choose between execution strategies (e.g., IndexScan vs. NodeScan). - CSR (Compressed Sparse Row)
- A compact, read-only graph representation using three arrays (
out_offsets,out_targets,weights). Used for OLAP algorithm execution because sequential memory access enables CPU prefetching. - Cypher
- A declarative graph query language originally created by Neo4j. Samyama supports ~90% of the OpenCypher specification.
- EdgeId
- A
u64integer serving as a direct index into the edge storage arena. LikeNodeId, this gives O(1) access without hashing. - Embedded Mode
- Running the Samyama engine in-process (no server) via
EmbeddedClient. Zero network overhead, full access to algorithms, vector search, and persistence APIs. - EXPLAIN
- A Cypher prefix that returns the physical execution plan without executing the query. Shows operator tree, estimated row counts, and graph statistics.
- GAK (Generation-Augmented Knowledge)
- Samyama’s paradigm where the database uses LLMs to autonomously discover and create missing data, inverting the traditional RAG pattern. The database actively builds its own knowledge graph.
- GraphStatistics
- Runtime statistics maintained by
GraphStore: label counts, edge type counts, average degree, and property stats (null fraction, distinct count, selectivity). Used by the cost-based optimizer. - GraphStore
- The core in-memory storage structure. Contains versioned arenas for nodes/edges, adjacency lists, column stores, vector indices, and property indices.
- GraphView
- The CSR representation of a projected subgraph, used as input to all algorithms in
samyama-graph-algorithms. Immutable once built, enabling zero-lock parallel processing. - An approximate nearest neighbor search algorithm for vector indexing. Provides logarithmic search complexity with high recall. Implemented via the
hnsw_rscrate. - JET (JSON Enablement Token)
- The Enterprise license format:
base64(header).base64(payload).base64(signature)with Ed25519 signing. Contains org, features, expiry, and machine fingerprint. - Label
- A string tag on a node that categorizes it (e.g.,
Person,Account). Nodes can have multiple labels. Labels are indexed for fast scanning. - Late Materialization
- An optimization where scan operators produce
Value::NodeRef(id)references instead of full node clones. Properties are resolved on-demand only at theProjectOperator, reducing memory bandwidth by 4-5x. - LDBC Graphalytics
- The industry-standard benchmark suite for graph analytics correctness and performance. Samyama passes 28/28 tests across 6 algorithms on XS and S-size datasets.
- LSM-Tree (Log-Structured Merge-Tree)
- The storage engine architecture used by RocksDB. Converts random writes into sequential appends, optimizing for write-heavy workloads like graph databases.
- MCP (Model Context Protocol)
- Anthropic’s open standard for connecting LLMs to external tools and data sources. Samyama exposes graph queries as MCP tools, enabling LLM agents to query the knowledge graph via pre-authored Cypher templates. Achieves 98% accuracy on BiomedQA benchmark vs. 0% for text-to-Cypher. See
sdk/python/samyama_mcp/. - Mechanical Sympathy
- Designing software to align with hardware characteristics (CPU caches, memory access patterns, SIMD lanes). A core design principle throughout Samyama.
- Metaheuristic
- A nature-inspired optimization algorithm that searches for “good enough” solutions in complex spaces. Samyama implements 22 metaheuristics (Jaya, PSO, DE, GWO, NSGA-II, etc.).
- MVCC (Multi-Version Concurrency Control)
- A concurrency technique where readers see a consistent snapshot while writers create new versions. Samyama implements MVCC via version chains in the node/edge arenas.
- NodeId
- A
u64integer serving as a direct index into the versioned node arena (Vec<Vec<Node>>). This eliminates hash lookups, giving O(1) access with cache-friendly contiguous memory. - NodeRef
- A lightweight
Value::NodeRef(NodeId)used in late materialization. Carries only the ID, not the full node data. Properties are resolved lazily viaresolve_property(). - NLQ (Natural Language Query)
- The pipeline that converts natural language questions to Cypher queries using LLMs. Supports OpenAI, Gemini, Ollama, and Claude providers.
- NSGA-II (Non-dominated Sorting Genetic Algorithm II)
- A multi-objective optimization algorithm that finds Pareto-optimal solutions. Used with the Constrained Dominance Principle for feasible-first selection.
- OpenCypher
- The open standard for the Cypher query language, maintained by the openCypher project. Samyama implements ~90% of the specification.
- Pareto Front
- The set of solutions where no objective can be improved without worsening another. NSGA-II and MOTLBO return Pareto fronts for multi-objective optimization.
- PCA (Principal Component Analysis)
- A dimensionality reduction technique that projects high-dimensional data onto principal components. Samyama implements Randomized SVD (Halko et al.) and Power Iteration solvers.
- PEG (Parsing Expression Grammar)
- A formal grammar type that uses ordered choice (tries alternatives left-to-right). Samyama’s Cypher parser uses the Pest PEG library.
- PhysicalOperator
- The trait implemented by all 35 execution operators. Each operator processes
RecordBatches in a pull-based Volcano model. - PITR (Point-in-Time Recovery)
- Enterprise feature that restores the database to an exact timestamp by replaying WAL entries against a snapshot.
- PROFILE
- A planned Cypher prefix (not yet implemented) that will execute the query and return actual row counts and timing per operator, complementing EXPLAIN.
- PropertyValue
- The union type for node/edge properties:
String,Integer,Float,Boolean,DateTime,Array,Map, orNull. - Raft
- A consensus algorithm for distributed systems. Ensures all nodes agree on the log order. Samyama uses the
openraftcrate for leader election, log replication, and quorum commits. - Rayon
- A Rust parallel computing library used for data-parallel algorithm execution. Enables zero-overhead parallel iteration over CSR arrays.
- RDF (Resource Description Framework)
- A W3C standard for representing knowledge as subject-predicate-object triples. Samyama supports RDF with SPO/POS/OSP indexing and Turtle/N-Triples/RDF-XML serialization.
- RecordBatch
- The internal data structure passed between operators in the Volcano model. Contains columns of
Values and supports batch processing of 1,024 records at a time. - RESP (Redis Serialization Protocol)
- The wire protocol used by Redis clients. Samyama implements RESP3 for backward compatibility with the Redis ecosystem.
- RocksDB
- An embedded key-value store based on LSM-Trees, originally forked from LevelDB by Facebook. Samyama uses it for persistent storage with Column Families for multi-tenancy.
- Selectivity
- The fraction of rows that satisfy a filter predicate. Low selectivity (e.g., 0.01 = 1%) means the filter is highly selective, favoring index scans.
- Snapshot Isolation
- A concurrency level where each query sees a consistent point-in-time view of the database, regardless of concurrent writes. Achieved via MVCC version chains.
- SPARQL
- The W3C standard query language for RDF data. Parser infrastructure is in place via
spargebra; query execution is in development. - Volcano Model
- A query execution model where operators form a tree and data flows bottom-up via
next_batch()calls. Each operator pulls from its children on demand (lazy evaluation). - WAL (Write-Ahead Log)
- A sequential log where all mutations are written before being applied to the main storage. Ensures durability: if the process crashes, uncommitted changes can be replayed.
- wgpu
- The Rust implementation of the WebGPU API. Used in Samyama Enterprise for GPU-accelerated graph algorithms via WGSL compute shaders targeting Metal, Vulkan, and DX12.
- WGSL (WebGPU Shading Language)
- The shader language for WebGPU compute kernels. Samyama Enterprise uses WGSL shaders for PageRank, CDLP, LCC, Triangle Counting, PCA, and vector distance operations.