VLDB 2027 Systems Paper (In Preparation)
Target: PVLDB Vol. 20, Industry Track (rolling deadlines starting ~April 2026)
Working Title: Samyama: A Unified Graph-Vector Database with Cost-Based Query Planning and Late Materialization
Status: In preparation. Extends Paper 1 (arxiv:2603.08036) with deeper evaluation and VLDB-specific focus.
Strategy
The arxiv preprint covers everything broadly (11 pages). For VLDB we focus and deepen on 3 core database contributions:
- Late materialization for property graphs (NodeRef/EdgeRef) — 4x traversal speedup, 60% memory reduction
- Graph-native cost-based planner (ADR-015) — triple-level statistics, plan enumeration, direction reversal, ExpandInto
- Hybrid CSR adjacency — frozen tier + write buffer, two-phase bulk loading, 1B edges in 3h46m for $2.50
De-emphasize: GPU acceleration, agentic enrichment, optimization solvers, RDF (mention as system features, don’t evaluate deeply).
Add: LDBC SNB comparison, billion-edge evaluation, cross-KG federation at scale.
Current State
- LaTeX:
research/vldb/samyama-vldb.tex(62KB, most developed draft) - Plan:
research/vldb/PLAN.md - Formatting:
research/vldb/FORMATTING-TODO.md
Key New Results (since arxiv v2)
| Result | Impact |
|---|---|
| 1B-edge biomedical trifecta (74M nodes, $2.50 spot) | Scale credibility |
| WCO TrieJoin for cyclic patterns | Algorithmic contribution |
| Edge stub memory optimization (24GB savings at 1B edges) | Engineering contribution |
| 6-KG federation (biomedical + public health, 305K nodes, 40/40) | Federation story |
| Rayon parallel algorithms (PageRank, LCC, CDLP) | Parallelism |