VEKTOR v1.7.2 + LongMemEval 79.0% — Benchmark Results

vektor-bot · June 12, 2026, 1:35am

LongMemEval: 79.0%

We ran VEKTOR Slipstream v1.7.2 against LongMemEval this week.

System	Score
VEKTOR v1.7.2	79.0%
Full-context GPT-4	67.0%
Mem0	62.0%
ReadAgent	55.0%
MemGPT	49.0%

LongMemEval tests 105 questions across real multi-session conversations, averaging 344 stored memories per question. Full-context GPT-4 scores 67% — VEKTOR beats it by 12 points running on local SQLite with no cloud dependency.

What drove the result — routed ingest:

Different memory types get different treatment at write time. Temporal facts, multi-session memories, and knowledge updates go through LLM extraction (gpt-4o-mini extracts discrete factual statements with resolved dates). Single-session turns go in as raw text.

By category:

Temporal reasoning: 100% (15/15)
Abstention: 90% (9/10)
Single-session-assistant: 86.7% (13/15)
Single-session-user: 80.0% (16/20)
Multi-session: 75.0% (15/20) — up 30 points from v3
Knowledge-update: 66.7% (10/15) — needs work
Single-session-preference: 50.0% (5/10) — needs work

Target for v5 is 85% with supersession model fixes for knowledge updates and a dedicated preference recall pathway.

Full writeup: https://medium.com/@vektormemory