v14_branchb_learnable_vocab · Status Report
Try Engram → Updated: 2026-05-05 UTC
Status — v14_branchb_learnable_vocab: Vocab Was the Bottleneck. Branch B Confirmed. First Recognizable Dialog Output.

Update 2026-05-05: v14_branchb_learnable_vocab deployed live — single architectural change from v13: vocab_matrix_global is now an nn.Parameter (~5.6M trainable parameters) trained alongside the brain via AdamW with lr=EMBED_LR/2=2.5e-4. Brain was warm-started from v13 final and frozen for epoch 1 so the vocab could adjust against fixed brain predictions, then unfrozen for epochs 2–5 (joint training). Architecture otherwise identical to v13: 12L / 384D / 12H / RoPE / Pre-LN / AdamW, vocab 14,704, context 32, 5.3 MB corpus. Trained 2026-05-05 on Modal L4, $8.00.

Loss curve (the key evidence): v13 final plateau: 5.04 nats. Epoch 1 end (brain frozen, only vocab moving): 4.7854 — already 0.25 nats below v13 with the brain frozen. Epoch 2 (brain unfrozen, joint training): 4.0939. Epoch 3: 3.5592. Epoch 4: 3.0563. Epoch 5 final: 2.7059 — 2.34 nats below v13, ~2.12 nats above the theoretical floor at INV_TEMP=30 (≈0.59).

Smoking gun — epoch 1 with brain frozen. The fact that loss broke through v13's 5.04 plateau in epoch 1 with the brain still frozen is the clean ablation: vocab geometry was the bottleneck. The sentence-transformer initialization optimizes for semantic similarity ("cat" near "dog"), not syntactic prediction ("cat" followed by "is"). Letting the vocab learn under the cross-entropy gradient unlocked the descent.

Eval verdict — PARTIAL. Qualitatively much improved, not yet fully coherent. Output now produces real English fragments and recognizable conversational/scheduling patterns from the dailydialog corpus. Sample outputs: "hi how about tomorrow coming to account on friday", "okay how about one week user what time bot at ten to p after", "ten minutes walk", "i love you very long". Compare v13's token-soup: "today we who had selling how me user". Replies are not yet fully coherent multi-turn dialogue but this is by far the cleanest signal yet — dialog-shaped text is appearing for the first time.

Significance of Branch B. Branch B is one of engram's core architectural bets: vocab/brain as separable, swappable components. v14-B is the first run to exercise the "learnable" half of "learnable 96-D coordinates" under cross-entropy training. The frozen sentence-transformer geometry optimized for semantic similarity turns out to be hostile to syntactic prediction — these are different tasks, and the gradient needed to move the vocab to find out.

Next — Branches A, C, D from V14_CANDIDATES.md. Branch A (raise halt-gate cap from 3 → 5 + lower ponder cost) is most likely next: avg_ponder is still pegging at 3.00 (the cap), meaning the adaptive-compute lever is not yet engaged. Branch D (surprise-modulated gradient) and Branch C (episodic memory at training time) are the remaining axes. After A/C/D land, plans/FUTURE_RESEARCH.md picks up at v15+ (∇-Reasoner is the recommended first follow-on, zero training cost). Cumulative spend: ~$73 of $150 budget ceiling.

Active Deployment
v14_branchb_learnable_vocab
LIVE · PARTIAL · Dialog fragments
Trained 2026-05-05 · Modal L4 GPU · $8.00
Pre-LN AdamW RoPE cosine x-ent 21.5M brain params vocab 14,704 learnable vocab ~5.6M params INV_TEMP=30 avg_ponder pegged at 3.00
v14-B Training Metrics
distinct
PASS
avg_ponder
3.00
fragments
PARTIAL
final loss
2.71
Server Status
Status Live (HTTP 200)
Active model v14_branchb_learnable_vocab
Architecture 12L · 384D · 12H · head_dim=32 · RoPE · Pre-LN (frozen since v9)
Loss function cosine cross-entropy · tied output projection · INV_TEMP=30
Brain params 21,509,761
Engram module params 38,696,064
Vocab size 14,704 — now learnable nn.Parameter (~5.6M trainable params); warm-started from sentence-transformer init
Vocab training schedule epoch 1: brain frozen, vocab only (lr=2.5e-4); epochs 2–5: joint training (brain + vocab)
Corpus dailydialog_clean.txt + everyday_conversations.txt (5.3 MB combined)
Training 5 epochs · Modal L4 · 2026-05-05
Per-epoch loss ep1 (brain frozen): 4.7854 · ep2: 4.0939 · ep3: 3.5592 · ep4: 3.0563 · ep5: 2.7059
Final loss 2.7059 nats (theoretical floor at INV_TEMP=30: ≈0.59 — 2.12 nats headroom)
Training cost $8.00 · cumulative ~$73 of $150 ceiling
Platform Modal (L4 GPU)
OpenMythos Architecture Transfer — Ablation Results

Five architectural ideas from kyegomez/OpenMythos were ported and tested with strict pass/kill criteria from a benchmark harness (bench/run.py). Only one shipped.

Phase Idea Status Why
0 Reproducibility harness + baseline PASSED 0.0e+00 loss diff between identical runs; baseline locked
1 LTI residual injection KILLED grad_norm_p99 unchanged (0.572 vs 0.561); no eval gain
2 Loop-index sinusoidal embedding KILLED halt gate completely insensitive to loop signal
3 Inference-time depth extrapolation SKIPPED Phase 1+2 prereq chain broken
4 Per-loop LoRA SKIPPED Phase 1–3 prereq chain broken
5 RoPE positional encoding PASSED · SHIPPED grad_norm_p99 halved (0.561→0.280); zero quality cliff at 3× train context
6 Lock + document PASSED use_rope=True locked as default; README updated
What We've Learned — v8 through v14

Each run tested one hypothesis. Architecture has been stable since v6 (Pre-LN + AdamW + RoPE).

Model Change tested Final loss Vocab Coherent? Conclusion
v8_clean Corpus cleanup — strip numeric artifacts, merge rare tokens 1.0044 9,509 Real words, no grammar Cleanup worked — output became recognizable English. Grammar still missing.
v9_dialog_big Capacity — 384D / 12L / 12H, ~21.5M brain params (was ~6M) ~1.05 9,509 FAIL Model capacity hypothesis refuted. Word salad at 21.5M same as at 6M.
v10_dialog_corpus Corpus expansion — intended to add everyday_conversations.txt ~1.05 9,509 NOT TESTED File not committed. Corpus expansion hypothesis was not tested — v9 re-run.
v11_dialog_2corpus Corpus expansion — first clean test with both files committed 1.0480 (MSE) 14,704 FAIL Corpus volume hypothesis refuted. +55% vocab, same coherence level as v9.
v12_xent Loss function — MSE on embeddings → cosine cross-entropy, INV_TEMP=10 7.38 nats (x-ent) 14,704 REPLACED Cross-entropy trains cleanly; dialog scaffolding tokens present. INV_TEMP=10 too flat — 5.6 nats above floor. Temperature calibration identified as next test.
v13_xent_temp30 Temperature — INV_TEMP 10 → 30; all else frozen 5.04 nats (x-ent) 14,704 REPLACED Loss dropped 2.34 nats. avg_ponder saturated at 2.7–2.9 (cap=3). Output still token-soup. Temperature calibration hypothesis partially supported for loss, refuted for coherence. Frozen vocab embeddings identified as bottleneck.
v14_branchb_learnable_vocab Branch B: vocab_matrix_global made nn.Parameter (~5.6M params); brain frozen ep1, joint training ep2–5 2.7059 nats (x-ent) 14,704 (learnable) PARTIAL · BRANCH B CONFIRMED Loss broke below v13's 5.04 plateau in epoch 1 with brain frozen — vocab geometry was the bottleneck. Final loss 2.7059 (2.34 nats below v13). Output shows real English fragments and dialog patterns for the first time. avg_ponder pegged at cap=3 — adaptive-compute lever still not engaged. Branches A/C/D queued.
Evaluation Report — Human Judgement

A 16-prompt evaluation across three difficulty buckets (greetings · chitchat · harder Q&A) was run against v4_rope. Result: all 32 generated replies were near-identical sequences of rare/odd words regardless of input prompt. Example prompt/reply pairs:

[USER] hello
[BOT] you for suavely user resigning to the for impressiveness resigning i you freshener that the

[USER] what is the capital of france
[BOT] that slighted resigning to impressiveness you the i and to resigning the defilements for you

[USER] tell me a story
[BOT] you for suavely user resigning to the for impressiveness resigning i you freshener that the

Diagnosis: the model isn't conditioning on input prompts. The output collapses to the same vocab cluster regardless of context. Two plausible causes:

Roadmap to Coherent — Cost & Time Tiers

Modal pricing as of 2026-04-26 (Starter plan, $25 included credits). Each tier assumes restart from previous run + bug fixes. Recommendation: don't budget more than $5 until v5 confirms whether undertraining is really the bottleneck.

Tier Change Wall time GPU Cost Expected quality
v4 (now) 19M params · 15 MB corpus · 1 epoch ~1.5h L4 ($0.80/h) $1.20 gibberish (current)
v5 + full dailydialog · 5 epochs ~6h L4 ~$5 most greetings/chitchat coherent
v6 + bigger model (50M params, 12 layers) ~12h A10G ($1.10/h) ~$13 usually-coherent short replies
v7 + 200 MB conversational corpus · 80M params ~30h L40S ($1.95/h) ~$60 actual conversation, occasional weirdness
v8 + 1 GB corpus · 100M params · 5 epochs ~80h L40S ~$160 GPT-2-tier coherence
Logical Next Steps (updated 2026-05-05)
  1. Branch A — raise halt-gate cap from 3 → 5, lower ponder cost. avg_ponder is pegging at exactly 3.00 (the current cap) in v14-B evals, which means the adaptive-compute lever is not engaged — the model is always running to the cap rather than learning to halt earlier on easy tokens. Raising the cap to 5 and reducing the ponder cost will test whether the model can learn differentiated pondering depth. This is a relatively cheap single-variable change. See plans/V14_CANDIDATES.md.
  2. Branches C and D — episodic memory and surprise-modulated gradient. Branch C adds episodic memory at training time; Branch D modulates the loss gradient by per-token prediction surprise. Both are independent axes that can be tested after Branch A. Order and priority are documented in plans/V14_CANDIDATES.md.
  3. v15+ research backlog. Once Branches A/C/D are resolved, plans/FUTURE_RESEARCH.md picks up at the ∇-Reasoner follow-on (zero training cost) and other v15+ candidates. The core vocab/brain architecture is now validated and coherent dialog fragments are appearing — further work should stack on this foundation rather than revisiting architecture fundamentals.
Recent Runs
Timestamp Model Status Notes
2026-05-05 v14_branchb_learnable_vocab LIVE · PARTIAL · Branch B confirmed vocab_matrix_global → nn.Parameter (~5.6M learnable); brain frozen ep1, joint ep2–5; ep1 loss 4.7854 (below v13's 5.04 with brain frozen — vocab was the bottleneck); final loss 2.7059 nats; distinct PASS, dialog fragments present, not fully coherent; avg_ponder 3.00 (cap); $8.00 · cumulative ~$73
2026-05-04 v13_xent_temp30 REPLACED INV_TEMP 10 → 30; 5 epochs; final loss 5.04 nats (floor ≈0.59, headroom 4.45); distinct PASS, avg_ponder 2.7–2.9 (near cap saturation), coherent FAIL; $6.00 · cumulative ~$57; frozen vocab identified as bottleneck
2026-05-03 v12_xent REPLACED loss MSE → cosine x-ent tied projection; 5 epochs; final loss 7.38 nats (floor ≈1.77, headroom 5.6); distinct PASS, dialog scaffolding partial, coherent FAIL; $6.00 · commit e281458
2026-05-02 03:55 v11_dialog_2corpus REPLACED vocab grew 9,509 → 14,704 confirming both corpora ingested; final loss 1.0480; eval 1/3 — distinct PASS, english partial (proper-noun leakage), coherent FAIL; avg_ponder collapsed 3.0→1.0; 11h08m · ~$6; corpus-volume hypothesis refuted
2026-04-29 08:52 v10_dialog_corpus REPLACED corpus bug: everyday_conversations.txt not committed — trained on dailydialog only (same as v9); vocab=9,509 confirms no expansion; effectively a v9 re-run · ~$4 wasted
2026-04-28 19:50 v9_dialog_big REPLACED 21.5M params · 12L · 384D · 5 epochs · ~$4 · eval 1/3 criteria: distinct PASS, english mostly, coherent FAIL
2026-04-28 02:16 v8_clean REPLACED cleaned corpus (9,509-token vocab) · 5 epochs · loss=1.0044 · real English words but word salad
2026-04-26 01:18 v4_rope REPLACED 1 epoch, 15 MB corpus, loss=1.1347, eval failed (see report)
2026-04-25 22:31 phase5_5b PASSED RoPE extrapolation: 5.0% top1 maintained at 2× and 3× train context
2026-04-25 22:12 phase5_5a PASSED RoPE at-distribution: grad_norm_p99 halved (0.561→0.280)
2026-04-25 21:58 phase2_loopidx KILLED halt gate insensitive to loop signal
2026-04-25 21:47 phase1_lti KILLED no grad-norm or eval gain
2026-04-25 21:13 baseline LOCKED eval_cosine_top1=5.0%, grad_norm_p99=0.561, reproducible
2026-04-06 09:02 large_iter4 REPLACED Pre-RoPE model (128D · 5L · ctx=16) — kept available for rollback