Update 2026-05-05: v14_branchb_learnable_vocab deployed live — single architectural change from v13: vocab_matrix_global is now an nn.Parameter (~5.6M trainable parameters) trained alongside the brain via AdamW with lr=EMBED_LR/2=2.5e-4. Brain was warm-started from v13 final and frozen for epoch 1 so the vocab could adjust against fixed brain predictions, then unfrozen for epochs 2–5 (joint training). Architecture otherwise identical to v13: 12L / 384D / 12H / RoPE / Pre-LN / AdamW, vocab 14,704, context 32, 5.3 MB corpus. Trained 2026-05-05 on Modal L4, $8.00.
Loss curve (the key evidence): v13 final plateau: 5.04 nats. Epoch 1 end (brain frozen, only vocab moving): 4.7854 — already 0.25 nats below v13 with the brain frozen. Epoch 2 (brain unfrozen, joint training): 4.0939. Epoch 3: 3.5592. Epoch 4: 3.0563. Epoch 5 final: 2.7059 — 2.34 nats below v13, ~2.12 nats above the theoretical floor at INV_TEMP=30 (≈0.59).
Smoking gun — epoch 1 with brain frozen. The fact that loss broke through v13's 5.04 plateau in epoch 1 with the brain still frozen is the clean ablation: vocab geometry was the bottleneck. The sentence-transformer initialization optimizes for semantic similarity ("cat" near "dog"), not syntactic prediction ("cat" followed by "is"). Letting the vocab learn under the cross-entropy gradient unlocked the descent.
Eval verdict — PARTIAL. Qualitatively much improved, not yet fully coherent. Output now produces real English fragments and recognizable conversational/scheduling patterns from the dailydialog corpus. Sample outputs: "hi how about tomorrow coming to account on friday", "okay how about one week user what time bot at ten to p after", "ten minutes walk", "i love you very long". Compare v13's token-soup: "today we who had selling how me user". Replies are not yet fully coherent multi-turn dialogue but this is by far the cleanest signal yet — dialog-shaped text is appearing for the first time.
Significance of Branch B. Branch B is one of engram's core architectural bets: vocab/brain as separable, swappable components. v14-B is the first run to exercise the "learnable" half of "learnable 96-D coordinates" under cross-entropy training. The frozen sentence-transformer geometry optimized for semantic similarity turns out to be hostile to syntactic prediction — these are different tasks, and the gradient needed to move the vocab to find out.
Next — Branches A, C, D from V14_CANDIDATES.md. Branch A (raise halt-gate cap from 3 → 5 + lower ponder cost) is most likely next: avg_ponder is still pegging at 3.00 (the cap), meaning the adaptive-compute lever is not yet engaged. Branch D (surprise-modulated gradient) and Branch C (episodic memory at training time) are the remaining axes. After A/C/D land, plans/FUTURE_RESEARCH.md picks up at v15+ (∇-Reasoner is the recommended first follow-on, zero training cost). Cumulative spend: ~$73 of $150 budget ceiling.
Five architectural ideas from kyegomez/OpenMythos were ported and tested with strict pass/kill criteria from a benchmark harness (bench/run.py). Only one shipped.
| Phase | Idea | Status | Why |
|---|---|---|---|
| 0 | Reproducibility harness + baseline | PASSED | 0.0e+00 loss diff between identical runs; baseline locked |
| 1 | LTI residual injection | KILLED | grad_norm_p99 unchanged (0.572 vs 0.561); no eval gain |
| 2 | Loop-index sinusoidal embedding | KILLED | halt gate completely insensitive to loop signal |
| 3 | Inference-time depth extrapolation | SKIPPED | Phase 1+2 prereq chain broken |
| 4 | Per-loop LoRA | SKIPPED | Phase 1–3 prereq chain broken |
| 5 | RoPE positional encoding | PASSED · SHIPPED | grad_norm_p99 halved (0.561→0.280); zero quality cliff at 3× train context |
| 6 | Lock + document | PASSED | use_rope=True locked as default; README updated |
Each run tested one hypothesis. Architecture has been stable since v6 (Pre-LN + AdamW + RoPE).
| Model | Change tested | Final loss | Vocab | Coherent? | Conclusion |
|---|---|---|---|---|---|
| v8_clean | Corpus cleanup — strip numeric artifacts, merge rare tokens | 1.0044 | 9,509 | Real words, no grammar | Cleanup worked — output became recognizable English. Grammar still missing. |
| v9_dialog_big | Capacity — 384D / 12L / 12H, ~21.5M brain params (was ~6M) | ~1.05 | 9,509 | FAIL | Model capacity hypothesis refuted. Word salad at 21.5M same as at 6M. |
| v10_dialog_corpus | Corpus expansion — intended to add everyday_conversations.txt | ~1.05 | 9,509 | NOT TESTED | File not committed. Corpus expansion hypothesis was not tested — v9 re-run. |
| v11_dialog_2corpus | Corpus expansion — first clean test with both files committed | 1.0480 (MSE) | 14,704 | FAIL | Corpus volume hypothesis refuted. +55% vocab, same coherence level as v9. |
| v12_xent | Loss function — MSE on embeddings → cosine cross-entropy, INV_TEMP=10 | 7.38 nats (x-ent) | 14,704 | REPLACED | Cross-entropy trains cleanly; dialog scaffolding tokens present. INV_TEMP=10 too flat — 5.6 nats above floor. Temperature calibration identified as next test. |
| v13_xent_temp30 | Temperature — INV_TEMP 10 → 30; all else frozen | 5.04 nats (x-ent) | 14,704 | REPLACED | Loss dropped 2.34 nats. avg_ponder saturated at 2.7–2.9 (cap=3). Output still token-soup. Temperature calibration hypothesis partially supported for loss, refuted for coherence. Frozen vocab embeddings identified as bottleneck. |
| v14_branchb_learnable_vocab | Branch B: vocab_matrix_global made nn.Parameter (~5.6M params); brain frozen ep1, joint training ep2–5 | 2.7059 nats (x-ent) | 14,704 (learnable) | PARTIAL · BRANCH B CONFIRMED | Loss broke below v13's 5.04 plateau in epoch 1 with brain frozen — vocab geometry was the bottleneck. Final loss 2.7059 (2.34 nats below v13). Output shows real English fragments and dialog patterns for the first time. avg_ponder pegged at cap=3 — adaptive-compute lever still not engaged. Branches A/C/D queued. |
A 16-prompt evaluation across three difficulty buckets (greetings · chitchat · harder Q&A) was run against v4_rope. Result: all 32 generated replies were near-identical sequences of rare/odd words regardless of input prompt. Example prompt/reply pairs:
Diagnosis: the model isn't conditioning on input prompts. The output collapses to the same vocab cluster regardless of context. Two plausible causes:
Modal pricing as of 2026-04-26 (Starter plan, $25 included credits). Each tier assumes restart from previous run + bug fixes. Recommendation: don't budget more than $5 until v5 confirms whether undertraining is really the bottleneck.
| Tier | Change | Wall time | GPU | Cost | Expected quality |
|---|---|---|---|---|---|
| v4 (now) | 19M params · 15 MB corpus · 1 epoch | ~1.5h | L4 ($0.80/h) | $1.20 | gibberish (current) |
| v5 | + full dailydialog · 5 epochs | ~6h | L4 | ~$5 | most greetings/chitchat coherent |
| v6 | + bigger model (50M params, 12 layers) | ~12h | A10G ($1.10/h) | ~$13 | usually-coherent short replies |
| v7 | + 200 MB conversational corpus · 80M params | ~30h | L40S ($1.95/h) | ~$60 | actual conversation, occasional weirdness |
| v8 | + 1 GB corpus · 100M params · 5 epochs | ~80h | L40S | ~$160 | GPT-2-tier coherence |
| Timestamp | Model | Status | Notes |
|---|---|---|---|
| 2026-05-05 | v14_branchb_learnable_vocab | LIVE · PARTIAL · Branch B confirmed | vocab_matrix_global → nn.Parameter (~5.6M learnable); brain frozen ep1, joint ep2–5; ep1 loss 4.7854 (below v13's 5.04 with brain frozen — vocab was the bottleneck); final loss 2.7059 nats; distinct PASS, dialog fragments present, not fully coherent; avg_ponder 3.00 (cap); $8.00 · cumulative ~$73 |
| 2026-05-04 | v13_xent_temp30 | REPLACED | INV_TEMP 10 → 30; 5 epochs; final loss 5.04 nats (floor ≈0.59, headroom 4.45); distinct PASS, avg_ponder 2.7–2.9 (near cap saturation), coherent FAIL; $6.00 · cumulative ~$57; frozen vocab identified as bottleneck |
| 2026-05-03 | v12_xent | REPLACED | loss MSE → cosine x-ent tied projection; 5 epochs; final loss 7.38 nats (floor ≈1.77, headroom 5.6); distinct PASS, dialog scaffolding partial, coherent FAIL; $6.00 · commit e281458 |
| 2026-05-02 03:55 | v11_dialog_2corpus | REPLACED | vocab grew 9,509 → 14,704 confirming both corpora ingested; final loss 1.0480; eval 1/3 — distinct PASS, english partial (proper-noun leakage), coherent FAIL; avg_ponder collapsed 3.0→1.0; 11h08m · ~$6; corpus-volume hypothesis refuted |
| 2026-04-29 08:52 | v10_dialog_corpus | REPLACED | corpus bug: everyday_conversations.txt not committed — trained on dailydialog only (same as v9); vocab=9,509 confirms no expansion; effectively a v9 re-run · ~$4 wasted |
| 2026-04-28 19:50 | v9_dialog_big | REPLACED | 21.5M params · 12L · 384D · 5 epochs · ~$4 · eval 1/3 criteria: distinct PASS, english mostly, coherent FAIL |
| 2026-04-28 02:16 | v8_clean | REPLACED | cleaned corpus (9,509-token vocab) · 5 epochs · loss=1.0044 · real English words but word salad |
| 2026-04-26 01:18 | v4_rope | REPLACED | 1 epoch, 15 MB corpus, loss=1.1347, eval failed (see report) |
| 2026-04-25 22:31 | phase5_5b | PASSED | RoPE extrapolation: 5.0% top1 maintained at 2× and 3× train context |
| 2026-04-25 22:12 | phase5_5a | PASSED | RoPE at-distribution: grad_norm_p99 halved (0.561→0.280) |
| 2026-04-25 21:58 | phase2_loopidx | KILLED | halt gate insensitive to loop signal |
| 2026-04-25 21:47 | phase1_lti | KILLED | no grad-norm or eval gain |
| 2026-04-25 21:13 | baseline | LOCKED | eval_cosine_top1=5.0%, grad_norm_p99=0.561, reproducible |
| 2026-04-06 09:02 | large_iter4 | REPLACED | Pre-RoPE model (128D · 5L · ctx=16) — kept available for rollback |