Every GPU run, every bug, every iteration toward a mind
Engram is a small custom language model built from scratch — an AttentionBrain with multi-head self-attention over a sliding context window. It lives in the kent-ai-dev/engram repo. Training migrated from SaladCloud (preemption issues) to Modal.com (L4 GPU, no preemption) as of April 2026.
Weights stored on Modal Volume (persistent). Vocab: 37,591 tokens (full 13-book + DailyDialog corpus, trained Apr 6). Coherence penalty added in commit 5ae5950.
10 runs (+ pre-history) across ~12 days. Same architecture. Many bugs. Getting closer.
Victorian literature corpus (Frankenstein, Dracula, The Time Machine…). ingest.py wiped ChromaDB on every run — so the server always loaded stale embeddings with similarity 0.00. High-probability tokens were Victorian vocabulary: "knight", "wretchedly", "thence". Model couldn't form a coherent sentence, let alone a conversation.
ingest.py run
Three attempts back-to-back on local Windows VPS (CPU only): full corpus (6.2 MB, 3 epochs), then 20% corpus, then 5% corpus. All timed out. CPU-only training requires 4–12 hours minimum for this architecture. Side effect: ingest.py held Ollama's process lock, blocking semantic search for 30+ hours.
Pre-history weights committed to the GitHub repo as a baseline snapshot. Vocabulary: 7,253 tokens. This became the "stale weights" reference point that all subsequent SaladCloud runs attempted to improve upon. Last known good state before the SaladCloud era began.
First attempt on SaladCloud GPU. Multiple bugs surfaced with the container environment: python:3.11-slim ships without bash (scripts failed immediately), containers crash-looped due to wrong base image, S4 uploads failed with curl exit 56, and the restart_policy was incorrectly set. Several containers launched and died before training could complete.
python:3.11-slim has no bash; wrong base image; bad restart_policy
/bin/sh; moved to pytorch/pytorch:2.5.1 base; S4 curl fallback added; restart_policy set to "never"
Container engram-1774387064 queued with DailyDialog full corpus (6.2 MB, 3 epochs). SaladCloud API key had silently expired mid-session, blocking the launch. A new key was obtained and stored as an environment variable (not hardcoded). As of Mar 24 23:26 UTC the container was still running — no weights URL ever captured.
Several security and infrastructure fixes landed this session: hardcoded API key removed from git history, S4 upload replaced the old 0x0.st fallback, IMDS JWT auth added for SaladCloud S4. Upload succeeded — but the ntfy notification used single quotes: -d 'Training done! WEIGHTS_URL=$UPLOAD_URL'. Single quotes prevent shell variable expansion. $UPLOAD_URL was never substituted. The URL was in S4 but no one knew where.
-d arg wrapped in single quotes → $UPLOAD_URL never expanded
Container engram-1774630566 ran for ~8 hours on SaladCloud GPU. Training completed. S4 upload succeeded. But the ntfy single-quote bug from Run 5 was still present — $UPLOAD_URL never expanded in the notification body. Nobody knew the S4 URL. Container was deleted. Weights were gone.
-d arg (commit c35bd18)
Container engram-1774677568 launched with the ntfy double-quote fix applied. Cancelled shortly after in favour of running the full-corpus run (Run 8) instead of another DailyDialog-only pass. No training loss to report.
Container engram-1774780608 ran the full corpus: 13 books + DailyDialog (~2.5 M words). SSH access established Mar 30. train_runner.py had hardcoded Windows paths (C:\Python314\python.exe) and timed out on Linux. ingest.py ran instead and produced weights — but the container had cloned the repo, which already had those same weights committed from local runs. Checksums matched exactly: no new training had occurred. Weights were manually pushed to S4.
C:\Python314\python.exe hardcoded in train_runner.py
salad_train.py now calls python3 train_runner.py instead of python ingest.py
Container engram-1774884158 on SaladCloud. No confirmed completion — container likely preempted. This was the last SaladCloud-era run before the platform migration.
Container engram-1775425431 launched after cleaning 4 stale containers. Ran for 8.7 hours with 3 GPU node swaps (preemption). The polling script timed out at MAX_TRAINING_WAIT (28800s). An instance was briefly running near the end but no weights were ever produced. This was the decisive failure that triggered the migration to Modal.
First successful training on Modal.com. L4 GPU (24 GB VRAM), no preemption. Full corpus (13 books + DailyDialog, ~2.5M words), 3 epochs. Training completed in ~1 hour — compared to 8+ hours of failure on SaladCloud. All 3 weight files saved to Modal Volume and deployed to the live frontend. Vocab jumped from 7,253 → 37,591 tokens.
However, model output was still word salad — repeating "dissemble", "outlandish", "xxvii". Surprise score ~1.08 (high = random). The 37K vocab was too large for 3 epochs to converge. Led to launching a 10-epoch follow-up.
Upgraded architecture to 8 layers (from 3), 256 dimensions, 5 epochs. Training completed on Modal L4 GPU. Weights downloaded and deployed. Frontend serves vocab=37,591. Output quality is still poor — model produces grammatically-structured gibberish ("drawbridges tzatziki mammiferous"). Eval score: 56.4/100 (passes threshold but semantically meaningless). Deep analysis revealed: data bottleneck (13 Gutenberg books insufficient), single-head attention, contradictory ponder objectives, no gradient clipping.
Full architecture analysis completed. Compared Engram to kent_hologram (Hyperdimensional Computing system). Identified 7 transferable techniques: surprise-gated learning, curriculum training, experience replay, output validation, ventriloquist dual-model generation, salience-weighted loss, and adaptive pondering.
Key finding: The gibberish problem is primarily a data problem. TinyStories (476M tokens, designed for small models) is the recommended unlock. Models under 10M params trained on TinyStories produce coherent stories — Engram is 18.7M params producing gibberish on Gutenberg books.
Next run plan: Fix 4 bugs (gradient clipping, ponder weight, evaluator contradiction, multi-head attention), then train on TinyStories + WikiText-2 curriculum for 20-30 epochs.