After that Crash - I Built an External Memory

Ansari

Ansari

Category : Development

Time to read : 6 mins

By someone who has grepped journalctl at 2:47 AM and questioned every life choice that led to this moment

The Problem

If you've ever owned a server a VPS, a homelab NAS, a "it's just a side project" Docker stack you know the ritual:

  1. Something breaks at an hour reserved for sleep or regret.

  2. You SSH in, heart rate elevated, coffee optional, dignity already gone.

  3. You run journalctl -f, grep, tail -f, maybe dmesg if you're feeling spicy.

  4. You scroll through fourteen thousand identical lines until your eyes glaze over.

  5. You fix the symptom. You go back to bed. You forget everything by morning.

The logs knew what happened. You didn't. And next week, when nginx throws 502s again, you're back at step 3 like none of it ever occurred.

I've had ollama eat my RAM, compositors crash for reasons that rhyme with "Wayland," and systemd units fail in ways that would make a therapist take notes. One thing became clear:

grep is a search tool, not a memory. And I am not a fk*** elephant.

The Idea

At some point I stopped pretending I'd mentally index every Connection refused between Tuesday and Thursday.

I wanted something that:

  • Watches logs continuously stdin, journald, tail, whatever's screaming

  • Actually remembers them in a structured way, not just appends to a flat file I'll never read

  • Lets me ask questions later like a calm, well-rested version of myself

That's aftermath. Not during. Not preventing. After. Because that's when you're actually motivated to build tooling.

How It Works

The pipeline is embarrassingly sensible and simple, once you say it out loud:

  1. Watch — tail logs from wherever the pain originates.

  2. Batch + dedup — group lines, squash repeats, don't ship the same ollama.service: CUDA OOM seventeen hundred times.

  3. Enrich — wrap batches with timestamps, hostname, error counts. Give the downstream brain something to chew on.

  4. Remember — ship to Cognee, which builds a temporal knowledge graph using Ollama for embeddings and completion.

  5. Investigate — when disaster strikes again, ask a question in plain English.

journalctl -f -p err | ./aftermath watch --dataset systemd
 
./aftermath investigate "why nginx returning 502 since 3pm?" --dataset systemd
./aftermath investigate "what changed before the compositor crash?"

You pipe logs in. An LLM on your own machine turns them into a graph. Later, you interrogate the graph like it's a witness who was awake the whole time.

The Dedup Layer

Here's where it gets personal. Noisy systems don't fail once. They fail rhythmically. Without dedup, you'd feed every repeat into Cognee, burn through your embedding quota, and melt your GPU fan bearings. So aftermath fingerprints log lines strips timestamps, hashes the message and suppresses repeats inside a sliding window. When something repeats enough times to be statistically insulting, you get a summary instead:

[aftermath dedup] ollama.service:model runner (+847 repeats,14:02:11–14:18:44)

One line. Same information. Ollama doesn't have to read the same tragedy 847 times.

There's also a --min-remember-interval flag for when your logs are especially chatty and you need to tell the remember pipeline to chill out. Rate limiting: not just for APIs, also for your own trauma.

How I Built It

  • "Go" because I wanted something that compiles to one binary and doesn't need a runtime lecture.
  • "Cognee" because "temporal knowledge graph" sounds fancier than "I dumped logs into a vector DB and prayed"

The CLI has two commands. That's it. I'm not building a platform. I'm building a shovel. watch - batches, deduplicates, enriches, and calls Cognee's /api/v1/remember. investigate - hits /api/v1/search with GRAPH_COMPLETION and prints whatever the graph thinks happened.

Does It Work?

Yes, for the kind of incidents where:

  • Something broke after a sequence of earlier warnings you didn't notice

  • The logs actually contain signal, not just [info] still alive

  • You're willing to run Cognee + Ollama locally like a civilized homelab goblin

I've pointed it at journald for ollama.service, tailed syslog, piped one-off demo errors through stdin. When things go sideways, investigate returns a causal-ish narrative instead of me manually correlating timestamps across three terminal panes.

No, it's not foolproof. It won't:

  • Fix your nginx config

  • Stop you from docker compose up on a machine with 4GB RAM

  • Replace a real observability stack if you're running production at scale

  • Guarantee the LLM won't confidently hallucinate a root cause that sounds plausible but is fiction

Smarter operators use Datadog, Grafana Loki, proper tracing. I use those too, sometimes. But for the homelab, the side project, the "why did Hyprland die when I plugged in a USB hub" investigation : this hits different.

The lazy 3 AM version of me? It crushes that use case. And since incidents are automated by physics and bad config, they usually come back. Having memory helps.

Trade-offs and Caveats

  • You need Cognee and Ollama running. This is not curl logs.txt | magic. There's setup.

  • LLM answers are probabilistic. Verify before you systemctl restart the wrong thing.

  • Dedup can hide nuance. If two different failures share the same message shape, they get fingerprinted together. Tune your window.

  • You pay in GPU time and disk, not dollars unless your electricity bill counts, in which case, sorry.

  • It's post-mortem tooling. The crash already happened. You're reconstructing the crime scene, not preventing the murder.

But for my use case, stopping the "what even happened last Tuesday" spiral it was more than worth it.

Conclusion: Fighting Amnesia With a Knowledge Graph

You won't find "pipe journald into Cognee and ask qwen2.5 what went wrong" in the SRE handbook. It's a bit rogue. A bit overkill. A bit exactly the kind of thing you build at 1 AM after the fifth OOM kill.

If you're a developer tired of being the only storage layer between your server and institutional memory, maybe it's time to outsource remembering to a graph that was actually awake when it happened.

The crash is over. Welcome to the aftermath.

Love this article? 🤍Check out what else I write about