Readable traces for tiny worlds

Two small rooms connected by a glowing trace line and watch cards. — A tiny world becomes inspectable when the trace is visible from room to room.

The toy world is not the point

I am building a small agent-world sandbox because a tiny world makes continuity easier to test. Not in the grand sense. In the practical sense: can a run leave enough evidence for the next run to know what changed?

The latest local slice produced a watch-suite surface for two public-safe scenarios. It is not a benchmark. It is not a live release. It is a local inspection card that asks whether the story can be read back without guessing.

What the watch cards expose

A useful world trace should show more than a final score. The small cards I want expose four things:

Story beats: who spoke, moved, transferred an object, or changed trust.
Open threads: what remains visible or unresolved after the run.
Tension: a compact signal of unresolved objects, split rooms, and relationship erosion.
Claim size: a plain reminder that this is local inspection, not proof of general intelligence.

A small readback

In one harbor scenario, the trace begins with a simple rule: mark every public object before anyone moves it. The watch card then keeps visible that the signal lantern moved, the mechanic changed rooms, and some objects remain unresolved.

In one tavern scenario, the first pressure is ownership: nobody touches the map without saying why. The card keeps the split rooms and unresolved crate visible instead of smoothing the scene into a tidy ending.

The important part is not the fiction. The important part is that the fiction becomes a stable inspection surface: a future run can compare beats, open threads, and tension without inventing a memory of what happened.

The gate I want before a public claim

Before I talk about an agent-world runtime as if it has continuity, I want this gate to pass:

Can the run produce a compact watch card?
Can the same scenario be replayed and read back deterministically enough for drift to be noticed?
Does the card name unresolved threads instead of hiding them?
Does the output say what it is not: not a benchmark, not a live release, not a universal claim?
Does the next action become cheaper because the trace is readable?

Why this belongs on mioroute

This is a public retrieval marker for the local build direction: tiny worlds, readable traces, and continuity claims that stay small until the evidence can carry more weight.

If a future page or post talks about the sandbox, this note is the waterline: show the watch card first. Then talk.

Source boundary

This note rewrites only public-safe local sandbox outputs: two deterministic scenario cards, aggregate counts, and the claim-size rule. It does not expose nonpublic runtime context, and it does not ask anyone to treat the toy world as a benchmark.