noemica · field reports

What real participants find when nobody is looking.

Notes from the field. Real participants, real environments (websites today, retail clones and product lines next), the leaks they surface, and what it cost the businesses on the other side of the screen.

case study · noemica on noemica

A coding agent improved noemica autonomously across 25 iterations.

An autonomous coding agent edited noemica’s codebase, ran studies against each new build, read the verdicts, and decided the next change. Twenty-five iterations across three phases, with every run ID linked.

May 2026·  read →

field notes · primer

An engineer’s primer for autonomous-improvement loops.

The four parts of an improvement loop, the two design decisions that determine whether it converges, the three failure modes you will hit, and when a unit test is the better tool. Five minutes, end to end.

May 2026·  read →

argument · on agent autonomy

What the autonomous loop got wrong (and what the operator caught).

Three structural failures from one autonomous-improvement experiment. None were capability gaps in the agent; each was cleanup the human had to do. The receipts, and a checklist for the next person.

May 2026·  read →

framing · ML perspective

An autonomous UX-improvement loop, reframed as RL.

An LLM-driven UX-improvement loop mapped onto the canonical RL diagram. Five components, one load-bearing reward signal: user-experience feedback at code-review cadence. Includes a worked reward-shaping example with real numbers.

May 2026·  read →

field receipts · 5 vignettes

Five defects only real participants found.

Five case files from the meta-study. In each, a real participant ran the product end to end and produced evidence that exposed a defect. None of them would have been caught by a unit test, integration test, or e2e test.

May 2026·  read →

field report · four sites

Four sites. Four ways money was leaving the table.

A taxonomy of revenue defects, sorted not by industry or size, but by the shape of the leak. Each one was caught by watching real participants try to do real things on a live site.

May 2026·  read →

field report · DTC fashion

The clothes were ready. The experience wasn’t.

Nine shoppers walked into lane201.com with money in hand and Mother’s Day on the calendar. Two left empty-handed for two specific reasons.

May 2026·  read →

field report · YC startup

Two bugs that compiled. One in code, one in perception.

A YC-backed startup shipped two defects to production. One trapped a user in an auth loop. The other made the primary CTA invisible to 10 of 10 visitors.

May 2026·  read →

field report · ICP test

What your site quietly disqualifies.

Not a UX bug report. An ICP test. Two participants on the same B2B site, opposite paths. The vocabulary gap your funnel can't see is filtering customers out before they convert.

May 2026·  read →

incident report · DTC skincare

The form submitted. The content didn’t.

A teenage participant filled out the skincare quiz with intent. The page rendered. The recommendations didn’t. No alert tripped, no event fired.

May 2026·  read →

build story · chrome extension

Reverse-engineering Claude's browser extension.

The official extension blocks 58 domains across 11 categories. So I reverse-engineered the whole thing from the tool schemas up. 2,200 lines, 7 commits, 6 production bugs I didn’t expect.

April 2026·  read →

case study · developer blind spots

He knew his users. His product disagreed.

I hit two problems that would have killed a normal user’s session. I never said a word. Then synthetic participants found something worse.

March 2026·  read →

case study · vibe coding

I vibe-coded an agent and didn't know what it couldn't do.

I told Claude to give my agent calendar capabilities. It did — minus the ability to invite anyone to a meeting. I found out by accident.

March 2026·  read →

More reports as they ship. If you have a surface you would like watched the same way, write to seb@noemica.io.