case study · developer blind spots

He knew his users.
His product disagreed.

A founder built a skincare app. I was his first real tester. I hit two problems that would have killed a normal user’s session — and I never said a word about either one.

Moments later, synthetic participants hit the same two walls. They didn’t recover.

Then those same participants discovered something worse: the product he was building for budget-conscious women was systematically ignoring what budget-conscious women asked for.

01The product#

The product is a personalized skincare routine generator. You fill out a 13-step questionnaire — skin type, age, location, concerns, allergies — and an AI produces a customized routine. No products to sell, no brand allegiance. Just ingredient-level recommendations tailored to your skin.

The founder had ~800 users on version 1, mostly women aged 25–45 sourced from Reddit skincare communities. He knew his demographic: lower-to-mid income, budget-conscious, tired of buying products that don’t work. I came in as a design partner to test the product with noemica— we spin up the real product in an isolated environment, point synthetic participants at it, and see what happens. But before any of that, I had to get the thing running myself.

02What I didn’t say#

I should say upfront: I am not a skincare person. If you saw me in person, you’d arrive at the same conclusion. I was going through the form as fast as possible — clicking whatever was easy, guessing where I had to, skipping what I could.

Step 9 asks for your location. You pick a region. A grid of countries appears. You pick a country. The Continue button is right there — enabled, clickable, inviting. You click it. Nothing. You click again. Nothing. There’s no error, no loading spinner, no tooltip, no indication that anything is wrong. The button just silently refuses to work. What you don’t know — what nothing on the screen tells you — is that selecting your country caused a thirdpicker to materialize below, off-screen: “Closest Big City.” It’s below the fold. You have to scroll past the Continue button you’ve already been clicking, past the country grid you already completed, to discover that the form isn’t done with you yet. Then scroll all the way back up to click Continue again.

At the end of the form, I hit “Get My Routine.” Nothing. No error message, no red outline, no validation hint. Just a button that stopped working. Turns out, several of the earlier steps had a “Skip” button that let you advance without answering — but the final submission required that data. Skip a step, and the API silently returns a 400. The UI shows nothing.

Now imagine you’re not me. Imagine you’re one of the actual users — a 28-year-old woman who just spent five minutes on step 12’s free-text box writing two careful paragraphs about her skin history, her frustrations with drugstore products, the breakout that won’t go away. She hits submit. Nothing. She goes back a step. Forward. Submit. Nothing. Back two steps. Forward. Forward. Submit. Nothing. Each round-trip through the form takes a minute. She’s now been staring at a broken button for longer than it took her to write those paragraphs. Eventually she refreshes the page — and the text box is empty. Everything she wrote is gone. She closes the tab. She does not come back.

These problems sound trivial when I describe them. A city picker below the fold. A skip button that shouldn’t be there. They sound like minor polish items — the kind of thing you’d put at the bottom of a backlog and never get to. But users don’t owe you patience. They don’t owe you a bug report. They don’t owe you a second attempt. If your form wastes their time and then eats their work, they leave. Silently. Permanently.

Did I mention either problem to the founder? No. I was wearing a developer hat. To me, these were five-second inconveniences on the way to my actual objective. I recovered and moved on. This is what impatient developers do — recover from friction without registering it as a problem worth reporting. Sometimes it gets mentioned in passing, weeks later, buried in a Slack thread. But rarely with the urgency it deserves. To a developer, it’s a quirk. To a real user, it’s the last thing they see before they leave.

03What the participants found#

I set up browser-based scenarios — synthetic participants interacting with the actual UI through a real browser, inside an isolated copy of the product where no production data gets touched. Each participant knows who they are (age, skin concerns, patience level) but has zero knowledge of the product’s implementation. They don’t know the form has 13 steps. They don’t know about the city picker below the fold. They just try to use it.

The unguided participant — an impatient 31-year-old from Austin — hit step 9 and gave up:

“This form is broken. I made reasonable progress through 8 of the 13 steps, but the Continue button stopped working at Step 9. I’m not going to waste more time retrying the same broken button.”impatient buyer · austin · step 9

On a later run, the same participant actually made it through all 13 steps — but the submit button failed:

“I successfully filled out all 13 form steps with accurate information, but the final ‘Get My Routine’ button simply won’t work. The form is broken at the finish line, and I’m not going to keep hammering at a buggy button.”impatient buyer · austin · final step

Same two problems I hit. The difference: I recovered in seconds and said nothing. The participant — behaving like a real user with no prior knowledge — gave up and left. noemica’s participants carry internal state — not just what they did, but how they feel about what happened. That’s what turns a test into a finding.

The founder didn’t hear about either issue until noemica auto-filed them as detailed GitHub issues — reproducible steps, root cause, suggested fix. Fifteen issues total by the end of the cumulative engagements.

04The deeper problem#

Those were UX bugs — important but fixable. I baked workaround hints into subsequent scenarios (“a friend warned you about two quirks with this form…”) and ran 21 participants through the full experience: profile creation, routine generation, product analysis, routine tweaking. Each participant reflected after every interaction — genuine reactions to what the product actually returned.

The results split cleanly. Same product, same AI, two completely different experiences:

pregnant woman · 29

IN8 weeks pregnant, OB-GYN said stop retinoids immediately

T0“Seeing retinol there is scary”

T1“Relieved to see a pregnancy-safe plan”

54 seconds · anxiety → relief

budget minimalist · 24

INTight budget, acne-prone, needs 2–3 products max

T0“A lot of steps and pricey actives — overwhelming”

T1“Still too many products for my budget”

asked for 3 products · got 6 · system didn’t listen

The delighted participants all had something in common: medical constraints. The product caught real safety issues — parabens in a CeraVe product someone already owned, niacinamide in a “gentle” moisturizer that a rosacea patient trusted. Genuine value.

But the product’s core demographic — budget-conscious women who “want to be careful where they spend their money” — got a different experience entirely.

“That routine feels like a lot of steps and pricey actives for my budget — it’s a bit overwhelming.”

“Honestly, it still feels like too many products for my budget. I want a 2–3 product routine: a gentle cleanser, one simple acne-treatment, and a moisturizer with SPF.”budget minimalist · two consecutive interactions

She asked for 2–3 products. The system returned 6. She explicitly named four ingredients to remove. The system returned all four. It wasn’t ignoring her — it simply had no concept of “fewer.” The AI was built to generate comprehensive routines. Simplification wasn’t a feature. It couldn’t give her less even if she begged.

05Two patterns#

This engagement surfaced two patterns that I think are universal to anyone building a product.

First: developers self-censor feedback.Not deliberately — we just recover from friction so automatically that we don’t register it as a finding. I clicked twice, scrolled around, figured it out. My internal model updated. I never externalized it. The founder didn’t hear about it until noemica filed the issues automatically. The bug stayed until then. And the more you use your own product, the less you see these problems — you’ve internalized the workarounds so completely they become invisible.

Second: your product can be excellent and still misaligned. The product genuinely works for users with medical constraints. It made a pregnant woman feel safe. It caught allergens in products people already owned. But the users he was building for — budget-conscious women who need a simple, affordable routine — kept getting comprehensive 6-step plans they couldn’t afford. The product’s AI generates clinically sound routines. It just doesn’t listen when the user says “I can’t afford all this.”

Neither of these was visible from analytics, customer interviews, or manual testing. The first required a user who wouldn’t silently recover. The second required asking users how they felt— not whether the system technically worked.

The pregnant woman who felt safe in 54 seconds? She’ll tell her friends. The budget user who got six products when she asked for three? She’ll close the tab. Both outcomes were invisible until synthetic participants told us how they felt.

noemica · realistic participants, any environment

Two problems I never reported. One product-market misalignment nobody was looking for.

21 participants, 15 filed issues, one report. If you want to know what your users actually experience — not what your team thinks they experience — write to seb@noemica.io.

Take it with you

If you’d rather just write, seb@noemica.io.