Skip to content
Alan Regaya

Wall of Mistakes

Lessons

Things I've shipped, broken, or over-engineered — and what I changed in my process because of them. Public because hiding mistakes is how you repeat them, and because the only post-mortems worth writing are the ones someone else can read.

012024

A cache key collision quietly served the wrong product page for hours.

Context
Headless WooCommerce storefront with edge caching keyed by product slug. Two products in different categories ended up with overlapping cache keys after a slug normalization change.
What happened
For a window of about four hours, a small percentage of visitors saw the wrong product detail page. The site looked healthy — no 500s, no error spike — until a customer support ticket surfaced it.
What I missed
I trusted the slug as a unique cache key without re-deriving uniqueness after the normalization change. The test suite didn't cover cache-layer behavior, only the response shape.
What changed
I now treat cache keys as a first-class part of any data-shape change. The repo got a small integration test that hits the cache layer with adjacent product IDs and asserts response identity. PR template now has a "Does this change anything cache-keyed?" line.
022024

I called a migration "done" without running it against a production-sized dataset.

Context
WooCommerce-to-headless migration involving a Typesense reindex of ~50k products with custom attribute mapping.
What happened
It worked perfectly on staging (~2k products). On production it OOM'd the indexer mid-run, left the index in a partial state, and search returned half-empty results until I caught it.
What I missed
I optimized for correctness of the mapping logic and assumed scale was a separate, later concern. The staging dataset wasn't representative — it was 4% the size.
What changed
Migrations now have an explicit "scale rehearsal" step: snapshot of production data, run against it in an isolated environment, measure peak memory and time. No migration ships without that rehearsal artifact in the PR.
032025

A "small" CSS rollback removed the entire add-to-cart button on mobile.

Context
WooCommerce theme on a client storefront. A visual regression PR was reverted to "unblock release" without re-running the visual diff after revert.
What happened
The revert touched a shared utility class. Mobile add-to-cart was display:none for a few hours overnight. Conversion dropped sharply before the team caught it the next morning.
What I missed
I treated 'revert' as a safe operation. It isn't — a revert is just another commit, and it deserves the same visual-regression gate as the original PR.
What changed
Reverts now go through the full CI gate, including visual diff. I'm also more willing to ship a forward fix than a revert when a revert touches shared code.
042023

I built a "flexible" hook system that no one used and I had to maintain.

Context
Early version of an internal plugin. I designed a pluggable filter system anticipating future extension points.
What happened
A year later, the filter system had zero external consumers. It had two: me, calling it from inside the plugin. The indirection made every change harder to trace.
What I missed
I optimized for hypothetical future requirements. I'd convinced myself flexibility was free — it isn't, you pay for it in readability, debuggability, and onboarding cost.
What changed
I now wait for the third real use case before extracting an abstraction. Two looks like a pattern but is usually a coincidence; three is when the right shape becomes obvious.

More context on how I approach this work lives in the principles on the About page.