Wall of Mistakes

Lessons

Things I've shipped, broken, or over-engineered — and what I changed in my process because of them. Public because hiding mistakes is how you repeat them, and because the only post-mortems worth writing are the ones someone else can read.

012024

A cache key collision quietly served the wrong product page for hours.

Context: Headless WooCommerce storefront with edge caching keyed by product slug. Two products in different categories ended up with overlapping cache keys after a slug normalization change.
What happened: For a window of about four hours, a small percentage of visitors saw the wrong product detail page. The site looked healthy — no 500s, no error spike — until a customer support ticket surfaced it.
What I missed: I trusted the slug as a unique cache key without re-deriving uniqueness after the normalization change. The test suite didn't cover cache-layer behavior, only the response shape.
What changed: I now treat cache keys as a first-class part of any data-shape change. The repo got a small integration test that hits the cache layer with adjacent product IDs and asserts response identity. PR template now has a "Does this change anything cache-keyed?" line.

022024

I called a migration "done" without running it against a production-sized dataset.

Context: WooCommerce-to-headless migration involving a Typesense reindex of ~50k products with custom attribute mapping.
What happened: It worked perfectly on staging (~2k products). On production it OOM'd the indexer mid-run, left the index in a partial state, and search returned half-empty results until I caught it.
What I missed: I optimized for correctness of the mapping logic and assumed scale was a separate, later concern. The staging dataset wasn't representative — it was 4% the size.
What changed: Migrations now have an explicit "scale rehearsal" step: snapshot of production data, run against it in an isolated environment, measure peak memory and time. No migration ships without that rehearsal artifact in the PR.

032025

A "small" CSS rollback removed the entire add-to-cart button on mobile.

Context: WooCommerce theme on a client storefront. A visual regression PR was reverted to "unblock release" without re-running the visual diff after revert.
What happened: The revert touched a shared utility class. Mobile add-to-cart was display:none for a few hours overnight. Conversion dropped sharply before the team caught it the next morning.
What I missed: I treated 'revert' as a safe operation. It isn't — a revert is just another commit, and it deserves the same visual-regression gate as the original PR.
What changed: Reverts now go through the full CI gate, including visual diff. I'm also more willing to ship a forward fix than a revert when a revert touches shared code.

042023

I built a "flexible" hook system that no one used and I had to maintain.

Context: Early version of an internal plugin. I designed a pluggable filter system anticipating future extension points.
What happened: A year later, the filter system had zero external consumers. It had two: me, calling it from inside the plugin. The indirection made every change harder to trace.
What I missed: I optimized for hypothetical future requirements. I'd convinced myself flexibility was free — it isn't, you pay for it in readability, debuggability, and onboarding cost.
What changed: I now wait for the third real use case before extracting an abstraction. Two looks like a pattern but is usually a coincidence; three is when the right shape becomes obvious.

More context on how I approach this work lives in the principles on the About page.