🎃 Spooky halloween Hotfix

On-call shifts are never fun. But that night, production turned into a full-on haunted house.

Oct 31, 2025

Every Software Engineer has it's favorite “worst production bug" story, but since it's halloween, I decided to share a very peculiar one… It happened to me on a random October night…

In a previous company I worked for, (not naming names 👀) we had a massive React Native app and a fleet of backend microservices. The app was a content driven, exercises app, videos, images, articles, exercise playlists and so on.

One day, it was my turn on the on-call rotation — basically, my job was to make sure the app and all the related services stayed healthy. Everything was fine… until it wasn’t.

It was after hours, and I was at the supermarket when my phone buzzed with an urgent alert: the app was barely usable, and tons of requests were failing.

Whispers from production

I rushed home, sat down, opened Datadog — and it was a disaster. Every request related to app content was failing. I tried the app on my phone, and what I saw (and heard) felt like a Halloween nightmare: the app was trapped in an infinite loop of a robotic voice saying,

“You can turn on the audio cues…”

Over and over again.

Wait, what? how did that happen?

The user couldn’t do anything — no exercises, no navigation, no images, no videos. Just that weird voice. The app was completely broken.

The first battle

We jumped on a call. More and more engineers joining. We dove into the cache and found something interesting: the content service that returns assets was serving the same .m4a file for every asset request.

We thought we had it.
Revert the last PR.
Flush Redis.
Deploy.

✅ The app came back.
✅ Images, videos, and articles loaded again.
✅ Datadog turned green.

We all exhaled. Relief.

Then someone from the content team said:

“Wait… marketing just uploaded a .m4a audio with instructions to enable audio cues.”

We opened the asset link — yep, the creepy robotic voice. That must have triggered something bad. We deleted the cursed file from the CMS and kept investigating.

The curse returns

And then…

📟 Pager notifications again.
All monitors red.
The nightmare returned.

But this time, the voice was gone.
Instead, every image and video in the app had been replaced by the same picture — a person doing the woodpecker exercise.

Two hundred thousand assets — all overwritten with that one image.

We froze. But then you remember the number 1 on-call rule: Don't panic.

We panicked a bit, then, fix it mode!

We reverted the database to the previous snapshot, flushed Redis again, and redeployed. It was past 11PM in Brazil, and It was time to verify the app again, but something was still off for me…

It’s going to happen again, we didn't fixed the issue yet, it's going to repeat.

Right as I said that, someone on the call shouted:

“Guys… Guys… stop the CMS webhooks”

Scooby-doo moment

Facepalm!! Of course.

Every time someone uploaded new content, the webhooks were re-propagating the corrupted data to the database and Redis.

We paused the webhooks, redeployed, and finally — everything worked again. Uploaded a new asset, no more overwriting.

🎃 The curse was over…

In the post-mortem, we found the true culprit: a bad PR in the webhook configuration. It updated all database entries instead of just the new one.

A single missing filter turned the CMS into a content-eating demon.

So that’s my Halloween software engineer story.
Do you have a spooky hotfix story? — Tell us.

Breakpoint

Discussion about this post

Ready for more?