The Slack message came in at 9:47 AM: “Hey, why wasn’t I added to the client portal? I signed up yesterday.” My stomach dropped. I checked the database. No record of their signup. I checked the logs. The form submission had arrived. But the webhook that was supposed to create their account? Never fired.

By lunch, I’d heard from three more people. By end of day, I’d identified 12 failed signups over the past week. 12 potential customers who submitted the form, paid the signup fee, and never got access. One asked for a refund. Can’t really blame them.

How It Started

My signup flow was simple. Beautiful, even. User fills out form, webhook fires, account gets created, welcome email sends, money flows in:

app.post('/signup', async (req, res) => {
  const userData = req.body;
  
  await fetch('https://my-api.com/create-account', {
    method: 'POST',
    body: JSON.stringify(userData)
  });
  
  res.json({ success: true });
});

It worked perfectly in testing. It worked perfectly for the first 50 customers. And then it didn’t.

The Ways Webhooks Fail

The first failure was the most obvious: my API was down for maintenance. 15 minutes of planned downtime. During those 15 minutes, three people signed up. Their webhooks hit the offline API, failed, and disappeared into the void. No retry logic. No queue. No fallback. Just gone.

The second failure was timeout. Someone signed up. The webhook fired. But my API was slow that day (I was running a database migration). The webhook timed out after 5 seconds. The submission completed on the front end. The account never got created.

The third failure was the most insidious: the webhook succeeded but the API returned an error. HTTP 200 response, but the JSON body said { error: "email already exists" }. My webhook code didn’t check the response body. It saw 200, assumed success, and moved on.

Building It Right (The Hard Way)

After losing those customers, I spent a week rebuilding the webhook system properly.

First, I added a queue:

import { Queue } from 'bullmq';
const webhookQueue = new Queue('webhooks');

app.post('/signup', async (req, res) => {
  const userData = req.body;
  
  await db.signups.create(userData);
  
  await webhookQueue.add('create-account', userData);
  
  res.json({ success: true });
});

Now submissions got saved immediately. The webhook happened asynchronously. If it failed, the data was still in the database. But queues require infrastructure. I installed Redis. Configured workers. Set up monitoring. Added health checks. Then I added retry logic:

webhookQueue.process('create-account', async (job) => {
  const response = await fetch(apiUrl, {
    method: 'POST',
    body: JSON.stringify(job.data),
    timeout: 10000
  });
  
  if (!response.ok) {
    throw new Error(`Webhook failed: ${response.status}`);
  }
  
  const result = await response.json();
  if (result.error) {
    throw new Error(`API error: ${result.error}`);
  }
}, {
  attempts: 5,
  backoff: {
    type: 'exponential',
    delay: 2000
  }
});

Five retry attempts with exponential backoff. If it fails, wait 2 seconds. Then 4 seconds. Then 8 seconds. Then 16 seconds. Then 32 seconds. This caught most failures. API down briefly? It’ll retry when it’s back up. Slow response? Longer timeout prevents false failures. But I still needed fallback mechanisms:

webhookQueue.on('failed', async (job, error) => {
  await db.failedWebhooks.create({
    data: job.data,
    error: error.message,
    attempts: job.attemptsMade
  });
  
  await sendAlert('Webhook failed after all retries');
});

When everything fails, save it to a failed webhooks table and alert me. At least I can manually process them instead of losing customers. The final system required Redis for the queue, worker processes to handle jobs, a database table for failed webhooks, a monitoring dashboard to track success rates, an alert system for failures, and manual tools to replay failed webhooks. What started as 10 lines of code became an entire subsystem. Hundreds of lines of code. Multiple services. Constant monitoring.

The Thing Nobody Tells You

Even with all that infrastructure, webhooks still failed occasionally. Network blips. API changes. Unexpected response formats. Server restarts during processing. Race conditions. Weird edge cases. I spent more time debugging webhook failures than building actual features.

And every failure was stressful. Because failures often meant unhappy customers. Missing data. Lost revenue. The 2 AM Slack notification became routine: “Webhook failure spike detected.” I’d drag myself to the laptop, check the dashboard, identify the issue, fix it, replay the failed webhooks. This was not how I wanted to spend my time.

Why I Built StaticForm’s Webhook System

When I built StaticForm, I made sure the webhook system handled all the stuff I’d built: automatic retries with exponential backoff, submission storage so nothing gets lost, webhook health monitoring, and failover to email if webhooks fail completely.

More importantly, it just works. I haven’t had a webhook failure since building it. Not one. No 2 AM alerts. No lost customers. No debugging sessions. The submissions always get stored, even if your API is completely down. When it comes back up, StaticForm delivers the backlog. You don’t have to build any of that.

What I Learned

Reliable webhooks are really hard to build yourself. Not “kinda tricky” hard. Actually hard. It requires infrastructure, monitoring, error handling, retry logic, fallback mechanisms, and constant maintenance. Or you can use StaticForm and not think about it.

I wish I’d had StaticForm from the start. Would’ve saved me a week of development time, several sleepless nights, and $5,000 in refunds.

Get 10 free credits to test your form at app.staticform.app. Pay as you go, or buy a plan to save money.