Form responses delay
Incident Report for Typeform
Postmortem

The following is the postmortem regarding the "Form responses delay" incident.

Late Tuesday night, 26th of August

An issue prevented the Typeform platform (specifically the processes dedicated to inserting the responses into our database) to operate correctly after failing to make an request into an separate service that had a big usage load. After the failing request, this part of the platform failed to restart and begin processing messages again correctly. Thanks to our backup strategy and a restart of the service, the issue was fixed, the backlog started processing and we began to work on a permanent fix for the issue.

Thursday night, 28th of August

Before the fix was applied, our results queue had delay again after another failed restart due to misconfiguration of our alarms for system status, our response took longer than desired, but the issue was soon fixed again and the response backlog resumed processing.

Friday, 29th of August

During Friday midday, the fix was applied. We're still monitoring the service to ensure that everything is working as expected.

Further explaination

These two incidents were due to the same issue, where a request to a separate service failed due to an already huge load and put a component of the platform, called workers, in a state where it couldn't recover and continue to process responses into the database. Therefore, all responses were saved into our backup database, passed into a message queue for processing, but not inserted into our main database.

This also led to a backlog of unprocessed messages. Once the the worker process was restarted, it began processing the backlog. After ~30 minutes, Typeform was back to normal processing rate.

We followed our standard backup plan and stored all the responses in a separate server to ensure that no responses are lost, no matter what goes wrong.

We've taken appropriate steps to improve the handling of the request which created the delay. The bug with erroneous restart have been fixed. We've also improved our alarming system to notify us sooner and with more information, in case of any future problems.

For us, the most important thing is to keep your responses safe and make sure we can always present them to you, our users. At this time, we failed in our mission and are taking extra measures to improve our service for today, and for the future.

If you have any questions and/or feedback, you're welcome to reach out to our customer support team, who will always respond to you in a timely manner, at the following emails:

English: support@typeform.com

Spanish: soporte@typeform.com

Visit our Helpcenter for more information: http://helpcenter.typeform.com/hc/en-us

Or submit a support request directly here: http://helpcenter.typeform.com/hc/en-us/requests/new

Thanks and our apologies for any inconvenience caused.

Posted Sep 01, 2014 - 12:40 CEST

Resolved
This incident has been resolved.
Posted Aug 29, 2014 - 16:57 CEST
Monitoring
We were experiencing a delay in getting form responses to users' result tables due to a processing bottleneck in the results queue. The issue is now fixed and results are back to arriving in real time. We will share a postmortem soon.

Please note that every response collected on Typeform is always backed up. So rest assured that there was no risk of any data loss at any point.
Posted Aug 29, 2014 - 14:29 CEST