Heroku Status

Current Status and Incident Report

US Database & HTTP Latency issues

Production 2h Development 2h

Follow-up

We encountered a large number of failures of the block storage devices which help power our Heroku Postgres databases in one availability zone. Our team immediately began working to restore availability to affected databases, which was ongoing for several hours. In our followup after the incident we identified several areas where improvements could be made to reduce the time required to restore should future failures occur, these are currently being implemented.

Furthermore, the Heroku status site is typically updated when a large amount of databases are affected, and closed when back at acceptable levels – provided all have a ticket opened with the status of their database if affected. During this incident many direct notifications were not opened correctly creating unclarity for affected databases. We're working to put additional safeguards in place to ensure this does not occur in the future.

Resolved

This issue is now resolved.

Monitoring

Our database engineers are restarting remaining affected database instances. We are continuing to monitor the situation.

Monitoring

Our database engineers have identified all affected databases and are in the process of recovering them. We are monitoring the situation.

Update

We saw a number of customer databases go offline, as well as an increase in ELB latency as some ELB nodes became unhealthy. We did not see any other impact to production applications.

Unhealthy ELBs have recovered now and our database engineers are working on recovering the remaining customer databases.

Issue

Our engineers confirmed issues with EBS-backed instances for a single availability zone. We are currently working with our infrastructure provider in resolving this.

Investigating

We are seeing a several databases in a single availability zone down at the moment. Our engineers are continuing to investigate the impact on production applications.

Investigating

Our automated systems have detected potential platform errors. We are investigating.

← Current Status