On Saturday at approximately midnight GMT (6pm EST), our SSL certificate behind our database instances expired causing an outage across our API stack.
During this brief outage, all systems connected to our database (Activity API and our REST API) were temporarily unavailable until we cycled through our database instances with the updated certificates to bring them back online.
This brief outage was mainly a result of human error—through all of our testing in updating our SSL certificates (over a month ago) in MongoDB, it was very difficult to complete a full end-to-end test of updated certificates without taking the system offline. We believed we had executed the migration of SSL certificates properly.
Unfortunately, we had missed a key item in our migration process that caused API calls to error and for our system to go down for a brief time.
We've added multiple new thresholds for alerts to more quickly resolve an outage like this in the future. Additionally, any time key things like SSL certificates expire we're going to earmark these times and dates and will have key staff on call to ensure proper testing goes into place at the right time.