Skip to main content

FeedMail was Down

FeedMail was offline for 26 minutes. During this period the website was unavailable and feed updates were not sent.

This outage was caused by our CoreDNS resolver failing. While FeedMail continued operating normally for a while as most operations such as feed fetching and mail sending don't rely on the Kubernetes DNS server FeedMail does use the Kubernetes DNS server for a few operations such as connecting to it's own database. When database connections needed to be refreshed the DNS resolution failure caused FeedMail to become unhealthy and it was unable to continue operation.

Timeline

All times are in UTC.

13:28StartFeedMail goes down. Website is offline and feeds are not being checked.
13:32DetectionAutomated monitoring reported that the FeedMail website was unavailable.
13:38
Automated monitoring reported that feeds were not being fetched.
13:42
Kubernetes cluster update was started.
13:53MitigatedFeedMail was restored to operation. The website was again available and feeds started being checked.
13:54ResolvedAll feeds were checked and mail was sent.
Note that WebSub updates that fired during the downtime may take slightly longer to appear as the server will select the retry interval.

Analysis

CoreDNS was returning 503 to its readiness healthy check and had the following message repeated in its logs. 

plugin/ready: Still waiting on: "kubernetes"

No changed had recently been made to CoreDNS. Restarting CoreDNS did not help.

This incident was resolved by updating Kubernetes. This update was announced earlier in the day and we were planning on waiting a few days to apply it in case any bugs were found and fixed in the new version. Instead it was decided to apply it immediately to reconfigure CoreDNS or the Kubernetes API server to a working state. This was a risky maneuver but since FeedMail runs on a managed Kubernetes cluster we don't configure CoreDNS ourselves so it seemed safer than manually tweaking settings, especially since the true issue may have been with the Kubernetes API server.

What Went Well

  • Monitoring quickly detected the issue.
  • The service quickly and gracefully recovered once DNS resolution was restored.

What Went Poorly

Nothing.

Where We Got Lucky

  • The Kubernetes update was released only hours before fixed the issue.
    • If it didn't or hadn't been released we would have had to file a service request which likely would have taken longer.

Action Items

At this time we don't except to take any action. This downtime is within our reliability targets. The cost to resolve this issue is not deemed worth it at this time.

One mitigation would be to run multiple Kubernetes clusters. This would give us software version and geographical isolation. However this would increase operational complexity as well as costs. Another option may be to run more instances of CoreDNS but this is managed by our provider so we would prefer not to customize it at this time.

One last option would be to override DNS settings and use our own DNS resolvers for all operations. This is something that we will continue to revisit in the future.

Comments

Popular posts from this blog

Digests are Coming

Up to this point FeedMail has only supported real-time notifications. Meaning that every feed update immediately produces a single email. However this is about to change! When we asked for feedback on the features you would like to see in FeedMail we had a number of users reach out saying that they wanted a way to batch notifications together. We saw two main reasons for this: To reduce noise in their inbox. For some high-volume feeds users wanted to be able to quickly skim, then delete the entire batch in one go. While deleting one-by-one offers more flexibility, the bulk option is easier for high-volume feeds. To reduce costs. While we believe that our prices are incredibly reasonable, they can add up if you are getting lots of updates. For example if you follow a feed that updates every 15min that will be about $35 a year (or half price if you buy your credits in bulk). Not super expensive but maybe more than you want to spend for a single feed! Digests provide and option for cost

Digests Leave Beta

Thanks everyone who has helped evaluate digests over the past weeks. All of the blocking issues are now resolved and we will be releasing them soon. Once digests are officially released there will be links to them from the FeedMail site and pricing information added to our homepage. Price Increase Part of the purpose of the beta was to evaluate the cost of providing digests and see how they would be used. We have decided upon final pricing which we hope will be sustainable for years to come. Digests issues will cost 1 credit per 5 feeds. Note that this is feeds included in an issue , not total feeds that target a particular digest. It also does not matter how many new items a feed has. So if you have a digest with 200 feeds configured but this morning's issue only has new items from 2 of them it will cost 1 credit. If 14 feeds update the next day that issue will cost 3 credits. If the day after has no updates it will cost nothing. This new pricing takes effect no earlier than 202

Update to Date-based Entry Ignoring

TL;DR FeedMail will now ignore new items 7 days older than a previously seen item. This is expected to affect almost no "true" new posts. In theory checking to find new entries for a feed is a simple process. Download the feed. Check the ID of each entry to see if you have seen it before. However the real world is much messier. It is recommended for feed IDs to be URLs (to ensure global uniqueness) however this results in many feeds just using the URL that the article is available at. However these URLs sometimes change, and poorly designed feed generators update the ID of existing entries to the new URL. From a protocol point of view these are completely new entries, however to a user these are duplicates. In order to reduce the effect of this common issue on our our users FeedMail has some simple mitigations for posts that have recorded published dates. If the entry is older than a year always ignore it. If the entry is older than the 10th newest post in the feed ignore it.