Skip to main content

Update to Date-based Entry Ignoring

TL;DR FeedMail will now ignore new items 7 days older than a previously seen item. This is expected to affect almost no "true" new posts.

In theory checking to find new entries for a feed is a simple process.

  1. Download the feed.
  2. Check the ID of each entry to see if you have seen it before.

However the real world is much messier. It is recommended for feed IDs to be URLs (to ensure global uniqueness) however this results in many feeds just using the URL that the article is available at. However these URLs sometimes change, and poorly designed feed generators update the ID of existing entries to the new URL.

From a protocol point of view these are completely new entries, however to a user these are duplicates. In order to reduce the effect of this common issue on our our users FeedMail has some simple mitigations for posts that have recorded published dates.

  1. If the entry is older than a year always ignore it.
  2. If the entry is older than the 10th newest post in the feed ignore it.
  3. If the entry is more than 7 days older than the newest post in the feed ignore it.

Rule number 3 is a new rule to better filter out old entries for infrequent feeds. In many cases this will ensure that you only receive one or two duplicate entries if the feed accidentally changes their post IDs.

One might think that you can simply ignore any article that isn't newer than the newest article, however in practice many feeds post items with the published timestamp slightly out of order. Based on our analysis a 7 day window will almost never ignore new articles while filtering out a lot of duplicates with new IDs.

Comments

Popular posts from this blog

DNS Outage

From 2024-08-26 19:46 to 2024-08-27 11:21 UTC FeedMail had an outage. Until 2024-06-26 20:34 FeedMail was completely down. For the remainder of the outage most emails not sent. It is expected that no feed updates were lost during this outage. Updates would only be lost if they were only present on the feed within the 50min of total outage. Most feeds ensure that updates are present for days so this would not be an issue. Notifications have been delayed and should be sent by 2024-08-27 12:31. This may take longer if your mail provider applies limits and FeedMail needs to retry delivery at a later time. Update : All delayed notifications have been sent successfully. Timeline All times are in UTC . 2024-08-26 19:46 Start FeedMail goes down.   19:53 Detection Automated monitoring reported that feeds were not being checked. 20:34 The Database IP was hardcoded, restoring most functionality. 2024-08-27 11:21 Resolution FeedMail was switched external DNS. 11:24 Schedule of ...

Digests Now Respect Category Filters

Due to an oversight category filters did not apply to digests. This has been corrected and future digests will be filtered by your selected categories. If you do not want this filtering to occur please update your filters to "Ignore selected categories" and deselect all categories to inactivate the filter.

Digests are now Supported for Owner-Paid Feeds

Owner-paid feeds allow feed publishers to provide FeedMail to their subscribers at no cost. For example the FeedMail Blog is an owner-paid feed. Up until now digest subscriptions were not covered by owner-paid plans. Subscribers could select a digest but they would have to pay for the subscriptions themselves. Digests are now fully supported under owner-paid plans. For users: The owner-paid feeds in your digests no longer count towards the cost of the digest. For publishers: Users will now be able to receive your feed as a digest or included in one of their existing digests. You will be charged one credit for each digest issue containing items from your feed (no matter how many items from your feed are in that issue). Notably this cost will never be more than real-time subscriptions would be.