Skip to main content

Update to Date-based Entry Ignoring

TL;DR FeedMail will now ignore new items 7 days older than a previously seen item. This is expected to affect almost no "true" new posts.

In theory checking to find new entries for a feed is a simple process.

  1. Download the feed.
  2. Check the ID of each entry to see if you have seen it before.

However the real world is much messier. It is recommended for feed IDs to be URLs (to ensure global uniqueness) however this results in many feeds just using the URL that the article is available at. However these URLs sometimes change, and poorly designed feed generators update the ID of existing entries to the new URL.

From a protocol point of view these are completely new entries, however to a user these are duplicates. In order to reduce the effect of this common issue on our our users FeedMail has some simple mitigations for posts that have recorded published dates.

  1. If the entry is older than a year always ignore it.
  2. If the entry is older than the 10th newest post in the feed ignore it.
  3. If the entry is more than 7 days older than the newest post in the feed ignore it.

Rule number 3 is a new rule to better filter out old entries for infrequent feeds. In many cases this will ensure that you only receive one or two duplicate entries if the feed accidentally changes their post IDs.

One might think that you can simply ignore any article that isn't newer than the newest article, however in practice many feeds post items with the published timestamp slightly out of order. Based on our analysis a 7 day window will almost never ignore new articles while filtering out a lot of duplicates with new IDs.

Comments

Popular posts from this blog

Delivery Delays to Gmail

In the past 48 hours Google has started delaying the delivery of some FeedMail notifications. This is currently affecting about 10% of messages to Gmail users. These notifications will be resent with a delay. We also speculate that some notifications will be marked as spam.   Update : As of 2023-05-09 this appears to be resolved. If You Are Affected If you use Gmail you may be affected by this. Notifications may be delayed or marked as spam. If your notifications are marked as spam you can create a filter to avoid this. Use "from:*@feedmail.org" as the rule and select " Never send it to Spam". If your notifications are delayed we are unaware of any action that you can take. However marking notifications that ended up in your spam folder as "Not Spam" may help avoid future delays.  It does appear that these emails are eventually being accepted but we are unsure if that means that they are actually ending up in users' mailboxes (or even their spam folder

No body notification option.

Previously FeedMail provided two options for notification bodies: Content from the feed. Attempt to scrape content from the linked website. A third option is now available which includes no content in notifications. This can be useful to reduce email size or if you prefer reading articles in your browser anyways. This option is especially useful for digests. By setting subscriptions in a digest to "No Content" you can get a headlines-only digest. Or you can keep content for some short content like micro blogging but just get headlines for news sources with longer articles. Simply go to your subscription management page to change this setting. You can find a link at the bottom of each email notification.

Digests Leave Beta

Thanks everyone who has helped evaluate digests over the past weeks. All of the blocking issues are now resolved and we will be releasing them soon. Once digests are officially released there will be links to them from the FeedMail site and pricing information added to our homepage. Price Increase Part of the purpose of the beta was to evaluate the cost of providing digests and see how they would be used. We have decided upon final pricing which we hope will be sustainable for years to come. Digests issues will cost 1 credit per 5 feeds. Note that this is feeds included in an issue , not total feeds that target a particular digest. It also does not matter how many new items a feed has. So if you have a digest with 200 feeds configured but this morning's issue only has new items from 2 of them it will cost 1 credit. If 14 feeds update the next day that issue will cost 3 credits. If the day after has no updates it will cost nothing. This new pricing takes effect no earlier than 202