Drupal Aggregator

The default Aggregator Drupal module does not work very well. There are several problems with the Drupal Core module, one of which we have not fixed in our version (i.e. the flatness of the item table.)

There is a list of the known issues and our comments and whether we fixed the problem:

Problem Solution in m2osw's version of Aggregator
Missing XML marker The <?xml ... ?> marker is missing from some RSS feeds, add it as required
Spurious data Some RSS feeds add spurious data before or after the <?xml ... ?> or the closing tag </rss> (the only valid thing after the closing root tag are comments.)
XML Stylesheet The XML parser used does not like XML stylesheets; we remove them before parsing.
XML entities RSS feeds include all sorts of characters that are not valid UTF-8 characters. We attempt to fix all those that we can fix. Most often we find the Windows 1252 apostrophe and different types of quotes.
Lone Ampersand It is frequent to find a lone ampersand, instead of &amp;. We fix those so the output becomes a valid &.
Date timezone If we find "UT" instead of "UTC" in a date, change it. This happens and would completely fail parsing of the date.
Empty XML files Some RSS feeds return an empty file instead of nothing when there was no additions since last check.
Large number of Blocks In Core, each RSS feed added to the Aggregator adds a corresponding block. With this version, you must turn on a flag or the block is not made available, making the list of blocks a lot shorter if you have a really large number of RSS feeds.
Open in the same window By default, the Core Drupal aggregator creates links that open all the destinations in the same window. Our version opens the links in a new window, meaning that your website stays open when your users visit that destination link.
Lack of logs We enhanced the log capability by adding some watchdog() calls and also adding parameters so you know the name of the RSS feed that generated problems.
Limit the GUID The database limits the item GUID to 255 characters: VARCHAR(255). We clamp that parameter to make sure we don't get an error (MySQL auto-clamps.) Although this could cause problems with duplicates, it doesn't seem to ever happen.
Limit author and title fields Like the GUID, the author and title fields are limited to 255 characters. These two fields are clamped when saved in the database.
Deleting old items Somehow, the Core Aggregator module doesn't properly get rid of older items. We palliate to that problem by adding a loop over all the existing items. This can be slow the first time if you already have many items in your database. Afterward, it should not be that bad since it only deletes the few old items.
Hidden feed identifier With Core, the feed identifier can only be known by editing the feed. It is at times practical to know the identifier (i.e. to use it in insert_view) without having to click on edit or at least look at the edit link in your status bar. We added an ID column.
Default update time The default update time is set to 1h which seems extremely frequent to us. Most RSS feeds do not get updated more than once a day (when not once a week!) So by default we set the update time to 1 day.
Never delete items Support the concept of never deleting anything from your site. Although this is not recommended (it grows real fast to a size that makes your site slowdown...) it works. Choose the option "Never" in the Delete every feature.

We also added support for transforming RSS items into nodes. There are many features in that regard. This is done at the time RSS items are retrieved, and for sites that already had many items, it can be applied to existing items via CRON.

Finally, we included an attempt in using Taxonomies instead of the usual Categories (only in the nodes created by the module.) At this time, the taxonomy items must be name the same as the aggregator categories.

You can download our version by clicking on the tarball below.

IMPORTANT NOTES

If you find any problem, don't hesitate to post a comment below, email us, or register and post a ticket. Thank you.

You can also have a look at Drupal issue #350667.

AttachmentSize
aggregator-6.20-m2osw-1.1.tar_.gz23.19 KB
aggregator-6.20-m2osw-1.0.tar_.gz23.12 KB