RSS feeds transformation

Updated on 2009-07-25

QuantShare is capable of downloading RSS feeds and converting them into CSV files for easier parsing. This task is accomplished by the downloader and is fully automated. It loads the specified RSS feeds, converts them into CSV content, then applies the CSV parsing and inserting rules to add the data into your local databases.

The RSS feed comes generally in a format like the following one: (This is an example of an element (item) of the yahoo news feed for the Google ticker symbol)

<item>
<title>This Week Dumbest Stock Moves (at Motley Fool)</title>
<link>https://www.quantshare.com</link>
<description>The description</description>
<pubDate>Fri, 24 Jul 2009 19:37:46 Etc/GMT</pubDate>
</item>

After reading and parsing the RSS feed content of the Google stock, the downloader will convert every 'item' tag into a row. The 'item' node children will be separated by a string or a character. (You decide what string or character to use as a separator).

The result will be: (assuming we used a semi-colon ';' as a separator)

This Week Dumbest Stock Moves (at Motley Fool) ; https://www.quantshare.com ; The description ; Fri, 24 Jul 2009 19:37:46 Etc/GMT

The next step will be to define the CSV parser settings, adding columns and assigning a database field to each one of them. For example, in the above line, the forth column must be associated with the 'date' field of the database of your choice. The transformed content is then parsed and referenced database fields are filled.

Common problems and solutions:

How to deal with nodes content that contain more than one line?
Multiple lines nodes’ content will be transformed into a single line text; the 'new line' character will be transformed into the following text '||'. You can write a Pre-Script formula to transform this '||' into a 'new line' character, leave it as it is, or perform any other transformation.
Example: The Google finance RSS feed, that downloads news for stocks, has multiline nodes’ content.

What if a node’s content contains the separator I choose?
The separator you choose is removed from every node’s content. So whatever separator you pick, your content will always be correctly parsed.

What if the description or title contains html tags?
All html tags are removed from nodes' content, only visible text is kept.