The downloader plug-in downloads CSV, text,
excel or RSS data from any website, parse it, transform it, and then import it
to your databases.
N.B. the downloader plug-in use the ASCII
Importer engine to parse the downloaded data.
Each download item is associated
with one or sereral URLs, and each URL can contain any number of
fields.
If for example you want to import stock quotes for the server
'www.example.com', and the complete URL to get stock quotes for the symbol
"goog" is "www.example.com/ quotes.php?id=goog".
In addition, you have to
login first using the following URL
"http://www.example.com/login.php?user=xx&pass=yy" before being able to
download any quotes.
Given this example, your download item, should
contains two URL:
1 – "http://www.example.com/login.php"
2
– "http://www.example.com/quotes.php"
The first URL should contain two fields, one
for the login and another one for the password.
The second URL should contain
a field named 'id', you have to set this field's type to 'Symbol'.
Open the download manager by clicking on
'Download' in the menu bar, then 'Download Manager'.
Click on 'Add', in the
download manager.
'Download Steps' are the URLs that the
downloader must visit in order to download your content.
Click on 'Add URL'
to add a new download step.
You can
associate one or more fields to each 'Download Step' or URL.
First, add a new
URL, and then click on the button 'Fields' under the 'Fields' column. Fields are
used as parameters.
To add new fields, click on 'Add
Field'.
To remove a field, select one, and then click on 'Remove Selected
Field'.
To close the fields form, click on 'Close'.
When adding a field, you should specify
four things:
·
Field name: The name that will be used in the
URL
·
Field value: The value that will be used in
testing and in static fields like the login and password fields; the value of a
dynamic field is automatically updated in the downloading process.
·
Field Type: The type associated with the
field.
·
Field visibility: Indicate whether to
display the field as a URL parameter or not. Check the box to hide the field.
You still can use the fields data (please refer to how to use
brackets).
Example:
URL:
"www.example.com/test.php?a=2"
Field Name: b
Field
Value: test
Field Type: Custom
In the testing process, the software
will use this URL: www.example.com/test.php?a=2&b=test.
In the downloading process, if you put the following values (EX1
and EX2) in the text box inside the tab associated with the field
"b":
The downloader will use theses URLs to
download your
content:
"www.example.com/test.php?a=2&b=EX1"
"www.example.com/test.php?a=2&b=EX2"
How
to use brackets:
In order to use a field inside a URL you should
insert the field name inside brackets.
Let us take the last example:
"www.example.com".
You can display the value "2" that corresponds to the
field "a", using the following text: [a]
Example:
"www.example.com/test_[a].php".
Keywords:
[DATE]: displays the
date
[SYMBOL]: displays the symbol
[field_name]: displays any field
value
Open the 'Update Download Item'
form.
Click on the icon within the 'Up' column.
After you have added the URL, specified the fields, and
depending on the content you are parsing, click on the cell within the 'Parser'
column and select the appropriate content type.
Click on 'Parser' to open the
'Parser' form.
This form will upload some content to help you fill in the
appropriate parsing settings.
Look at the 'ASCII Import' plug-in for more
information.
Excel: Download excel file and parse the content of every
sheet.
RSS: Download RSS feed then transform it to CSV.
Zip: Download a
compressed archive and parse every file included inside the archive.
Click on the "Settings" cell to open the settings form.
Within
this form you can specify:
Whether to submit the fields data with a POST or
GET method.
Whether to execute the URL-Script once or for each combination of
fields.
For more information on the URL-Script, please visit the appropriate
section.
The URL-Script is executed before downloading any data. This
script let you define dynamic URLs. It is executed for each combination of
fields, which means that if for example you are using 10 Symbols and a custom
field with 2 custom data. This script will be executed 20 times, once for each
Symbols-Custom data combination. If you check the above element, then the script
will be executed only one time, and you have to define the URLs inside this
script.
In the 'Update Download Item' form, there is
a button named 'Test'.
Click on this button to see how the application will
browse your URLs to import your content.
This tool is designed to help you
find and correct possible problems.
In the 'Update Download Item' form, there is
a button named 'Detect'.
Click on this button to open a browser page. Visit
your content pages URLs to look for all the form fields names.
This tool is
designed to help you create your download item.
- Maximum number of download threads to allow:
specify how many concurrent threads to use during the downloading
process.
- This item should download data every: You specify
here a number of days for your download item, and the application will alert
you each time a download item needs to be run. A column named 'Need a run', in
the 'Download Manager' form, tells you if the download item needs a run or
not.
- Number of seconds to wait between requests
- Database to use for the 'last symbol date': Use
this option to download only missing quotes or data. Select a database so the
downloader can grab the last date from the symbol data in the specified
database.
Specify a list of proxy URLs to use when
downloading data.
Each line represents a proxy URL.
Translate symbols before downloading
data.
Change the symbol name in the URL that will be used to download
data.
Each line represents the source and
destination symbol pair.
The "offset dates" button let you
offset dates components.
If for example yahoo wants the month component to
varies from 0 to 11, 0 means January and 11 means December, then in the 'Offset
Dates', set the 'Offset Month' numeric box to one.
To open the downloader form, select
"Download -> Download Manager".
Select a download
item then click on "Open".
Click on the button "Start" to begin the
downloading process.
The progress column shows you the download progression
and each time a download is complete, the corresponding row is
unchecked.
In the "download items" grid, the "Last" column indicates the
number of days since the last execution of the corresponding
item.
If you have specified a symbol field in the
'Update Download Item' form, then a 'Symbols' tab will appears.
Select the
symbols you would like to download.
If you have specified a Date field in the
'Update Download Item' form, then the 'Dates' tab will appears.
In this tab, select the start date, the end
date, the format, the interval, then click on 'Save'.
The 'Dates' text box, will show you the
dates that are going to be used in the downloading.
Format text box:
[Y]: year, example: 2008
[Y2]: year,
example: 08
[M]: month, example: 01 or 10
[M2]: month, example: 1 (without
the 0) or 10
[D]: day, example: 01 or 22
[D2]: day, example: 1 (without
the 0) or 22
In 'Update Download Item' form, you can also
specify dates component within the URL.
Example for yahoo: http://ichart.finance.yahoo.com/table.csv?a=[2M2]&b=[2D2]&c=[2Y]&d=[M2]&e=[D2]&f=[Y]&g=d&ignore=.csv
Dates format are a little bit different here:
[Y], [Y2],
[M], [M2], [D], [D2] refers to start date, while [2Y], [2Y2], [2M], [2M2], [2D],
[2D2] refers to end date.
If
you have specified date components within URLs, a 'Start & End Dates' tab
will appears in the 'Download Data' form.
Select 'Last symbol data' to download only
missing quotes or data.
The database that will be used to get the last
downloaded date for a particular symbol can be set in the 'Download Settings'
form.
If you have specified a custom field in the
'Update Download Item' form, then a tab containing a text box will
appears.
Each line inside this text box corresponds to a value.
If you need to dynamically specify the URL to
download then you have to use the URL-Script to programmatically specify what
URLs or URLs paths to use.
A URL path is a sequence of URLs. It is used if
for example before loading a URL, the website requires that you load some other
pages, login...
A URL is added using the following
function:
Functions.AddURL
A URL path is added using the
following functions:
Functions.CreateURLPath: creates an URL
path.
AddURL: adds an URL to an URL
path.
Functions.AddURLPath: adds an URL path.
The
(Functions.Net) class contains different methods to download, extract
and parse HTML documents.
The engine will look for URLs specified in the URL-Script; it
downloads them and split the content using the parser settings. It then passes
the split content to the Pre-Script. This script allows you to modify the
provided data. Finally the content is parsed and before added to the quotes or
custom databases, it is passed to the Post-Script.