Insight Things

A scientific blog revealing the hidden links which shape our world

Download Historical Share, Trust and Index Quotations

One of the more interesting applications of statistics is the analysis of stock quotation data. A solid analysis can give you an advantage when deciding which shares to buy. Also you can search for patterns; which shares move together with others..? However, the first step is collecting all the data which are required – and this is a cumbersome task!

Let's download historical stock, trust and index quotations for analysis purposes

Let’s download historical stock, trust and index quotations for analysis purposes

I found lots of discussions in various forums, on StackOverflow and so on. But nobody provided a solution for downloading a bulk of data at once. As a consequence, I created my own solution (which you can download here) based on the Google Finance API.

Please notice that using this tool might violate against the Google Finance rules declared in their disclaimer, because it downloads and saves the data.

Update: Google Finance API has been disabled in 2018. We now use Alpha Vantage. You need to get a free API key.

Basics: Input and Output

The tool is a simple command line application. Its input consists of a CSV file where the first three columns are “Exchange” (containing the exchanges’s abbreviation as stated in the Google Finance Disclaimer), the “Symbol” and the “Name” of the shares, trusts or indices. Further columns can be added and will appear in the output as well. You configure the input CSV file in “StockDataDownloader.exe.config” by manipulating the value for the key “StockNames”. In the corresponding subdirectory you will find input files for NYSE, NASDAQ and ETR which I compiled from the exchanges’ official websites. Running the program after download and configuring the API key should result in output similar to the following.

Screenshot of the tool while downloading past NASDAQ quotations

Screenshot of the tool while downloading past NASDAQ quotations

If you didn’t adjust the settings for “OutputDir”, then you will find a CSV file in the directory “Output”. If you want to pause downloading then just hit a key. Restarting the applications will make the process continue with those stocks for which no data is present at this point in time. This is of particular interest when Google Alpha Vantage finds out that you are a bot. The console will show dozens of errors and you are recommended to stop the process and continue later (~1min).

Permanent errors after being perceived as a bot by Google

Permanent errors after being perceived as a bot by Google

Notice that, as a consequence, an update of the data (e. g. some days later) can only be done if you delete the output file or change the output to another name in StockDataDownloader.exe.config. Otherwise Stock Data Downloader won’t collect new quotations.

Advanced Configuration

In the configuration file StockDataDownloader.exe.config you have further options on hand:

  • StockNames gives you the chance to set the input CSV file with format [path]\[filename]
  • StartDate defines the oldest data which will be delivered by the Google Finance API. You receive a configuration with default “Jan+01%2C+2000” which you can modify as long as you keep the format fixed.
  • ParallelConnections allows running parallel queries. This is not recommended, because Google will detect that you’re a bot even faster then.
  • Delay defines the delay (in milliseconds) after querying the data for one share. You can set it to a “human-realistic” value in order to make Google think that you are actually a human being. The default is 5000. Otherwise you will run into the Alpha Vantage download limit.
  • OutputDir lets you set the directory of the tool’s output file.
  • OutputFile names the actual file which will be created.
  • IncludeLatestData can be set to “false” if you don’t want to receive all data available. This could be the case if you want to skip the data of the current day, because Google Finance API may not yield valid high and low price information before trading has stopped.
  • DayLimitToIgnore is enormously important if you want to skip shares for which there has no data been available for a certain number of days (and which therefore may not exist anymore).
  • LocalizeDecimalSeparator defines whether to convert Google’s decimal dot into a comma. This is not recommended, especially if you want to process your data using R – so leave it “false”.
  • Function defines what kind of data should be downloaded. The default is TIME_SERIES_DAILY.
  • Mode can be “full” (default) or “compact”. The latter will limit the number of records sent to ~100.
  • ApiKey should contain your API key. You can get it for free here.

A word on data quality

I guess they have their reasons at Google Finance to list issues regarding data quality in their disclaimer (including missing and invalid data). For me these statements came too late; I found out about the data quality the hard way ๐Ÿ™‚ Actually, for some symbols you won’t find any data although the share is still in trade. Further there are days, weeks and even past years missing for some of the shares which I investigated. My advise for you is to think about what data actualy creates a benefit for your purposes and then clean the data before working with them. Also, try to counter-check each and every result you yield, because it could point to problems with data quality.

6 Comments

  1. Is this based on the Google Finance API which they deprecated some years ago and announce would be removed – though it has not actually been removed yet.

    • Jan Rothkegel

      December 18, 2017 at 3:51 pm

      Yes, your’re right. However, we should take into account that they will actually do it some day. Enjoy the pleasure of market data as long as it is available ๐Ÿ˜‰

  2. Hi! this is just what i needed!
    But..it’s collecting only the last 99 dates? How can I change it to get older data?
    Thanks!

    • Jan Rothkegel

      November 23, 2018 at 5:28 pm

      I’m glad that you like the tool. As mentioned, we migrated to Alpha Vantage API. Most probably you are using the default dummy key of Alpha Vantage which comes with some restrictions (e. g. the number of records delivered by the API). You need to register and enter your key to the configuation. Did you do so already?

      • I also receive only the last 100 databars. (when I use “timeseries_daily”). Reason is not the key because if I download via URL and browser I get what I need. Alphavantage has a variable “full” which means last 20 years or “compact” means last 100 bars.
        The default is obviously “compact”.
        I tried to enter this into the config sheet, however, it seems the program is not designed accept more variables.
        This feature should be a minor change and a major improvent to the program.
        For now I use my own script, written in “autoitscript.com”, but in order to share it I would need to make it more nice and pretty.

        • Jan Rothkegel

          May 8, 2020 at 6:28 pm

          Hi Roland,

          most probably this option was not available back in 2017. Now that you informe me, I updated the tool.

          Thanks for the hint and have fun with the new version ๐Ÿ™‚

          Best
          Jan

Leave a Reply

Your email address will not be published.

*

© 2024 Insight Things

Theme by Anders NorenUp ↑