Skip to content

MaryanMorel/inseeBdmScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INSEE BDM Scraper

Python utility based on Scrapy & Pandas for BDM data download automation (as there is no API for now). This utility is meant to download monthly series as they are the most common in the BDM database. If you want to extract quarterly or weekly data, you may have to hack the spider.

Please read the INSEE copyright notice before any scraping.

Tested on ubuntu 13.10

Installation

Just clone the repository wherever you want (e.g. ~/scraping/)

git clone git@github.com:MaryanMorel/inseeBdmScraper.git

or

git clone https://github.com/MaryanMorel/inseeBdmScraper.git

Be sure to have the all the dependencies installed (see below)

Usage

Insee BDM series are identified by an idbank. You can find it when you search for data:

insee_idbank_search.png

When you look at data:

insee_idbank_table.png

Or in the url of the series view page:

http://www.bdm.insee.fr/bdm2/affichageSeries.action?idbank=001565530&codeGroupe=1007

In order to use the scraper, just run the following commands in your terminal:

cd ~/path/to/inseeBdmScraper			
scrapy crawl insee -a idBank=001565530

The first line set your working directory to inseeBdmScraper root directory, the second line execute the scraper. By default, Scraped data will be available in your home directory (.csv, utf8 encoding). You can change this behaviour by editing the OUTPUT_PATH variable in settings.py.

Crawl responsibly by identifying yourself (and your website) on the user-agent, edit the settings.py file :

USER_AGENT = 'inseeBdmScraper (+http://www.yourdomain.com)'

Dependencies

Python 2.7.x
Scrapy 0.22.2
Pandas 0.13.0

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About

Python utility for INSEE BDM data automatic downloading.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages