Bruna Ferreira Santos
Data Analysis Full Time FEB 2021, Paris, February, 22, 2021
The goal of this project was to collect information from Caterpillar's new equipment through web scraping.
GET method:
- Two data access with request module (headers and normalize JSON)
- One data access with Selenium and BeautifulSoup (webdriver, execute script, find all tables)
-
Static
-
Restful API Based
-
Dynamic
- To access the data, first I retrieved the names of all types of equipment displayed on the page, clean and store them into a list.
- Then, I lopped to access the 3 main characteristics of each model from the selected equipment - that are exposed in the page.
- After having the information in a Dataframe, I started my Data Cleaning and Feature Organization.
- With adequate data, I enabled access to MySQL and created the excel file.
- To go further, I accessed the page of each model from all the types of equipment with Selenium, where I needed to click to 'expand' the information.
- I looped trou all the pages retrieving and storing (concat) the information into a data frame.
- Data Cleaning (in processing)