The Python script extracts all the pages from a given eMag link in less than 20 seconds and saves the data in an Excel output file. In the below example the script extracted all the 25 pages with aprox 1500 products from the mobile phones eMag page (https://www.emag.ro/telefoane-mobile/c).
By sending a different header with each request using https://httpbin.org/user-agent, I managed to bypass the anti spyder/scraping tools eMag is currently using. Otherwise, the website would lock me out after several requests.
- Pandas
- Requests
- Threading
- BeautifulSoup
- Regular Expressions/Regex