Skip to content

tutaru99/web-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

web-scraper

scraping the world

@TODO

- Deeper level scraping (article->link->article content)

  • Scraping results are stored in the database
  • Duplicate detection / Scraping per item performed only once
  • Implement rate limiting when needed
  • Look into captcha and how cheerio can handle it
  • Automatic periodic check for new results(?)
  • Alternative scraping option with pupeteer when chosen - for dynamic websites (?)