Skip to content

A headline finder from the G1 website, basically takes as data the headline, link, date and respective image.

Notifications You must be signed in to change notification settings

slocksert/g1_WebScrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

G1 WEBSCRAPER

What it does?

  • A headline finder from the G1 site, basically takes as data the headline, link, date and respective image.

How it was made?

  • The software utilizes SQLAlchemy for database interaction and FastAPI for web framework. It scrapes news data from G1 website, stores it in a CSV file, and then checks if each news already exists in the database before sending the new data to the database.

How to use?

  • Clone this repository:
$ git clone https://github.com/slocksert/g1_WebScrapper.git
  • Activate Poetry env and install dependencies

  • To start the MySQL database using a docker-compose file:

    • Create a .env file with these variables:
      • MYSQL_HOST
      • MYSQL_ROOT_PASSWORD
      • MYSQL_DATABASE
      • MYSQL_PORT
  • Start the MySQL compose:

$ docker compose up
  • Start the WebScraper:
$ python3 app/main.py

To visualize the database install a database admnistration tool, example right below.


About

A headline finder from the G1 website, basically takes as data the headline, link, date and respective image.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published