Skip to content

bjagrelli/metal_archives_scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Metal Archives Scraper

Overview

The Metal Archives Scraper is a powerful web scraping tool designed to extract band, album, and song information from the Metal Archives website (https://www.metal-archives.com/). The scraper utilizes multithreading for efficient and parallel scraping, allowing you to quickly gather a comprehensive dataset of metal music-related details.

Features

  • Band, Album, and Song Scraping: Extracts essential information about bands, albums, and songs, including details like band name, country of origin, genre, album name, release year, song titles, and more.

  • Efficient Multithreading: Utilizes concurrent execution to speed up the scraping process, allowing for faster data extraction across multiple threads.

  • Data Cleaning and Refinement (Upcoming): Upcoming updates will include data cleaning and refinement processes to ensure accurate and high-quality scraped data.

  • User-Friendly Interface (Upcoming): Future versions may introduce a user-friendly graphical interface for easier configuration and monitoring of the scraping process.

Getting Started

  1. Clone this repository to your local machine.

  2. Install the required Python packages by running: pip install -r requirements.txt

  3. Update the headers and database variables in the __main__ section of scraper.py with your desired user agent and database name.

  4. Run the scraper: python scraper.py

  5. Monitor the terminal for progress updates and scraped data. You can adjust the alphabet list in the get_all_band_links method to scrape specific letters or ranges.

What's Coming Next

We are committed to continuously improving the Metal Archives Scraper to provide users with a more comprehensive and efficient experience. Here's a glimpse of what you can expect in the upcoming releases:

Data Cleaning and Refinement

As we move forward, our focus will be on enhancing the quality of the scraped data. We will implement data cleaning and refinement processes to ensure reliable and valuable information.

Enhanced User Control

We will introduce user-configurable options to customize the scraping process according to your needs. This will provide greater control over data extraction and error handling.

Extended Scraping Capabilities

Future updates will include scraping additional content such as artist biographies, reviews, and more detailed album information.

User-Friendly Interface

We are considering the development of a graphical interface to streamline the user experience and make the scraping process more intuitive.

Stay tuned for these updates as we evolve the Metal Archives Scraper to meet your needs and provide you with a powerful web scraping tool.

Contribution

We welcome contributions from the community! Feel free to submit issues, pull requests, or suggestions to help improve the Metal Archives Scraper.

License

This project is licensed under the MIT License.

About

Data scraping from Metal Archives website

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages