Skip to content

faraui/netcraft-web-scraper-pdf-report

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

netcraft-web-scraper-pdf-report

Web-scraping sitereport.netcraft.com into PDF format (see result sample for Wikipedia). Based on my cloudflare-bypass-headless-web-scraper.

Structure

[ 1.1M] netcraft-web-scraper-pdf-report
!! [  754] LICENSE.txt
~~ [  841] README.md
~~ [ 1.1M] extlib.tar.bz2
++ [ 5.0K] main.sh
~~ [ 1.1K] scraper.pl

5 files, 1 directory

Installation

git clone -q https://github.com/faraui/netcraft-web-scraper-pdf-report.git && \
cd netcraft-web-scraper-pdf-report && \
chmod ugo+x main.sh

Usage

./main.sh [http://|https://]example.org[*]

If no protocol is specified, https will be used.