Skip to content

decodez/cosmic_crawler

 
 

Repository files navigation

Node Crawler

Features

  • Crawl through entire website
  • Store crawled URLs
  • Store page information
    • page title
    • meta description
    • latest updated date
  • Detect page with external domain resources
    • iframe
    • image
    • video
  • detect page with
    • iframe
    • image
    • video
  • detect all non-HTML document
  • scan WCAG and generate report using Pa11y
  • validate HTML using W3C validator

System Requirement

  • Microsoft Window / Mac OS
  • Node.JS v10

Installation

  1. create new project folder with following structure:
    • reports/
      • html-validate/
      • wcag/
  2. run npm install

Crawling

  1. in index.js, update the configuration
const configuration: {
	entryUrl: 'https://domain.com/xxx',
	domain: 'domain.com'
}
  1. run node index.js

Generate CSV

  1. run node moduletester.js

Written with StackEdit.

About

Crawler on Node.js, using Puppeteer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%