Skip to content

Latest commit

 

History

History
20 lines (14 loc) · 487 Bytes

README.md

File metadata and controls

20 lines (14 loc) · 487 Bytes

PageSimpleScraper

Web Page simple scraper using Jsoup

and for url validate i'm using Apache Commons UrlValidator class

for unique images and links it create tab delimited file with details on it:

images:

  • image number
  • image url
  • image width
  • image height
  • image alt

links:

  • link url
  • link text

You can view exapmle files for this link