Skip to content

Latest commit

 

History

History
26 lines (19 loc) · 980 Bytes

README.md

File metadata and controls

26 lines (19 loc) · 980 Bytes

WebCrawler

Web Crawler developed in the Special Topics in Computer and Algorithms: Information Recovery class in the Federal Center of Technological Education in Minas Gerais.

Authors: Josué Rocha Lima, Túlio Coqueiro, Caio Silva Gonçalves

Advisor: Daniel Hasan Dalip

To do

  • Configure project to be used in NetBeans. - Josué
  • Store last access time for server (armazenar a última vez que um servidor foi acessado). - Caio
  • Malformed HTMLs (HTMLs mal formados) - Caio
  • Insert pages in the collected pages queue (inserir páginas coletadas na fila de coletados) - Túlio
  • Extract links from collected pages (extrair links das páginas coletadas). - Túlio
  • Existência de páginas (404). - Túlio
  • Page encoding verification - Josué
  • Robot exclusion protocol (Protocolo de exclusão de robôs) - Josué
  • Noindex and nofollow criteria.
  • Code comments.
  • Crawler webpage.
  • Report.

Use instructions