Skip to content

Web Crawler developed in the Information Recovery class in the Federal Center of Technological Education in Minas Gerais.

Notifications You must be signed in to change notification settings

josuerocha/WebCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebCrawler

Web Crawler developed in the Special Topics in Computer and Algorithms: Information Recovery class in the Federal Center of Technological Education in Minas Gerais.

Authors: Josué Rocha Lima, Túlio Coqueiro, Caio Silva Gonçalves

Advisor: Daniel Hasan Dalip

To do

  • Configure project to be used in NetBeans. - Josué
  • Store last access time for server (armazenar a última vez que um servidor foi acessado). - Caio
  • Malformed HTMLs (HTMLs mal formados) - Caio
  • Insert pages in the collected pages queue (inserir páginas coletadas na fila de coletados) - Túlio
  • Extract links from collected pages (extrair links das páginas coletadas). - Túlio
  • Existência de páginas (404). - Túlio
  • Page encoding verification - Josué
  • Robot exclusion protocol (Protocolo de exclusão de robôs) - Josué
  • Noindex and nofollow criteria.
  • Code comments.
  • Crawler webpage.
  • Report.

Use instructions