Skip to content

goplus/hdq

Repository files navigation

hdq - HTML DOM Query Language for Go+

Build Status Go Report Card GitHub release Coverage Status Language GoDoc

Summary about hdq

hdq is a Go+ package for processing HTML documents.

Tutorials

Collect links of a html page

How to collect all links of a html page? If you use hdq, it is very easy.

import "github.com/goplus/hdq"

func links(url any) []string {
	doc := hdq.Source(url)
	return [link for a <- doc.any.a if link := a.href?:""; link != ""]
}

At first, we call hdq.Source(url) to create a node set named doc. doc is a node set which only contains one node, the root node.

Then, select all a elements by doc.any.a. Here doc.any means all nodes in the html document.

Then, we visit all these a elements, get href attribute value and assign it to the variable link. If link is not empty, collect it.

At last, we return all collected links. Goto tutorial/01-Links to get the full source code.