Skip to content

mccutchen/urlresolver

Repository files navigation

urlresolver

Documentation Build status Code coverage Go report card

A golang package that "resolves" a given URL by issuing a GET request, following any redirects, canonicalizing the final URL, and attempting to extract the title from the final response body.

Methodology

Resolving

A URL is resolved by issuing a GET request and following any redirects until a non-30x response is received.

Canonicalizing

The final URL is aggressively canonicalized using a combination of PuerkitoBio/purell and some manual heuristics for removing unnecessary query params (e.g. utm_* tracking params), normalizing case (e.g. twitter.com/Thresholderbot and twitter.com/thresholderbot are the same).

Canonicalization is optimized for URLs that are shared on social media.

Security

TL;DR: Use safedialer.Control in the transport's dialer to block attempts to resolve URLs pointing at internal, private IP addresses.

Exposing functionality like this on the internet can be dangerous, because it could theoretically allow a malicious client to discover information about your internal network by asking it to resolve URLs whose DNS points at private IP addresses.

The dangers, along with a golang-specific mitigation, are outlined in Andrew Ayer's excellent "Preventing Server Side Request Forgery in Golang" blog post.

To mitigate that danger, users are strongly encouraged to use safedialer.Control as the Control function in the dialer used by the transport given to urlresolver.New.

See github.com/mccutchen/urlresolverapi for a productionized example, deployed at https://urlresolver.com.