URL Stemmer

Class Library
C#
GitHub Repository

What It Is

Code to "normalize" URLs so they can be compared for equality

Status

Last Reviewed:

I’ve used this quite a bit in the past, but not recently. It was used in a URL redirector add-on for Opti CMS at some point (not sure that still exists).

Details

When working with URL redirection, I got interested in the concept of URL equality. How do you determine if two URLs are equal?

The real “value” of a URL is the resource it responds with. And two URLs can respond with the same resource while differing in:

This library will normalize a URL in several configurable ways, allowing you to compare them to test for equality, or store them in a more standard form.

Basic usage:

var stemmedUrl = UrlStemmer.Stem("https://domain.com/my/path");

This will “normalize” the URL according to a buunch of settings – what domain should be used, whether subdomains should be removed, how querystrings should be handled, etc.

The idea is that you can compare two superficially different URLs to determine if they’re actually equal:

// Basic point: you can't compare URLs for equality by a simple string compare

var url1 = UrlStemmer.Stem("HTTP://DOMAIN.COM/my/path?a=b&c=d#bookmark");
var url2 = UrlStemmer.Stem("https://domain.com/my/path/?c=d&a=b");

// These two URLs might actually be equal when "normalized" by the stemmer

Full doc, including all the configuration options, can be found in at the repo link.