Absolutize URLs
Command: absolutize
Description: Convert relative URLs to absolute based on source URL. By default this operates on the URL the original pipeline input was retrieved from.
Arguments
-
url: The base URL with which to calculate the new links. If not provided, the source URL will be used.
Notes
This command will iterate through all the A and IMG tags, find those with relative URLs in the HREF or SRC, and make them absolute. This is helpful when embedding a chunk of HTML with links, and you want to user to navigate to the original location of those links.
The domain name that is used to make the links absolute can be provided:
absolutize -url:https://whatever.com
If you don't provide any arguments, it will use the URL source property of the working data.
Code Notes
I need to account for when there isn't a valid URL in the working data. In those cases, the URL argument will be required.
Also, are there any other tag/attribute combinations I should consider?
The actual URL formation is native JS:
new URL(relative, baseUrl).href
You can provide any path, and it will form the URL relative to that path:
new URL('baz.html','http://foo.com/bar/').href // http://foo.com/bar/baz.html
However, be careful about your trailing slashes. No slash, means the relative URL will replace the last segment.
new URL('baz.html','http://foo.com/bar').href // http://foo.com/baz.html
Here's a little tester for you to see how your URLs would resolve.
Source Code
import { parseHtml } from "../helpers.js";
async function absolutize(working, command, p) {
const url = command.getArg("url") ?? working.source;
const doc = await parseHtml(working.text);
// Resolve links
doc.querySelectorAll("a[href]").forEach((a) => {
a.setAttribute("href", resolveUrl(a.getAttribute("href"), url));
});
// Resolve images
doc.querySelectorAll("img[src]").forEach((img) => {
img.setAttribute("src", resolveUrl(img.getAttribute("src"), url));
});
// Serialize DOM back to string
return doc.body.innerHTML;
}
// Meta
absolutize.title = "Absolutize URLs";
absolutize.description = "Convert relative URLs to absolute based on source URL. By default this operates on the URL the original pipeline input was retrieved from.";
absolutize.args = [
{
name: "url",
type: "string",
description: "The base URL with which to calculate the new links. If not provided, the source URL will be used.",
},
];
absolutize.allowedContentTypes = ["html"];
absolutize.parseValidators = [
{
test: (command) => {
if (command.getArg("url") && !isAbsoluteUrl(command.getArg("url"))) {
return false;
}
return true;
},
message: "If you provide a URL, it must be an absolute URL.",
},
];
// Helpers
function resolveUrl(relative, baseUrl) {
try {
return new URL(relative, baseUrl).href;
} catch {
return relative; // leave it unchanged if it's not a valid URL
}
}
function isAbsoluteUrl(str) {
try {
new URL(str.trim()); // no base ⇒ throws on relative inputs
return true;
} catch {
return false;
}
}
export default absolutize;