Absolutize URLs

Command: absolutize

Description: Convert relative URLs to absolute based on source URL. By default this operates on the URL the original pipeline input was retrieved from.

Arguments

Notes

This command will iterate through all the A and IMG tags, find those with relative URLs in the HREF or SRC, and make them absolute. This is helpful when embedding a chunk of HTML with links, and you want to user to navigate to the original location of those links.

The domain name that is used to make the links absolute can be provided:

absolutize -url:https://whatever.com

If you don't provide any arguments, it will use the URL source property of the working data.

Code Notes

I need to account for when there isn't a valid URL in the working data. In those cases, the URL argument will be required.

Also, are there any other tag/attribute combinations I should consider?

The actual URL formation is native JS:

new URL(relative, baseUrl).href

You can provide any path, and it will form the URL relative to that path:

new URL('baz.html','http://foo.com/bar/').href  // http://foo.com/bar/baz.html

However, be careful about your trailing slashes. No slash, means the relative URL will replace the last segment.

new URL('baz.html','http://foo.com/bar').href  // http://foo.com/baz.html

Here's a little tester for you to see how your URLs would resolve.

Source Code

import { parseHtml } from "../helpers.js";

async function absolutize(working, command, p) {

  const url = command.getArg("url") ?? working.source;

  const doc = await parseHtml(working.text);

  // Resolve links
  doc.querySelectorAll("a[href]").forEach((a) => {
    a.setAttribute("href", resolveUrl(a.getAttribute("href"), url));
  });

  // Resolve images
  doc.querySelectorAll("img[src]").forEach((img) => {
    img.setAttribute("src", resolveUrl(img.getAttribute("src"), url));
  });

  // Serialize DOM back to string
  return doc.body.innerHTML;
}

// Meta

absolutize.title = "Absolutize URLs";
absolutize.description = "Convert relative URLs to absolute based on source URL. By default this operates on the URL the original pipeline input was retrieved from.";
absolutize.args = [
  {
    name: "url",
    type: "string",
    description: "The base URL with which to calculate the new links. If not provided, the source URL will be used.",
  },
];
absolutize.allowedContentTypes = ["html"];
absolutize.parseValidators = [
  {
    test: (command) => {
      if (command.getArg("url") && !isAbsoluteUrl(command.getArg("url"))) {
        return false;
      }
      return true;
    },
    message: "If you provide a URL, it must be an absolute URL.",
  },
];

// Helpers

function resolveUrl(relative, baseUrl) {
  try {
    return new URL(relative, baseUrl).href;
  } catch {
    return relative; // leave it unchanged if it's not a valid URL
  }
}

function isAbsoluteUrl(str) {
  try {
    new URL(str.trim()); // no base ⇒ throws on relative inputs
    return true;
  } catch {
    return false;
  }
}

export default absolutize;