Configuration Language: Specification

I write a weird amount of code that needs dynamic configuration, meaning configuration which is supplied at runtime, not at parse or startup time. To configure, the code needs to read in a string of text (the “source”), and turn it into an object.

I use these when configuring pipeline processes. The concept is of multiple “filters” or “gateways” through which data passes. Each one has to be identified, configured, and ordered in relation to the others.

Over the years, I’ve come up with a DSL that has worked well for me. I originally wrote something like this for C# in the Denina library, but then I needed a version in JavaScript. Instead of writing it manually, I decided to specify it and have AI write the implementation for me.

The code samples below are in JavaScript, but the basic idea CL is language agnostic. This is a generic specification.

Simple Specification

Each line of text represents a Command. The leading word – from the start of the line to the first instance of whitespace – is the Command name, and it’s followed by multiple Arguments, separated by whitespace.

An Argument name is preceded by a dash, followed by a colon, and the value follows the colon:

command1 -arg1:value1 -arg2:value2
command2 -arg1:value1
command3 -arg1
command4

Arguments values that have whitespace must be quoted:

command1 -arg1:"value1 with some spaces"

The above information gets you through 90% of use cases.

Detailed Specification

The goal of CL is to parse text into a data structure.

Commands

The simplest practical source for a Commandset would look like this:

foo

The above would be a Commandset with one command named foo which has zero arguments.

(Note that an empty string or a string of nothing but whitespace is also parsable as a Commandset. It would simply result in a Commandset with an empty collection of commands.)

Each Command is placed on its own line:

foo
bar

The above would result in a Commandset with two Commands – named foo and bar. Neither Command will have any Arguments.

In the list of Commands, blank lines – or lines consistent of only whitespace – are ignored. Lines starting with a pound sign (#) are comments and are also ignored.

The order of Commands matters. In the above example, foo comes before bar.

Arguments

An Argument is specified on the same line as the Command to which it belongs. The name of the Argument is preceded with a dash (-) and followed by a colon (:), followed immediately by the value of the argument.

foo -baz:qux
bar

The above is a Commandset with a two Commands named foo and bar. The first Command has one Argument named baz with a value of qux. The name of the Argument is what comes after the dash and before the colon. The value is what comes after the first colon and before the next whitespace.

Argument values which contain a space need to enclosed in quotes – double or single, just be consistent.

foo -bar:"qux corge"

If a quoted value itselfs contains a quote, it will need to be escaped by preceding it with a backslash (\):

foo -bar:"He said, \"My name is Deane.\""

Quotes have to be escaped even if they differ from the surrounding quotes. In the above example, the double-quotes would have to be escaped even if the surrounding quotes were single.

Multiple Arguments are separated with any form or combination of whitespace on that same line (see exception below for indented lines):

foo -bar:qux -grault:garply

The order of Arguments matters. In the above example, bar comes before grault.

An Argument does not have to specify a value, in which case the value is assumed to be true.

foo -bar:qux -grault -waldo:fred

The above is a Commandset with one Command named foo. That command has three arguments:

Note that -grault isn’t followed by a colon. If so, its value would be an empty string.

# grault is "true"
foo -grault

# grault is an empty string
foo -grault:

Commands and their Arguments can be placed on multiple lines as long as the second and subsequent lines are indented using any combination of whitespace. A series of indented lines are simply joined to form a single line (including the leading whitespace on the second and subsequent lines).

foo
  -bar:qux
  -grault
  -waldo:fred
baz

The above is functionally the same as:

foo  -bar:qux  -grault  -waldo:fred
baz

This means the order of the Arguments is simply vertical, rather than horizontal. In the example above, the order is bar, grault, and waldo, whether the Arguments are on one line or multiple lines.

The level of indentation does not matter. It can vary between lines. It only matters that consecutive lines are indented with some form of whitespace. While not recommended, a blank line between two indented lines will not break the “indentation series.”

# This is legal, as the blank line will simply be ignored
# But it's visually confusing, so it's not recommmended
foo
  -bar:baz

  -qux:grault

Tokens

Tokens can be defined at the bottom of the source, under the last Command specification. The purpose of Tokens is to allow longer values – which might include line breaks – that would be awkward or impossible to type on a single line.

(Additionally, Tokens allow you to use the same value for multiple Arguments, but the practical value of this is probably slight.)

A Token is a line that begins with a dollar sign $ immediately followed by the Token name. The lines following – until the next Token declaration or the end of the file – constitute the value of the Token. Token values can be multi-line.

foo -bar:qux
baz

$garply
Mary had a little lamb

$plugh
Its fleece was white as snow

This is a Commandset with two Commands named foo and bar and two Tokens named garply and plugh.

(In this example, the tokens are defined but not used, which is rather pointless.)

No Commands can be defined after the first Token declaration.

If defined, Tokens can be referred to as Argument values. The value of the Token is substituted for the Argument value.

foo -bar:$qux

$qux
This is the value of qux. It can be very long…

…and constitute multiple lines.

This substitution occurs at parse time. The resulting Argument object has no record of whether the value was provided inline or via a Token reference.

The same Token can be used for multiple of Arguments spread thoughout multiple Commands.

If the referenced Token name is not defined, this constitutes a parse failure. Conversely, if you define a Token but don’t reference it, this will not cause the parse to fail, but there’s no practical reason to do this.

Token values are simple, unprocessed strings. They cannot refer to other Tokens.

Quoted values are not evaluated for the presence of Tokens, so an Argument value that could be confused with a Token reference should be quoted:

# This is a Token reference
foo -bar:$baz

# This is not a Token reference
# This is the literal string $baz
foo -bar:"$baz"

$baz
This is the Token value.

Naming and Charactersets

Command names, Argument names, and Token names may consist of:

The first character must be a letter.

Argument and Token values (not names) can contain any character, including Unicode, emojis, or other non-ASCII characters.

API

A Commandset object is obtained by passing the text source into the constructor:

let commands = new CommandSet(source);

Any violation of the rules detailed above should result in a failed parse and a thrown exception.

The Commandset object should provide an iterable collection of Command objects in the source order, meaning the order the commands appear in the text source. This collection should be publicly writable.

Each Command object should have a property for name, and an iterable collection of Argument objects, in the source order. Each Argument object has a unique key and a string value.

A Command object should be able to be instantiated independent of the Commandset object, and be able to be parsed from source.

let command = new Command("foo -bar:baz);
commandset.commands.push(command);

Token substitution occurs at parse time. Tokens will need to be parsed first, as they will be needed to complete substitution as the Commands are parsed. Once the source is parsed and the Commandset object is created, the Tokens are no longer needed.

The Argument object should be able to be instantiated independent of the Command object, and be able to be parsed from source:

let argument = new Argument("-bar:baz");
command.arguments.push(argument);

All Argument and Token values are parsed as strings. Casing is always preserved.

Whitespace is preserved when:

Tests

In order for a CL parser to be valid, it must do the following.

Examples

Example 1

Source:

foo
bar

Parsed:

[
  {
    name: "foo",
    args: []
  },
  {
    name: "bar",
    args: []
  }
]

Example 2

Source:

foo -bar:baz
bar -qux

Parsed:

[
  {
    name: "foo",
    args: [
     {
       bar: "baz"
     }
    ]
  },
  {
    name: "bar"
    args: [
      {
        qux: "true"
      }
    ]
  }
]

Example 3

Source:

foo -bar:$baz

$baz
This is the value of baz

Parsed:

[
  {
    name: "foo",
    args: [
      {
        bar: "This is the value of baz"
      }
    ]
  }
]

Example 4

Source:

foo -bar: -baz

Parsed:

[
  {
    name: "foo",
    args: [
      {
        bar: "",
        baz: "true"
      }
    ]
  }
]