Configuration Language: Specification
I write a weird amount of code that needs dynamic configuration, meaning configuration which is supplied at runtime, not at parse or startup time. To configure, the code needs to read in a string of text (the “source”), and turn it into an object.
I use these when configuring pipeline processes. The concept is of multiple “filters” or “gateways” through which data passes. Each one has to be identified, configured, and ordered in relation to the others.
Over the years, I’ve come up with a DSL that has worked well for me. I originally wrote something like this for C# in the Denina library, but then I needed a version in JavaScript. Instead of writing it manually, I decided to specify it and have AI write the implementation for me.
The code samples below are in JavaScript, but the basic idea CL is language agnostic. This is a generic specification.
Simple Specification
Each line of text represents a Command. The leading word – from the start of the line to the first instance of whitespace – is the Command name, and it’s followed by multiple Arguments, separated by whitespace.
An Argument name is preceded by a dash, followed by a colon, and the value follows the colon:
command1 -arg1:value1 -arg2:value2
command2 -arg1:value1
command3 -arg1
command4
Arguments values that have whitespace must be quoted:
command1 -arg1:"value1 with some spaces"
The above information gets you through 90% of use cases.
Detailed Specification
The goal of CL is to parse text into a data structure.
The resulting object is called a Commandset. A Commandset is an iterable, ordered collection of Command objects, and a iterable, ordered collection of Argument objects.
Each Command has a name. Multiple Commands with the same name can exist in a Commandset. Each Command can have zero or more Arguments. A Command’s Arguments “belong” to that Command and are not related to any other Command.
Each Argument is a key/value object. An Argument key may not appear more than once.
Commands
The simplest practical source for a Commandset would look like this:
foo
The above would be a Commandset with one command named foo
which has zero arguments.
(Note that an empty string or a string of nothing but whitespace is also parsable as a Commandset. It would simply result in a Commandset with an empty collection of commands.)
Each Command is placed on its own line:
foo
bar
The above would result in a Commandset with two Commands – named foo
and bar.
Neither Command will have any Arguments.
In the list of Commands, blank lines – or lines consistent of only whitespace – are ignored. Lines starting with a pound sign (#
) are comments and are also ignored.
The order of Commands matters. In the above example, foo
comes before bar
.
Arguments
An Argument is specified on the same line as the Command to which it belongs. The name of the Argument is preceded with a dash (-
) and followed by a colon (:
), followed immediately by the value of the argument.
foo -baz:qux
bar
The above is a Commandset with a two Commands named foo
and bar.
The first Command has one Argument named baz
with a value of qux.
The name of the Argument is what comes after the dash and before the colon. The value is what comes after the first colon and before the next whitespace.
Argument values which contain a space need to enclosed in quotes – double or single, just be consistent.
foo -bar:"qux corge"
If a quoted value itselfs contains a quote, it will need to be escaped by preceding it with a backslash (\
):
foo -bar:"He said, \"My name is Deane.\""
Quotes have to be escaped even if they differ from the surrounding quotes. In the above example, the double-quotes would have to be escaped even if the surrounding quotes were single.
Multiple Arguments are separated with any form or combination of whitespace on that same line (see exception below for indented lines):
foo -bar:qux -grault:garply
The order of Arguments matters. In the above example, bar
comes before grault
.
An Argument does not have to specify a value, in which case the value is assumed to be true.
foo -bar:qux -grault -waldo:fred
The above is a Commandset with one Command named foo.
That command has three arguments:
bar
with a value ofqux
grault
with an assumed/default value oftrue
waldo
with a value offred
Note that -grault
isn’t followed by a colon. If so, its value would be an empty string.
# grault is "true"
foo -grault
# grault is an empty string
foo -grault:
Commands and their Arguments can be placed on multiple lines as long as the second and subsequent lines are indented using any combination of whitespace. A series of indented lines are simply joined to form a single line (including the leading whitespace on the second and subsequent lines).
foo
-bar:qux
-grault
-waldo:fred
baz
The above is functionally the same as:
foo -bar:qux -grault -waldo:fred
baz
This means the order of the Arguments is simply vertical, rather than horizontal. In the example above, the order is bar
, grault
, and waldo
, whether the Arguments are on one line or multiple lines.
The level of indentation does not matter. It can vary between lines. It only matters that consecutive lines are indented with some form of whitespace. While not recommended, a blank line between two indented lines will not break the “indentation series.”
# This is legal, as the blank line will simply be ignored
# But it's visually confusing, so it's not recommmended
foo
-bar:baz
-qux:grault
Tokens
Tokens can be defined at the bottom of the source, under the last Command specification. The purpose of Tokens is to allow longer values – which might include line breaks – that would be awkward or impossible to type on a single line.
(Additionally, Tokens allow you to use the same value for multiple Arguments, but the practical value of this is probably slight.)
A Token is a line that begins with a dollar sign $
immediately followed by the Token name. The lines following – until the next Token declaration or the end of the file – constitute the value of the Token. Token values can be multi-line.
foo -bar:qux
baz
$garply
Mary had a little lamb
$plugh
Its fleece was white as snow
This is a Commandset with two Commands named foo
and bar
and two Tokens named garply
and plugh.
garply
has a value ofMary had a little lamb
(Token values are trimmed automatically – leading and trailing whitespace is removed)plugh
has a value ofIts fleece was white as snow
(In this example, the tokens are defined but not used, which is rather pointless.)
No Commands can be defined after the first Token declaration.
If defined, Tokens can be referred to as Argument values. The value of the Token is substituted for the Argument value.
foo -bar:$qux
$qux
This is the value of qux. It can be very long…
…and constitute multiple lines.
This substitution occurs at parse time. The resulting Argument object has no record of whether the value was provided inline or via a Token reference.
The same Token can be used for multiple of Arguments spread thoughout multiple Commands.
If the referenced Token name is not defined, this constitutes a parse failure. Conversely, if you define a Token but don’t reference it, this will not cause the parse to fail, but there’s no practical reason to do this.
Token values are simple, unprocessed strings. They cannot refer to other Tokens.
Quoted values are not evaluated for the presence of Tokens, so an Argument value that could be confused with a Token reference should be quoted:
# This is a Token reference
foo -bar:$baz
# This is not a Token reference
# This is the literal string $baz
foo -bar:"$baz"
$baz
This is the Token value.
Naming and Charactersets
Command names, Argument names, and Token names may consist of:
- ASCII letters
- Numeric digits
- Underscore (
_
) - Dash (
-
) - Period/dot (
.
)
The first character must be a letter.
Argument and Token values (not names) can contain any character, including Unicode, emojis, or other non-ASCII characters.
API
A Commandset object is obtained by passing the text source into the constructor:
let commands = new CommandSet(source);
Any violation of the rules detailed above should result in a failed parse and a thrown exception.
The Commandset object should provide an iterable collection of Command objects in the source order, meaning the order the commands appear in the text source. This collection should be publicly writable.
Each Command object should have a property for name, and an iterable collection of Argument objects, in the source order. Each Argument object has a unique key and a string value.
A Command object should be able to be instantiated independent of the Commandset object, and be able to be parsed from source.
let command = new Command("foo -bar:baz);
commandset.commands.push(command);
Token substitution occurs at parse time. Tokens will need to be parsed first, as they will be needed to complete substitution as the Commands are parsed. Once the source is parsed and the Commandset object is created, the Tokens are no longer needed.
The Argument object should be able to be instantiated independent of the Command object, and be able to be parsed from source:
let argument = new Argument("-bar:baz");
command.arguments.push(argument);
All Argument and Token values are parsed as strings. Casing is always preserved.
Whitespace is preserved when:
- Quoted in an Argument value (remember, Argument values with spaces need to be quoted)
- It appears in a Token definition, between characters (leading and trailing whitespace is trimmed)
Tests
In order for a CL parser to be valid, it must do the following.
- Correctly parse Commands into a Commandset in the source order
- Allow more than one Command in a Commandset with the same name
- Correctly concatenate consecutively indented lines to the end of the last non-indented line
- Correctly parse Arguments in the source order
- Disallow Arguments in the same Command with the same key
- Provide a default value of “true” to an Argument with no value
- Capture quoted (double or single quotes) Argument values correctly, including all spaces
- Disallow Command, Argument, or Token names that have any character which is not an ASCII letter, digit, underscore, dash, or period.
- Allow escaping of quotes inside a quoted Token value
- Correctly parse Token blocks
- Correctly return Tokens as an Argument value, when that Argument value is requested
- Correctly parse when the special characters have been changed
- Allow new Command objects to be added the Commandset collection
- Allow new Token objects to be added to the Commandset collection
Examples
Example 1
Source:
foo
bar
Parsed:
[
{
name: "foo",
args: []
},
{
name: "bar",
args: []
}
]
Example 2
Source:
foo -bar:baz
bar -qux
Parsed:
[
{
name: "foo",
args: [
{
bar: "baz"
}
]
},
{
name: "bar"
args: [
{
qux: "true"
}
]
}
]
Example 3
Source:
foo -bar:$baz
$baz
This is the value of baz
Parsed:
[
{
name: "foo",
args: [
{
bar: "This is the value of baz"
}
]
}
]
Example 4
Source:
foo -bar: -baz
Parsed:
[
{
name: "foo",
args: [
{
bar: "",
baz: "true"
}
]
}
]