Skip to main content

API Reference

This page documents the API of Ohm/JS, a JavaScript library for working with grammars written in the Ohm language. For documentation on the Ohm language, see the syntax reference.

Instantiating Grammars

NOTE: For grammars defined in a JavaScript string literal (i.e., not in a separate .ohm file), it's recommended to use a template literal with the String.raw tag.

ohm.grammar(source: string, optNamespace?: object) → Grammar

Instantiate the Grammar defined by source. If specified, optNamespace is the Namespace to use when resolving external references in the grammar. For more information, see the documentation on Namespace objects below.

ohm.grammars(source: string, optNamespace?: object) → Namespace

Create a new Namespace containing Grammar instances for all of the grammars defined in source. If optNamespace is specified, it will be the prototype of the new Namespace.

Namespace objects

When instantiating a grammar that refers to another grammar -- e.g. MyJava <: Java { keyword += "async" } -- the supergrammar name ('Java') is resolved to a grammar by looking up the name in a Namespace. In Ohm/JS, Namespaces are a plain old JavaScript objects, and an object literal like {Java: myJavaGrammar} can be passed to any API that expects a Namespace. For convenience, Ohm also has the following methods for working with namespaces:

ohm.namespace(optProps?: object)

Create a new namespace. If optProps is specified, all of its properties will be copied to the new namespace.

ohm.extendNamespace(namespace: object, optProps?: object)

Create a new namespace which inherits from namespace. If optProps is specified, all of its properties will be copied to the new namespace.

Grammar objects

A Grammar instance g has the following methods:

g.match(str: string, optStartRule?: string) → MatchResult

Try to match str against g, returning a MatchResult. If optStartRule is given, it specifies the rule on which to start matching. By default, the start rule is inherited from the supergrammar, or if there is no supergrammar specified, it is the first rule in g's definition.

g.matcher()

Create a new Matcher object which supports incrementally matching g against a changing input string.

g.trace(str: string, optStartRule?: string) → Trace

Try to match str against g, returning a Trace object. optStartRule has the same meaning as in g.match. Trace objects have a toString() method, which returns a string which summarizes each parsing step (useful for debugging).

g.createSemantics() → Semantics

Create a new Semantics object for g.

g.extendSemantics(superSemantics: Semantics) → Semantics

Create a new Semantics object for g that inherits all of the operations and attributes in superSemantics. g must be a descendent of the grammar associated with superSemantics.

Matcher objects

Matcher objects can be used to incrementally match a changing input against the Matcher's grammar, e.g. in an editor or IDE. When a Matcher's input is modified via replaceInputRange, further calls to match will reuse the partial results of previous calls wherever possible. Generally, this means that small changes to the input will result in very short match times.

A Matcher instance m has the following methods:

m.getInput() → string

Return the current input string.

m.setInput(str: string)

Set the input string to str.

m.replaceInputRange(startIdx: number, endIdx: number, str: string)

Edit the current input string, replacing the characters between startIdx and endIdx with str.

m.match(optStartRule?: string) → MatchResult

Like Grammar's match method, but operates incrementally.

m.trace(optStartRule?: string) → Trace

Like Grammar's trace method, but operates incrementally.

MatchResult objects

Internally, a successful MatchResult contains a parse tree, which is made up of parse nodes. Parse trees are not directly exposed -- instead, they are inspected indirectly through operations and attributes, which are described in the next section.

A MatchResult instance r has the following methods:

r.succeeded() → boolean

Return true if the match succeeded, otherwise false.

r.failed() → boolean

Return true if the match failed, otherwise false.

MatchFailure objects

When r.failed() is true, r has the following additional properties and methods:

r.message: string

Contains a message indicating where and why the match failed. This message is suitable for end users of a language (i.e., people who do not have access to the grammar source).

r.shortMessage: string

Contains an abbreviated version of r.message that does not include an excerpt from the invalid input.

r.getRightmostFailurePosition() → number

Return the index in the input stream at which the match failed.

r.getRightmostFailures() → Array

Return an array of Failure objects describing the failures the occurred at the rightmost failure position.

Semantics, Operations, and Attributes

An Operation represents a function that can be applied to a successful match result. Like a Visitor, an operation is evaluated by recursively walking the parse tree, and at each node, invoking the matching semantic action from its action dictionary.

An Attribute is an Operation whose result is memoized, i.e., it is evaluated at most once for any given node.

A Semantics is a family of operations and/or attributes for a given grammar. A grammar may have any number of Semantics instances associated with it — this means that the clients of a grammar (even in the same program) never have to worry about operation/attribute name clashes.

Semantics objects

Operations and attributes are accessed by applying a semantics instance to a MatchResult. This returns a parse node, whose properties correspond to the operations and attributes of the semantics. For example, to invoke an operation named 'prettyPrint': mySemantics(matchResult).prettyPrint(). Attributes are accessed using property syntax — e.g., for an attribute named 'value': mySemantics(matchResult).value.

A Semantics instance s has the following methods, which all return this so they can be chained:

mySemantics.addOperation(nameOrSignature: string, actionDict: object) → Semantics

Add a new Operation to this Semantics, using the semantic actions contained in actionDict. The first argument is either a name (e.g. 'prettyPrint') or a signature which specifies the operation name and zero or more named parameters (e.g., 'prettyPrint()', 'prettyPrint(depth, strict)'). It is an error if there is already an operation or attribute called name in this semantics.

If the operation has arguments, they are accessible via this.args within a semantic action. For example, this.args.depth would hold the value of the depth argument for the current action.

mySemantics.addAttribute(name: string, actionDict: object) → Semantics

Exactly like semantics.addOperation, except it will add an Attribute to the semantics rather than an Operation.

mySemantics.extendOperation(name: string, actionDict: object) → Semantics

Extend the Operation named name with the semantic actions contained in actionDict. name must be the name of an operation in the super semantics — i.e., you must first extend the Semantics via extendSemantics before you can extend any of its operations.

semantics.extendAttribute(name: string, actionDict: object) → Semantics

Exactly like semantics.extendOperation, except it will extend an Attribute of the super semantics rather than an Operation.

Semantic Actions

A semantic action is a function that computes the value of an operation or attribute for a specific type of node in the parse tree. There are three different types of parse nodes:

  • Rule application, or non-terminal nodes, which correspond to rule application expressions
  • Terminal nodes, for string and number literals, and keyword expressions
  • Iteration nodes, which are associated with expressions inside a repetition operator (*, +, and ?)

Generally, you write a semantic action for each rule in your grammar, and store them together in an action dictionary. For example, given the following grammar:

Name {
FullName = name name
name = (letter | "-" | ".")+
}

A set of semantic actions for this grammar might look like this:

const actions = {
FullName(firstName, lastName) { ... },
name(parts) { ... }
};

The value of an operation or attribute for a node is the result of invoking the node's matching semantic action. In the grammar above, the body of the FullName rule produces two values — one for each application of the name rule. The values are represented as parse nodes, which are passed as arguments when the semantic action is invoked. An error is thrown if the function arity does not match the number of values produced by the expression.

The matching semantic action for a particular node is chosen as follows:

  • On a rule application (non-terminal) node, first look for a semantic action with the same name as the rule (e.g., 'FullName'). If the action dictionary does not have a property with that name, use the action named _nonterminal, if it exists. If there is no _nonterminal action, and the node has exactly one child, then return the result of invoking the operation/attribute on the child node.
  • On a terminal node (e.g., a node produced by the parsing expression "hello"), use the semantic action named _terminal.
  • On an iteration node (e.g., a node produced by the parsing expression letter+), use the semantic action named _iter.
The `_iter`, `_nonterminal`, and `_terminal` actions are sometimes called _special actions_. `_iter` and `_nonterminal` take a variable number of arguments, which are typically captured into an array using [rest parameter syntax](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/rest_parameters), e.g. `_iter(...children) { ... }`. The `_terminal` action takes no arguments.

NOTE: Versions of Ohm prior to v16.0 had slightly different behaviour with regards to default semantic actions. See here for more details.

Note that you can also write semantic actions for built-in rules like letter or digit. For ListOf, please see the documentation on asIteration below.

Parse Nodes

Each parse node is associated with a particular parsing expression (a fragment of an Ohm grammar), and the node captures any input that was successfully parsed by that expression. Unlike many parsing frameworks, Ohm does not have a syntax for binding/capturing -- every parsing expression captures all the input it consumes, and produces a fixed number of values.

A node n has the following methods and properties:

n.child(idx: number) → Node

Get the child at index idx.

n.isTerminal() → boolean

true if the node is a terminal node, otherwise false.

n.isIteration() → boolean

true if the node is an iteration node (i.e., if it associated with a repetition operator in the grammar), otherwise false.

n.children: Array

An array containing the node's children.

n.ctorName: string

The name of grammar rule that created the node.

n.source: Interval

Captures the portion of the input that was consumed by the node.

n.sourceString: string

The substring of the input that was consumed by the node. Equivalent to n.source.contents.

n.numChildren: number

The number of child nodes that the node has.

n.isOptional() → boolean

true if the node is an iterator node having either one or no child (? operator), otherwise false.

Operations and Attributes

In addition to the properties listed above, within a given semantics, every node also has a method/property corresponding to each operation/attribute in the semantics. For example, in a semantics that has an operation named 'prettyPrint' and an attribute named 'freeVars', every node has a prettyPrint() method and a freeVars property.

Built-in Operations

asIteration

The built-in asIteration operation offers a convenient way of handling ListOf expressions, by adapting them to have the same interface as built-in iteration nodes. As an example, take the following grammar:

G {
Start = ListOf<letter, ",">
}

...and an operation defined as follows:

s.addOperation('upper()', {
Start(list) {
return list.asIteration().children.map(c => c.upper());
},
letter(l) {
return this.sourceString.toUpperCase();
}
});

Then s(g.match('a, b, c')).upper() will return ['A', 'B', 'C']. Note that calling upper() on the result of asIteration implicitly maps the upper operation over each element of the list.

You can also extend the asIteration operation to handle other list-like rules in your own language.