包详细信息

peberminta

mxxii13.3mMIT0.10.0

Simple, transparent parser combinators toolkit that supports any tokens

parser, parser-combinators, parsec

自述文件

peberminta

lint status badge test status badge test coverage badge License: MIT npm npm deno

Simple, transparent parser combinators toolkit that supports tokens of any type.

For when you wanna do weird things with parsers.

Features

  • Well typed - written in TypeScript and with a lot of attention to keep types well defined.

  • Highly generic - no constraints on tokens, options (additional state data) and output types. Core module has not a single mention of strings as a part of normal flow. Some string-specific building blocks can be loaded from a separate module in case you need them.

  • Transparent. Built on a very simple base idea - just a few type aliases. Whole parser state is accessible at any time.

  • Lightweight. Zero dependencies. Just type aliases and functions.

  • Batteries included - comes with a pretty big set of building blocks.

  • Easy to extend - just follow the convention defined by type aliases when making your own building blocks. (And maybe let me know what you think can be universally useful to be included in the package itself.)

  • Easy to make configurable parsers. Rather than dynamically composing parsers based on options or manually weaving options into a dynamic parser state, this package offers a standard way to treat options as a part of static data and access them at any moment for course correction.

  • Well tested - comes with tests for everything including examples.

  • Practicality over "purity". To be understandable and self-consistent is more important than to follow an established encoding of abstract ideas. More on this below.

  • No streaming - accepts a fixed array of tokens. It is simple, whole input can be accessed at any time if needed. More on this below.

  • Bring your own lexer/tokenizer - if you need it. It doesn't matter how tokens are made - this package can consume anything you can type. I have a lexer as well, called leac, and it is used in some examples, but there is nothing special about it to make it the best match (well, maybe the fact it is written in TypeScript, has equal level of maintenance and is made with arrays instead of iterators in mind as well).

Changelog

Available here: CHANGELOG.md

Install

Node

> npm i peberminta
import * as p from 'peberminta';
import * as pc from 'peberminta/char';

Deno

import * as p from 'https://deno.land/x/peberminta@.../core.ts';
import * as pc from 'https://deno.land/x/peberminta@.../char.ts';

Examples

Published packages using peberminta

API

Detailed API documentation with navigation and search:

Convention

Whole package is built around these type aliases:

export type Data<TToken,TOptions> = {
  tokens: TToken[],
  options: TOptions
};

export type Parser<TToken,TOptions,TValue> =
  (data: Data<TToken,TOptions>, i: number) => Result<TValue>;

export type Matcher<TToken,TOptions,TValue> =
  (data: Data<TToken,TOptions>, i: number) => Match<TValue>;

export type Result<TValue> = Match<TValue> | NonMatch;

export type Match<TValue> = {
  matched: true,
  position: number,
  value: TValue
};

export type NonMatch = {
  matched: false
};
  • Data object holds tokens array and possibly an options object - it's just a container for all static data used by a parser. Parser position, on the other hand, has it's own life cycle and passed around separately.

  • A Parser is a function that accepts Data object and a parser position, looks into the tokens array at the given position and returns either a Match with a parsed value (use null if there is no value) and a new position or a NonMatch.

  • A Matcher is a special case of Parser that never fails and always returns a Match.

  • Result object from a Parser can be either a Match or a NonMatch.

  • Match is a result of successful parsing - it contains a parsed value and a new parser position.

  • NonMatch is a result of unsuccessful parsing. It doesn't have any data attached to it.

  • TToken can be any type.

  • TOptions can be any type. Use it to make your parser customizable. Or set it as undefined and type as unknown if not needed.

Building blocks

Core blocks

| | | | | | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | | ab | abc | action | ahead | | all | and | any | between | | chain | chainReduce | choice | condition | | decide | discard | eitherOr | emit | | end | eof | error | fail | | filter | first | flatten | flatten1 | | guard | last | left | leftAssoc1 | | leftAssoc2 | longest | lookAhead | make | | many | many1 | map | mapR | | middle | not | of | option | | or | otherwise | peek | recursive | | reduceLeft | reduceRight | refine | right | | rightAssoc1 | rightAssoc2 | satisfy | sepBy | | sepBy1 | skip | some | start | | takeMinMax | takeN | takeUntil | takeUntilP | | takeWhile | takeWhileP | token | |

Core utilities

| | | | | | --------------------------------------------------------------------------- | --------------------------------------------------------------------- | --------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- | | match | parse | parserPosition | remainingTokensNumber | | tryParse | | | |

Char blocks

| | | | | | ----------------------------------------------------------------------- | --------------------------------------------------------------------- | --------------------------------------------------------------------------- | ----------------------------------------------------------------------- | | anyOf | char | charTest | concat | | noneOf | oneOf | str | |

Char utilities

| | | | | | --------------------------------------------------------------------- | --------------------------------------------------------------------- | --------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- | | match | parse | parserPosition | tryParse |

Turning grammars into parsers

Extended Backus–Naur form is a common notation for defining language grammars.

Parsing Expression Grammar (PEG) is another one.

There are many different dialects of those notations, it is impractical to capture them all here. See this *BNF comparison table for example.

ANTLR (ANother Tool for Language Recognition) is a parser generator whose meta language is also commonly used to describe grammars (more documentation).

Here is a quick cross-reference to give a general idea how to turn production rules into peberminta parsers:

Usage ISO EBNF PEG ANTLR peberminta
terminal (string, character) "foo" or 'bar' "foo" or 'bar' "foo" or 'b' char.str, char.char, core.token
any listed character [abc] [abc] char.oneOf
character range [a-z] "a".."z" char.charTest
non-terminal ... ... ... a Parser instance
concatenation, sequence , core.all, core.ab, core.abc
alternation, choice `\ ` / `\ ` core.first, core.eitherOr
optional (0 or 1) [ ... ] ...? ...? core.option
repetition (0 or more) { ... } ...* ...* core.many
repetition (1 or more) { ... }- ...+ ...+ core.many1
grouping ( ... ) ( ... ) ( ... ) a Parser instance
any token (wildcard) . . core.any
not (inversion) ! ~ core.not
and-predicate (positive lookahead) &... ( ... ) => core.ahead
not-predicate (negative lookahead) !... ( ~... ) => combination of core.not and core.ahead
end of input !. EOF core.end

The same grammar can be described and implemented in multiple ways. There are more peberminta blocks, some of them might be fitting the original idea better than its expression in a limited formal grammar...

See the following examples that illustrate rule-by-rule implementation of a grammar using peberminta blocks:

What about ...?

  • performance - The code is very simple but I won't put any unverified assumptions here. I'd be grateful to anyone who can set up a good benchmark project to compare different parser combinators.

  • stable release - Current release is well thought out and tested. I leave a chance that some supplied functions may need an incompatible change. Before version 1.0.0 this will be done without a deprecation cycle.

  • streams/iterators - Maybe some day, if the need to parse a stream of non-string tokens arise. For now I don't have a task that would force me to think well on how to design it. It would require a significant trade off and may end up being a separate module (like char) at best or even a separate package.

  • Fantasy Land - You can find some familiar ideas here, especially when compared to Static Land. But I'm not concerned about compatibility with that spec - see "Practicality over "purity"" entry above. What I think might make sense is to add separate tests for laws applicable in context of this package. Low priority though.

Some other parser combinator packages

更新日志

Changelog

Version 0.10.0

General

  • Targeting Node 20 and ES2020;
  • Adjustment of package.json exports;
  • Reorganized tests for easier maintenance;
  • Improved documentation.

core module

  • New functions takeN and takeMinMax;
  • New function filter and aliases guard and refine
    • can reject matches and narrow types based on a predicate/type guard;
  • takeWhile now can narrow the value type if provided a type guard;
  • satisfy now can narrow the token type if provided a type guard;
  • Renamed choice to first - to better reflect the order of evaluation
    • choice and or are still available as aliases for it;
  • New function last
    • tries provided parsers in reverse order
    • for convenience when converting a grammar that puts more specific alternatives after general ones;
  • Renamed map1 to mapR
    • old name provided as a deprecated alias and will be removed in the next minor/major version;
  • eitherOr is now a function and otherwise is an alias for it;
  • New alias between added for function middle;
  • New core types: TupleOf - for better type inference in takeN.

char module

  • Now char.charTest can have two arguments: positive and negative regular expressions
    • for convenience when defining character exclusions;
  • Better type inference in char.oneOf;
  • New char types: CharUnion, GraphemeUnion - for better type inference in oneOf.

Version 0.9.0

  • many functions got overloads for Matcher type propagation in less common scenarios;
  • condition function now accepts Parsers/Matchers with different value types, result value type is the union of the two;
  • added type tests for overloads using expect-type.

Version 0.8.0

  • Targeting Node.js version 14 and ES2020;
  • Now should be discoverable with denoify.

Version 0.7.0

  • otherwise function now has two overloads - Parser * Matcher -> Matcher and Parser * Parser -> Parser;
  • otherwise function now accepts Parsers/Matchers with different value types, result value type is the union of the two;
  • otherwise function now has an alias called eitherOr which might be more natural for combining parsers.

Version 0.6.0

  • ensure local imports have file extensions - fix "./core module cannot be found" issue.

Version 0.5.4

  • remove terser, source-map files;
  • use only rollup-plugin-cleanup to condition published files.

Version 0.5.3

  • source-map files;
  • minor documentation update.

Version 0.5.2

  • peek function keeps Parser/Matcher distinction;

Version 0.5.1

  • documentation updates;
  • package marked as free of side effects for tree shaking.

Version 0.5.0

  • Initial release;
  • Aiming at Node.js version 12 and up.