Crate combine[−][src]
This crate contains parser combinators, roughly based on the Haskell libraries parsec and attoparsec.
A parser in this library can be described as a function which takes some input and if it
is successful, returns a value together with the remaining input.
A parser combinator is a function which takes one or more parsers and returns a new parser.
For instance the many
parser can be used to convert a parser for single digits into one that
parses multiple digits. By modeling parsers in this way it becomes easy to compose complex
parsers in an almost declarative way.
Overview
combine
limits itself to creating LL(1) parsers
(it is possible to opt-in to LL(k) parsing using the attempt
combinator) which makes the
parsers easy to reason about in both function and performance while sacrificing
some generality. In addition to you being able to reason better about the parsers you
construct combine
the library also takes the knowledge of being an LL parser and uses it to
automatically construct good error messages.
extern crate combine; use combine::Parser; use combine::stream::state::State; use combine::parser::char::{digit, letter}; const MSG: &'static str = r#"Parse error at line: 1, column: 1 Unexpected `|` Expected `digit` or `letter` "#; fn main() { // Wrapping a `&str` with `State` provides automatic line and column tracking. If `State` // was not used the positions would instead only be pointers into the `&str` if let Err(err) = digit().or(letter()).easy_parse(State::new("|")) { assert_eq!(MSG, format!("{}", err)); } }
This library is currently split into a few core modules:
-
parser
is where you will find all the parsers that combine provides. It contains the coreParser
trait as well as several submodules such assequence
orchoice
which each contain several parsers aimed at a specific niche. -
stream
contains the second most important trait next toParser
. Streams represent the data source which is being parsed such as&[u8]
,&str
or iterators. -
easy
contains combine’s default “easy” error and stream handling. If you use theeasy_parse
method to start your parsing these are the types that are used. -
error
contains the types and traits that make up combine’s error handling. Unless you need to customize the errors your parsers return you should not need to use this module much.
Examples
extern crate combine; use combine::parser::char::{spaces, digit, char}; use combine::{many1, sep_by, Parser}; use combine::stream::easy; fn main() { //Parse spaces first and use the with method to only keep the result of the next parser let integer = spaces() //parse a string of digits into an i32 .with(many1(digit()).map(|string: String| string.parse::<i32>().unwrap())); //Parse integers separated by commas, skipping whitespace let mut integer_list = sep_by(integer, spaces().skip(char(','))); //Call parse with the input to execute the parser let input = "1234, 45,78"; let result: Result<(Vec<i32>, &str), easy::ParseError<&str>> = integer_list.easy_parse(input); match result { Ok((value, _remaining_input)) => println!("{:?}", value), Err(err) => println!("{}", err) } }
If we need a parser that is mutually recursive or if we want to export a reusable parser the
parser!
macro can be used. In effect it makes it possible to return a parser without naming
the type of the parser (which can be very large due to combine’s trait based approach). While
it is possible to do avoid naming the type without the macro those solutions require either allocation
(Box<Parser<Input = I, Output = O, PartialState = P>>
) or nightly rust via impl Trait
. The
macro thus threads the needle and makes it possible to have non-allocating, anonymous parsers
on stable rust.
#[macro_use] extern crate combine; use combine::parser::char::{char, letter, spaces}; use combine::{between, choice, many1, parser, sep_by, Parser}; use combine::error::{ParseError, ParseResult}; use combine::stream::{Stream, Positioned}; use combine::stream::state::State; #[derive(Debug, PartialEq)] pub enum Expr { Id(String), Array(Vec<Expr>), Pair(Box<Expr>, Box<Expr>) } // `impl Parser` can be used to create reusable parsers with zero overhead fn expr_<I>() -> impl Parser<Input = I, Output = Expr> where I: Stream<Item = char>, // Necessary due to rust-lang/rust#24159 I::Error: ParseError<I::Item, I::Range, I::Position>, { let word = many1(letter()); // A parser which skips past whitespace. // Since we aren't interested in knowing that our expression parser // could have accepted additional whitespace between the tokens we also silence the error. let skip_spaces = || spaces().silent(); //Creates a parser which parses a char and skips any trailing whitespace let lex_char = |c| char(c).skip(skip_spaces()); let comma_list = sep_by(expr(), lex_char(',')); let array = between(lex_char('['), lex_char(']'), comma_list); //We can use tuples to run several parsers in sequence //The resulting type is a tuple containing each parsers output let pair = (lex_char('('), expr(), lex_char(','), expr(), lex_char(')')) .map(|t| Expr::Pair(Box::new(t.1), Box::new(t.3))); choice(( word.map(Expr::Id), array.map(Expr::Array), pair, )) .skip(skip_spaces()) } // As this expression parser needs to be able to call itself recursively `impl Parser` can't // be used on its own as that would cause an infinitely large type. We can avoid this by using // the `parser!` macro which erases the inner type and the size of that type entirely which // lets it be used recursively. // // (This macro does not use `impl Trait` which means it can be used in rust < 1.26 as well to // emulate `impl Parser`) parser!{ fn expr[I]()(I) -> Expr where [I: Stream<Item = char>] { expr_() } } fn main() { let result = expr() .parse("[[], (hello, world), [rust]]"); let expr = Expr::Array(vec![ Expr::Array(Vec::new()) , Expr::Pair(Box::new(Expr::Id("hello".to_string())), Box::new(Expr::Id("world".to_string()))) , Expr::Array(vec![Expr::Id("rust".to_string())]) ]); assert_eq!(result, Ok((expr, ""))); }
Re-exports
pub extern crate byteorder; |
pub extern crate either; |
Modules
easy | Stream wrapper which provides an informative and easy to use error type. |
error | Error types and traits which define what kind of errors combine parsers may emit |
parser | A collection of both concrete parsers as well as parser combinators. |
stream | Traits and implementations of arbitrary data streams. |
Macros
choice | Takes a number of parsers and tries to apply them each in order. Fails if all the parsers fails or if an applied parser consumes input before failing. |
opaque | Convenience macro over |
parser | Declares a named parser which can easily be reused. |
struct_parser | Sequences multiple parsers and builds a struct out of them. |
Traits
ParseError | Trait which defines a combine parse error. |
Parser | By implementing the |
Positioned | A type which has a position. |
RangeStream | A |
RangeStreamOnce | A |
Stream | A stream of tokens which can be duplicated |
StreamOnce |
|
Functions
any | Parses any token. |
attempt |
|
between | Parses |
chainl1 | Parses |
chainr1 | Parses |
choice | Takes a tuple, a slice or an array of parsers and tries to apply them each in order. Fails if all the parsers fails or if an applied parser consumes input before failing. |
count | Parses |
count_min_max | Parses |
env_parser | Constructs a parser out of an environment and a function which needs the given environment to do the parsing. This is commonly useful to allow multiple parsers to share some environment while still allowing the parsers to be written in separate functions. |
eof | Succeeds only if the stream is at end of input, fails otherwise. |
from_str | Takes a parser that outputs a string like value ( |
look_ahead |
|
many | Parses |
many1 | Parses |
none_of | Extract one token and succeeds if it is not part of |
not_followed_by | Succeeds only if |
one_of | Extract one token and succeeds if it is part of |
optional | Parses |
parser | Wraps a function, turning it into a parser. |
position | Parser which just returns the current position in the stream. |
satisfy | Parses a token and succeeds depending on the result of |
satisfy_map | Parses a token and passes it to |
sep_by | Parses |
sep_by1 | Parses |
sep_end_by | Parses |
sep_end_by1 | Parses |
skip_count | Parses |
skip_count_min_max | Parses |
skip_many | Parses |
skip_many1 | Parses |
token | Parses a character and succeeds if the character is equal to |
tokens | Parses multiple tokens. |
tokens2 | Parses multiple tokens. |
try | Deprecated
|
unexpected | Always fails with |
unexpected_any | Always fails with |
value | Always returns the value |
Type Definitions
ConsumedResult | A |
ParseResult | A type alias over the specific |