210 lines
6.9 KiB
Markdown
210 lines
6.9 KiB
Markdown
|
|
|
|||
|
|
<p align="center">
|
|||
|
|
<img src="https://raw.github.com/pest-parser/pest/master/pest-logo.svg?sanitize=true" width="80%"/>
|
|||
|
|
</p>
|
|||
|
|
|
|||
|
|
# pest. The Elegant Parser
|
|||
|
|
|
|||
|
|
[](https://gitter.im/pest-parser/pest?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
|||
|
|
[](https://pest.rs/book)
|
|||
|
|
[](https://docs.rs/pest)
|
|||
|
|
|
|||
|
|
[](https://github.com/pest-parser/pest/actions/workflows/ci.yml)
|
|||
|
|
[](https://codecov.io/gh/pest-parser/pest)
|
|||
|
|
<a href="https://blog.rust-lang.org/2021/11/01/Rust-1.56.1.html"><img alt="Rustc Version 1.56.1+" src="https://img.shields.io/badge/rustc-1.56.1%2B-lightgrey.svg"/></a>
|
|||
|
|
|
|||
|
|
[](https://crates.io/crates/pest)
|
|||
|
|
[](https://crates.io/crates/pest)
|
|||
|
|
|
|||
|
|
pest is a general purpose parser written in Rust with a focus on accessibility,
|
|||
|
|
correctness, and performance. It uses parsing expression grammars
|
|||
|
|
(or [PEG]) as input, which are similar in spirit to regular expressions, but
|
|||
|
|
which offer the enhanced expressivity needed to parse complex languages.
|
|||
|
|
|
|||
|
|
[PEG]: https://en.wikipedia.org/wiki/Parsing_expression_grammar
|
|||
|
|
|
|||
|
|
## Getting started
|
|||
|
|
|
|||
|
|
The recommended way to start parsing with pest is to read the official [book].
|
|||
|
|
|
|||
|
|
Other helpful resources:
|
|||
|
|
|
|||
|
|
* API reference on [docs.rs]
|
|||
|
|
* play with grammars and share them on our [fiddle]
|
|||
|
|
* find previous common questions answered or ask questions on [GitHub Discussions]
|
|||
|
|
* leave feedback, ask questions, or greet us on [Gitter] or [Discord]
|
|||
|
|
|
|||
|
|
[book]: https://pest.rs/book
|
|||
|
|
[docs.rs]: https://docs.rs/pest
|
|||
|
|
[fiddle]: https://pest.rs/#editor
|
|||
|
|
[Gitter]: https://gitter.im/pest-parser/pest
|
|||
|
|
[Discord]: https://discord.gg/XEGACtWpT2
|
|||
|
|
[GitHub Discussions]: https://github.com/pest-parser/pest/discussions
|
|||
|
|
|
|||
|
|
## Example
|
|||
|
|
|
|||
|
|
The following is an example of a grammar for a list of alphanumeric identifiers
|
|||
|
|
where all identifiers don't start with a digit:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
alpha = { 'a'..'z' | 'A'..'Z' }
|
|||
|
|
digit = { '0'..'9' }
|
|||
|
|
|
|||
|
|
ident = { !digit ~ (alpha | digit)+ }
|
|||
|
|
|
|||
|
|
ident_list = _{ ident ~ (" " ~ ident)* }
|
|||
|
|
// ^
|
|||
|
|
// ident_list rule is silent which means it produces no tokens
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Grammars are saved in separate .pest files which are never mixed with procedural
|
|||
|
|
code. This results in an always up-to-date formalization of a language that is
|
|||
|
|
easy to read and maintain.
|
|||
|
|
|
|||
|
|
## Meaningful error reporting
|
|||
|
|
|
|||
|
|
Based on the grammar definition, the parser also includes automatic error
|
|||
|
|
reporting. For the example above, the input `"123"` will result in:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
thread 'main' panicked at ' --> 1:1
|
|||
|
|
|
|
|||
|
|
1 | 123
|
|||
|
|
| ^---
|
|||
|
|
|
|
|||
|
|
= unexpected digit', src/main.rs:12
|
|||
|
|
```
|
|||
|
|
while `"ab *"` will result in:
|
|||
|
|
```
|
|||
|
|
thread 'main' panicked at ' --> 1:1
|
|||
|
|
|
|
|||
|
|
1 | ab *
|
|||
|
|
| ^---
|
|||
|
|
|
|
|||
|
|
= expected ident', src/main.rs:12
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
These error messages can be obtained from their default `Display` implementation,
|
|||
|
|
e.g. `panic!("{}", parser_result.unwrap_err())` or `println!("{}", e)`.
|
|||
|
|
|
|||
|
|
## Pairs API
|
|||
|
|
|
|||
|
|
The grammar can be used to derive a `Parser` implementation automatically.
|
|||
|
|
Parsing returns an iterator of nested token pairs:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
extern crate pest;
|
|||
|
|
#[macro_use]
|
|||
|
|
extern crate pest_derive;
|
|||
|
|
|
|||
|
|
use pest::Parser;
|
|||
|
|
|
|||
|
|
#[derive(Parser)]
|
|||
|
|
#[grammar = "ident.pest"]
|
|||
|
|
struct IdentParser;
|
|||
|
|
|
|||
|
|
fn main() {
|
|||
|
|
let pairs = IdentParser::parse(Rule::ident_list, "a1 b2").unwrap_or_else(|e| panic!("{}", e));
|
|||
|
|
|
|||
|
|
// Because ident_list is silent, the iterator will contain idents
|
|||
|
|
for pair in pairs {
|
|||
|
|
// A pair is a combination of the rule which matched and a span of input
|
|||
|
|
println!("Rule: {:?}", pair.as_rule());
|
|||
|
|
println!("Span: {:?}", pair.as_span());
|
|||
|
|
println!("Text: {}", pair.as_str());
|
|||
|
|
|
|||
|
|
// A pair can be converted to an iterator of the tokens which make it up:
|
|||
|
|
for inner_pair in pair.into_inner() {
|
|||
|
|
match inner_pair.as_rule() {
|
|||
|
|
Rule::alpha => println!("Letter: {}", inner_pair.as_str()),
|
|||
|
|
Rule::digit => println!("Digit: {}", inner_pair.as_str()),
|
|||
|
|
_ => unreachable!()
|
|||
|
|
};
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This produces the following output:
|
|||
|
|
```
|
|||
|
|
Rule: ident
|
|||
|
|
Span: Span { start: 0, end: 2 }
|
|||
|
|
Text: a1
|
|||
|
|
Letter: a
|
|||
|
|
Digit: 1
|
|||
|
|
Rule: ident
|
|||
|
|
Span: Span { start: 3, end: 5 }
|
|||
|
|
Text: b2
|
|||
|
|
Letter: b
|
|||
|
|
Digit: 2
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Defining multiple parsers in a single file
|
|||
|
|
The current automatic `Parser` derivation will produce the `Rule` enum
|
|||
|
|
which would have name conflicts if one tried to define multiple such structs
|
|||
|
|
that automatically derive `Parser`. One possible way around it is to put each
|
|||
|
|
parser struct in a separate namespace:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
mod a {
|
|||
|
|
#[derive(Parser)]
|
|||
|
|
#[grammar = "a.pest"]
|
|||
|
|
pub struct ParserA;
|
|||
|
|
}
|
|||
|
|
mod b {
|
|||
|
|
#[derive(Parser)]
|
|||
|
|
#[grammar = "b.pest"]
|
|||
|
|
pub struct ParserB;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Other features
|
|||
|
|
|
|||
|
|
* Precedence climbing
|
|||
|
|
* Input handling
|
|||
|
|
* Custom errors
|
|||
|
|
* Runs on stable Rust
|
|||
|
|
|
|||
|
|
## Projects using pest
|
|||
|
|
|
|||
|
|
* [pest_meta](https://github.com/pest-parser/pest/blob/master/meta/src/grammar.pest) (bootstrapped)
|
|||
|
|
* [AshPaper](https://github.com/shnewto/ashpaper)
|
|||
|
|
* [brain](https://github.com/brain-lang/brain)
|
|||
|
|
* [cicada](https://github.com/mitnk/cicada)
|
|||
|
|
* [comrak](https://github.com/kivikakk/comrak)
|
|||
|
|
* [elastic-rs](https://github.com/cch123/elastic-rs)
|
|||
|
|
* [graphql-parser](https://github.com/Keats/graphql-parser)
|
|||
|
|
* [handlebars-rust](https://github.com/sunng87/handlebars-rust)
|
|||
|
|
* [hexdino](https://github.com/Luz/hexdino)
|
|||
|
|
* [Huia](https://gitlab.com/jimsy/huia/)
|
|||
|
|
* [insta](https://github.com/mitsuhiko/insta)
|
|||
|
|
* [jql](https://github.com/yamafaktory/jql)
|
|||
|
|
* [json5-rs](https://github.com/callum-oakley/json5-rs)
|
|||
|
|
* [mt940](https://github.com/svenstaro/mt940-rs)
|
|||
|
|
* [Myoxine](https://github.com/d3bate/myoxine)
|
|||
|
|
* [py_literal](https://github.com/jturner314/py_literal)
|
|||
|
|
* [rouler](https://github.com/jarcane/rouler)
|
|||
|
|
* [RuSh](https://github.com/lwandrebeck/RuSh)
|
|||
|
|
* [rs_pbrt](https://github.com/wahn/rs_pbrt)
|
|||
|
|
* [stache](https://github.com/dgraham/stache)
|
|||
|
|
* [tera](https://github.com/Keats/tera)
|
|||
|
|
* [ui_gen](https://github.com/emoon/ui_gen)
|
|||
|
|
* [ukhasnet-parser](https://github.com/adamgreig/ukhasnet-parser)
|
|||
|
|
* [ZoKrates](https://github.com/ZoKrates/ZoKrates)
|
|||
|
|
* [Vector](https://github.com/timberio/vector)
|
|||
|
|
* [AutoCorrect](https://github.com/huacnlee/autocorrect)
|
|||
|
|
* [yaml-peg](https://github.com/aofdev/yaml-peg)
|
|||
|
|
* [qubit](https://github.com/abhimanyu003/qubit)
|
|||
|
|
* [caith](https://github.com/Geobert/caith) (a dice roller crate)
|
|||
|
|
* [Melody](https://github.com/yoav-lavi/melody)
|
|||
|
|
|
|||
|
|
## Minimum Supported Rust Version (MSRV)
|
|||
|
|
|
|||
|
|
This library should always compile with default features on **Rust 1.56.1**
|
|||
|
|
or **Rust 1.61** with `const_prec_climber`.
|
|||
|
|
|
|||
|
|
## Special thanks
|
|||
|
|
|
|||
|
|
A special round of applause goes to prof. Marius Minea for his guidance and all
|
|||
|
|
pest contributors, some of which being none other than my friends.
|