神刀安全网

Parser Combinators in Swift

About the Speaker: Yasuhiro Inami

Yasuhiro is an iOS developer at LINE Corporation. While creating iPhone apps such as messenger, camera, news app in his work, he also spends time on making open source projects like ReactKit and SwiftTask. He is a big fan of Apple, Swift, and Hearthstone. You can find him at Battle.net or GitHub .

@inamiy

My name is Yasihiro Inami; I work at the LINE Corporation, a messenger app company. It is an honor to be a speaker at the try! Swift conference, with great developers from all over the world. In this talk I present one of the most fantastic techniques in functional programming: parser combinator.

Parser is an analyzer that takes a string as an input and creates a syntax tree. It consists of two parts: the lexer (takes a long string and converts it into small chunks, called tokens), and the syntactic analyzer (takes the tokens and creates a Syntax Tree).

When we do parsing, there are mainly two approaches: bottom-up and top-down parsing.

For example, there is a string in the bottom A = B + C * 2 ; D = 1 . How do we parse this, and in what order?

In bottom-up parsing, we take a closer look at the bottom tree first. We keep reading the text, we can expand the tree and, lastly, we reach the top. The parsing starts from the bottom to the top.

In top-down parsing, you read the partial text and, at some point (a bit earlier than bottom-up parsing), the top-down parser starts to predict what tree should come at the top. And it keeps guessing down to the bottom.

The parser combinator is a recourse in this parser, and it is a top-down parser. If you have ever read the Apple Swift Code before, the code is in leaf parse, and it also uses the same technique in C++. This is known as DSL imperative way.

This is called Combinator logic:

I = λx.x = { x in x } K = λxy.x = { x, y in x } S = λxyz.xz(yz)   = { x, y, z in x(z)(y(z))} Y = S(K(SII))(S(S(KS)K)(K(SII))) ι = λx.xSK

For now, think of it as a function ( closures , in Swift). They can be combined together to form more interesting characteristics.

Parser combinator is a higher-order function that combines multiple parsers to form one big parser. Here ( see slide ) there are two parsers, the inside parsers, collaborating with each other.

Simple Parser in Swift

How do we implement this in our first language, Swift?

struct Parser<A> {     let parse: String -> (A, String)? }

We prepare a simple struct called Parser (this is a generic Parser<A> ). It is a container of the closure ( let parse ). It takes String as input, and, in this case, the output is the tuple. The tuple consists of “output” and “remaining input”. Because the parsing sometimes fails, I use the Optional.

This structure is called state monad, and it may combine the monad or optional monad together. From here is monad’s showtime. It is not difficult; relax.

Applicative pure, alternative empty

// Applicative pure func pure<A>(a: A) -> Parser<A> {     return Parser { (a, $0) } }  // Alternative empty func empty<A>() -> Parser<A> {     return Parser { _ in nil } }

The pure is instantiating the parser. It parses the closure later. Inside that closure, you regain the tuple, which means this parser is always successful returning the variable <A> , and it does not consume the input. For the empty , the nil is returned. That means this parser always fails.

Monad bind, functor fmap

// Monad bind (>>=) func >>- <A, B>(p: Parser<A>, f: A -> Parser<B>) -> Parser<B> {     return Parser { input in         if let (result, input2) = p.parse(input) {             return f(result).parse(input2)         } else {             return nil         }     } }  // Functor fmap (<$>) func <^> <A, B>(f: A -> B, p: Parser<A>) -> Parser<B> {     return p >>- { a in pure(f(a)) } }

First, you receive the input, you see the closure, and the input is passed as an argument. You parse that with the given parser p . If that parsing succeeds: you get a tuple of it, you extract the result part, and apply f . And that f is the result of the second parser. You use that parser for the main input, in this case, input2 . Otherwise, if the first parser fails in the first place, you would just return nil . That means the whole parser fails (because the first parsing fails).

Once you have this fmap operator, you can immediately get the functor fmap . In Swift, we do not use $ , instead we use a ^ sign (we cannot use $ for the custom operator).

Alternative choice operator

// Alternative choice (associative operation) func <|> <A>(p: Parser<A>, q: Parser<A>) -> Parser<A> {     return Parser { input in         if let (result, input2) = p.parse(input) {             return (result, input2)         } else {             return q.parse(input)         }     } }

You have a parser p and q , you try parser p first. If that parsing succeeds, then finish, and you return that tuple. If it fails, you try q instead.

Applicative sequential application

I would like to mention three important sequential operations:

// Applicative sequential application func <*> <A, B>(p: Parser<A -> B>, q: Parser<A>) -> Parser<B> {     return p >>- { f in f <^> q } }  // Sequence actions, discarding the value of the second argument func <* <A, B>(p: Parser<A>, q: Parser<B>) -> Parser<A> {     return const <^> p <*> q }  // Sequence actions, discarding the value of the first argument func *> <A, B>(p: Parser<A>, q: Parser<B>) -> Parser<B> {     return const(id) <^> p <*> q }

  • The first one is a function application for monadic structure.

  • The second one is a sequence action ( it looks like an arrow pointing to the left ). It tries parser p then q ; it uses both. But only uses the left side of the output of p ; it discards the output of q .

  • The last of our operators is the opposite. We try parser p then q . It is the same as before, but it only uses the output of q (not p ).

func satisfy(predicate: Character -> Bool) -> Parser<Character> {     return Parser { input in         if let (head, tail) = uncons(input) where predicate(head) {             return (tail, head)         } else {             return nil         }     } }

Here is the satisfy parser I implemented (it is fundamental). It examines the input. You decompose it using the uncons function, it is the head and tail . Examining the head part with a predicate: if that predicate succeeds, you get the decomposed tuple; otherwise, the poor parser fails.

You can also create any() parser (can parse any character), or digit() (can only parse the numeric number, and numeric character, or some specific character), and char(c: Character) parser. It is readable, simple:

func any() -> Parser<Character> {     return satisfy(const(true)) }  func digit() -> Parser<Character> {     return satisfy { "0"..."9" ~= $0 } }  func char(c: Character) -> Parser<Character> {     return satisfy { $0 == c } }

In short, this is parsing the specific string. Otherwise it fails.

func string(str: String) -> Parser<String> {     if let (head, tail) = uncons(str) {         return char(head) *> string(tail) *> pure(str)     } else {         return pure("")     } }

Here are some important parsers called many and many1 . The parser p is given, and they use that parser p many times. The first many parser uses the parser p from zero occurrence to many occurrences, and it never fails. The many1 parser must be able to parse at least one occurrence of p , otherwise it fails. They are mutually recursive.

func many<A>(p: Parser<A>) -> Parser<[A]> {     return many1(p) <|> pure([]) }  func many1<A>(p: Parser<A>) -> Parser<[A]> {     return cons <^> p <*> many(p) }

Here are skipping parsers: skipMany and skipMany1 . It looks similarly to the previous many and many1 , but notice that the result part is now an empty void, an empty tuple (meaningless output).

It seems similar to using this parser for the grounds, but it is more efficient, it has better performance than many and many1 . And we can implement important particles, skipSpaces . As you can see, the implementation skipSpaces() is common when we want to parse things. We want to skip spaces, but we do not care whether it is top or line with feet.

func skipMany<A>(p: Parser<A>) -> Parser<()> {     return skipMany1(p) <|> pure(()) }  func skipMany1<A>(p: Parser<A>) -> Parser<()> {     return p *> skipMany(p) }  func skipSpaces() -> Parser<()> {     return skipMany(space) }

Let’s add symbol(str: String) and natural() parsers. You might notice that there is a middle parser in the middle that is sandwiched :bread: by skipSpaces() parser. With applicative, I do not like operators pointing this direction. It first skips the head white space, then uprising middle parser, and lastly, skips the tail space. But it only uses the output of the middle parser. It does not care about the spaces, it only cares about the string ( like a hamburger! :hamburger: ). For the simple parser, the relevant ( delicious ) part ( meat ) is a string parser. For natural parser, the relevant part is many1(digit())) (again, like a hamburger! ), which converts the result party from string to integer. Only three lines, sometimes one line of code.

func symbol(str: String) -> Parser<String> {     return skipSpaces() *> string(str) <* skipSpaces() } func natural() -> Parser<Int> {     return skipSpaces() *>     ({ Int(String($0))! } <^> many1(digit()))     <* skipSpaces() }

Let’s use the + and * signs, parentheses, and natural numbers to make a simple calculator.

The mathematical expression below is called Backus–Naur Form (BNF):

<expr> ::= <expr> + <term> | <term> <term> ::= <factor> * <term> | <factor> <factor> ::= ( <expr> ) | <number> <number> ::= '0' | '1' | '2' | ...

It is common when you do parsing. In Swift, our first language, we implement the following:

func expr() -> Parser<Int> {     return term() >>- { t in // uses right recursion         (symbol("+") *> expr() >>- { e in pure(t + e) }) <|> pure(t)     } }  func term() -> Parser<Int> {     return factor() >>- { f in         (symbol("*") *> term() >>- { t in pure(f * t) }) <|> pure(f)     } }  func factor() -> Parser<Int> {     return (symbol("(") *> expr() <* symbol(")")) <|> natural() }

I used the right recursion, but you can write it in almost one line. Less than three or maybe some more lines. A bit weird, but it is still simple. Here goes the calculation:

let (ans, _) = expr().parse(" ( 12 + 3 )    * 4+5")! expect(ans) == 65

12 + 3, save on space plus 4+5 is expected to be equal to 65… and it works! :muscle:

For those who want more on parsing, I developed TryParsec . It is a monadic parser combinator for this try! Swift conference (nowhere else!). It is inspired by Haskell, and especially the Attoparsec and Aeson libraries. It supports CSV, XML, and JSON. Technically, it can parse LL(*) , with backtracking by default. This library is not using the do track catch in Swift. This Tryparsec… does not actually try anything, it does not use any try keywords, but please try it ( because this is the try! Swift conference ). This library is not open source yet (I will, right after this conference).

Here are some functions of TryParsec:

  • Basic Operators: >>- , <^> , <*> , *> , <* , <|> , <?>

  • Combinators: many , many1 , manyTill , zeroOrOne , skipMany , skipMany1 , sepBy , sepBy1 , sepEndBy , sepEndBy1 , count , chainl , chainl1 , chainr , chainr1

  • Text ( UnicodeScalarView ): peek , endOfInput , satisfy , skip , skipWhile , take , takeWhile , any , char , not , string , asciiCI , oneOf , noneOf , space , skipSpaces , number … (etc)

Let’s see how this TryParsec works with JSON.

enum JSON

enum JSON {     case String(Swift.String)     case Number(Double)     case Bool(Swift.Bool)     case Null     case Array([JSON])     case Object([Swift.String : JSON]) }

In TryParsec, the enum JSON is already built in (common, basic type). Let’s say we have a JSON String, how do we parse? All we need to do is call parseJSON , and you immediately get the JSON enum tree. Simple, easy.

{     "string": "hello",     "num": 123.45,     "bool": true,     "null": null,     "array": ["hello", 9.99, false],     "dict": { "key": "value" },     "object": { "enabled": true } }

let json = parseJSON(jsonString).value print(json)

In this process, I do not use any NSJSONSerialization , or any object that is not involved in this process. It is all written in pure Swift. But we are not interested in this structure. This is an intermediate representation. We want our custom model to be mapped to it. We need JSON decoding and encoding features ( which we all love :heart: ), and Swift has about 20-30 libraries for doing so. Fortunately, TryParsec, is the 31st JSON decoding and encoding library. :100:

struct Model

We saw the JSON string example, and we have model mapped this one. In the bottom I added a let dummy . The JSON string does not have this field, but I added it.

struct Model {     let string: String     let num: Double     let bool: Bool     let null: Any?     let array: [Any]     let dict: [String : Any]     let subModel: SubModel     let dummy: Bool?  // doesn't exist in raw JSON }

FromJSON protocol

You have to implement the conformance to the FromJSON protocol. You implement the static func fromJSON , and you use the helper message code fromJSONObject . This is common writing code applicative style. If you have ever tried the JSON library called Argo, TryParsec uses the same technique. Call function decode, and you get the mapped result.

extension Model: FromJSON {     static func fromJSON(json: JSON) -> Result<Model, JSON.ParseError> {         return fromJSONObject(json) {             curry(self.init)                 <^> $0 !! "string"                 <*> $0 !! "num"                 <*> $0 !! "bool"                 <*> $0 !! "null"                 <*> $0 !! "array"                 <*> $0 !! "dict"                 <*> $0 !! "object" // mapping to SubModel                 <*> $0 !? "dummy"  // doesn't exist in raw JSON         }     } }

… simple.

ToJSON protocol

For JSON encoding, conform to the ToJSON protocol, use the function toJSONObject , and pass the array of the key-value pairs.

extension Model: ToJSON {     static func toJSON(model: Model) -> JSON {         return toJSONObject([             "string" ~ model.string,             "num" ~ model.num,             "bool" ~ model.bool,             "null" ~ model.null,             "array" ~ model.array,             "dict" ~ model.dict,             "subModel" ~ model.subModel         ])     } }

Encode to JSON string

Call the function encode , and you immediately get back the jsonString .

let jsonString = encode(model) print(jsonString)

TryParsec supports JSON, CSV and XML. It is simple, readable, and easy to create your own parsers ( please try it ). I also have the Xcode playground ( you can also play around with it ).

But let me point out some caveats:

  1. It does not have good performance yet (compared to NSJSONSerialization ).
  2. FromJSON / ToJSON protocols (the mapping) are limited. Any library (Argo and other great libraries) has limitations… because Swift is currently limited. FromJSON / ToJSON does not work in some nested structures.

Hopefully, Swift 3 will solve these problems. Swift is now open source, it is evolving every day, and we have a great community.

Please check out my library, and help me improve the code base.

Q: Cool stuff. I have two questions, you can pick which one you want to answer. The first one is, is there any support for error? If the parsing fails can you annotate your parser combinators to say, “if this part fails, maybe I am going to throw this error here”. The second question is, how did you do the ToJSON , do you have a printer combinator library?

Yasuhiro: I will answer the second question first. There is no printed JSON yet. I think I need to implement that. Now it is just one line JSON string. And the second one, is an error porting is important. But I did not implement it, because the performance is already bad at this point. I need to improve that first, and then hit our importing. Sorry about that.

See the discussion on Hacker News .

Get new videos & tutorials — we won’t email you for any other reason, ever.

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Parser Combinators in Swift

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址