Posted on :: Tags: , :: Source Code

Okay, so by now, I've gotten an implementation of the fixed-point parser that we are researching that I feel good about. The implementation, found here, includes a Parser_base module that provides the very basics of the parser framework, a Combinators module that adds on a lot of useful combinators, and a Strings module that helps when dealing with strings.

In this post, I want to go over the essentials of the fixed-point parser framework. This includes one parser combinator (choice) and two other essential functions: memo for memoizing parsers (this solves left recursion problems); no_results for indicating that a parser returns no results; and run for running a parser. Together, these ensure that the user of the library don't need to know anything about the state.

I will also explain two other combinators that make this base more user-friendly, sequence and unit, and one function that allows for better debugging, run_raw.

NOTE: though the types have changed slightly since I wrote about them, [my post about fixed-point parser types] will give you a good idea about the types that are being used here. If you want to understand the exact implementation, reference the code itself for now

The choice combinator

let choice (parser1 : 'a parser) (parser2 : 'a parser) : 'a parser =
  fun (idx, callback) ->
    fun state ->
      state
      |> parser1 (idx, callback)
      |> parser2 (idx, callback)

The choice combinator takes two parsers, parser1 and parser2, and it returns a parser that tries both parsers. Note that this is unordered choice. In other words, the parser will try parser2 even if parser1 succeeds.

This combinator works by simply returning a state_transformer that runs the initial state through the state_transformer produced by running parser1, then running the resulting state through the state_transformer produced by running parser2.

The memo function

let memo ~(tag : tag) (parser : cached_value parser) : cached_value parser =
   fun (idx, callback) ->
    let loc : parser_position = (tag, idx) in
    fun state ->
      let (State state) = state in
      match Hashtbl.find state loc with
      | Some (parser_outputs, callbacks) ->
          let new_callbacks = callback :: callbacks in
          Hashtbl.set state ~key:loc ~data:(parser_outputs, new_callbacks) ;
          Set.fold parser_outputs ~init:(State state)
            ~f:(fun state parser_output ->
              let state_transformer = callback parser_output in
              state_transformer state )
      | None ->
          let new_callback : cached_value parser_callback =
           fun parser_output ->
            fun state ->
             let (State state) = state in
             match Hashtbl.find state loc with
             | Some (parser_outputs, callbacks) ->
                 if Set.mem parser_outputs parser_output then State state
                 else
                   let new_parser_outputs =
                     Set.add parser_outputs parser_output
                   in
                   Hashtbl.set state ~key:loc
                     ~data:(new_parser_outputs, callbacks) ;
                   List.fold_left callbacks ~init:(State state)
                     ~f:(fun state callback ->
                       let state_transformer = callback parser_output in
                       state_transformer state )
             | None ->
                 failwith "unreachable"
          in
          Hashtbl.set state ~key:loc
            ~data:(CachedParserOutputSet.empty, [callback]) ;
          parser (idx, new_callback) (State state)