Parser combinators
-
Upload
lifecoder -
Category
Technology
-
view
3.418 -
download
2
description
Transcript of Parser combinators
Parser CombinatorsWriting external DSLs
Wednesday, November 9, 2011
DSLs
• DSL is an ultimate abstraction
• You create a language that precisely describes the problem you are trying to solve.
Wednesday, November 9, 2011
Internal & External
• DSL can be internal - made using the host programming language.
• Or external - you invent a little language for solving your problem.
Wednesday, November 9, 2011
Internal DSL: Good
• No external tool dependencies
• Nice IDE support
• Type checking for free
• Easy to implement
• Gives full power of your language
• Parsing is free
Wednesday, November 9, 2011
Internal DSL: Bad
• It requires expressive language
• If the language is not expressive enough DSL will be verbose and not pleasant to use
• It can produce obscure error messages
Wednesday, November 9, 2011
External DSL: Good
• Platform independent
• Nicer syntax
• External tool can perform static analysis
• Good error messages
Wednesday, November 9, 2011
External DSLs: Bad
• No IDE support
• Hard to implement
Wednesday, November 9, 2011
External DSLs
• External DSLs can be implemented using interpretation or code generation.
• Parse
• Analyze
• Interpret | Generate Code
Wednesday, November 9, 2011
parseJSON :: String -> Either Error JSONObject
Wednesday, November 9, 2011
JSON Grammar-like stuff
object := '{' fields? '}'fields := field (',' field)*field := id ':' valuevalue := literal | object | array
Wednesday, November 9, 2011
Bison (C)ANTLR (Java and many others)
You can encode grammar by hand
Wednesday, November 9, 2011
function parseObject() { match('{') if (lookahead(ID)) parseFields() match('}')}
Wednesday, November 9, 2011
function parseFields() { parseField() while (lookahead(',')) { consume() parseField() }}
Wednesday, November 9, 2011
• Parser can either consume some input or not.
• Parser can fail or succeed.
• Parser can return some value.
Wednesday, November 9, 2011
• Usually sequences are just sequential calls.
• Optional parts of grammar are just IFs
• Zero or more elements is WHILE
• Grammar rule is a function
Wednesday, November 9, 2011
function parseFields() { separatedBy(',', parseField)}
function separatedBy(ch, f) { f() while (lookahead(ch)) { consume() f() }}
Wednesday, November 9, 2011
function sepBy(ch, p) { return seq(p, many(seq(ch, p)))}
Wednesday, November 9, 2011
Combinator pattern
Wednesday, November 9, 2011
Haskell in 60 seconds
Wednesday, November 9, 2011
main = putStrLn "Hello World!"
times x n = x * ntimes = \x n -> x * ntimes = \x -> \n -> x * ntimes = (*)times 2 32 `times` 3data Point = Point Int Intdata Either a b = Left a | Right bPoint 10 20Left “Hello”
Wednesday, November 9, 2011
import Text.Parsec
parse :: Parser -> SourceName -> String -> Either ParseError a
Wednesday, November 9, 2011
p = satisfy (\x -> x == 'a')main = print (parse p "<test>" "abc")
> Right 'a'
p = satisfy (\x -> x == 'a')main = print (parse p "<test>" "bcd")
> Left "<test>" (line 1, column 1):> unexpected "b"
Wednesday, November 9, 2011
-- There is something like this in Parsecchar c = satisfy (\x -> x == c)
-- Examplep = char 'a'main = print (parse p "<test>" "bcd")
> Left "<test>" (line 1, column 1):> unexpected "b"> expecting "a"
Wednesday, November 9, 2011
char c = satisfy (==c) <?> show [c]
Wednesday, November 9, 2011
p = char 'a' >>= \x -> char 'b' >>= \y -> return [x, y]main = print (parse p "<test>" "abc")
> Right "ab"
Wednesday, November 9, 2011
p = do char 'a' char 'b' return "ab"main = print (parse p "<test>" "abc")
> Right "ab"
Wednesday, November 9, 2011
p = do char 'a' char 'b' <|> char 'q' char 'c' <|> char 'd'
main = print (parse p "<test>" "abc")
Wednesday, November 9, 2011
letter = satisfy isAlpha <?> "letter"digit = satisfy isDigit <?> "digit"spaces = skipMany space <?> "white space"
many p = ... censored ...
Wednesday, November 9, 2011
p = do letter many (letter <|> digit)main = print (parse p "<test>" "hello123")
> Right "ello123"
Wednesday, November 9, 2011
p = do x <- letter xs <- many (letter <|> digit) return (x:xs)main = print (parse p "<test>" "hello123")
> Right "hello123"
Wednesday, November 9, 2011
ident = do x <- letter xs <- many (letter <|> digit) return (x:xs)p = identmain = print (parse p "<test>" "123hello")
> Left "<test>" (line 1, column 1):> unexpected "1"> expecting letter
Wednesday, November 9, 2011
ident = do x <- letter xs <- many (letter <|> digit) return (x:xs)p = ident <?> "variable name"main = print (parse p "<test>" "123hello")
> Left "<test>" (line 1, column 1):> unexpected "1"> expecting variable name
Wednesday, November 9, 2011
ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name"p = ident main = print (parse p "<test>" "123hello")
Wednesday, November 9, 2011
ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name"letKeyword = string "let"p = ident <|> letKeywordmain = print (parse p "<test>" "letter")
> Right "letter"
Wednesday, November 9, 2011
ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name"letKeyword = do s <- string "let" notFollowedBy (letter <|> digit) spaces return sp = try(letKeyword) <|> identmain = print (parse p "<test>" "letter")-----Right "letter"
Wednesday, November 9, 2011
Lexer-like example
Wednesday, November 9, 2011
type Name = Stringdata Expr = Number Integer | Var Name | Let Name Expr | Seq [Expr] deriving Show
Wednesday, November 9, 2011
myDef = emptyDef { identStart = letter <|> char '_', identLetter = alphaNum <|> char '_', opStart = opLetter myDef, opLetter = oneOf "=,;", reservedOpNames = [ ",", ";", "=" ], reservedNames = ["let"], caseSensitive = True }
Wednesday, November 9, 2011
tokenParser = PT.makeTokenParser myDef
parens = PT.parens tokenParsernaturalNumber = PT.natural tokenParserkeyword = PT.reserved tokenParseridentifier = PT.identifier tokenParserop = PT.reservedOp tokenParsercommaSep = PT.commaSep tokenParsercommaSep1 = PT.commaSep1 tokenParser
Wednesday, November 9, 2011
simple = numberLiteral <|> var
numberLiteral = do n <- naturalNumber return $ Number n
var = do s <- identifier return $ Var s
Wednesday, November 9, 2011
letStmt = do keyword "let" defs <- commaSep1 def op ";" return $ Seq defs def = do name <- identifier op "=" value <- simple return $ Let name value
Wednesday, November 9, 2011
main = print (parse letStmt "<test>" "let one = 1, two = 2;"
)
----
Right (Seq [Let "one" (Number 1),Let "two" (Number 2)
])
Wednesday, November 9, 2011
CSV Parser Example
Wednesday, November 9, 2011
import Text.Parsec
csvFile = line `endBy1` eolline = cell `sepBy` (char ',')cell = wsopt >> many (noneOf ",\n")wsopt = many (oneOf " \t")eol = char '\n'main = print (parse csvFile "<test>" input) where input = "aa, b,c\nd,e,f\n"
> Right [["aa","b","c"],["d","e","f"]]
Wednesday, November 9, 2011
import scala.util.parsing.combinator._
object CSVParser extends RegexParsers { override def skipWhitespace = false def whitespace = """[ \t]*"""r def csv = rep1sep(row, "\n") <~ "\n" def row = rep1sep(field, ",") def field = opt(whitespace) ~> """[^\n,]""".r def parse(s: String) = parseAll(csv, s) }println(CSVParser.parse("a,b,c\nd,e, f\n"))
> [3.1] parsed: List(List(a, b, c), List(d, e, f))
Wednesday, November 9, 2011
Pysec for PythonSpirit for C++In standard library of ScalaA lot of others...
Wednesday, November 9, 2011
Stuff to read
• Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages by Terence Parr
Wednesday, November 9, 2011
Stuff to read
• Domain Specific Languages by Martin Fowler
Wednesday, November 9, 2011
Stuff to read
• http://learnyouahaskell.com/Learn You a Haskell for a Great Good
Wednesday, November 9, 2011
The End
Wednesday, November 9, 2011