About Tokens and Lexemes
-
Upload
ben-scholzen -
Category
Technology
-
view
9.234 -
download
0
description
Transcript of About Tokens and Lexemes
- 1. About tokens and lexemes Ben Scholzen Game Developer Gameforge Productions GmbH
2. What we'll cover
- Definition of a compiler, tokenizer and parser
3. Basic structure of a tokenizer and a parser 4. Where to optimize things for PHP 5. What about parser generators? 6. They are evil!
- PHP_LexerGenerator, PHP_ParserGenerator, lemon-PHP
7. Create lots of function calls like lemon parsers in C 8. Are not working very performance-wise 9. Will eat up all your memory 10. Conclusion
- Don't use them!
11. Let's get started 12. What a compiler is and how it works
- Acts as frontend for the application
13. Converts human-readable data into machine-readable data 14. Consists of a two components:
- The lexer:
- Is a finite-state-machine
15. Reads the input stream 16. Clears up the input data 17. Creates a list of tokens The parser:
- Gets tokens from the tokenizer
18. Converts them into a data structure 19. What a compiler is and how it works Lexer Parser TokensDocument Stream Structure 20. Sounds great, but where do I need it?
- Formatting languages
- BB-Code
21. Wiki-Codes Description languages
- iCalendar / vCalendar
22. XML Even programming languages
- JavaScript
23. PHP Anything else you want your program to understand 24. The lexer (or tokenizer) 25. What are tokens?
- Categorized block of text
- Token type
26. Corresponding block of text (lexeme) List of tokens represents an entire document 27. Example in PHP:$value = 5 * 7 ; 28. How the tokenizer works
- Define possible states of the lexer
29. Tokenize the input in a loop
- Scan with preg_match()
- Strtok() is mostly too simple
30. Reading char-by-char is too slow 31. Use the offset parameter 32. Use the G assertion (^ won't work) Always store the current position 33. Use either a switch-statement or a structured arrayReturn the tokens 34. What we can optimize
- Use little memory
- Always just read a partial part of the document into memory
- Via fopen() and fgets()
35. Requires previous knowledge about when tokens end Offer a method for the parser to get a partial bunch of tokens Speed up execution-time
- Do no internal function-calls if applicable
36. Going into practice 37. The beginning
- Use little memory
- Via fopen() and fread()
- Requires previous knowledge about when tokens end
38. Offer a method for the parser to get a partial bunch of tokens Speed up execution-time Do no internal function-calls if applicable 39. Throwing in a file 40. Preparing stuff 41. Base state 42. Operator state 43. Value state 44. Rounding it up 45. Some actual testing 46. And what we get
- array(6) {
47. [0]=> 48. array(2) { 49. [0]=> 50. string(8) "variable" 51. [1]=> 52. string(6) "$value" 53. } 54. [1]=> 55. array(2) { 56. [0]=> 57. string(8) "operator" 58. [1]=> 59. string(1) "=" 60. } 61. [2]=> 62. array(2) { 63. [0]=> 64. string(6) "number" 65. [1]=> 66. string(1) "5" 67. }
- [3]=>
68. array(2) { 69. [0]=> 70. string(8) "operator" 71. [1]=> 72. string(1) "*" 73. } 74. [4]=> 75. array(2) { 76. [0]=> 77. string(6) "number" 78. [1]=> 79. string(1) "7" 80. } 81. [5]=> 82. array(2) { 83. [0]=> 84. string(8) "operator" 85. [1]=> 86. string(1) ";" 87. } 88. } 89. The parser 90. So we have a bunch of tokens, what now?
- Loop through the tokens and analyze them
91. Create an object-oriented tree-structure or interpret 92. Avoid non-tail recursion
- Use tail-recursion (trampoline) instead
93. Saves you from hitting the stack limit That's it! 94. Summary Questions? 95. Where to go from here
- Wikipedia: http://en.wikipedia.org/wiki/Compiler http://en.wikipedia.org/wiki/Parsing
96. About tail-recursion in PHP: http://www.alternateinterior.com/2006/09/tail-recursion-in-php.html 97. My blog: http://www.dasprids.de 98. Rate this talk: http://joind.in/635 99. Follow me on twitter: 100. http://www.twitter.com/dasprid 101. Thank you!