- Introduction
- Chapter 1 Values, Types, and Operators
- Chapter 2 Program Structure
- Expressions and statements
- Variables
- Keywords and reserved words
- The environment
- Functions
- The console.log function
- Return values
- prompt and confirm
- Control flow
- Conditional execution
- while and do loops
- Indenting Code
- for loops
- Breaking Out of a Loop
- Updating variables succinctly
- Dispatching on a value with switch
- Capitalization
- Comments
- Summary
- Exercises
- Chapter 3 Functions
- Chapter 4 Data Structures: Objects and Arrays
- Chapter 5 Higher-Order Functions
- Chapter 6 The Secret Life of Objects
- Chapter 7 Project: Electronic Life
- Chapter 8 Bugs and Error Handling
- Chapter 9 Regular Expressions
- Creating a regular expression
- Testing for matches
- Matching a set of characters
- Repeating parts of a pattern
- Grouping subexpressions
- Matches and groups
- The date type
- Word and string boundaries
- Choice patterns
- The mechanics of matching
- Backtracking
- The replace method
- Greed
- Dynamically creating RegExp objects
- The search method
- The lastIndex property
- Parsing an INI file
- International characters
- Summary
- Exercises
- Chapter 10 Modules
- Chapter 11 Project: A Programming Language
- Chapter 12 JavaScript and the Browser
- Chapter 13 The Document Object Model
- Chapter 14 Handling Events
- Chapter 15 Project: A Platform Game
- Chapter 16 Drawing on Canvas
- Chapter 17 HTTP
- Chapter 18 Forms and Form Fields
- Chapter 19 Project: A Paint Program
- Chapter 20 Node.js
- Chapter 21 Project: Skill-Sharing Website
- Eloquent JavaScript
- Exercise Hints
- Program Structure
- Functions
- Data Structures: Objects and Arrays
- Higher-Order Functions
- The Secret Life of Objects
- Project: Electronic Life
- Bugs and Error Handling
- Regular Expressions
- Modules
- Project: A Programming Language
- The Document Object Model
- Handling Events
- Project: A Platform Game
- Drawing on Canvas
- HTTP
- Forms and Form Fields
- Project: A Paint Program
- Node.js
- Project: Skill-Sharing Website
Parsing
The most immediately visible part of a programming language is its syntax, or notation. A parser is a program that reads a piece of text and produces a data structure that reflects the structure of the program contained in that text. If the text does not form a valid program, the parser should complain and point out the error.
Our language will have a simple and uniform syntax. Everything in Egg is an expression. An expression can be a variable, a number, a string, or an application. Applications are used for function calls but also for constructs such as if
or while
.
To keep the parser simple, strings in Egg do not support anything like backslash escapes. A string is simply a sequence of characters that are not double quotes, wrapped in double quotes. A number is a sequence of digits. Variable names can consist of any character that is not whitespace and does not have a special meaning in the syntax.
Applications are written the way they are in JavaScript, by putting parentheses after an expression and having any number of arguments between those parentheses, separated by commas.
do(define(x, 10), if(>(x, 5), print("large"), print("small")))
The uniformity of the Egg language means that things that are operators in JavaScript (such as >
) are normal variables in this language, applied just like other functions. And since the syntax has no concept of a block, we need a do
construct to represent doing multiple things in sequence.
The data structure that the parser will use to describe a program will consist of expression objects, each of which has a type
property indicating the kind of expression it is and other properties to describe its content.
Expressions of type "value"
represent literal strings or numbers. Their value
property contains the string or number value that they represent. Expressions of type "word"
are used for identifiers (names). Such objects have a name
property that holds the identifier’s name as a string. Finally, "apply"
expressions represent applications. They have an operator
property that refers to the expression that is being applied, and they have an args
property that refers to an array of argument expressions.
The >(x, 5)
part of the previous program would be represented like this:
{ type: "apply", operator: {type: "word", name: ">"}, args: [ {type: "word", name: "x"}, {type: "value", value: 5} ] }
Such a data structure is called a syntax tree. If you imagine the objects as dots and the links between them as lines between those dots, it has a treelike shape. The fact that expressions contain other expressions, which in turn might contain more expressions, is similar to the way branches split and split again.
Contrast this to the parser we wrote for the configuration file format in Chapter 9 , which had a simple structure: it split the input into lines and handled those lines one at a time. There were only a few simple forms that a line was allowed to have.
Here we must find a different approach. Expressions are not separated into lines, and they have a recursive structure. Application expressions contain other expressions.
Fortunately, this problem can be solved elegantly by writing a parser function that is recursive in a way that reflects the recursive nature of the language.
We define a function parseExpression
, which takes a string as input and returns an object containing the data structure for the expression at the start of the string, along with the part of the string left after parsing this expression. When parsing subexpressions (the argument to an application, for example), this function can be called again, yielding the argument expression as well as the text that remains. This text may in turn contain more arguments or may be the closing parenthesis that ends the list of arguments.
This is the first part of the parser:
function parseExpression(program) { program = skipSpace(program); var match, expr; if (match = /^"([^"]*)"/.exec(program)) expr = {type: "value", value: match[1]}; else if (match = /^\d+\b/.exec(program)) expr = {type: "value", value: Number(match[0])}; else if (match = /^[^\s(),"]+/.exec(program)) expr = {type: "word", name: match[0]}; else throw new SyntaxError("Unexpected syntax: " + program); return parseApply(expr, program.slice(match[0].length)); } function skipSpace(string) { var first = string.search(/\S/); if (first == -1) return ""; return string.slice(first); }
Because Egg allows any amount of whitespace between its elements, we have to repeatedly cut the whitespace off the start of the program string. This is what the skipSpace
function helps with.
After skipping any leading space, parseExpression
uses three regular expressions to spot the three simple (atomic) elements that Egg supports: strings, numbers, and words. The parser constructs a different kind of data structure depending on which one matches. If the input does not match one of these three forms, it is not a valid expression, and the parser throws an error. SyntaxError
is a standard error object type, which is raised when an attempt is made to run an invalid JavaScript program.
We can then cut off the part that we matched from the program string and pass that, along with the object for the expression, to parseApply
, which checks whether the expression is an application. If so, it parses a parenthesized list of arguments.
function parseApply(expr, program) { program = skipSpace(program); if (program[0] != "(") return {expr: expr, rest: program}; program = skipSpace(program.slice(1)); expr = {type: "apply", operator: expr, args: []}; while (program[0] != ")") { var arg = parseExpression(program); expr.args.push(arg.expr); program = skipSpace(arg.rest); if (program[0] == ",") program = skipSpace(program.slice(1)); else if (program[0] != ")") throw new SyntaxError("Expected ',' or ')'"); } return parseApply(expr, program.slice(1)); }
If the next character in the program is not an opening parenthesis, this is not an application, and parseApply
simply returns the expression it was given.
Otherwise, it skips the opening parenthesis and creates the syntax tree object for this application expression. It then recursively calls parseExpression
to parse each argument until a closing parenthesis is found. The recursion is indirect, through parseApply
and parseExpression
calling each other.
Because an application expression can itself be applied (such as in multiplier(2)(1)
), parseApply
must, after it has parsed an application, call itself again to check whether another pair of parentheses follows.
This is all we need to parse Egg. We wrap it in a convenient parse
function that verifies that it has reached the end of the input string after parsing the expression (an Egg program is a single expression), and that gives us the program’s data structure.
function parse(program) { var result = parseExpression(program); if (skipSpace(result.rest).length > 0) throw new SyntaxError("Unexpected text after program"); return result.expr; } console.log(parse("+(a, 10)")); // → {type: "apply", // operator: {type: "word", name: "+"}, // args: [{type: "word", name: "a"}, // {type: "value", value: 10}]}
It works! It doesn’t give us very helpful information when it fails and doesn’t store the line and column on which each expression starts, which might be helpful when reporting errors later, but it’s good enough for our purposes.
This is a book about getting computers to do what you want them to do. Computers are about as common as screwdrivers today, but they contain a lot more hidden complexity and thus are harder to operate and understand. To many, they remain alien, slightly threatening things.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论