对于 iOS 应用程序,我想解析可能包含 UNIX 样式变量以进行替换的 HTML 文件。例如,HTML 可能如下所示:
<html>
<head></head>
<body>
<h1>${title}</h1>
<p>${paragraph1}</p>
<img src="${image}" />
</body>
</html>
我正在尝试创建一个简单的 ParseKit 语法,它将为我提供两个回调:一个用于传递 HTML,另一个用于它检测到的变量。为此,我创建了以下语法:
@start = Empty | content*;
content = variable | passThrough;
passThrough = /[^$]+/;
variable = '$' '{' Word closeChar;
openChar = '${';
closeChar = '}';
我至少面临两个问题:对于 variable
我最初将其声明为 openChar Word closeChar
,但它没有工作(我还是不知道为什么)。第二个问题(也是更重要的)是解析器在找到 ![]()
(即带引号的字符串内的变量)时停止。
我的问题是:
- 如何修改语法以使其按预期工作?
- 使用分词器更好吗?如果是这样的话,我应该如何配置呢?
For an iOS application, I want to parse an HTML file that may contain UNIX style variables for replacement. For example, the HTML may look like:
<html>
<head></head>
<body>
<h1>${title}</h1>
<p>${paragraph1}</p>
<img src="${image}" />
</body>
</html>
I'm trying to create a simple ParseKit grammar that will provide me two callbacks: One for passthrough HTML, and another for the variables it detects. For that, I created the following grammar:
@start = Empty | content*;
content = variable | passThrough;
passThrough = /[^$]+/;
variable = '
I'm facing at least two issues with this: for variable
I had originally declared it as openChar Word closeChar
, but it did not work (I still don't know why). The second issue (and more important) is that the parser stops when it finds <img src"${image}" />
(i.e. a variable inside a quoted string).
My questions are:
- How can I modify the grammar to make it work as expected?
- Is it better to use a tokenizer? If that's the case, how should I configure it?
'{' Word closeChar;
openChar = '${';
closeChar = '}';
I'm facing at least two issues with this: for variable
I had originally declared it as openChar Word closeChar
, but it did not work (I still don't know why). The second issue (and more important) is that the parser stops when it finds <img src"${image}" />
(i.e. a variable inside a quoted string).
My questions are:
- How can I modify the grammar to make it work as expected?
- Is it better to use a tokenizer? If that's the case, how should I configure it?
发布评论
评论(1)
ParseKit 的开发者在这里。我将回答你的两个问题:
1)你采取了正确的方法,但这是一个棘手的情况。有几个小问题,你的语法需要稍微改变一下。
我开发了一种对我有用的语法:
然后在汇编器中实现这两个回调:
然后您的客户端/驱动程序代码将如下所示:
这将被打印:
2)是的,我认为这肯定会很多对于这种相对简单的情况,最好直接使用 Tokenizer。性能将会大大提高。以下是您可以如何使用 Tokenizer 来完成任务:
Developer of ParseKit here. I'll answer both of your questions:
1) You are taking the correct approach, but this is a tricky case. There are several small gotchas, and your Grammar needs to be changed a bit.
I've developed a grammar which is working for me:
Then implement these two callbacks in your Assembler:
And then your client/driver code will look something like this:
This will be printed:
2) Yes, I think it would definitely be much better to use the Tokenizer directly for this relatively simple case. Performance will be massively better. Here's how you might approach the task with the Tokenizer: