具有替换变量的 HTML 的简单 ParseKit 语法

发布于 2025-01-05 22:31:09 字数 846 浏览 3 评论 0 原文

对于 iOS 应用程序,我想解析可能包含 UNIX 样式变量以进行替换的 HTML 文件。例如,HTML 可能如下所示:

<html>
  <head></head>
  <body>
    <h1>${title}</h1>
    <p>${paragraph1}</p>
    <img src="${image}" />
  </body>
</html>

我正在尝试创建一个简单的 ParseKit 语法,它将为我提供两个回调:一个用于传递 HTML,另一个用于它检测到的变量。为此,我创建了以下语法:

@start        = Empty | content*;

content       = variable | passThrough;
passThrough   = /[^$]+/;
variable      = '$' '{' Word closeChar;

openChar      = '${';
closeChar     = '}';

我至少面临两个问题:对于 variable 我最初将其声明为 openChar Word closeChar,但它没有工作(我还是不知道为什么)。第二个问题(也是更重要的)是解析器在找到 (即带引号的字符串内的变量)时停止。

我的问题是:

  1. 如何修改语法以使其按预期工作?
  2. 使用分词器更好吗?如果是这样的话,我应该如何配置呢?

For an iOS application, I want to parse an HTML file that may contain UNIX style variables for replacement. For example, the HTML may look like:

<html>
  <head></head>
  <body>
    <h1>${title}</h1>
    <p>${paragraph1}</p>
    <img src="${image}" />
  </body>
</html>

I'm trying to create a simple ParseKit grammar that will provide me two callbacks: One for passthrough HTML, and another for the variables it detects. For that, I created the following grammar:

@start        = Empty | content*;

content       = variable | passThrough;
passThrough   = /[^$]+/;
variable      = '

I'm facing at least two issues with this: for variable I had originally declared it as openChar Word closeChar, but it did not work (I still don't know why). The second issue (and more important) is that the parser stops when it finds <img src"${image}" /> (i.e. a variable inside a quoted string).

My questions are:

  1. How can I modify the grammar to make it work as expected?
  2. Is it better to use a tokenizer? If that's the case, how should I configure it?
'{' Word closeChar; openChar = '${'; closeChar = '}';

I'm facing at least two issues with this: for variable I had originally declared it as openChar Word closeChar, but it did not work (I still don't know why). The second issue (and more important) is that the parser stops when it finds <img src"${image}" /> (i.e. a variable inside a quoted string).

My questions are:

  1. How can I modify the grammar to make it work as expected?
  2. Is it better to use a tokenizer? If that's the case, how should I configure it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

栩栩如生 2025-01-12 22:31:09

ParseKit 的开发者在这里。我将回答你的两个问题:

1)你采取了正确的方法,但这是一个棘手的情况。有几个小问题,你的语法需要稍微改变一下。

我开发了一种对我有用的语法:

// Tokenizer Directives
@symbolState = '"' "'"; // effectively tells the tokenizer to turn off QuoteState. 
                      // Otherwise, variables enclosed in quotes would not be found (they'd be embedded in quoted strings). 
                      // now single- & double-quotes will be recognized as individual symbols, not start- & end-markers for quoted strings

@symbols = '${'; // declare '${' as a multi-char symbol

@reportsWhitespaceTokens = YES; // tell the tokenizer to preserve/report whitespace

// Grammar
@start = content*;
content = passthru | variable;
passthru = /[^$].*/;
variable = start name end;
start = '${';
end = '}';
name = Word;

然后在汇编器中实现这两个回调:

- (void)parser:(PKParser *)p didMatchName:(PKAssembly *)a {
    NSLog(@"%s %@", __PRETTY_FUNCTION__, a);
    PKToken *tok = [a pop];

    NSString *name = tok.stringValue;
    // do something with name
}

- (void)parser:(PKParser *)p didMatchPassthru:(PKAssembly *)a {
    NSLog(@"%s %@", __PRETTY_FUNCTION__, a);
    PKToken *tok = [a pop];

    NSMutableString *s = a.target;
    if (!s) {
        s = [NSMutableString string];
    }

    [s appendString:tok.stringValue];

    a.target = s;
}

然后您的客户端/驱动程序代码将如下所示:

NSString *g = // fetch grammar
PKParser *p = [[PKParserFactory factory] parserFromGrammar:g assembler:self];
NSString *s = @"<img src=\"${image}\" />";
[p parse:s];
NSString *result = [p parse:s];
NSLog(@"result %@", result);

这将被打印:

result: <img src="" />

2)是的,我认为这肯定会很多对于这种相对简单的情况,最好直接使用 Tokenizer。性能将会大大提高。以下是您可以如何使用 Tokenizer 来完成任务:

PKTokenizer *t = [PKTokenizer tokenizerWithString:s];
[t setTokenizerState:t.symbolState from:'"' to:'"'];
[t setTokenizerState:t.symbolState from:'\'' to:'\''];
[t.symbolState add:@"${"];
t.whitespaceState.reportsWhitespaceTokens = YES;

NSMutableString *result = [NSMutableString string];

PKToken *eof = [PKToken EOFToken];
PKToken *tok = nil;
while (eof != (tok = [t nextToken])) {
    if ([@"${" isEqualToString:tok.stringValue]) {
        tok = [t nextToken];
        NSString *varName = tok.stringValue;

        // do something with variable
    } else if ([@"}" isEqualToString:tok.stringValue]) {
        // do nothing
    } else {
        [result appendString:tok.stringValue];
    }
}

Developer of ParseKit here. I'll answer both of your questions:

1) You are taking the correct approach, but this is a tricky case. There are several small gotchas, and your Grammar needs to be changed a bit.

I've developed a grammar which is working for me:

// Tokenizer Directives
@symbolState = '"' "'"; // effectively tells the tokenizer to turn off QuoteState. 
                      // Otherwise, variables enclosed in quotes would not be found (they'd be embedded in quoted strings). 
                      // now single- & double-quotes will be recognized as individual symbols, not start- & end-markers for quoted strings

@symbols = '${'; // declare '${' as a multi-char symbol

@reportsWhitespaceTokens = YES; // tell the tokenizer to preserve/report whitespace

// Grammar
@start = content*;
content = passthru | variable;
passthru = /[^$].*/;
variable = start name end;
start = '${';
end = '}';
name = Word;

Then implement these two callbacks in your Assembler:

- (void)parser:(PKParser *)p didMatchName:(PKAssembly *)a {
    NSLog(@"%s %@", __PRETTY_FUNCTION__, a);
    PKToken *tok = [a pop];

    NSString *name = tok.stringValue;
    // do something with name
}

- (void)parser:(PKParser *)p didMatchPassthru:(PKAssembly *)a {
    NSLog(@"%s %@", __PRETTY_FUNCTION__, a);
    PKToken *tok = [a pop];

    NSMutableString *s = a.target;
    if (!s) {
        s = [NSMutableString string];
    }

    [s appendString:tok.stringValue];

    a.target = s;
}

And then your client/driver code will look something like this:

NSString *g = // fetch grammar
PKParser *p = [[PKParserFactory factory] parserFromGrammar:g assembler:self];
NSString *s = @"<img src=\"${image}\" />";
[p parse:s];
NSString *result = [p parse:s];
NSLog(@"result %@", result);

This will be printed:

result: <img src="" />

2) Yes, I think it would definitely be much better to use the Tokenizer directly for this relatively simple case. Performance will be massively better. Here's how you might approach the task with the Tokenizer:

PKTokenizer *t = [PKTokenizer tokenizerWithString:s];
[t setTokenizerState:t.symbolState from:'"' to:'"'];
[t setTokenizerState:t.symbolState from:'\'' to:'\''];
[t.symbolState add:@"${"];
t.whitespaceState.reportsWhitespaceTokens = YES;

NSMutableString *result = [NSMutableString string];

PKToken *eof = [PKToken EOFToken];
PKToken *tok = nil;
while (eof != (tok = [t nextToken])) {
    if ([@"${" isEqualToString:tok.stringValue]) {
        tok = [t nextToken];
        NSString *varName = tok.stringValue;

        // do something with variable
    } else if ([@"}" isEqualToString:tok.stringValue]) {
        // do nothing
    } else {
        [result appendString:tok.stringValue];
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文