当前位置：文江博客话题详情

我如何用雪佛兰解析器递归提取我的表达元素？

发布于 2025-02-11 05:04:42 字数 5061 浏览 1 评论 0原文

我在打字稿中写了一个带有雪佛兰的解析器。

我有此形式的表达：

术语：match_term 和术语：match_term 或术语：match_term 和 and 术语：match_term 或术语：匹配_term

与优先级或优先级，并且可以有用户想要的数量和或。

我设法编写了解析器，该解析器允许我获得此表格的结果：

array_of_expressions：[Expression1，expression2，...]

array_of_operators：[and of of of of of of and of of of，of of ...]

这是以下规则这使我能够得到这个结果：

const StringDoubleQuote = createToken({ name: "StringDoubleQuote", pattern: /"[^"\\]*(?:\\.[^"\\]*)*"/ });
const StringSimpleQuote = createToken({ name: "StringSimpleQuote", pattern: /'[^'\\]*(?:\\.[^'\\]*)*'/ });

const And = createToken({ name: "And", pattern: /(AND|and)/ });
const Or = createToken({ name: "Or", pattern: /(OR|or)/ });
const Not = createToken({ name: "Not", pattern: /(NOT|not)/ });
const Colon = createToken({ name: "Colon", pattern: /:/ });

const WhiteSpace = createToken({
    name: "WhiteSpace",
    pattern: /[ \t\n\r]+/,
    group: Lexer.SKIPPED
});
const allTokens = [
    WhiteSpace,
    Colon,
    And,
    Or,
    Not,
    StringDoubleQuote,
    StringSimpleQuote
];

class CustomParser extends CstParser {
    private static INSTANCE: CustomParser | undefined;

    public static get(): CustomParser {
        if (CustomParser.INSTANCE === undefined) {
            CustomParser.INSTANCE = new CustomParser();
        }
        return CustomParser.INSTANCE;
    }

    public readonly jsonLexer = new Lexer(allTokens);

    constructor() {
        super(allTokens, { nodeLocationTracking: "onlyOffset" })
        this.performSelfAnalysis()
    }

    // In TypeScript the parsing rules are explicitly defined as class instance properties
    // This allows for using access control (public/private/protected) and more importantly "informs" the TypeScript compiler
    // about the API of our Parser, so referencing an invalid rule name (this.SUBRULE(this.oopsType);)
    // is now a TypeScript compilation error.

    public extractFirstExpression = this.RULE(RULES.extractFirstExpression, () => {
        this.SUBRULE(this.extractNextExpression);
    });

    public extractNextExpression = this.RULE(RULES.extractNextExpression, () => {
        this.MANY(() => this.OR([
            { ALT: () => this.SUBRULE(this.extractParentOperator) },
            { ALT: () => this.SUBRULE(this.extractParentExpression) }
        ]));
    });

    public extractParentOperator = this.RULE(RULES.extractParentOperator, () => {
        this.OR([
            {
                ALT: () => {
                    this.CONSUME(And, { LABEL: TERMINAL_LABELS.AND_BETWEEN_GLOBAL_TERMS });
                }
            },
            {
                ALT: () => {
                    this.CONSUME(Or, { LABEL: TERMINAL_LABELS.OR_BETWEEN_GLOBAL_TERMS });
                }
            }
        ])
    });

    public extractParentExpression = this.RULE(RULES.extractParentExpression, () => {
        this.SUBRULE(this.extractGenericTerm);
        this.SUBRULE(this.extractMatchTerm);
    });

    ....
}

我不想拥有这个结果，而是拥有一个ORS列表，每个或将包含一个ands ands，每个列表，每个列表都包含围绕它的两个表达式。

arrayor = [或，或...]

，每个界面或将是这样的接口：

interface OrNode {
  type: "OrNode",
  operator: "OR",
  orOperatorChild: AndNode[]
}

interface AndNode {
  type: "AndNode",
  operator: "AND",
  leftExpression: Expression,
  rightExpression: Expression
}

interface Expression {
  type: "Expression",
  fullPart?: string,
  mainTerm?: string,
  valueMatch?: string,
  ...
}

根据我的结果，我必须制定一个规则，以查找或操作员，对于每个或操作员，输入一个子序列将寻找和操作员，然后对于每个操作员，输入一个将寻找左右表达式的子列。

但是我的字符串首先以表达式开头，因此如何在第一个表达式之前读取或操作员，然后输入将读取其余部分的子序列？另外，在某些情况下，只有一个没有操作员的表达式，如果我想要或成为父母，我该怎么办？

我被告知使用递归规则，包括查看此示例示例，但我看不出如何将其应用于我的情况。

编辑：

我将以前的语法规则更改为以下内容（我尝试添加递归）：

    public extractExpressions = this.RULE(RULES.extractExpressions, () => {
        this.SUBRULE(this.extractParentExpression);
        this.OPTION(() => this.OR([
            { ALT: () => this.SUBRULE(this.extractAndExpression) },
            { ALT: () => this.SUBRULE(this.extractOrExpression) }
        ]));
    });

    public extractAndExpression = this.RULE(RULES.extractAndExpression, () => {
        this.CONSUME(And, { LABEL: TERMINAL_LABELS.AND_BETWEEN_GLOBAL_TERMS });
        this.SUBRULE(this.extractExpressions);
    });

    public extractOrExpression = this.RULE(RULES.extractOrExpression, () => {
        this.CONSUME(Or, { LABEL: TERMINAL_LABELS.OR_BETWEEN_GLOBAL_TERMS });
        this.SUBRULE(this.extractExpressions);
    });

    public extractParentExpression = this.RULE(RULES.extractParentExpression, () => {
        this.SUBRULE(this.extractGenericTerm);
        this.SUBRULE(this.extractMatchTerm);
    });
    .....

这些规则似乎是连贯的且对您进行了优化。我得到一个使我可以做我想做的事的对象，但是我想知道是否有一种更优化或优雅的方法来做我想做的事。

事先感谢您的任何帮助。

原文

I write a parser with Chevrotain in Typescript.

I have an expression of this form:

TERM:MATCH_TERM AND TERM:MATCH_TERM OR TERM:MATCH_TERM AND TERM:MATCH_TERM OR TERM:MATCH_TERM

with OR taking precedence over AND, and there can be the number of AND and OR that the user wants.

I managed to write the parser that allows me to obtain a result of this form:

Array_of_expressions: [expression1, expression2, ...]

Array_of_operators: [AND, OR, AND, AND, OR, ...]

It is the following rules that allowed me to have this result:

const StringDoubleQuote = createToken({ name: "StringDoubleQuote", pattern: /"[^"\\]*(?:\\.[^"\\]*)*"/ });
const StringSimpleQuote = createToken({ name: "StringSimpleQuote", pattern: /'[^'\\]*(?:\\.[^'\\]*)*'/ });

const And = createToken({ name: "And", pattern: /(AND|and)/ });
const Or = createToken({ name: "Or", pattern: /(OR|or)/ });
const Not = createToken({ name: "Not", pattern: /(NOT|not)/ });
const Colon = createToken({ name: "Colon", pattern: /:/ });

const WhiteSpace = createToken({
    name: "WhiteSpace",
    pattern: /[ \t\n\r]+/,
    group: Lexer.SKIPPED
});
const allTokens = [
    WhiteSpace,
    Colon,
    And,
    Or,
    Not,
    StringDoubleQuote,
    StringSimpleQuote
];

class CustomParser extends CstParser {
    private static INSTANCE: CustomParser | undefined;

    public static get(): CustomParser {
        if (CustomParser.INSTANCE === undefined) {
            CustomParser.INSTANCE = new CustomParser();
        }
        return CustomParser.INSTANCE;
    }

    public readonly jsonLexer = new Lexer(allTokens);

    constructor() {
        super(allTokens, { nodeLocationTracking: "onlyOffset" })
        this.performSelfAnalysis()
    }

    // In TypeScript the parsing rules are explicitly defined as class instance properties
    // This allows for using access control (public/private/protected) and more importantly "informs" the TypeScript compiler
    // about the API of our Parser, so referencing an invalid rule name (this.SUBRULE(this.oopsType);)
    // is now a TypeScript compilation error.

    public extractFirstExpression = this.RULE(RULES.extractFirstExpression, () => {
        this.SUBRULE(this.extractNextExpression);
    });

    public extractNextExpression = this.RULE(RULES.extractNextExpression, () => {
        this.MANY(() => this.OR([
            { ALT: () => this.SUBRULE(this.extractParentOperator) },
            { ALT: () => this.SUBRULE(this.extractParentExpression) }
        ]));
    });

    public extractParentOperator = this.RULE(RULES.extractParentOperator, () => {
        this.OR([
            {
                ALT: () => {
                    this.CONSUME(And, { LABEL: TERMINAL_LABELS.AND_BETWEEN_GLOBAL_TERMS });
                }
            },
            {
                ALT: () => {
                    this.CONSUME(Or, { LABEL: TERMINAL_LABELS.OR_BETWEEN_GLOBAL_TERMS });
                }
            }
        ])
    });

    public extractParentExpression = this.RULE(RULES.extractParentExpression, () => {
        this.SUBRULE(this.extractGenericTerm);
        this.SUBRULE(this.extractMatchTerm);
    });

    ....
}

I would like, instead of having this result, to have a list of ORs, and each OR would contain a list of ANDs, and each AND would contain the two expressions around it.

arrayOR = [OR, OR, ...]

and each OR would be an interface like this :

interface OrNode {
  type: "OrNode",
  operator: "OR",
  orOperatorChild: AndNode[]
}

interface AndNode {
  type: "AndNode",
  operator: "AND",
  leftExpression: Expression,
  rightExpression: Expression
}

interface Expression {
  type: "Expression",
  fullPart?: string,
  mainTerm?: string,
  valueMatch?: string,
  ...
}

According to me to have this result I have to make a rule that will look for the OR operators, for each OR operator enter a SUBRULE that will look for the AND operators, then for each AND operator enter a SUBRULE that will look for the right and left expression.

But my string starts with an expression first, so how can I read the OR operator before the first expression and then enter the SUBRULE that will read the rest? Also, in some cases there is only one expression without operators, how do I do if I want the OR to be parent but there is no OR?

I've been told to use recursive rules, including looking at this example example , but I don't see how I can apply it to my case.

EDIT:

I changed the previous grammar rules to the following (I tried to add recursion):

    public extractExpressions = this.RULE(RULES.extractExpressions, () => {
        this.SUBRULE(this.extractParentExpression);
        this.OPTION(() => this.OR([
            { ALT: () => this.SUBRULE(this.extractAndExpression) },
            { ALT: () => this.SUBRULE(this.extractOrExpression) }
        ]));
    });

    public extractAndExpression = this.RULE(RULES.extractAndExpression, () => {
        this.CONSUME(And, { LABEL: TERMINAL_LABELS.AND_BETWEEN_GLOBAL_TERMS });
        this.SUBRULE(this.extractExpressions);
    });

    public extractOrExpression = this.RULE(RULES.extractOrExpression, () => {
        this.CONSUME(Or, { LABEL: TERMINAL_LABELS.OR_BETWEEN_GLOBAL_TERMS });
        this.SUBRULE(this.extractExpressions);
    });

    public extractParentExpression = this.RULE(RULES.extractParentExpression, () => {
        this.SUBRULE(this.extractGenericTerm);
        this.SUBRULE(this.extractMatchTerm);
    });
    .....

Do these rules seem coherent and optimized to you. I get an object that allows me to do what I want to do but I wonder if there is a more optimized or elegant way to do what I want to do.

Thanks in advance for any help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小…红帽 2025-02-18 05:04:43

正如我在问题的编辑中所说的那样，在解析器的结果中，具有递归结构的解决方案是在这种情况下以下面的方式写出我的规则：

    public extractExpressions = this.RULE(RULES.extractExpressions, () => {
        this.SUBRULE(this.extractParentExpression);
        this.OPTION(() => this.OR([
            { ALT: () => this.SUBRULE(this.extractAndExpression) },
            { ALT: () => this.SUBRULE(this.extractOrExpression) }
        ]));
    });

    public extractAndExpression = this.RULE(RULES.extractAndExpression, () => {
        this.CONSUME(And, { LABEL: TERMINAL_LABELS.AND_BETWEEN_GLOBAL_TERMS });
        this.SUBRULE(this.extractExpressions);
    });

    public extractOrExpression = this.RULE(RULES.extractOrExpression, () => {
        this.CONSUME(Or, { LABEL: TERMINAL_LABELS.OR_BETWEEN_GLOBAL_TERMS });
        this.SUBRULE(this.extractExpressions);
    });

    public extractParentExpression = this.RULE(RULES.extractParentExpression, () => {
        this.SUBRULE(this.extractGenericTerm);
        this.SUBRULE(this.extractMatchTerm);
    });
    .....

As I put it in the edit of the question, the solution to have a recursive structure in the result of my parser is to write my rules in the following way in this case:

    public extractExpressions = this.RULE(RULES.extractExpressions, () => {
        this.SUBRULE(this.extractParentExpression);
        this.OPTION(() => this.OR([
            { ALT: () => this.SUBRULE(this.extractAndExpression) },
            { ALT: () => this.SUBRULE(this.extractOrExpression) }
        ]));
    });

    public extractAndExpression = this.RULE(RULES.extractAndExpression, () => {
        this.CONSUME(And, { LABEL: TERMINAL_LABELS.AND_BETWEEN_GLOBAL_TERMS });
        this.SUBRULE(this.extractExpressions);
    });

    public extractOrExpression = this.RULE(RULES.extractOrExpression, () => {
        this.CONSUME(Or, { LABEL: TERMINAL_LABELS.OR_BETWEEN_GLOBAL_TERMS });
        this.SUBRULE(this.extractExpressions);
    });

    public extractParentExpression = this.RULE(RULES.extractParentExpression, () => {
        this.SUBRULE(this.extractGenericTerm);
        this.SUBRULE(this.extractMatchTerm);
    });
    .....

回复收藏 0 原文

~没有更多了~