当前位置：文江博客话题详情

编写迷你语言

发布于 2024-07-14 12:12:09 字数 200 浏览 8 评论 0原文

我有一个应用程序需要允许用户编写类似于 excel 的表达式：

(H1 + (D1 / C3)) * I8

以及更复杂的东西，例如

If(H1 = 'True', D3 * .2, D3 * .5)

我只能用正则表达式做这么多。任何有关正确方法的建议以及我可以学习的任何资源将不胜感激。

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情丝乱 2024-07-21 12:12:09

其他一些问题，您可以在以下位置找到提示：

祝你好运！

回复收藏 0 原文

記柔刀 2024-07-21 12:12:09

当面临类似的情况时——需要处理短的单行表达式——我编写了一个解析器。这些表达式是布尔逻辑、形式

n1 = y and n2 > z
n2 != x or (n3 > y and n4 = z)

等。在英语中，您可以说存在由 AND 和 OR 连接的原子，每个原子具有三个元素 - 左侧属性、运算符和值。因为它是如此简洁，我认为解析更容易。可能的属性集是已知的且有限的（例如：名称、大小、时间）。运算符因属性而异：不同的属性采用不同的运算符集。可能值的范围和格式也会根据属性的不同而有所不同。

为了进行解析，我使用 String.Split() 在空格上分割字符串。
后来我意识到，在 Split() 之前，我需要规范化输入字符串 - 在括号前后插入空格。我用 regex.Replace() 做到了这一点。

分割的输出是一个标记数组。然后，解析发生在一个大的 for 循环中，并在左侧属性值上切换。每次循环时，我都会吃掉一组代币。如果第一个标记是开放括号，则该组的长度只有一个标记：括号本身。对于众所周知的名称（我的属性值）的标记，解析器必须吸收一组 3 个标记，每个标记代表名称、运算符和值。如果在任何时候没有足够的标记，解析器就会抛出异常。根据令牌流，解析器状态将会改变。连词（AND、OR、XOR）意味着将前一个原子推入堆栈，当下一个原子完成时，我会弹出前一个原子并将这两个原子连接成一个复合原子。等等。状态管理发生在解析器每个循环的末尾。

Atom current;
for (int i=0; i < tokens.Length; i++) 
{
  switch (tokens[i].ToLower())
  {
    case "name":
        if (tokens.Length <= i + 2)
            throw new ArgumentException();
        Comparison o = (Comparison) EnumUtil.Parse(typeof(Comparison), tokens[i+1]);
        current = new NameAtom { Operator = o, Value = tokens[i+2] };
        i+=2;
        stateStack.Push(ParseState.AtomDone);
        break;
    case "and": 
    case "or":
        if (tokens.Length <= i + 3) 
          throw new ArgumentException();
        pendingConjunction = (LogicalConjunction)Enum.Parse(typeof(LogicalConjunction), tokens[i].ToUpper());
        current = new CompoundAtom { Left = current, Right = null, Conjunction = pendingConjunction };
        atomStack.Push(current);
        break;

    case "(":
        state = stateStack.Peek();
        if (state != ParseState.Start && state != ParseState.ConjunctionPending && state != ParseState.OpenParen)
          throw new ArgumentException();
        if (tokens.Length <= i + 4)
          throw new ArgumentException();
        stateStack.Push(ParseState.OpenParen);
        break;

    case ")":
        state = stateStack.Pop();
        if (stateStack.Peek() != ParseState.OpenParen)
            throw new ArgumentException();
        stateStack.Pop();
        stateStack.Push(ParseState.AtomDone);
        break;

    // more like that...
    case "":
       // do nothing in the case of whitespace
       break;
    default:
        throw new ArgumentException(tokens[i]);
  }

  // insert housekeeping for parse states here

}

这很简单，只是一点点。但我们的想法是每个案例陈述都相当简单。在表达式的原子单元中很容易解析。棘手的部分是将它们适当地连接在一起。

这个技巧是在每个 slurp 循环结束时的内务管理部分中使用状态堆栈和原子堆栈完成的。根据解析器状态可能会发生不同的事情。正如我所说，在每个 case 语句中，解析器状态可能会发生变化，之前的状态会被推入堆栈。然后，在 switch 语句的末尾，如果状态表明我刚刚完成了对原子的解析，并且有一个未决的连词，我会将刚刚解析的原子移到CompoundAtom 中。代码如下所示：

            state = stateStack.Peek();
            if (state == ParseState.AtomDone)
            {
                stateStack.Pop();
                if (stateStack.Peek() == ParseState.ConjunctionPending)
                {
                    while (stateStack.Peek() == ParseState.ConjunctionPending)
                    {
                        var cc = critStack.Pop() as CompoundAtom;
                        cc.Right = current;
                        current = cc; // mark the parent as current (walk up the tree)
                        stateStack.Pop();   // the conjunction is no longer pending 

                        state = stateStack.Pop();
                        if (state != ParseState.AtomDone)
                            throw new ArgumentException();
                    }
                }
                else stateStack.Push(ParseState.AtomDone); 
            }

另一个神奇之处是 EnumUtil.Parse。这使我能够解析诸如“<”之类的东西转换为枚举值。假设您像这样定义枚举：

internal enum Operator
{
    [Description(">")]   GreaterThan,
    [Description(">=")]  GreaterThanOrEqualTo,
    [Description("<")]   LesserThan,
    [Description("<=")]  LesserThanOrEqualTo,
    [Description("=")]   EqualTo,
    [Description("!=")]  NotEqualTo
}

通常 Enum.Parse 查找枚举值的符号名称，并且 < 不是有效的符号名称。 EnumUtil.Parse() 查找描述中的内容。代码如下所示：

internal sealed class EnumUtil
{
    /// <summary>
    /// Returns the value of the DescriptionAttribute if the specified Enum value has one.
    /// If not, returns the ToString() representation of the Enum value.
    /// </summary>
    /// <param name="value">The Enum to get the description for</param>
    /// <returns></returns>
    internal static string GetDescription(System.Enum value)
    {
        FieldInfo fi = value.GetType().GetField(value.ToString());
        var attributes = (DescriptionAttribute[])fi.GetCustomAttributes(typeof(DescriptionAttribute), false);
        if (attributes.Length > 0)
            return attributes[0].Description;
        else
            return value.ToString();
    }

    /// <summary>
    /// Converts the string representation of the name or numeric value of one or more enumerated constants to an equivilant enumerated object.
    /// Note: Utilised the DescriptionAttribute for values that use it.
    /// </summary>
    /// <param name="enumType">The System.Type of the enumeration.</param>
    /// <param name="value">A string containing the name or value to convert.</param>
    /// <returns></returns>
    internal static object Parse(Type enumType, string value)
    {
        return Parse(enumType, value, false);
    }

    /// <summary>
    /// Converts the string representation of the name or numeric value of one or more enumerated constants to an equivilant enumerated object.
    /// A parameter specified whether the operation is case-sensitive.
    /// Note: Utilised the DescriptionAttribute for values that use it.
    /// </summary>
    /// <param name="enumType">The System.Type of the enumeration.</param>
    /// <param name="value">A string containing the name or value to convert.</param>
    /// <param name="ignoreCase">Whether the operation is case-sensitive or not.</param>
    /// <returns></returns>
    internal static object Parse(Type enumType, string stringValue, bool ignoreCase)
    {
        if (ignoreCase)
            stringValue = stringValue.ToLower();

        foreach (System.Enum enumVal in System.Enum.GetValues(enumType))
        {
            string description = GetDescription(enumVal);
            if (ignoreCase)
                description = description.ToLower();
            if (description == stringValue)
                return enumVal;
        }

        return System.Enum.Parse(enumType, stringValue, ignoreCase);
    }

}

我从其他地方得到了 EnumUtil.Parse() 东西。也许在这里？

When faced with a similar situation - the need to handle short one-line expressions - I wrote a parser. The expressions were boolean logic, of the form

n1 = y and n2 > z
n2 != x or (n3 > y and n4 = z)

and so on. In english you could say that there are atoms joined by AND and OR, and each atom has three elements - a left-hand-side attribute, an operator, and a value. Because it was so succint I think the parsing was easier. The set of possible attributes is known and limited (eg: name, size, time). The operators vary by attribute: different attributes take different sets of operators. And the range and format of possible values vary according to attribute as well.

To parse, I split the string on whitespace using String.Split().
I later realized that prior to Split(), I needed to normalize the input string - inserting whitespace before and after parens. I did that with a regex.Replace().

The output of the split is an array of tokens. Then parsing occurs in a big for loop with a switch on the left-hand-side attribute value. With each go-round of the loop, I was set to slurp in a group of tokens. If the first token was an open-paren, then the group was just one token in length: the paren itself. For tokens that were well-known names - my attribute values - the parser had to slurp in a group of 3 tokens, one each for the name, the operator, and the value. If at any point there are not enough tokens, the parser throws an exception. Based on the stream of tokens, the parser state would change. A conjunction (AND,OR,XOR) meant to push the prior atom onto a stack, and when the next atom was finished, I'd pop the prior atom and join those two atoms into a compound atom. And so on. The state management happened at the end of each loop of the parser.

Atom current;
for (int i=0; i < tokens.Length; i++) 
{
  switch (tokens[i].ToLower())
  {
    case "name":
        if (tokens.Length <= i + 2)
            throw new ArgumentException();
        Comparison o = (Comparison) EnumUtil.Parse(typeof(Comparison), tokens[i+1]);
        current = new NameAtom { Operator = o, Value = tokens[i+2] };
        i+=2;
        stateStack.Push(ParseState.AtomDone);
        break;
    case "and": 
    case "or":
        if (tokens.Length <= i + 3) 
          throw new ArgumentException();
        pendingConjunction = (LogicalConjunction)Enum.Parse(typeof(LogicalConjunction), tokens[i].ToUpper());
        current = new CompoundAtom { Left = current, Right = null, Conjunction = pendingConjunction };
        atomStack.Push(current);
        break;

    case "(":
        state = stateStack.Peek();
        if (state != ParseState.Start && state != ParseState.ConjunctionPending && state != ParseState.OpenParen)
          throw new ArgumentException();
        if (tokens.Length <= i + 4)
          throw new ArgumentException();
        stateStack.Push(ParseState.OpenParen);
        break;

    case ")":
        state = stateStack.Pop();
        if (stateStack.Peek() != ParseState.OpenParen)
            throw new ArgumentException();
        stateStack.Pop();
        stateStack.Push(ParseState.AtomDone);
        break;

    // more like that...
    case "":
       // do nothing in the case of whitespace
       break;
    default:
        throw new ArgumentException(tokens[i]);
  }

  // insert housekeeping for parse states here

}

That's simplified, just a little. But the idea is that each case statement is fairly simple. It's easy to parse in an atomic unit of the expression. The tricky part was joining them all together appropriately.

That trick was accomplished in the housekeeping section, at the end of each slurp-loop, using the state stack and the atom stack. Different stuff can happen according to the parser state. As I said, in each case statement, the parser state might change, with the prior state getting pushed onto a stack. Then at the end of the switch statement, if the state said I had just finished parsing an atom, and there was a pending conjunction, I'd move the just-parsed atom into the CompoundAtom. The code looks like this:

            state = stateStack.Peek();
            if (state == ParseState.AtomDone)
            {
                stateStack.Pop();
                if (stateStack.Peek() == ParseState.ConjunctionPending)
                {
                    while (stateStack.Peek() == ParseState.ConjunctionPending)
                    {
                        var cc = critStack.Pop() as CompoundAtom;
                        cc.Right = current;
                        current = cc; // mark the parent as current (walk up the tree)
                        stateStack.Pop();   // the conjunction is no longer pending 

                        state = stateStack.Pop();
                        if (state != ParseState.AtomDone)
                            throw new ArgumentException();
                    }
                }
                else stateStack.Push(ParseState.AtomDone); 
            }

The one other bit of magic was the EnumUtil.Parse. That allows me to parse things like "<" into an enum value. Suppose you define your enums like this:

internal enum Operator
{
    [Description(">")]   GreaterThan,
    [Description(">=")]  GreaterThanOrEqualTo,
    [Description("<")]   LesserThan,
    [Description("<=")]  LesserThanOrEqualTo,
    [Description("=")]   EqualTo,
    [Description("!=")]  NotEqualTo
}

Normally Enum.Parse looks for the symbolic name of the enum value, and < is not a valid symbolic name. EnumUtil.Parse() looks for the thing in the description. The code looks like this:

internal sealed class EnumUtil
{
    /// <summary>
    /// Returns the value of the DescriptionAttribute if the specified Enum value has one.
    /// If not, returns the ToString() representation of the Enum value.
    /// </summary>
    /// <param name="value">The Enum to get the description for</param>
    /// <returns></returns>
    internal static string GetDescription(System.Enum value)
    {
        FieldInfo fi = value.GetType().GetField(value.ToString());
        var attributes = (DescriptionAttribute[])fi.GetCustomAttributes(typeof(DescriptionAttribute), false);
        if (attributes.Length > 0)
            return attributes[0].Description;
        else
            return value.ToString();
    }

    /// <summary>
    /// Converts the string representation of the name or numeric value of one or more enumerated constants to an equivilant enumerated object.
    /// Note: Utilised the DescriptionAttribute for values that use it.
    /// </summary>
    /// <param name="enumType">The System.Type of the enumeration.</param>
    /// <param name="value">A string containing the name or value to convert.</param>
    /// <returns></returns>
    internal static object Parse(Type enumType, string value)
    {
        return Parse(enumType, value, false);
    }

    /// <summary>
    /// Converts the string representation of the name or numeric value of one or more enumerated constants to an equivilant enumerated object.
    /// A parameter specified whether the operation is case-sensitive.
    /// Note: Utilised the DescriptionAttribute for values that use it.
    /// </summary>
    /// <param name="enumType">The System.Type of the enumeration.</param>
    /// <param name="value">A string containing the name or value to convert.</param>
    /// <param name="ignoreCase">Whether the operation is case-sensitive or not.</param>
    /// <returns></returns>
    internal static object Parse(Type enumType, string stringValue, bool ignoreCase)
    {
        if (ignoreCase)
            stringValue = stringValue.ToLower();

        foreach (System.Enum enumVal in System.Enum.GetValues(enumType))
        {
            string description = GetDescription(enumVal);
            if (ignoreCase)
                description = description.ToLower();
            if (description == stringValue)
                return enumVal;
        }

        return System.Enum.Parse(enumType, stringValue, ignoreCase);
    }

}

I got that EnumUtil.Parse() thing from somewhere else. Maybe here?

回复收藏 0 原文

一人独醉 2024-07-21 12:12:09

一个小的递归下降解析器非常适合这个。您可能甚至不必构建解析树 - 您可以在解析时进行评估。

 /* here's a teeny one in C++ */
void ScanWhite(const char* &p){
  while (*p==' ') p++;
}

bool ParseNum(const char* &p, double &v){
  ScanWhite(p);
  if (!DIGIT(*p)) return false;
  const char* p0 = p;
  while(DIGIT(*p)) p++;
  if (*p == '.'){
    p++;
    while(DIGIT(*p)) p++;
  }
  v = /* value of characters p0 up to p */;
  return true;
}

bool ParseId(const char* &p, double &v){
  ScanWhite(p);
  if (ALPHA(p[0]) && DIGIT(p[1])){
    v = /* value of cell whose name is p[0], p[1] */;
    p += 2;
    return true;
  }
  return false;
}

bool ParseChar(const char* &p, char c){
  ScanWhite(p);
  if (*p != c) return false;
  p++;
  return true;
}

void ParseExpr(const char* &p, double &v); /* forward declaration */

void ParsePrimitive(const char* &p, double &v){
  if (ParseNum(p, v));
  else if (ParseId(p, v));
  else if (ParseChar(p, '(')){
    ParseExpr(p, v);
    if (!ParseChar(p, ')'){/* throw syntax error */}
  }
  else {/* throw syntax error */}
}
#define PARSE_HIGHER ParsePrimitive

void ParseUnary(const char* &p, double &v){
  if (ParseChar(p, '-')){
    ParseUnary(p, v);
    v = -v;
  }
  else {
    PARSE_HIGHER(p, v);
  }
}
#undef  PARSE_HIGHER
#define PARSE_HIGHER ParseUnary

void ParseProduct(const char* &p, double &v){
  double v2;
  PARSE_HIGHER(p, v);
  while(true){
    if (ParseChar(p, '*')){
      PARSE_HIGHER(p, v2);
      v *= v2;
    }
    else if (ParseChar(p, '/')){
      PARSE_HIGHER(p, v2);
      v /= v2;
    }
    else break;
  }
}
#undef  PARSE_HIGHER
#define PARSE_HIGHER ParseProduct

void ParseSum(const char* &p, double &v){
  double v2;
  PARSE_HIGHER(p, v);
  while(true){
    if (ParseChar(p, '+')){
      PARSE_HIGHER(p, v2);
      v += v2;
    }
    else if (ParseChar(p, '-')){
      PARSE_HIGHER(p, v2);
      v -= v2;
    }
    else break;
  }
}
#undef  PARSE_HIGHER
#define PARSE_HIGHER ParseSum

void ParseExpr(const char* &p, double &v){
  PARSE_HIGHER(p, v);
}

double ParseTopLevel(const char* buf){
  const char* p = buf;
  double v;
  ParseExpr(p, v);
  return v;
}

现在，如果您只需调用 ParseTop，它就会为您计算表达式的值。

使用 PARSE_HIGHER 宏的原因是为了更容易在中间优先级添加运算符。

执行“if”语句有点复杂。每个解析例程都需要一个额外的“启用”参数，因此除非启用，否则它不会进行计算。然后解析“if”一词，解析测试表达式，然后解析两个结果表达式，禁用非活动表达式。

A little recursive-descent parser is perfect for this. You probably don't even have to build a parse tree - you can do the evaluation as you parse.

 /* here's a teeny one in C++ */
void ScanWhite(const char* &p){
  while (*p==' ') p++;
}

bool ParseNum(const char* &p, double &v){
  ScanWhite(p);
  if (!DIGIT(*p)) return false;
  const char* p0 = p;
  while(DIGIT(*p)) p++;
  if (*p == '.'){
    p++;
    while(DIGIT(*p)) p++;
  }
  v = /* value of characters p0 up to p */;
  return true;
}

bool ParseId(const char* &p, double &v){
  ScanWhite(p);
  if (ALPHA(p[0]) && DIGIT(p[1])){
    v = /* value of cell whose name is p[0], p[1] */;
    p += 2;
    return true;
  }
  return false;
}

bool ParseChar(const char* &p, char c){
  ScanWhite(p);
  if (*p != c) return false;
  p++;
  return true;
}

void ParseExpr(const char* &p, double &v); /* forward declaration */

void ParsePrimitive(const char* &p, double &v){
  if (ParseNum(p, v));
  else if (ParseId(p, v));
  else if (ParseChar(p, '(')){
    ParseExpr(p, v);
    if (!ParseChar(p, ')'){/* throw syntax error */}
  }
  else {/* throw syntax error */}
}
#define PARSE_HIGHER ParsePrimitive

void ParseUnary(const char* &p, double &v){
  if (ParseChar(p, '-')){
    ParseUnary(p, v);
    v = -v;
  }
  else {
    PARSE_HIGHER(p, v);
  }
}
#undef  PARSE_HIGHER
#define PARSE_HIGHER ParseUnary

void ParseProduct(const char* &p, double &v){
  double v2;
  PARSE_HIGHER(p, v);
  while(true){
    if (ParseChar(p, '*')){
      PARSE_HIGHER(p, v2);
      v *= v2;
    }
    else if (ParseChar(p, '/')){
      PARSE_HIGHER(p, v2);
      v /= v2;
    }
    else break;
  }
}
#undef  PARSE_HIGHER
#define PARSE_HIGHER ParseProduct

void ParseSum(const char* &p, double &v){
  double v2;
  PARSE_HIGHER(p, v);
  while(true){
    if (ParseChar(p, '+')){
      PARSE_HIGHER(p, v2);
      v += v2;
    }
    else if (ParseChar(p, '-')){
      PARSE_HIGHER(p, v2);
      v -= v2;
    }
    else break;
  }
}
#undef  PARSE_HIGHER
#define PARSE_HIGHER ParseSum

void ParseExpr(const char* &p, double &v){
  PARSE_HIGHER(p, v);
}

double ParseTopLevel(const char* buf){
  const char* p = buf;
  double v;
  ParseExpr(p, v);
  return v;
}

Now if you just call ParseTop, it will calculate the value of an expression for you.

The reason for the PARSE_HIGHER macro is to make it easier to add operators at intermediate levels of precedence.

To do the "if" statement is a little more involved. Each parse routine needs an additional "enable" argument, so it does no calculation unless it's enabled. Then you parse the word "if", parse the test expression, and then parse the two result expressions, with the inactive one disabled.

回复收藏 0 原文