字符串文字识别问题

发布于 2024-10-12 01:17:30 字数 2703 浏览 3 评论 0原文

我试图通过读取字符串每个符号来识别字符串文字。 我的扫描仪框架示例:

public sealed class Scanner
{        
    // some class inner implementations        
    /// <summary>
    /// 
    /// </summary>
    /// <param name="Line"></param>
    /// <param name="LineNumber"></param>
    public void Run(String Line, Int32 LineNumber)
    {
        var ChPosition = default(Int32);
        var ChCurrent  = default(Char);
        var Value      = new StringBuilder();

        while (default(Char) != Line.ElementAtOrDefault<Char>(ChPosition))
        {
            ChCurrent = Line.ElementAtOrDefault<Char>(ChPosition);

            #region [Whitespace]
            if (Char.IsWhiteSpace(ChCurrent))
            {
                ChPosition++;
            }
            #endregion
            else
            {
                switch (ChCurrent)
                {                        
                    #region [String Literal (")]
                    case '"':
                        {
                            // skipping " sign, include only string inner value
                            ChCurrent = Line.ElementAtOrDefault<Char>(++ChPosition);

                            // ...? Problematic place!!!

                            this.Tokens.Enqueue(new SharedEntities.Token
                            {
                                Class = SharedEntities.Token.TokenClass.StringLiteral,
                                Value = Value.ToString()
                            }
                            );
                            Value.Clear();
                            ChPosition++;
                            break;
                        }
                    #endregion                        
                        {
                            throw new ScanningException(
                            "<syntax_error#" + ChCurrent.ToString() + ">\n"
                            + "Unsupported character appeared at: {ln: "
                            + LineNumber.ToString()
                            + "; pos: "
                            + (ChPosition + 1).ToString()
                            + "}"
                            );
                        }
                } // [switch(ChCurrent)]                   
            } // [if(Char.IsWhiteSpace(ChCurrent))...else]
        } // [while(default(Char) != Line.ElementAtOrDefault<Char>(ChPosition))]
    } // [public void Run(String Line, Int32 LineNumber)]
} // [public sealed class Scanner]

我的目标是解析类似帕斯卡的字符串:“{所有包含的内容但是”,仅< /em> ""允许配对}"。

I'm trying to recognize string literal by reading string per symbol.
Example of my scanner skeleton:

public sealed class Scanner
{        
    // some class inner implementations        
    /// <summary>
    /// 
    /// </summary>
    /// <param name="Line"></param>
    /// <param name="LineNumber"></param>
    public void Run(String Line, Int32 LineNumber)
    {
        var ChPosition = default(Int32);
        var ChCurrent  = default(Char);
        var Value      = new StringBuilder();

        while (default(Char) != Line.ElementAtOrDefault<Char>(ChPosition))
        {
            ChCurrent = Line.ElementAtOrDefault<Char>(ChPosition);

            #region [Whitespace]
            if (Char.IsWhiteSpace(ChCurrent))
            {
                ChPosition++;
            }
            #endregion
            else
            {
                switch (ChCurrent)
                {                        
                    #region [String Literal (")]
                    case '"':
                        {
                            // skipping " sign, include only string inner value
                            ChCurrent = Line.ElementAtOrDefault<Char>(++ChPosition);

                            // ...? Problematic place!!!

                            this.Tokens.Enqueue(new SharedEntities.Token
                            {
                                Class = SharedEntities.Token.TokenClass.StringLiteral,
                                Value = Value.ToString()
                            }
                            );
                            Value.Clear();
                            ChPosition++;
                            break;
                        }
                    #endregion                        
                        {
                            throw new ScanningException(
                            "<syntax_error#" + ChCurrent.ToString() + ">\n"
                            + "Unsupported character appeared at: {ln: "
                            + LineNumber.ToString()
                            + "; pos: "
                            + (ChPosition + 1).ToString()
                            + "}"
                            );
                        }
                } // [switch(ChCurrent)]                   
            } // [if(Char.IsWhiteSpace(ChCurrent))...else]
        } // [while(default(Char) != Line.ElementAtOrDefault<Char>(ChPosition))]
    } // [public void Run(String Line, Int32 LineNumber)]
} // [public sealed class Scanner]

My target is to parse pascal-like string: "{everything enclosed, but ", only "" pair is allowed}".

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

恬淡成诗 2024-10-19 01:17:30

首先,您显然正在使用某种解析库,如果您修改了代码,例如像我一样修改代码,那么您将有更好的机会,以便任何人都可以复制、粘贴、运行您的代码。

答案很简单,您的(字符串文字)解析区域不会解析所有输入。这是修改为无需任何附加库即可使用的代码:

public class Test
{
    static char ElementAtOrDefault(string value, int position)
    {
        return position >= value.Length ? default(char) : value[position];
    }
    static string parseStringLiteral(string value, ref int ChPosition)
    {
        StringBuilder Value = new StringBuilder();
        char ChCurrent = ElementAtOrDefault(value, ++ChPosition);
        while (ChCurrent != '"')
        {
            Value.Append(ChCurrent);
            ChCurrent = ElementAtOrDefault(value, ++ChPosition);
            if (ChCurrent == '"')
            {
                // "" sequence only acceptable
                if (ElementAtOrDefault(value, ChPosition + 1) == '"')
                {
                    Value.Append(ChCurrent);
                    // skip 2nd double quote
                    ChPosition++;
                    // move position next
                    ChCurrent = ElementAtOrDefault(value, ++ChPosition);
                }
            }
            else if (default(Char) == ChCurrent)
            {
                // message: unterminated string
                throw new Exception("ScanningException");
            }
        }
        ChPosition++;
        return Value.ToString();
    }

    public static void test(string literal)
    {
        Console.WriteLine("testing literal with " + literal.Length + 
            " chars:\n" + literal);
        try
        {
            int pos = 0;
            string res = parseStringLiteral(literal, ref pos);
            Console.WriteLine("Parsed " + res.Length + " chars:\n" + res);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Error: " + ex.Message);
        }
        Console.WriteLine();
    }

    public static int Main(string[] args)
    {
        test(@"""Hello Language Design""");
        test(@"""Is there any problems with the """"strings""""?""");
        test(@"""v#:';?325;.<>,|+_)""(*&^%$#@![]{}\|-_=""");
        return 0;
    }
}

运行此程序会产生输出:

testing literal with 23 chars:
"Hello Language Design"
Parsed 21 chars:
Hello Language Design

testing literal with 45 chars:
"Is there any problems with the ""strings""?"
Parsed 41 chars:
Is there any problems with the "strings"?

testing literal with 39 chars:
"v#:';?325;.,|+_)"(*&^%$#@![]{}\|-_="
Parsed 18 chars:
v#:';?325;.,|+_)

因此它适用于您的测试,但算法不正确,请尝试运行:

//literal with "", should produce ", but it does not
test(@"""""""""");

您将错误地得到:

testing literal with 4 chars:
""""
Parsed 0 chars:

问题是,如果您在中遇到字符“你的 while 条件,你不检查下一个字符,如果它是“或不是:

while (ChCurrent != '"') //bug

当然,我为你创建了正确的版本:-)
就是这样(它使用您的风格,只是您编辑的版本):

static string parseStringLiteral(string value, ref int ChPosition)
{
    StringBuilder Value = new StringBuilder();
    char ChCurrent = ElementAtOrDefault(value, ++ChPosition);
    bool goon = true;
    while (goon)
    {
        if (ChCurrent == '"')
        {
            // "" sequence only acceptable
            if (ElementAtOrDefault(value, ChPosition + 1) == '"')
            {
                Value.Append(ChCurrent);
                // skip 2nd double quote
                ChPosition++;
                // move position next
                ChCurrent = ElementAtOrDefault(value, ++ChPosition);
            }
            else goon = false; //break;
        }
        else if (default(Char) == ChCurrent)
        {
            // message: unterminated string
            throw new Exception("ScanningException");
        }
        else
        {
            Value.Append(ChCurrent);
            ChCurrent = ElementAtOrDefault(value, ++ChPosition);
        }
    }
    ChPosition++;
    return Value.ToString();
}

快乐编码:-)

First, you are obviously using some kind of parsing library, you would have better chance if you had modified your code, e.g. to something like I did, so that anybody can copy, paste, run your code.

Answer is simple, your (string literal)-parsing region does not parse all input. Here is your code modified to be used without any additional library:

public class Test
{
    static char ElementAtOrDefault(string value, int position)
    {
        return position >= value.Length ? default(char) : value[position];
    }
    static string parseStringLiteral(string value, ref int ChPosition)
    {
        StringBuilder Value = new StringBuilder();
        char ChCurrent = ElementAtOrDefault(value, ++ChPosition);
        while (ChCurrent != '"')
        {
            Value.Append(ChCurrent);
            ChCurrent = ElementAtOrDefault(value, ++ChPosition);
            if (ChCurrent == '"')
            {
                // "" sequence only acceptable
                if (ElementAtOrDefault(value, ChPosition + 1) == '"')
                {
                    Value.Append(ChCurrent);
                    // skip 2nd double quote
                    ChPosition++;
                    // move position next
                    ChCurrent = ElementAtOrDefault(value, ++ChPosition);
                }
            }
            else if (default(Char) == ChCurrent)
            {
                // message: unterminated string
                throw new Exception("ScanningException");
            }
        }
        ChPosition++;
        return Value.ToString();
    }

    public static void test(string literal)
    {
        Console.WriteLine("testing literal with " + literal.Length + 
            " chars:\n" + literal);
        try
        {
            int pos = 0;
            string res = parseStringLiteral(literal, ref pos);
            Console.WriteLine("Parsed " + res.Length + " chars:\n" + res);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Error: " + ex.Message);
        }
        Console.WriteLine();
    }

    public static int Main(string[] args)
    {
        test(@"""Hello Language Design""");
        test(@"""Is there any problems with the """"strings""""?""");
        test(@"""v#:';?325;.<>,|+_)""(*&^%$#@![]{}\|-_=""");
        return 0;
    }
}

Running this program produces output:

testing literal with 23 chars:
"Hello Language Design"
Parsed 21 chars:
Hello Language Design

testing literal with 45 chars:
"Is there any problems with the ""strings""?"
Parsed 41 chars:
Is there any problems with the "strings"?

testing literal with 39 chars:
"v#:';?325;.,|+_)"(*&^%$#@![]{}\|-_="
Parsed 18 chars:
v#:';?325;.,|+_)

So it works for your testing, but algorithm is not correct, try running:

//literal with "", should produce ", but it does not
test(@"""""""""");

And you will incorrectly get:

testing literal with 4 chars:
""""
Parsed 0 chars:

Problem is, if you encounter character " in your while condition, you do not check next character, if it is " or not:

while (ChCurrent != '"') //bug

Of course, I created correct version for you :-)
Here it is (it uses your style, just edited version of yours):

static string parseStringLiteral(string value, ref int ChPosition)
{
    StringBuilder Value = new StringBuilder();
    char ChCurrent = ElementAtOrDefault(value, ++ChPosition);
    bool goon = true;
    while (goon)
    {
        if (ChCurrent == '"')
        {
            // "" sequence only acceptable
            if (ElementAtOrDefault(value, ChPosition + 1) == '"')
            {
                Value.Append(ChCurrent);
                // skip 2nd double quote
                ChPosition++;
                // move position next
                ChCurrent = ElementAtOrDefault(value, ++ChPosition);
            }
            else goon = false; //break;
        }
        else if (default(Char) == ChCurrent)
        {
            // message: unterminated string
            throw new Exception("ScanningException");
        }
        else
        {
            Value.Append(ChCurrent);
            ChCurrent = ElementAtOrDefault(value, ++ChPosition);
        }
    }
    ChPosition++;
    return Value.ToString();
}

Happy coding :-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文