当前位置：文江博客话题详情

C# fileparse

C# 中文本文件解析的最佳方法？

发布于 2024-07-04 06:16:13 字数 265 浏览 11 评论 0 原文

我想解析一个配置文件，就像这样：

[KEY:Value]     
    [SUBKEY:SubValue]

现在我开始使用 StreamReader，将行转换为字符数组，当时我认为必须有更好的方法。所以我请求你，谦虚的读者，帮助我。

一个限制是它必须在 Linux/Mono 环境中工作（确切地说是 1.2.6）。我没有最新的 2.0 版本（Mono），因此尝试将语言功能限制为 C# 2.0 或 C# 1.0。

原文

I want to parse a config file sorta thing, like so:

[KEY:Value]     
    [SUBKEY:SubValue]

Now I started with a StreamReader, converting lines into character arrays, when I figured there's gotta be a better way. So I ask you, humble reader, to help me.

One restriction is that it has to work in a Linux/Mono environment (1.2.6 to be exact). I don't have the latest 2.0 release (of Mono), so try to restrict language features to C# 2.0 or C# 1.0.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦里°也失望 2024-07-11 06:16:13

在我看来，您最好使用基于 XML 的配置文件，因为已经有 .NET 类可以相对轻松地为您读取和存储信息。有什么理由认为这是不可能的吗？

@Bernard：~~确实，手动编辑 XML 很乏味，但是您所呈现的结构看起来已经与 XML 非常相似。~~

那么，是的，那里有一个很好的方法。

回复收藏 0 原文

回心转意 2024-07-11 06:16:13

您还可以使用堆栈，并使用推/弹出算法。这与开始/结束标签匹配。

public string check()
    {
        ArrayList tags = getTags();


        int stackSize = tags.Count;

        Stack stack = new Stack(stackSize);

        foreach (string tag in tags)
        {
            if (!tag.Contains('/'))
            {
                stack.push(tag);
            }
            else
            {
                if (!stack.isEmpty())
                {
                    string startTag = stack.pop();
                    startTag = startTag.Substring(1, startTag.Length - 1);
                    string endTag = tag.Substring(2, tag.Length - 2);
                    if (!startTag.Equals(endTag))
                    {
                        return "Fout: geen matchende eindtag";
                    }
                }
                else
                {
                    return "Fout: geen matchende openeningstag";
                }
            }
        }

        if (!stack.isEmpty())
        {
            return "Fout: geen matchende eindtag";
        }            
        return "Xml is valid";
    }

您也许可以进行调整，以便可以读取文件的内容。正则表达式也是一个好主意。

You can also use a stack, and use a push/pop algorithm. This one matches open/closing tags.

public string check()
    {
        ArrayList tags = getTags();


        int stackSize = tags.Count;

        Stack stack = new Stack(stackSize);

        foreach (string tag in tags)
        {
            if (!tag.Contains('/'))
            {
                stack.push(tag);
            }
            else
            {
                if (!stack.isEmpty())
                {
                    string startTag = stack.pop();
                    startTag = startTag.Substring(1, startTag.Length - 1);
                    string endTag = tag.Substring(2, tag.Length - 2);
                    if (!startTag.Equals(endTag))
                    {
                        return "Fout: geen matchende eindtag";
                    }
                }
                else
                {
                    return "Fout: geen matchende openeningstag";
                }
            }
        }

        if (!stack.isEmpty())
        {
            return "Fout: geen matchende eindtag";
        }            
        return "Xml is valid";
    }

You can probably adapt so you can read the contents of your file. Regular expressions are also a good idea.

回复收藏 0 原文

﹏雨一样淡蓝的深情 2024-07-11 06:16:13

使用库几乎总是比创建自己的库更好。这里有一个“哦，我永远不需要那个/我没有想到那个”要点的快速列表，这些要点最终会在以后咬你一口：

转义字符。如果您想要键中包含 : 或值中包含 ] 该怎么办？
转义转义字符。
Unicode
制表符和空格的混合（请参阅 Python 的空白敏感语法的问题）
处理不同的返回字符格式
处理语法错误报告

就像其他人建议的那样，YAML 看起来是您的最佳选择。

回复收藏 0 原文

池木 2024-07-11 06:16:13

另一个用于 .NET 的 YAML 库正在开发中。目前它支持读取 YAML 流，并已在 Windows 和 Mono 上进行了测试。目前正在实施写入支持。

回复收藏 0 原文

残月升风 2024-07-11 06:16:13

前几天我正在研究几乎这个问题：这篇关于字符串标记化的文章正是您所需要的。您需要将标记定义为：

@"(?<level>\s) | " +
@"(?<term>[^:\s]) | " +
@"(?<separator>:)"

这篇文章很好地解释了它。从那里你就可以开始吃掉你认为合适的代币。

专业提示：对于 LL(1) 解析器（阅读：简单），令牌不能共享前缀。如果您将 abc 作为令牌，则不能将 ace 作为令牌

注意：本文缺少 | 示例中的字符，只需将它们放入即可。

I was looking at almost this exact problem the other day: this article on string tokenizing is exactly what you need. You'll want to define your tokens as something like:

@"(?<level>\s) | " +
@"(?<term>[^:\s]) | " +
@"(?<separator>:)"

The article does a pretty good job of explaining it. From there you just start eating up tokens as you see fit.

Protip: For an LL(1) parser (read: easy), tokens cannot share a prefix. If you have abc as a token, you cannot have ace as a token

Note: The article's missing the | characters in its examples, just throw them in.

回复收藏 0 原文

赴月观长安 2024-07-11 06:16:13

无论持久格式如何，使用正则表达式将是最快的解析方法。
在 ruby 中，可能只有几行代码。

\[KEY:(.*)\] 
\[SUBKEY:(.*)\]

这两个将为您提供第一组中的值和子值。查看 MSDN，了解如何将正则表达式与字符串进行匹配。

这是每个人的猫咪都应该拥有的东西。正则表达式出现之前的日子看起来就像冰河时代。

Regardless of the persisted format, using a Regex would be the fastest way of parsing.
In ruby it'd probably be a few lines of code.

\[KEY:(.*)\] 
\[SUBKEY:(.*)\]

These two would get you the Value and SubValue in the first group. Check out MSDN on how to match a regex against a string.

This is something everyone should have in their kitty. Pre-Regex days would seem like the Ice Age.

回复收藏 0 原文

公布 2024-07-11 06:16:13

@Gishu

实际上，一旦我适应了转义字符，我的正则表达式的运行速度就比我手写的自上而下的递归解析器稍慢，并且没有嵌套（将子项链接到它们的父项）和错误报告手写解析器。

正则表达式的编写速度稍快一些（尽管我确实有一些手动解析器的经验），但没有良好的错误报告。一旦你添加了这一点，它就会变得稍微困难和更长。

我还发现手写的解析器更容易理解其意图。例如，这里是代码片段：

private static Node ParseNode(TextReader reader)
{
    Node node = new Node();
    int indentation = ParseWhitespace(reader);
    Expect(reader, '[');
    node.Key = ParseTerminatedString(reader, ':');
    node.Value = ParseTerminatedString(reader, ']');
}

@Gishu

Actually once I'd accommodated for escaped characters my regex ran slightly slower than my hand written top down recursive parser and that's without the nesting (linking sub-items to their parents) and error reporting the hand written parser had.

The regex was a slightly faster to write (though I do have a bit of experience with hand parsers) but that's without good error reporting. Once you add that it becomes slightly harder and longer to do.

I also find the hand written parser easier to understand the intention of. For instance, here is the a snippet of the code:

private static Node ParseNode(TextReader reader)
{
    Node node = new Node();
    int indentation = ParseWhitespace(reader);
    Expect(reader, '[');
    node.Key = ParseTerminatedString(reader, ':');
    node.Value = ParseTerminatedString(reader, ']');
}

回复收藏 0 原文

抽个烟儿 2024-07-11 06:16:13

我考虑过，但我不打算使用 XML。我将手动编写这些内容，而手动编辑 XML 让我的大脑受伤。 :')

您看过 YAML 吗？

您可以享受 XML 的好处，而无需承受任何痛苦。它在 ruby 社区中广泛用于配置文件、预先准备的数据库数据等，

这是一个示例

customer:
  name: Orion
  age: 26
  addresses:
    - type: Work
      number: 12
      street: Bob Street
    - type: Home
      number: 15
      street: Secret Road

似乎有一个 C# 库在这里，我个人没有使用过，但是 yaml 非常简单，所以“这能有多难呢？” :-)

我想说，最好发明自己的特殊格式（并处理解析器错误）

I considered it, but I'm not going to use XML. I am going to be writing this stuff by hand, and hand editing XML makes my brain hurt. :')

Have you looked at YAML?

You get the benefits of XML without all the pain and suffering. It's used extensively in the ruby community for things like config files, pre-prepared database data, etc

here's an example

customer:
  name: Orion
  age: 26
  addresses:
    - type: Work
      number: 12
      street: Bob Street
    - type: Home
      number: 15
      street: Secret Road

There appears to be a C# library here, which I haven't used personally, but yaml is pretty simple, so "how hard can it be?" :-)

I'd say it's preferable to inventing your own ad-hoc format (and dealing with parser bugs)

回复收藏 0 原文

~没有更多了~