C# 中文本文件解析的最佳方法?
我想解析一个配置文件,就像这样:
[KEY:Value]
[SUBKEY:SubValue]
现在我开始使用 StreamReader,将行转换为字符数组,当时我认为必须有更好的方法。 所以我请求你,谦虚的读者,帮助我。
一个限制是它必须在 Linux/Mono 环境中工作(确切地说是 1.2.6)。 我没有最新的 2.0 版本(Mono),因此尝试将语言功能限制为 C# 2.0 或 C# 1.0。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
在我看来,您最好使用基于 XML 的配置文件,因为已经有 .NET 类可以相对轻松地为您读取和存储信息。 有什么理由认为这是不可能的吗?
@Bernard:
确实,手动编辑 XML 很乏味,但是您所呈现的结构看起来已经与 XML 非常相似。那么,是的,那里有一个很好的方法。
It looks to me that you would be better off using an XML based config file as there are already .NET classes which can read and store the information for you relatively easily. Is there a reason that this is not possible?
@Bernard:
It is true that hand editing XML is tedious, but the structure that you are presenting already looks very similar to XML.Then yes, has a good method there.
您还可以使用堆栈,并使用推/弹出算法。 这与开始/结束标签匹配。
您也许可以进行调整,以便可以读取文件的内容。 正则表达式也是一个好主意。
You can also use a stack, and use a push/pop algorithm. This one matches open/closing tags.
You can probably adapt so you can read the contents of your file. Regular expressions are also a good idea.
使用库几乎总是比创建自己的库更好。 这里有一个“哦,我永远不需要那个/我没有想到那个”要点的快速列表,这些要点最终会在以后咬你一口:
就像其他人建议的那样,YAML 看起来是您的最佳选择。
Using a library is almost always preferably to rolling your own. Here's a quick list of "Oh I'll never need that/I didn't think about that" points which will end up coming to bite you later down the line:
Like others have suggested, YAML looks like your best bet.
另一个用于 .NET 的 YAML 库正在开发中。 目前它支持读取 YAML 流,并已在 Windows 和 Mono 上进行了测试。 目前正在实施写入支持。
There is another YAML library for .NET which is under development. Right now it supports reading YAML streams and has been tested on Windows and Mono. Write support is currently being implemented.
前几天我正在研究几乎这个问题: 这篇关于字符串标记化的文章正是您所需要的。 您需要将标记定义为:
这篇文章很好地解释了它。 从那里你就可以开始吃掉你认为合适的代币。
专业提示:对于 LL(1) 解析器(阅读:简单),令牌不能共享前缀。 如果您将
abc
作为令牌,则不能将ace
作为令牌注意:本文缺少 | 示例中的字符,只需将它们放入即可。
I was looking at almost this exact problem the other day: this article on string tokenizing is exactly what you need. You'll want to define your tokens as something like:
The article does a pretty good job of explaining it. From there you just start eating up tokens as you see fit.
Protip: For an LL(1) parser (read: easy), tokens cannot share a prefix. If you have
abc
as a token, you cannot haveace
as a tokenNote: The article's missing the | characters in its examples, just throw them in.
无论持久格式如何,使用正则表达式将是最快的解析方法。
在 ruby 中,可能只有几行代码。
这两个将为您提供第一组中的值和子值。 查看 MSDN,了解如何将正则表达式与字符串进行匹配。
这是每个人的猫咪都应该拥有的东西。 正则表达式出现之前的日子看起来就像冰河时代。
Regardless of the persisted format, using a Regex would be the fastest way of parsing.
In ruby it'd probably be a few lines of code.
These two would get you the Value and SubValue in the first group. Check out MSDN on how to match a regex against a string.
This is something everyone should have in their kitty. Pre-Regex days would seem like the Ice Age.
@Gishu
实际上,一旦我适应了转义字符,我的正则表达式的运行速度就比我手写的自上而下的递归解析器稍慢,并且没有嵌套(将子项链接到它们的父项)和错误报告手写解析器。
正则表达式的编写速度稍快一些(尽管我确实有一些手动解析器的经验),但没有良好的错误报告。 一旦你添加了这一点,它就会变得稍微困难和更长。
我还发现手写的解析器更容易理解其意图。 例如,这里是代码片段:
@Gishu
Actually once I'd accommodated for escaped characters my regex ran slightly slower than my hand written top down recursive parser and that's without the nesting (linking sub-items to their parents) and error reporting the hand written parser had.
The regex was a slightly faster to write (though I do have a bit of experience with hand parsers) but that's without good error reporting. Once you add that it becomes slightly harder and longer to do.
I also find the hand written parser easier to understand the intention of. For instance, here is the a snippet of the code:
您看过 YAML 吗?
您可以享受 XML 的好处,而无需承受任何痛苦。 它在 ruby 社区中广泛用于配置文件、预先准备的数据库数据等,
这是一个示例
似乎有一个 C# 库在这里,我个人没有使用过,但是 yaml 非常简单,所以“这能有多难呢?” :-)
我想说,最好发明自己的特殊格式(并处理解析器错误)
Have you looked at YAML?
You get the benefits of XML without all the pain and suffering. It's used extensively in the ruby community for things like config files, pre-prepared database data, etc
here's an example
There appears to be a C# library here, which I haven't used personally, but yaml is pretty simple, so "how hard can it be?" :-)
I'd say it's preferable to inventing your own ad-hoc format (and dealing with parser bugs)