流式结构化文本输入

发布于 2024-08-21 11:01:34 字数 890 浏览 3 评论 0原文

我想从 TextReader 解析格式化的基本值和一些自定义字符串 - 本质上就像 scanf 允许的那样。

  • 我的输入可能没有换行符,因此 ReadLine+Regex 不是一个选项。我可以使用其他方式对文本输入进行分块;但问题是我在编译时不知道分隔符(所以这很棘手),并且该分隔符可能与本地化相关。例如,后跟逗号的浮点数可能是“1.5”或“1,5”,但在这两种情况下尝试解析浮点数都应该是“贪婪的”。
  • 为了安全起见,我想假设我的输入是积极敌对的(例如,从网络流流入):即故意缺少分块分隔符。
  • 我想避免自定义正则表达式:int.Parse 和 double.Parse 工作良好并且具有本地化意识。不要让我开始使用 DateTime - 无论如何,我可能需要一些自定义模式,但编写正则表达式来覆盖该场景听起来并不有趣。

举一个具体的例子,假设我有一个 TextReader 并且我知道下一个值应该是 double - 我如何提取该 double 以及可能有限数量的前瞻无需读取整个流,也无需手动编写可本地化的双解析器?

类似的问题

有一个上一个问题“寻找与 scanf 等效的 C# ”听起来很相似,但问答集中在 readline+regex (我想避免)。 如何对 TextReader 使用正则表达式? 没有找到答案(除了分块之外),无论如何我都想避免编写自己的正则表达式。

I'd like to parse formatted basic values and a few custom strings from a TextReader - essentially like scanf allows.

  • My input might not have line-breaks, so ReadLine+Regex isn't an option. I could use some other way of chunking text input; but the problem is that I don't know the delimiter at compile time (so that's tricky), and that that delimiter might be localization-dependant. For instance, a float followed by a comma might be "1.5," or "1,5," but in both cases attempting to parse the float should be "greedy".
  • To be safe, I'd like to assume my input is actively hostile (say, streaming in from a network stream): i.e. intentionally missing chunking delimiters.
  • I'd like to avoid custom Regex's: int.Parse and double.Parse work well and are localization-aware. Don't get me started on DateTime's - I might need a few custom patterns anyhow, but writing Regexes to cover that scenario doesn't sound like fun.

For a concrete example, let's say I have a TextReader and that I know the next value should be a double - how can I extract that double and possibly a limited amount of lookahead without reading the entire stream and without manually writing a localizable double-parser?

Similar Questions

There's a previous question "Looking for C# equivalent of scanf" which sounds similar but the Q+A focus on readline+regex (which I'd like to avoid). How can I use Regex against a TextReader? didn't find an answer (beyond chunking), and in any case I'd like to avoid writing my own Regexes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

多情出卖 2024-08-28 11:01:34

基于缺乏答案并且我自己仍然没有找到任何东西,似乎

  • 没有办法直接从 .NET 中的 Streams(或 TextReaders)使用本地化解析,也没有办法知道流的多少对应于以系统的方式可解析的前缀。
  • 无法将正则表达式应用于 .NET 中的 Stream(或 TextReaders),因此您自己没有简单的方法来实现类似的功能。
  • 如果您确实需要这样的东西,最简单的选择是成熟的解析器生成器。 ANTLR 对此很有效;它有很多现有的语法,您可以复制粘贴作为基础知识,它还附带一个 GUI 来帮助您理解语法,并为 .NET、java、C 和许多其他语言制作解析器。它对开发人员友好,速度快......但是对于我的需要来说太强大和灵活了;就像用猎枪射击虫子一样 - 我对这个解决方案并不满意。

Based on that lack of answers and still not having found anything myself, it seems that

  • There is no means to use localized parsing directly from Streams (or TextReaders) in .NET, nor is there a way to know how much of the stream corresponds to a parseable prefix in a systematic way.
  • There is no means to apply regular expressions to Streams (or TextReaders) in .NET, so there's no easy way of implementing something like this yourself.
  • If you really need something like this, the easiest option is a full-fledged parser generator. ANTLR works well for this; it has a lot of existing grammars you can copy-paste for the basics, and it comes with a GUI to help understand your grammar and makes parsers for .NET, java, C and a host of other languages. It's developer friendly, fast... ...but way too powerful and flexible for what I need; like shooting a bug with a shotgun - I'm not thrilled with this solution.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文