未转义的用户名与 BNF 不兼容吗?

发布于 2024-08-18 21:31:56 字数 624 浏览 6 评论 0原文

我需要解析软件的(专有)输出。遗憾的是,存在未转义的用户名,我正在抓耳挠腮地想知道是否可以使用 BNF(或 EBNF 或 ABNF)描述我需要解析的文件。

这个问题过于简单化了(这实际上只是一个例子),可能看起来像这样:

(数据) ::= <用户名>;
<用户名> ::=(其他类型的数据)

在某些情况下,用户名也可以出现在一行的中间,而不是出现在左侧或右侧。

问题是用户名未转义,并且对用户名没有足够的限制(它们是可打印的 ASCII,最多 20 个字符,并且不能包含换行符)。例如,“=”将是一个完全有效的用户名。 “= 1 = john = 2”也是如此(因为用户在登录时可以选择他们想要的任何用户名,并且这些用户名在我得到的输出中不会转义)。

我问这个问题是因为我的解析器阻塞了一些非常有创意的用户名(再一次,不在我的控制范围内,它们很“奇怪”,我需要处理它),并且我找不到一种简单的方法来处理这个问题。另请注意,我事先并不知道用户名(例如,我无权访问包含用户创建的所有用户名的数据库)。

那么不受限制和未转义的用户名与 BNF 不兼容吗?

PS:如果我犯了错误,请对我保持冷静,这是我在 stackoverflow 上的第一篇文章:)

I've got a (proprietary) output from a software that I need to parse. Sadly, there are unescaped user names and I'm scratching my hairs trying to know if I can, or not, describe the files I need to parse using a BNF (or EBNF or ABNF).

The problem, oversimplified (it's really just an example), may look like this:

(data) ::= <username>

<username> ::= (other type of data)

And in some case, instead of appearing at the left or at the right, the username can also appear in the middle of a line.

The problem is that the username is unescaped and there are not enough restrictions on user names (they're printable ASCII, max 20 chars and they can't contain line break). So "=" would be a perfectly valid username, for example. And so would "= 1 = john = 2" (because user, at sign-on, where allowed to choose any user name they wanted and these appear unescaped in the output I've got).

I'm asking because my parser chocked on some very creative usernames (once again, not in my control, they're "weird" and I need to deal with it) and I cannot find an easy way to deal with this. Also note that I do not know in advance the user names (for example I don't have access to a database that would contain all the user names that the users created).

So are unrestricted and unescaped user names incompatibles with BNF?

P.S: be cool with me if I made mistakes, it's my first post on stackoverflow :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

泛泛之交 2024-08-25 21:31:56

BNF 本身并不“关心”用户名。它适用于代币级别。如果您定义了一个username标记,您可以基于它使用BNF构建描述语法。

您的问题应该在词法分析器级别上解决。词法分析器应该足够智能,能够识别用户名(即使用户名没有转义),并将 username 标记传递给解析器。

理论上,您可以使用语法来描述各种用户名,但这在很大程度上取决于您语言中的其他内容。 = 本身是一个有效的令牌吗?如果是的话,如何区分其中包含 = 的用户名?我认为您必须用您的语言描述其余规则和有效标记才能在这里获得更完整的答案。

BNF doesn't "care" for user names per-se. It works on the token level. If you define a username token, you can build describe a grammar using BNF based on it.

Your problem should be solved on the lexer level. The lexer should be smart enough to recognize user names, even when they're not escaped, and pass username tokens to the parser.

In theory you could describe all kinds of user names with a grammar, but this heavily depends on the other things in your language. Is = a valid token on its own right? How can you tell a username having = in it apart if it is? I think you'll have to describe the rest of the rules and valid tokens in your language to get a fuller answer here.

柒夜笙歌凉 2024-08-25 21:31:56

可以通过识别不是用户名的东西然后将其他所有内容声明为用户名来工作,即使这意味着从右到左而不是从左到右进行解析或做一些同样古怪的事情。

可能值得看看您的输入是否实际上不明确:您能否找到导致生成相同输出的两种不同情况?如果是这样,您需要返回并获取对其中哪一个有利的要求,或者产生什么样的错误,等等。如果没有,原因可能会帮助您弄清楚解析器或词法分析器或任何需要做什么。

It might be possible to work by recognising things that are not usernames and then declaring everything else a username, even if this means parsing from right to left instead of left to right or doing something equally eccentric.

It may be worth looking to see if your input is actually ambiguous: can you find two different situations that lead to identical output being generated? If so, you need to go back and get requirements for which of them to favour, or what sort of error to produce, or whatever. If not, the reason why not might help you work out what your parser or lexer or whatever needs to do.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文