解析自定义字符串生成模式语法
背景:我正在为 URL 文件名开发一种类似于正则表达式的自定义语法。它的工作原理如下:
- 用户编写一个模式,例如
"[az][0-9]{0,2}"
,并将其作为输入传递 - 它由程序解析并翻译为它代表的排列集,即
'a'
、'a0'
、'a00'
...'z99'
这些模式会有所不同由于复杂性,基本上任何可能出现在 URL 文件名中的内容都必须被容纳。该语言是 Java 或 PHP,但任何语言的示例或抽象/概念帮助都非常受欢迎。
我的问题是:
- 从哪里开始实现上述内容的“解析器”,
不太重要的是,
- 如何以编程方式将解析的复杂模式转换为字符串
Background: I'm developing a custom regex-like syntax for URL filenames. It will work like this:
- User writes a pattern, something like
"[a-z][0-9]{0,2}"
, and passes it as input - It is parsed by the program and translated into the set of permutations it represents i.e.
'a'
,'a0'
,'a00'
...'z99'
These patterns will vary in complexity, basically anything that could appear in a URL filename must be accommodated. The language is either Java or PHP, but examples in any language or abstract/conceptual help is more than welcome.
My questions are:
- Where to start with the implementation of a "parser" for the above
and less importantly,
- How to translate parsed complex patterns into strings programmatically
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这里有一个很好的答案:
所以:/generate-all-permutations-of -text-from-a-regex-pattern-in-c
事情的关键是......定义你真正需要的东西,并找出一种方法在你拥有你需要的东西后停止并缩小你的范围搜索范围尽可能大,因为您正在尝试快速爆炸的排列数量。 “必须容纳可能出现在 URL 文件名中的任何内容。”不会削减它。例如,如果您将自己限制为英文字符和数字,那么对于 6 个字符长的字符串,您将看到超过 20 亿种组合。每增加一个字符就乘以 36。
如果使用 ISO 8859,您将获得超过 274 万亿种组合,而 Unicode 则将获得超过 745 万亿万亿种组合。
There is a good answer for this here:
SO: /generate-all-permutations-of-text-from-a-regex-pattern-in-c
The crux of the thing is this...define what you really need well and figure out a way to halt once you have what you need and narrow your search range as much as possible because you are flirting with a quickly exploding number of permutations. "anything that could appear in a URL filename must be accommodated." is not going to cut it. For example, if you limit yourself to English characters and numbers, for a string 6 characters long you are looking at over 2 billion combinations. For each additional character multiply by 36.
If you go with ISO 8859 you get over 274 trillion combinations and Unicode over 745 trillion-trillion combinations.