用不包含在括号 () 内的冒号分割字符串
我正在开发一个 HTML5/JavaScript 游戏引擎,我开始遇到一个我过去从未遇到过的场景,并且不知道如何实现这一点。
简而言之,我想按一个字符将一个字符串拆分为一个数组 - 只要该字符不在括号内。
基本上,在诸如项目/图块之类的 XML 文件中,我存储“触发器”,这些语句给出了代码将执行的操作的规则。单个触发器的不同参数用冒号 (:) 分隔,并且一个项目可以有多个触发器,其中每个触发器由逗号分隔。这是一个例子:(
<response trigger="npc:self:dialog:1:3">No, thank you.</response>
这基本上是说:如果选择了此响应,则使提出初始问题的 NPC 循环到特定转换的特定消息)
继续前进:我需要能够将回调触发器封装在其中具有某些触发器的参数的括号。这是一个例子:(
<response trigger="shop:open:1:(npc:self:dialog:1:4)">Yes, please.</response>
这基本上是说:打开一个特定的商店,当商店关闭时,跳转到说话的 NPC 的特定对话/消息)
这个想法是,当商店关闭时,我可以调用第四个该触发器的参数(触发器本身)。我相信您已经猜到了,这里的问题是,如果我根据“:”分割初始触发器字符串,那么它会将回调触发器分解为主触发器的其他(混乱)参数。我不想要这样。我也不想做任何事情,比如将辅助触发器拆分为另一个角色(出于稍后生成的原因,并且因为我想有时我会想要在更深的级别嵌套大量触发器,并且我不想使用我知道解决方法,但我想学习按不包含在其他特定字符中的字符进行分割的正确方法,
因为我用括号封装回调参数,所以我认为必须有一个。干净的正则表达式我可以使用不在括号内的所有冒号来分割主触发器字符串
遗憾的是,我无法想出正确的表达式来完成此操作,
我
非常感谢你们中的任何人的帮助。可能有:)
I'm working on an HTML5/JavaScript game engine, and I have started to encounter a scenario I haven't ever been in the past, and can't figure out how I can pull this off.
Simply put, I want to split a string into an array, by a character - so long as that character is not within parenthesis.
Basically, in XML files for things like items/tiles, I store "triggers", which are statements giving rules for operations the code will perform. The different parameters of a single trigger are split up with a colon (:), and more than one trigger can be in place for an item, whereby each trigger is split by a comma. Here's an example:
<response trigger="npc:self:dialog:1:3">No, thank you.</response>
(This is basically saying: if this response is selected, make the NPC who asked the initial question cycle to the a specific message of a specific conversion)
Moving along: I have come to need the ability to encapsulate callback triggers within parenthesis of parameters with certain triggers. Here's an example:
<response trigger="shop:open:1:(npc:self:dialog:1:4)">Yes, please.</response>
(This is basically saying: open up a specific store, and when the store is closed, jump to a specific conversation/message for the speaking NPC)
The idea is that when a store is closed, I can invoke the 4th parameter of that trigger (which is a trigger itself). As I am sure you have guessed, the problem here is that if I split the initial trigger-string based on ":", then it breaks up the callback trigger as other (messy) parameters of the main trigger. I don't want that. Nor, do I want to do anything like splitting secondary triggers by another character (for generation reasons later on, and because I imagine there will be times where I will want to nest lots of triggers at deeper levels and I don't want to use different characters. I know of work-arounds, but I'd like to learn the proper way to split by a character that is not contained within other specific characters.
Since I am encapsulating the callback parameter with parenthesis, I figure there must be a clean regular expression I can use to split the main trigger string by all colons NOT INSIDE parenthesis.
Sadly, I haven't been able to come up with the right expression to get this done.
Any ideas?
I greatly appreciate any assistance any of you may have. :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我怀疑你不能,至少如果有嵌套括号的可能性的话,因为识别正确的括号嵌套并不常见。
无论如何,不要构造一些巴洛克式的正则表达式,而是考虑一个非常简单的解析器:扫描到下一个出现的“:”或“(”,并对下一个标记执行某些操作。重复。这很容易做到具有递归下降,看起来像这样
(显然这是伪代码。将
string[n:]
作为“索引 n 中string
的子字符串> 到最后。)可能,考虑一下,您只需从
parseColonToken
开始,但我不确定这是否符合您预期的语法。I suspect you can't, at least if there's any chance of nested parentheses, since recognizing correct parenthesis-nesting is not regular.
In any case, instead of constructing some baroque regular expression, consider a very simple parser: scan to the next occurrence of either ":" or "(", and do something with the next token. Repeat. It would be easy to do with with recursive descent, and would look something like
(Obviously this is pseudocode. Take
string[n:]
as "the substring ofstring
from index n to the end.)probably, thinking about it, you'd simply start with
parseColonToken
but I'm not sure if that matches your expected grammar.您无法找到问题的正则表达式有一个很好的原因:
您描述的语言不是正则,即无法使用正则表达式进行解析。
基本上,您必须解析括号结构才能确定所有括号之外的冒号。这对于正则表达式来说是不可能的。
嵌套括号的语言是上下文无关的[1],因此可以直接编写递归解析器。
[1] http://en.wikipedia.org/wiki/Context-free_language
添加:你不需要递归解析器,一个简单的括号嵌套级别计数器就足够了:
There is a good reason why you were unable to find a regular expression for your problem:
The language you describe is not regular, i.e. it cannot be parsed with a regular expression.
Basically, you have to parse the parenthesis structure in order to determine the colons which are outside of all parentheses. This is not possible with a regular expression.
The language of nested parenthesis is context-free [1], so it straight-forward to write a recursive parser.
[1] http://en.wikipedia.org/wiki/Context-free_language
ADDITION: You don't need a recursive parser, a simple counter for the parenthesis nesting level is enough:
我认为最简单的方法是将字符串分成“函数”部分和“参数”部分,然后分别处理这两部分。如果您想将括号保留在参数部分,那么:
然后:
您可能可以将更多内容塞入单个正则表达式中(可能所有这些都取决于您的目标正则表达式引擎支持的非常规功能)但是没有太多意义,对于你的代码来说,清晰是比“短”更好的目标。任何看到上面内容的人都应该能够弄清楚它在做什么,如果他们有一个像样的 JavaScript 正则表达式参考 已在手。
如果您最终需要通过引用和转义等处理更复杂的表达式,那么您可以尝试修改 CSV 解析器 来执行以下操作:你需要什么。
I think the easiest approach would be to break the string into the "function" part and the "argument" part and then deal with the two parts separately. If you want to keep the parentheses on the argument part, then:
And then:
You might be able to cram more of that into a single regex (and possibly all of it depending on what non-regular features your target regex engine supports) but there's not much point and clarity is a better goal for your code than "short". Anyone looking at the above should be able to figure out what it is doing if they have a decent JavaScript regex reference in hand.
If you end up dealing with more complicated expressions with quoting and escaping and such, then you could try modifying a CSV parser to do what you need.