部分正则表达式匹配
我有一个正在测试字符输入流的正则表达式。我想知道是否有一种方法可以将正则表达式与输入进行匹配,并确定它是否是消耗整个输入缓冲区的部分匹配?即在正则表达式完成之前到达输入缓冲区的末尾。我希望实现决定是等待更多输入字符,还是中止操作。
换句话说,我需要确定哪一个是正确的:
在正则表达式匹配之前到达输入缓冲区的末尾
例如
"foo" =~ /^foobar/
正则表达式完全匹配
Eg
"foobar" =~ /^foobar/
正则表达式匹配失败
Eg
"fuubar" =~ /^foobar
输入未打包。
I have a regular expression that I'm testing a input stream of characters. I wonder if there is a way to match the regular expression against the input and determine if it is a partial match that consumes the entire input buffer? I.e. the end of the input buffer is reached before the regexp completes. I would like the implementation to decide whether to wait for more input characters, or abort the operation.
In other words, I need to determine which one is true:
The end of the input buffer was reached before the regexp was matched
E.g.
"foo" =~ /^foobar/
The regular expression matches completely
E.g.
"foobar" =~ /^foobar/
The regular expression failed to match
E.g.
"fuubar" =~ /^foobar
The input is not packetized.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是您正在解决的场景吗?您正在等待一个文字字符串,例如“foobar”。如果用户输入部分匹配,例如“foo”,您需要继续等待。如果输入不匹配,您想退出。
如果您正在使用文字字符串,我只需编写一个循环来按顺序测试字符。或者,
如果您尝试匹配更复杂的正则表达式,我不知道如何使用正则表达式来做到这一点。但我首先会阅读有关该平台如何实现正则表达式的更多信息。
汤姆
Is this the scenario you are solving? You are waiting for a literal string, e.g. 'foobar'. If the user types a partial match, e.g. 'foo', you want to keep waiting. If the input is a non-match you want to exit.
If you are working with literal strings I would just write a loop to to test the characters in sequence. Or,
If you are trying to match more complex regular expressions, I don't know how to do this with regular expressions. But I would start by reading more about how the platform implements regular expressions.
tom
我不确定这是否是您的问题,但是。
正则表达式要么匹配,要么不匹配。并且表达式将匹配可变数量的输入。所以,不能直接确定。
但是,如果您认为存在重叠的可能性,则可以使用智能缓冲方案来完成相同的事情。
有很多方法可以做到这一点。
一种方法是通过断言匹配所有不匹配的内容,直到开始
一场比赛(但不是您寻求的完整比赛)。
您可以简单地将它们从缓冲区中丢弃并清除。当您找到所需的匹配项时,请清除该数据及其之前的数据的缓冲区。
示例:
/()|([^<]*)/
您从缓冲区中丢弃/清除的部分位于第 2 组捕获缓冲区中。另一种方法是,如果您匹配有限长度的字符串,如果您不匹配任何内容
缓冲区中,您可以安全地丢弃从缓冲区开头到缓冲区结尾的所有内容减去您正在搜索的有限字符串的长度。
示例:您的缓冲区大小为 64k。您正在搜索长度为 10 的字符串。在缓冲区中未找到该字符串。您可以安全地清除 (64k - 10) 字节,保留最后 10 个字节。然后将 (64k-10) 字节追加到缓冲区末尾。当然,您只需要一个大小为 10 字节的缓冲区,不断删除/添加 1 个字符,但更大的缓冲区会更多
高效,您可以使用阈值来重新加载更多数据。
如果您可以创建一个易于收缩/扩展的缓冲区,则可以使用更多缓冲选项。
I'm not sure if this is your question but.
Regular expressions either match or not. And the expression will match a variable amount of input. So, it can't be determined directly.
However, it is possible, if you believe there is a possibility of overlap, to use a smart buffering scheme to accomplish the same thing.
There are many ways to do this.
One way is to match all that does not match via assertions, up until you get the start
of a match (but not the full match you seek).
These you simple throw away and clear from your buffer. When you get a match you seek, clear the buffer of that data and data before it.
Example:
/(<function.*?>)|([^<]*)/
The part you throw away/clear from the buffer is in group 2 capture buffer.Another way is if you are matching finite length strings, if you don't match anything in
the buffer, you can safely throw away all from the beginning of the buffer to the end of the buffer minus the length of the finite string you are searching for.
Example: Your buffer is 64k in size. You are searching for a string of length 10. It was not found in the buffer. You can safely clear (64k - 10) bytes, retaining the last 10 bytes. Then append (64k-10) bytes to the end of the buffer. Of course you only need a buffer of size 10 bytes, constantly removing/adding 1 character but a larger buffer is more
efficient and you could use thresholds to reload more data.
If you can create a buffer that easily contracts/expands, more buffering options are available.