Mysql REGEX 用于检测长行
我的数据库中有一些记录如下:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.......
<PRE>
one short line
an other short line
a very long line I want to detect with more than 80 caracterssssssssssssssssss
again some short lines
</PRE>
Nullam tristique nisl eu lacus fringilla porta. ........
我想检测 PRE
标签内的长行(> 80 个字符),然后我将手动编辑它们。
我尝试了类似的方法
SELECT * FROM table WHERE column
REGEXP "<PRE>.*[\n\r]+[^\n\r]{80,}[\n\r]+.*</PRE>"
,但它返回的记录没有长队。
有人能指出我正确的方向吗?
I have some records in my database that looks like that :
Lorem ipsum dolor sit amet, consectetur adipiscing elit.......
<PRE>
one short line
an other short line
a very long line I want to detect with more than 80 caracterssssssssssssssssss
again some short lines
</PRE>
Nullam tristique nisl eu lacus fringilla porta. ........
I would like to detect long lines (>80 caracters) inside the PRE
tags and then I will edit them manually.
I tried something like this
SELECT * FROM table WHERE column
REGEXP "<PRE>.*[\n\r]+[^\n\r]{80,}[\n\r]+.*</PRE>"
but it's returning records where there is no long lines.
Can someone point me in the right direction ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
[^\n\r]{80,}
不一定与开始搜索的 PRE 元素中的行匹配。.*
可以匹配结束标记及其他标记,因此长行可以位于另一个 PRE 元素(如果有)中,或者甚至位于PRE 元素之间的文本。
我不认为有一种万无一失的方法可以在 MySQL 中完成您想要的操作,但是您可以尝试一下:
您已经说过 PRE 元素内不会有任何其他标记,因此其内容中的任何尖括号都应该采用转义序列的形式,例如
<
,并且正则表达式遇到的第一个<
应该是< 中的一个/代码> 标签。
这是一个 hack,但没有前瞻,这是我能想到的将匹配限制在同一 PRE 元素内的唯一方法。为了正确完成这项工作,您应该完全在 MySQL 之外完成它。
The
[^\n\r]{80,}
isn't necessarily matching a line in the PRE element where it starts searching. The.*
could be matching the closing</PRE>
tag and beyond, so the long line could be in another PRE element if there is one, or even in the text between PRE elements.I don't think there's a bullet-proof way to do what you want in MySQL, but you could try this:
You've said there won't be any other markup inside the PRE element, so any angle bracket in its content should be in the form of an escape sequence like
<
, and the first<
the regex encounters should be one in the</PRE>
tag.It's a hack, but without lookaheads, this is the only way I can think of to constrain the match to within the same PRE element. To do this job right, you should do it outside MySQL altogether.
使用
.*?
而不是.*
,这样正则表达式解析器就不会贪婪Use
.*?
instead of.*
so the regex parser isn't greedy如果可能有多个
If there could be more then one
<PRE>
block, you expression can swallow space in between them. Change[^\n\r]{80,}
to[^\n\r]{80,}?
.请注意,这假设
标记永远不会与内容位于同一行。 (如果是这样,您可以消耗 74 个字符的“长行”,后跟结束标记,然后您将消耗大量内容,直到下一个结束标记。)
Note that this assumes that the
</PRE>
tag never comes on the same line as the content. (If it did, you could consume 74 characters of 'long line' followed by the closing tag, and then you would consume a lot of content up until the next closing tag.)