正则:使用正则围绕一个单词查找para
我想使用正则表达式围绕一个单词找到段落,段落的开始和结尾由定界线“ @@”确定。我正在使用带有令牌方法的Alteryx Regex工具,其Perl 5兼容。
例如 文本:
@@消费者还可以通过允许定期监视其帐户 确保其帐户安全。全天候进入银行业务 信息提供了欺诈活动的早期检测 充当抵抗财务损失或损失的护栏。@@在线账单 付款在线银行的最大优势之一是在线账单 支付。而不是必须写支票或填写表格才能付款 账单,一旦您在网上银行设置帐户,就需要 是简单的点击甚至更少,因为您通常可以自动化账单 付款。有了在线账单付款,很容易管理您的帐户 一个中心来源,并跟踪付款进入和退出您的 帐户。@@,尽管有许多优势,但有一些缺点 也使用在线银行。这是一些缺点/缺点 与在线银行@@
合作
与在线银行@@案例
:如果我指定单词“一个中央源”,则应从启动n结束的情况下提取para,以delemiter'@@'
输出:
在线账单支付在线银行的最大优势之一是 在线账单付费。而不是必须写支票或填写表格 要支付账单,一旦您在在线银行设置帐户后,所有 它需要简单单击甚至更少,因为您通常可以自动化 您的账单付款。有了在线账单付款,很容易管理您的 来自一个中央来源的帐户,并跟踪付款进出 您的帐户。
\bone central source\b(.*?)@@
I want to find paragraph around a word using regex expressions, start and end of paragraph is identified by delimiter '@@'. I am using alteryx regex tool with tokenize method, its perl 5 compatible.
e.g.
Text:
@@Consumers can also monitor their accounts regularly by allowing them
to keep their accounts safe. Around-the-clock access to banking
information provides early detection of fraudulent activity, thereby
acting as a guardrail against financial damage or loss.@@ Online Bill
Payment one of the great advantages of online banking is online bill
pay. Rather than having to write checks or fill out forms to pay
bills, once you set up your accounts at your online bank, all it takes
is a simple click or even less, as you can usually automate your bill
payments. With online bill pay, it’s easy to manage your accounts from
one central source and to track payments into and out of your
account.@@ In spite of their many advantages, there are some drawbacks
to using online banks as well. Here are some of the downsides/drawback
of working with an online bank @@
Case:
if i specify word "one central source", it should extract para from starting n ending with delimiter '@@'
output:
Online Bill Payment one of the great advantages of online banking is
online bill pay. Rather than having to write checks or fill out forms
to pay bills, once you set up your accounts at your online bank, all
it takes is a simple click or even less, as you can usually automate
your bill payments. With online bill pay, it’s easy to manage your
accounts from one central source and to track payments into and out of
your account.
\bone central source\b(.*?)@@
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果该工具是perl5兼容的,则可以使用:
说明
(?s)
inline修改器,dot匹配newline@@ @@ \ s*+\ s*+\ k在
\ bone Central Source \ b
实际上是在单词边界之间匹配以防止部分单词匹配(?=@@)
正面lookahead,surst @@右侧Regex Demo
If the tool is perl5 compatible, you can use:
Explanation
(?s)
Inline modifier, have the dot match a newline@@\s*+\K
Match@@
, match optional whitespace chars and then clear the match buff3er(?:.(?!@@))*
Match any char when not directly followed by @@\bone central source\b
Match literally between word boundaries to prevent partial word matches.*?
Match any char, as least as possible(?=@@)
Positive lookahead, assert @@ to the rightRegex demo
Something like this should work:
/@@\s*((?:.(?!@@))*?\bone central source\b.*?)\s*@@/gs
在此处进行测试: https://regex101.com/r/s9e1ej/1
这个想法是搜索
@@ @@
可能是后面的空间,然后任何char都不会随后搜索。@@
。这可以用负面的lookahead:。(?!@@)
,这意味着没有任何内容@@
。(?:。(?!@@))*?
在非捕捉组中可以重复但具有未筛选选项的非捕捉组中相同的模式。这是为了避免吃你的句子。如您在示例中所看到的,文本可以像我在文本中添加电子邮件地址一样包含
@
符号。然后,像您一样,搜索您正在寻找的句子,其中
\ b
。我删除了对案例不敏感的标志,因此,如果您的句子在另一种情况下可以编写,则可能需要重新启用它。如果您不想获得划界分隔符,则可以将中间部分放在捕获组中。而且,如果您不能将组与工具一起使用,请查看 第四只鸟的不错的解决方案< /a>使用
\ k
重置和正面的lookahead。Something like this should work:
/@@\s*((?:.(?!@@))*?\bone central source\b.*?)\s*@@/gs
Testing it here: https://regex101.com/r/s9e1ej/1
The idea is to search for the
@@
possibly followed by spaces and then any char which isn't followed by@@
. This can be done with a negative lookahead:.(?!@@)
meaning anything not followed by@@
.(?:.(?!@@))*?
is this same pattern inside a non-capturing group which can be repeated but with the ungreedy option. This is to avoid eating your sentence.As you can see in the example, the text can contain the
@
symbol like I did by adding an e-mail address in the text.Then, as you did, search for the sentence you are looking for with the word boundary
\b
. I removed the case-insensitive flag so you might need to re-enable it if your sentence can be written in another case.If you don't want to get the delimiting separator, you could put the middle part in a capturing group. And if you can't use a group with your tool then look at The fourth bird's nice solution which is using the
\K
reset and a positive lookahead at the end.