REGEX-范围之间的提取,也忽略某些单词(Python)
我需要提取满足多种情况的车辆类的价值,因此尝试在范围类别和日期之间提取,但是对于少数示例数据(例如持有人)等示例数据不需要的值,并且需要忽略holder。 我也尝试过或条件,但无法排除这些单词
尝试的言论
- ?
- :( = class \ s)[az az( - | \ s | \)]*(?= date)
样本数据1: 车辆课 LMV MCWG 发行日期
样本数据2: 车辆级MCWG 发行日期
样本数据3: 车辆类LMV MCWG 发行日期
样本数据4: 车辆类LMV McWog 告诉签名 发行日期
样本数据5: 车辆类MCWG LMV LMV-GV PSVBU 发行日期
样本数据6: 车辆LMY MCWG 持有人签名 预期输出的发行日期
:类和日期之间的值(例如:在示例数据1:LMV MCWG中,在示例数据6:LMY MCWG中,它应该忽略持有人签名)
I need to extract value of vehicle class that satisfies multiple scenarios, hence tried to extract between range class and date but for few sample data unwanted values like holder and tolder need to be ignored.
I have tried with or condition as well but unable to exclude those words
Tried Regex :
- (?<=Class\s)[a-z A-Z(-|\s|\)]*(?=Date|TOLDER)
- (?<=Class\s)[a-z A-Z(-|\s|\)]*(?=Date)
sample data 1 :
Vehicle Class
LMV
MCWG
Date of Issue
sample data 2 :
Vehicle Class MCWG
Date of issue
sample data 3 :
Vehicle Class LMV MCWG
Date of issue
sample data 4 :
Vehicle Class LMV MCWOG
TOLDER SIGNATURE
Date of Issue
sample data 5 :
Vehicle Class MCWG LMV LMV-GV PSVBUS
Date of issue
sample data 6 :
Vehicle Class LMY MCWG
HOLDER SIGNATURE
Date of Issue
Expected output : value between Class and Date (for eg : in sample data 1 : LMV MCWG, in sample data 6 : LMY MCWG, where it should ignore HOLDER SIGNATURE)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用模式
(mc [az]+)。*(lm [az]+)|(lm [az]+)+)。
参见 https://regex101.com/r/08ln888/1
You can use the pattern
(MC[A-Z]+).*(LM[A-Z]+)|(LM[A-Z]+).*(MC[A-Z]+)
see https://regex101.com/r/08lN88/1
您可以使用字符类匹配持有人或告诉者。您可以在捕获组中捕获所需的数据,而不是查看。
在字符类中,您正在使用
\ s
,它也与一个空间匹配,如果要匹配管道char,可以使用单个|
(请注意,它没有 请为了防止部分单词匹配,您可以添加Word Bornaries
\ B
参阅a REGEX DEMO 。
输出
You can match either HOLDER or TOLDER using a character class. Instead of lookarounds you can capture the data that you want in a capture group.
In the character class you are using
\s
which also matches a space, and if you want to match a pipe char you can use a single|
(note that it does not mean OR in a character class)To prevent a partial word match, you can add word boundaries
\b
See a regex demo.
Output