查找后跟一个子字符串但不跟另一个子字符串的子字符串模式
我有医院数据(opcs、NHS),其中包含程序代码,后跟指示偏侧性的代码。
使用 Regex
和 R
,我想识别字符串中的过程代码,该代码后面跟随其他过程代码,然后是横向代码。
但是,匹配不得包含兴趣程序代码,该程序代码后跟不同的偏侧性代码。示例:
string <- ("W100 Z923 W200 A456 W200 B234 A234 Z921")
我要匹配的内容:"W100|W200"
后面必须跟随的内容:"Z921"
例如应与此 W200 B234 A234 Z921
匹配
,但后面不得为:"Z922|Z923"
例如不应该匹配此 W100 Z923 W200 A456 W200 B234 A234 Z921
我尝试过的:
#match the procedure follow by Z921:
(W100|W200).{1,}?Z941
# I do not know how to add a negative look back to exclude matches without stopping this working, I have tried this, but it fails:
((W100|W200).{1,}Z941) (?<!Z943|Z942)
编辑:提高了问题和表示的清晰度
I have hospital data (opcs, NHS) which comprises of procedure codes followed by a code to indicate laterality.
Using Regex
and R
, I would like to identify a procedure code in a string which is followed other procedure codes then the laterality code.
However the match must not include procedure codes of intrest, which are followed by a different laterality code. Example:
string <- ("W100 Z923 W200 A456 W200 B234 A234 Z921")
What I am trying to match:"W100|W200"
What it must be followed by: "Z921"
e.g. Should match this W200 B234 A234 Z921
But must not be followed by: "Z922|Z923"
e.g. Should not match this W100 Z923 W200 A456 W200 B234 A234 Z921
What I have tried:
#match the procedure follow by Z921:
(W100|W200).{1,}?Z941
# I do not know how to add a negative look back to exclude matches without stopping this working, I have tried this, but it fails:
((W100|W200).{1,}Z941) (?<!Z943|Z942)
edit: Improved the clarity of question and reprex
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用
查看正则表达式演示。 详细信息:
\b
- 字边界W[12]00
-W100
或W200
code>\b
- 单词边界(?!\s+Z92[23]\b)
- 如果存在零个或多个空格,则匹配失败的负向预测进而Z923
或Z922
作为整个单词.*?
- 任何零个或多个字符(换行符除外),尽可能少Z941
- 一个Z941
字符串。You can use
See the regex demo. Details:
\b
- a word boundaryW[12]00
-W100
orW200
\b
- a word boundary(?!\s+Z92[23]\b)
- a negative lookahead that fails the match if there are zero or more whitespaces and thenZ923
orZ922
as a whole word.*?
- any zero or more chars, other than line break chars, as few as possibleZ941
- aZ941
string.