查找后跟一个子字符串但不跟另一个子字符串的子字符串模式

发布于 2025-01-16 21:06:47 字数 769 浏览 3 评论 0原文

我有医院数据（opcs、NHS），其中包含程序代码，后跟指示偏侧性的代码。

使用 Regex 和 R，我想识别字符串中的过程代码，该代码后面跟随其他过程代码，然后是横向代码。

但是，匹配不得包含兴趣程序代码，该程序代码后跟不同的偏侧性代码。示例：

string <- ("W100 Z923 W200 A456 W200 B234 A234 Z921")

我要匹配的内容："W100|W200"

后面必须跟随的内容："Z921" 例如应与此 W200 B234 A234 Z921 匹配

，但后面不得为："Z922|Z923" 例如不应该匹配此 W100 Z923 W200 A456 W200 B234 A234 Z921

我尝试过的：

#match the procedure follow by Z921: 
(W100|W200).{1,}?Z941 

# I do not know how to add a negative look back to exclude matches without stopping this working, I have tried this, but it fails:
((W100|W200).{1,}Z941) (?<!Z943|Z942)

编辑：提高了问题和表示的清晰度

原文

I have hospital data (opcs, NHS) which comprises of procedure codes followed by a code to indicate laterality.

Using Regex and R, I would like to identify a procedure code in a string which is followed other procedure codes then the laterality code.

However the match must not include procedure codes of intrest, which are followed by a different laterality code. Example:

string <- ("W100 Z923 W200 A456 W200 B234 A234 Z921")

What I am trying to match:"W100|W200"

What it must be followed by: "Z921"
e.g. Should match this W200 B234 A234 Z921

But must not be followed by: "Z922|Z923"
e.g. Should not match this W100 Z923 W200 A456 W200 B234 A234 Z921

What I have tried:

#match the procedure follow by Z921: 
(W100|W200).{1,}?Z941 

# I do not know how to add a negative look back to exclude matches without stopping this working, I have tried this, but it fails:
((W100|W200).{1,}Z941) (?<!Z943|Z942)

edit: Improved the clarity of question and reprex

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

溺ぐ爱和你が 2025-01-23 21:06:47

您可以使用

library(stringr)
str_extract_all(x, "\\bW[12]00\\b(?!\\s+Z92[23]\\b).*?Z941")

查看正则表达式演示。 详细信息：

\b - 字边界
W[12]00 - W100 或 W200 code>
\b - 单词边界
(?!\s+Z92[23]\b) - 如果存在零个或多个空格，则匹配失败的负向预测进而Z923 或 Z922 作为整个单词
.*? - 任何零个或多个字符（换行符除外），尽可能少
Z941 - 一个 Z941 字符串。

You can use

library(stringr)
str_extract_all(x, "\\bW[12]00\\b(?!\\s+Z92[23]\\b).*?Z941")

See the regex demo. Details:

\b - a word boundary
W[12]00 - W100 or W200
\b - a word boundary
(?!\s+Z92[23]\b) - a negative lookahead that fails the match if there are zero or more whitespaces and then Z923 or Z922 as a whole word
.*? - any zero or more chars, other than line break chars, as few as possible
Z941 - a Z941 string.