查找后跟一个子字符串但不跟另一个子字符串的子字符串模式

发布于 2025-01-16 21:06:47 字数 769 浏览 3 评论 0原文

我有医院数据(opcs、NHS),其中包含程序代码,后跟指示偏侧性的代码。

使用 RegexR,我想识别字符串中的过程代码,该代码后面跟随其他过程代码,然后是横向代码。

但是,匹配不得包含兴趣程序代码,该程序代码后跟不同的偏侧性代码。示例:

string <- ("W100 Z923 W200 A456 W200 B234 A234 Z921")

我要匹配的内容:"W100|W200"

后面必须跟随的内容:"Z921" 例如应与此 W200 B234 A234 Z921 匹配

,但后面不得为:"Z922|Z923" 例如不应该匹配此 W100 Z923 W200 A456 W200 B234 A234 Z921

我尝试过的:

#match the procedure follow by Z921: 
(W100|W200).{1,}?Z941 

# I do not know how to add a negative look back to exclude matches without stopping this working, I have tried this, but it fails:
((W100|W200).{1,}Z941) (?<!Z943|Z942)

编辑:提高了问题和表示的清晰度

I have hospital data (opcs, NHS) which comprises of procedure codes followed by a code to indicate laterality.

Using Regex and R, I would like to identify a procedure code in a string which is followed other procedure codes then the laterality code.

However the match must not include procedure codes of intrest, which are followed by a different laterality code. Example:

string <- ("W100 Z923 W200 A456 W200 B234 A234 Z921")

What I am trying to match:"W100|W200"

What it must be followed by: "Z921"
e.g. Should match this W200 B234 A234 Z921

But must not be followed by: "Z922|Z923"
e.g. Should not match this W100 Z923 W200 A456 W200 B234 A234 Z921

What I have tried:

#match the procedure follow by Z921: 
(W100|W200).{1,}?Z941 

# I do not know how to add a negative look back to exclude matches without stopping this working, I have tried this, but it fails:
((W100|W200).{1,}Z941) (?<!Z943|Z942)

edit: Improved the clarity of question and reprex

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

溺ぐ爱和你が 2025-01-23 21:06:47

您可以使用

library(stringr)
str_extract_all(x, "\\bW[12]00\\b(?!\\s+Z92[23]\\b).*?Z941")

查看正则表达式演示详细信息

  • \b - 字边界
  • W[12]00 - W100W200 code>
  • \b - 单词边界
  • (?!\s+Z92[23]\b) - 如果存在零个或多个空格,则匹配失败的负向预测进而Z923Z922 作为整个单词
  • .*? - 任何零个或多个字符(换行符除外),尽可能少
  • Z941 - 一个 Z941 字符串。

You can use

library(stringr)
str_extract_all(x, "\\bW[12]00\\b(?!\\s+Z92[23]\\b).*?Z941")

See the regex demo. Details:

  • \b - a word boundary
  • W[12]00 - W100 or W200
  • \b - a word boundary
  • (?!\s+Z92[23]\b) - a negative lookahead that fails the match if there are zero or more whitespaces and then Z923 or Z922 as a whole word
  • .*? - any zero or more chars, other than line break chars, as few as possible
  • Z941 - a Z941 string.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文