字符串模式匹配问题

发布于 2024-09-29 04:43:57 字数 763 浏览 8 评论 0原文

想象一下,我们有一个长字符串,其中包含子字符串“cat”和“dog”以及其他随机字符,例如。

cat x dog cat x cat x dog x dog x cat x dog x cat

这里“x”代表任意随机字符序列(但不是“cat”或“dog”)。

我想要做的是找到每个“猫”,后面跟着除“狗”之外的任何字符,然后是“猫”。我想在每种情况下删除第一个“猫”实例。

在这种情况下,我想删除括号内的 [cat],因为在下一个“cat”之前,它后面没有“dog”:

cat x dog [cat] x cat x dog x dog x cat x dog x cat

最终结果是:

cat x dog x cat x dog x dog x cat x dog x cat

这该怎么办?

我想到以某种方式使用正则表达式,如 VonC 推荐的 (n)(?=(n)) 此处

(cat)(?=(.*cat))

匹配字符串中的所有“cat”对。但我仍然不确定如何使用它来删除“猫”之前没有“狗”的每只猫。


我要解决的真正问题是 Java。但我实际上只是在寻找通用的伪代码/正则表达式解决方案。

Imagine we have a long string containing the substrings 'cat' and 'dog' as well as other random characters, eg.

cat x dog cat x cat x dog x dog x cat x dog x cat

Here 'x' represents any random sequence of characters (but not 'cat' or 'dog').

What I want to do is find every 'cat' that is followed by any characters except 'dog' and then by 'cat'. I want to remove that first instance of 'cat' in each case.

In this case, I would want to remove the bracketed [cat] because there is no 'dog' after it before the next 'cat':

cat x dog [cat] x cat x dog x dog x cat x dog x cat

To end up with:

cat x dog x cat x dog x dog x cat x dog x cat

How can this be done?

I thought of somehow using a regular expression like (n)(?=(n)) as VonC recommended
here

(cat)(?=(.*cat))

to match all of the pairs of 'cat' in the string. But I am still not sure how I could use this to remove each cat that is not followed by 'dog' before 'cat'.


The real problem I am tackling is in Java. But I am really just looking for a general pseudocode/regex solution.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

¢蛋碎的人ぎ生 2024-10-06 04:43:57

您想通过一次 RE 调用来完成此操作是否有任何特殊原因?我不确定这在 RE 中是否真的可行。

如果我必须这样做,我可能会分两次进行。首先标记字符串中“cat”和“dog”的每个实例,然后编写一些代码来识别需要删除哪些猫,并在另一遍中执行此操作。

伪代码如下:

// Find all the cats and dogs
int[] catLocations = string.findIndex(/cat/);
int[] dogLocations = string.findIndex(/dog/);
int [] idsToRemove = doLogic(catLocations, dogLocations);

// Remove each identified cat, from the end to the front
for (int id : idsToRemove.reverse())
  string.removeSubstring(id, "cat".length());

Is there any particular reason you want to do this with just one RE call? I'm not sure if that's actually possible in one RE.

If I had to do this, I'd probably go in two passes. First mark each instance of 'cat' and 'dog' in the string, then write some code to identify which cats need to be removed, and do that in another pass.

Pseudocode follows:

// Find all the cats and dogs
int[] catLocations = string.findIndex(/cat/);
int[] dogLocations = string.findIndex(/dog/);
int [] idsToRemove = doLogic(catLocations, dogLocations);

// Remove each identified cat, from the end to the front
for (int id : idsToRemove.reverse())
  string.removeSubstring(id, "cat".length());
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文