Python - 正则表达式 - 在单词之前分割字符串

发布于 2024-11-24 03:04:23 字数 445 浏览 4 评论 0原文

我试图在特定单词之前拆分 python 中的字符串。例如,我想在 "path:" 之前分割以下字符串。

  • "path:" 输入之前分割字符串
  • "path:bte00250 丙氨酸、天冬氨酸和谷氨酸代谢路径:bte00330 精氨酸和脯氨酸代谢"
  • 输出:['path: bte00250 丙氨酸、天冬氨酸和谷氨酸代谢', 'path:bte00330 精氨酸和脯氨酸代谢']

我有尝试过

rx = re.compile("(:?[^:]+)")
rx.findall(line)

这不会在任何地方分割字符串。问题在于 "path:" 之后的值永远无法指定整个单词。有谁知道该怎么做?

I am trying to split a string in python before a specific word. For example, I would like to split the following string before "path:".

  • split string before "path:"
  • input: "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
  • output: ['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

I have tried

rx = re.compile("(:?[^:]+)")
rx.findall(line)

This does not split the string anywhere. The trouble is that the values after "path:" will never be known to specify the whole word. Does anyone know how to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

等风来 2024-12-01 03:04:23

使用正则表达式来分割字符串似乎有点大材小用:字符串 split() 方法可能正是您所需要的。

无论如何,如果您确实需要匹配正则表达式才能分割字符串,则应该使用 re.split() 方法,根据正则表达式匹配拆分字符串。

另外,使用正确的正则表达式进行拆分:

>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

(?=...) 组是一个先行断言:表达式匹配空格 (注意表达式开头的空格) 后跟字符串 'path:',不消耗空格后面的内容。

using a regular expression to split your string seems a bit overkill: the string split() method may be just what you need.

anyway, if you really need to match a regular expression in order to split your string, you should use the re.split() method, which splits a string upon a regular expression match.

also, use a correct regular expression for splitting:

>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

the (?=...) group is a lookahead assertion: the expression matches a space (note the space at the start of the expression) which is followed by the string 'path:', without consuming what follows the space.

夏雨凉 2024-12-01 03:04:23

您可以执行 ["path:"+s for s in line.split("path:")[1:]] 而不是使用正则表达式。 (请注意,我们跳过第一个没有“path:”前缀的匹配。

You could do ["path:"+s for s in line.split("path:")[1:]] instead of using a regex. (note that we skip first match, that has no "path:" prefix.

不必了 2024-12-01 03:04:23

这可以在没有正则表达式的情况下完成。给定一个字符串:

s = "path:bte00250 Alanine, aspartate ... path:bte00330 Arginine and ..."

我们可以暂时用占位符替换所需的单词。占位符是单个字符,我们用它来分割:

word, placeholder = "path:", "|"
s = s.replace(word, placeholder).split(placeholder)
s
# ['', 'bte00250 Alanine, aspartate ... ', 'bte00330 Arginine and ...']

现在字符串已分割,我们可以使用列表理解将原始单词重新连接到每个子字符串:

["".join([word, i]) for i in s if i]
# ['path:bte00250 Alanine, aspartate ... ', 'path:bte00330 Arginine and ...']

This can be done without regular expressons. Given a string:

s = "path:bte00250 Alanine, aspartate ... path:bte00330 Arginine and ..."

We can temporarily replace the desired word with a placeholder. The placeholder is a single character, which we use to split by:

word, placeholder = "path:", "|"
s = s.replace(word, placeholder).split(placeholder)
s
# ['', 'bte00250 Alanine, aspartate ... ', 'bte00330 Arginine and ...']

Now that the string is split, we can rejoin the original word to each sub-string using a list comprehension:

["".join([word, i]) for i in s if i]
# ['path:bte00250 Alanine, aspartate ... ', 'path:bte00330 Arginine and ...']
葬花如无物 2024-12-01 03:04:23
in_str = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
in_list = in_str.split('path:')
print ",path:".join(in_list)[1:]
in_str = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
in_list = in_str.split('path:')
print ",path:".join(in_list)[1:]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文