Python - 正则表达式 - 在单词之前分割字符串

发布于 2024-11-24 03:04:23 字数 445 浏览 8 评论 0原文

我试图在特定单词之前拆分 python 中的字符串。例如，我想在 "path:" 之前分割以下字符串。

在 "path:" 输入之前分割字符串
："path:bte00250 丙氨酸、天冬氨酸和谷氨酸代谢路径：bte00330 精氨酸和脯氨酸代谢"
输出：['path: bte00250 丙氨酸、天冬氨酸和谷氨酸代谢', 'path:bte00330 精氨酸和脯氨酸代谢']

我有尝试过

rx = re.compile("(:?[^:]+)")
rx.findall(line)

这不会在任何地方分割字符串。问题在于 "path:" 之后的值永远无法指定整个单词。有谁知道该怎么做？

原文

I am trying to split a string in python before a specific word. For example, I would like to split the following string before "path:".

split string before "path:"
input: "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
output: ['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

I have tried

rx = re.compile("(:?[^:]+)")
rx.findall(line)

This does not split the string anywhere. The trouble is that the values after "path:" will never be known to specify the whole word. Does anyone know how to do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

等风来 2024-12-01 03:04:23

使用正则表达式来分割字符串似乎有点大材小用：字符串 split() 方法可能正是您所需要的。

无论如何，如果您确实需要匹配正则表达式才能分割字符串，则应该使用 re.split() 方法，根据正则表达式匹配拆分字符串。

另外，使用正确的正则表达式进行拆分：

>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

(?=...) 组是一个先行断言：表达式匹配空格 （注意表达式开头的空格） 后跟字符串 'path:'，不消耗空格后面的内容。

using a regular expression to split your string seems a bit overkill: the string split() method may be just what you need.

anyway, if you really need to match a regular expression in order to split your string, you should use the re.split() method, which splits a string upon a regular expression match.

also, use a correct regular expression for splitting:

>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

the (?=...) group is a lookahead assertion: the expression matches a space (note the space at the start of the expression) which is followed by the string 'path:', without consuming what follows the space.

回复收藏 0 原文

夏雨凉 2024-12-01 03:04:23

您可以执行 ["path:"+s for s in line.split("path:")[1:]] 而不是使用正则表达式。（请注意，我们跳过第一个没有“path:”前缀的匹配。

回复收藏 0 原文

不必了 2024-12-01 03:04:23

这可以在没有正则表达式的情况下完成。给定一个字符串：

s = "path:bte00250 Alanine, aspartate ... path:bte00330 Arginine and ..."

我们可以暂时用占位符替换所需的单词。占位符是单个字符，我们用它来分割：

word, placeholder = "path:", "|"
s = s.replace(word, placeholder).split(placeholder)
s
# ['', 'bte00250 Alanine, aspartate ... ', 'bte00330 Arginine and ...']

现在字符串已分割，我们可以使用列表理解将原始单词重新连接到每个子字符串：

["".join([word, i]) for i in s if i]
# ['path:bte00250 Alanine, aspartate ... ', 'path:bte00330 Arginine and ...']

This can be done without regular expressons. Given a string:

s = "path:bte00250 Alanine, aspartate ... path:bte00330 Arginine and ..."

We can temporarily replace the desired word with a placeholder. The placeholder is a single character, which we use to split by:

word, placeholder = "path:", "|"
s = s.replace(word, placeholder).split(placeholder)
s
# ['', 'bte00250 Alanine, aspartate ... ', 'bte00330 Arginine and ...']

Now that the string is split, we can rejoin the original word to each sub-string using a list comprehension:

["".join([word, i]) for i in s if i]
# ['path:bte00250 Alanine, aspartate ... ', 'path:bte00330 Arginine and ...']

回复收藏 0 原文

葬花如无物 2024-12-01 03:04:23

in_str = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
in_list = in_str.split('path:')
print ",path:".join(in_list)[1:]

in_str = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
in_list = in_str.split('path:')
print ",path:".join(in_list)[1:]

回复收藏 0 原文

~没有更多了~