当前位置：文江博客话题详情

如何从一行中提取几个标记的字符串（python）

发布于 2024-08-27 14:11:26 字数 410 浏览 6 评论 0原文

我的朋友们，

我在这个问题上花了相当多的时间......但还无法找到更好的方法来做到这一点。顺便说一句，我正在用 python 编码。

因此，这是我正在使用的文件中的一行文本，例如：

“>ref|ZP_01631227.1| 3-脱氢奎宁合酶 [Nodularia spumigena CCY9414]...”

如何提取两个字符串“ZP_01631227” .1”和“Nodularia spumigena CCY9414”来自生产线？

成对的“||”括号就像标记，所以我们知道我们想要将字符串放在两者之间......

我想我可能可以循环遍历该行中的所有字符并以困难的方式完成它。只是需要花费很多时间...想知道是否有 python 库或其他聪明的方法可以很好地做到这一点？

感谢大家！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

呢古 2024-09-03 14:11:26

一种简洁的替代方案是正则表达式（由于某种原因，它们在 Python 社区中名声不佳，但它们确实为简单的文本处理提供了简洁性和强大功能）：

import re
s = ">ref|ZP_01631227.1| 3-dehydroquinate synthase [Nodularia spumigena CCY9414]..."
mo = re.search(r'\|(.*?)\|/*\[(.*?)\]', s)
if mo:
  thefirst, thesecond = mo.groups()

One concise alternative is a regular expression (for some reason they have a bad rep in the Python community, but they do provide conciseness and power for simple text handling):

import re
s = ">ref|ZP_01631227.1| 3-dehydroquinate synthase [Nodularia spumigena CCY9414]..."
mo = re.search(r'\|(.*?)\|/*\[(.*?)\]', s)
if mo:
  thefirst, thesecond = mo.groups()

回复收藏 0 原文

夏花。依旧 2024-09-03 14:11:26

>>> for line in open("file"):
...     if "|" in line:
...         whatiwant_1=line.split("|")[1]
...         if "[" in line:
...             whatiwant_2=line.split("[")[1].split("]")[0]
...
>>> print whatiwant_1 , whatiwant_2
ZP_01631227.1 Nodularia spumigena CCY9414

>>> for line in open("file"):
...     if "|" in line:
...         whatiwant_1=line.split("|")[1]
...         if "[" in line:
...             whatiwant_2=line.split("[")[1].split("]")[0]
...
>>> print whatiwant_1 , whatiwant_2
ZP_01631227.1 Nodularia spumigena CCY9414

回复收藏 0 原文

~没有更多了~