如何从一行中提取几个标记的字符串(python)

发布于 2024-08-27 14:11:26 字数 410 浏览 6 评论 0原文

我的朋友们,

我在这个问题上花了相当多的时间......但还无法找到更好的方法来做到这一点。顺便说一句,我正在用 python 编码。

因此,这是我正在使用的文件中的一行文本,例如:

“>ref|ZP_01631227.1| 3-脱氢奎宁合酶 [Nodularia spumigena CCY9414]...”

如何提取两个字符串“ZP_01631227” .1”和“Nodularia spumigena CCY9414”来自生产线?

成对的“||”括号就像标记,所以我们知道我们想要将字符串放在两者之间......

我想我可能可以循环遍历该行中的所有字符并以困难的方式完成它。只是需要花费很多时间...想知道是否有 python 库或其他聪明的方法可以很好地做到这一点?

感谢大家!

My Friends,

I spent quite some time on this one... but cannot yet figure out a better way to do it. I am coding in python, by the way.

So, here is a line of text in a file I am working with, for example:

">ref|ZP_01631227.1| 3-dehydroquinate synthase [Nodularia spumigena CCY9414]..."

How can I extract the two strings "ZP_01631227.1" and "Nodularia spumigena CCY9414" from the line?

The pairs of "| |" and brackets are like markers so we know we want to get the strings in between the two...

I guess I can probably loop over all the characters in the line and do it the hard way. It just takes so much time... Wondering if there is a python library or other smart ways to do it nicely?

Thanks to all!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

呢古 2024-09-03 14:11:26

一种简洁的替代方案是正则表达式(由于某种原因,它们在 Python 社区中名声不佳,但它们确实为简单的文本处理提供了简洁性和强大功能):

import re
s = ">ref|ZP_01631227.1| 3-dehydroquinate synthase [Nodularia spumigena CCY9414]..."
mo = re.search(r'\|(.*?)\|/*\[(.*?)\]', s)
if mo:
  thefirst, thesecond = mo.groups()

One concise alternative is a regular expression (for some reason they have a bad rep in the Python community, but they do provide conciseness and power for simple text handling):

import re
s = ">ref|ZP_01631227.1| 3-dehydroquinate synthase [Nodularia spumigena CCY9414]..."
mo = re.search(r'\|(.*?)\|/*\[(.*?)\]', s)
if mo:
  thefirst, thesecond = mo.groups()
夏花。依旧 2024-09-03 14:11:26
>>> for line in open("file"):
...     if "|" in line:
...         whatiwant_1=line.split("|")[1]
...         if "[" in line:
...             whatiwant_2=line.split("[")[1].split("]")[0]
...
>>> print whatiwant_1 , whatiwant_2
ZP_01631227.1 Nodularia spumigena CCY9414
>>> for line in open("file"):
...     if "|" in line:
...         whatiwant_1=line.split("|")[1]
...         if "[" in line:
...             whatiwant_2=line.split("[")[1].split("]")[0]
...
>>> print whatiwant_1 , whatiwant_2
ZP_01631227.1 Nodularia spumigena CCY9414
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文