如何仅选择某些子字符串

发布于 2024-12-19 21:50:06 字数 549 浏览 2 评论 0原文

从字符串说 dna = 'ATAGGGATAGGGAGAGAGCGATCGAGCTAG' 我得到了子字符串 dna.format = 'ATAGGGATAG','GGGAGAGAG' 我只想打印长度能被3整除的子字符串 怎么办?我使用模数,但它不起作用!

import re
if mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
print re.findall("ATA"(.*?)"AGA" , mydna)
if len(mydna)%3 == 0
   print mydna

更正后的代码

import re
mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
re.findall("ATA"(.*?)"AGA" , mydna.format)
if len(mydna.format)%3 == 0:
   print mydna.format

仍然没有给我长度可被 3 整除的子字符串。 。知道出什么问题了吗?

我期望只打印长度可被三整除的子字符串

from a string say dna = 'ATAGGGATAGGGAGAGAGCGATCGAGCTAG'
i got substring say dna.format = 'ATAGGGATAG','GGGAGAGAG'
i only want to print substring whose length is divisible by 3
how to do that? im using modulo but its not working !

import re
if mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
print re.findall("ATA"(.*?)"AGA" , mydna)
if len(mydna)%3 == 0
   print mydna

corrected code

import re
mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
re.findall("ATA"(.*?)"AGA" , mydna.format)
if len(mydna.format)%3 == 0:
   print mydna.format

this still doesnt give me substring with length divisible by three . . any idea whats wrong ?

im expecting only substrings which has length divisible by three to be printed

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

残龙傲雪 2024-12-26 21:50:06

为了包含重叠子字符串,我有以下冗长的版本。这个想法是找到所有开始和结束标记并计算它们之间的距离。

mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
[mydna[start.start():end.start()+3] for start in re.finditer('(?=ATA)',mydna) for end in re.finditer('(?=AGA)',mydna) if end.start()>start.start() and (end.start()-start.start())%3 == 0]
['ATAGGGATAGGG', 'ATAGGG']

显示所有子字符串,包括重叠的子字符串:

[mydna[start.start():end.start()+3] for start in re.finditer('(?=ATA)',mydna) for end in re.finditer('(?=AGA)',mydna) if end.start()>start.start()]
['ATAGGGATAGGG', 'ATAGGGATAGGGAG', 'ATAGGGATAGGGAGAGAGC', 'ATAGGG', 'ATAGGGAG', 'ATAGGGAGAGAGC']

For including overlap substrings, I have the following lengthy version. The idea is to find all starting and ending marks and calculate the distance between them.

mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
[mydna[start.start():end.start()+3] for start in re.finditer('(?=ATA)',mydna) for end in re.finditer('(?=AGA)',mydna) if end.start()>start.start() and (end.start()-start.start())%3 == 0]
['ATAGGGATAGGG', 'ATAGGG']

Show all substrings, including overlapping ones:

[mydna[start.start():end.start()+3] for start in re.finditer('(?=ATA)',mydna) for end in re.finditer('(?=AGA)',mydna) if end.start()>start.start()]
['ATAGGGATAGGG', 'ATAGGGATAGGGAG', 'ATAGGGATAGGGAGAGAGC', 'ATAGGG', 'ATAGGGAG', 'ATAGGGAGAGAGC']
心头的小情儿 2024-12-26 21:50:06

您还可以使用正则表达式:

re.findall('ATA((...)*?)AGA', mydna)

内部大括号一次匹配 3 个字母。

You can also use the regular expression for that:

re.findall('ATA((...)*?)AGA', mydna)

the inner braces match 3 letters at once.

烟火散人牵绊 2024-12-26 21:50:06

使用模数是正确的过程。如果它不起作用,那么你就做错了。请提供您的代码示例以便进行调试。

Using modulo is the correct procedure. If it's not working, you're doing it wrong. Please provide an example of your code in order to debug it.

卷耳 2024-12-26 21:50:06

re.findAll() 将返回一个匹配字符串的数组,您需要迭代每个字符串并对这些字符串进行取模以实现您想要的效果。

re.findAll() will return you an array of matching strings, You need to iterate on each of those and do a modulo on those strings to achieve what you want.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文