是否存在酶促裂解的正则表达式？

发布于 2024-08-13 17:18:50 字数 1064 浏览 6 评论 0原文

是否存在用于（理论上）tryptic 裂解蛋白质序列？胰蛋白酶的切割规则是：在 R 或 K 之后，但不在 P 之前。

示例：

序列 VGTKCCTKPESERMPCTEDYLSLILNR 的切割应产生这 3 个序列 (肽s)：

 VGTK
 CCTKPESER
 MPCTEDYLSLILNR

请注意，第二个肽中 K 之后没有切割（因为 P 在 K 之后）。

在 Perl 中（也可以在 C#、Python 或 Ruby 中使用）：

  my $seq = 'VGTRCCTKPESERMPCTEDYLSLILNR';
  my @peptides = split /someRegularExpression/, $seq;

我使用了这种解决方法（其中剪切标记 = 首先插入到序列中，如果 P 紧接在剪切标记之后，则再次删除)：

  my $seq      = 'VGTRCCTKPESERMPCTEDYLSLILNR';
  $seq         =~ s/([RK])/$1=/g; #Main cut rule.
  $seq         =~ s/=P/P/g;       #The exception.
  my @peptides = split( /=/, $seq);

但这需要修改一个可能很长并且可能有数百万个序列的字符串。有没有一种方法可以将正则表达式与 split 一起使用？如果是，正则表达式是什么？

测试平台：Windows XP 64位。 ActivePerl 64 位。来自 perl -v：为 MSWin32-x64-多线程构建的 v5.10.0。

原文

Does a regular expression exist for (theoretical) tryptic cleavage of protein sequences? The cleavage rule for trypsin is: after R or K, but not before P.

Example:

Cleavage of the sequence VGTKCCTKPESERMPCTEDYLSLILNR should result in these 3 sequences (peptides):

 VGTK
 CCTKPESER
 MPCTEDYLSLILNR

Note that there is no cleavage after K in the second peptide (because P comes after K).

In Perl (it could just as well have been in C#, Python or Ruby):

  my $seq = 'VGTRCCTKPESERMPCTEDYLSLILNR';
  my @peptides = split /someRegularExpression/, $seq;

I have used this work-around (where a cut marker, =, is first inserted in the sequence and removed again if P is immediately after the cut maker):

  my $seq      = 'VGTRCCTKPESERMPCTEDYLSLILNR';
  $seq         =~ s/([RK])/$1=/g; #Main cut rule.
  $seq         =~ s/=P/P/g;       #The exception.
  my @peptides = split( /=/, $seq);

But this requires modification to a string that can potentially be very long and there can be millions of sequences. Is there a way where a regular expression can be used with split? If yes, what would the regular expression be?

Test platform: Windows XP 64 bit. ActivePerl 64 bit. From perl -v: v5.10.0 built for MSWin32-x64-multi-thread.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

神爱温柔 2024-08-20 17:18:50

您确实需要结合使用正向前瞻和负向前瞻。正确的（Perl）语法如下：

my @peptides = split(/(?!P)(?<=[RK])/, $seq);

You indeed need to use the combination of a positive lookbehind and a negative lookahead. The correct (Perl) syntax is as follows:

my @peptides = split(/(?!P)(?<=[RK])/, $seq);

回复收藏 0 原文

不交电费瞎发啥光 2024-08-20 17:18:50

您可以使用环顾断言来排除这种情况。像这样的东西应该有效：

split(/(?<=[RK](?!P))/, $seq)

You could use look-around assertions to exclude that cases. Something like this should work:

split(/(?<=[RK](?!P))/, $seq)

回复收藏 0 原文

清引 2024-08-20 17:18:50

您可以使用向前查找和向后查找来匹配这些内容，同时仍然获得正确的位置。

/(?<=[RK])(?!P)/

应该最终在 R 或 K 之后且后面没有 P 的点上进行分裂。

You can use lookaheads and lookbehinds to match this stuff while still getting the correct position.

/(?<=[RK])(?!P)/

Should end up splitting on a point after an R or K that is not followed by a P.

回复收藏 0 原文

寄离 2024-08-20 17:18:50

在 Python 中，您可以使用 finditer 方法返回非重叠模式匹配，包括开始和跨度信息。然后，您可以存储字符串偏移量，而不是重建字符串。

回复收藏 0 原文

~没有更多了~

关于作者

写给空气的情书

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

是否存在酶促裂解的正则表达式？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

是否存在酶促裂解的正则表达式？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。