如何正确地对 Perl 脚本进行反混淆?

发布于 2025-01-07 10:04:09 字数 3853 浏览 1 评论 0原文

我正在尝试对以下 Perl 代码进行反混淆():

#!/usr/bin/perl
(my$d=q[AA                GTCAGTTCCT
  CGCTATGTA                 ACACACACCA
    TTTGTGAGT                ATGTAACATA
      CTCGCTGGC              TATGTCAGAC
        AGATTGATC          GATCGATAGA
          ATGATAGATC     GAACGAGTGA
            TAGATAGAGT GATAGATAGA
              GAGAGA GATAGAACGA
                TC GATAGAGAGA
                 TAGATAGACA G
               ATCGAGAGAC AGATA
             GAACGACAGA TAGATAGAT
           TGAGTGATAG    ACTGAGAGAT
         AGATAGATTG        ATAGATAGAT
       AGATAGATAG           ACTGATAGAT
     AGAGTGATAG             ATAGAATGAG
   AGATAGACAG               ACAGACAGAT
  AGATAGACAG               AGAGACAGAT
  TGATAGATAG             ATAGATAGAT
  TGATAGATAG           AATGATAGAT
   AGATTGAGTG        ACAGATCGAT
     AGAACCTTTCT   CAGTAACAGT
       CTTTCTCGC TGGCTTGCTT
         TCTAA CAACCTTACT
           G ACTGCCTTTC
           TGAGATAGAT CGA
         TAGATAGATA GACAGAC
       AGATAGATAG  ATAGAATGAC
     AGACAGAGAG      ACAGAATGAT
   CGAGAGACAG          ATAGATAGAT
  AGAATGATAG             ACAGATAGAC
  AGATAGATAG               ACAGACAGAT
  AGACAGACTG                 ATAGATAGAT
   AGATAGATAG                 AATGACAGAT
     CGATTGAATG               ACAGATAGAT
       CGACAGATAG             ATAGACAGAT
         AGAGTGATAG          ATTGATCGAC
           TGATTGATAG      ACTGATTGAT
             AGACAGATAG  AGTGACAGAT
               CGACAGA TAGATAGATA
                 GATA GATAGATAG
                    ATAGACAGA G
                  AGATAGATAG ACA
                GTCGCAAGTTC GCTCACA
])=~s/\s+//g;%a=map{chr $_=>$i++}65,84,67,
71;$p=join$;,keys%a;while($d=~/([$p]{4})/g
){next if$j++%96>=16;$c=0;for$d(0..3){$c+=
$a{substr($1,$d,1)}*(4**$d)}$perl.=chr $c}
             eval $perl;

运行时,它打印出 Just anothergenenomic hacker.

通过 Deparse 运行代码并perltidy (perl -MO=Deparse jagh.pl | perltidy) 代码如下所示:

( my $d =
"AA...GCTCACA\n" # snipped double helix part
) =~ s/\s+//g;
(%a) = map( { chr $_, $i++; } 65, 84, 67, 71 );
$p = join( $;, keys %a );
while ( $d =~ /([$p]{4})/g ) {
    next if $j++ % 96 >= 16;
    $c = 0;
    foreach $d ( 0 .. 3 ) {
        $c += $a{ substr $1, $d, 1 } * 4**$d;
    }
    $perl .= chr $c;
}

这是我自己能够破译的内容。

( my $d =
"AA...GCTCACA\n" # snipped double helix part
) =~ s/\s+//g;

删除 $d(双螺旋)中的所有空格。

(%a) = map( { chr $_, $i++; } 65, 84, 67, 71 );

使用 ATCG 作为键以及值 0 生成哈希代码>、<代码>1、<代码>2和<代码>3。 我通常用 Python 编写代码,因此这会转换为 Python 中的字典 {'A': 0, 'B': 1, 'C': 2, 'D': 3}

$p = join( $;, keys %a );

将散列的键与 $; 用于多维数组模拟的下标分隔符< /a>.文档说默认值为“\034”,与 awk 中的 SUBSEP 相同,但是当我这样做时:

my @ascii = unpack("C*", $p);
print @ascii[1];

我得到值 28?另外,我不清楚这是如何模拟多维数组的。 $p 现在类似于 Python 中的 [['A'], ['T'], ['C'], ['G']] 吗?

    while ( $d =~ /([$p]{4})/g ) {

只要$d匹配([$p]{4}),就执行while块中的代码。但由于我不完全理解 $p 是什么结构,我也很难理解这里发生的事情。

next if $j++ % 96 >= 16;

如果 $j 模 96 大于或等于 16,则继续。$j 随着 while 循环的每次传递而递增 (?)。

$c = 0;
foreach $d ( 0 .. 3 ) {
    $c += $a{ substr $1, $d, 1 } * 4**$d;
}

对于从 03 范围内的 $d 提取一些子字符串,但此时我完全迷失了。最后几行连接所有内容并评估结果。

I'm trying to deobfuscate the following Perl code (source):

#!/usr/bin/perl
(my$d=q[AA                GTCAGTTCCT
  CGCTATGTA                 ACACACACCA
    TTTGTGAGT                ATGTAACATA
      CTCGCTGGC              TATGTCAGAC
        AGATTGATC          GATCGATAGA
          ATGATAGATC     GAACGAGTGA
            TAGATAGAGT GATAGATAGA
              GAGAGA GATAGAACGA
                TC GATAGAGAGA
                 TAGATAGACA G
               ATCGAGAGAC AGATA
             GAACGACAGA TAGATAGAT
           TGAGTGATAG    ACTGAGAGAT
         AGATAGATTG        ATAGATAGAT
       AGATAGATAG           ACTGATAGAT
     AGAGTGATAG             ATAGAATGAG
   AGATAGACAG               ACAGACAGAT
  AGATAGACAG               AGAGACAGAT
  TGATAGATAG             ATAGATAGAT
  TGATAGATAG           AATGATAGAT
   AGATTGAGTG        ACAGATCGAT
     AGAACCTTTCT   CAGTAACAGT
       CTTTCTCGC TGGCTTGCTT
         TCTAA CAACCTTACT
           G ACTGCCTTTC
           TGAGATAGAT CGA
         TAGATAGATA GACAGAC
       AGATAGATAG  ATAGAATGAC
     AGACAGAGAG      ACAGAATGAT
   CGAGAGACAG          ATAGATAGAT
  AGAATGATAG             ACAGATAGAC
  AGATAGATAG               ACAGACAGAT
  AGACAGACTG                 ATAGATAGAT
   AGATAGATAG                 AATGACAGAT
     CGATTGAATG               ACAGATAGAT
       CGACAGATAG             ATAGACAGAT
         AGAGTGATAG          ATTGATCGAC
           TGATTGATAG      ACTGATTGAT
             AGACAGATAG  AGTGACAGAT
               CGACAGA TAGATAGATA
                 GATA GATAGATAG
                    ATAGACAGA G
                  AGATAGATAG ACA
                GTCGCAAGTTC GCTCACA
])=~s/\s+//g;%a=map{chr $_=>$i++}65,84,67,
71;$p=join$;,keys%a;while($d=~/([$p]{4})/g
){next if$j++%96>=16;$c=0;for$d(0..3){$c+=
$a{substr($1,$d,1)}*(4**$d)}$perl.=chr $c}
             eval $perl;

When run, it prints out Just another genome hacker.

After running the code trough Deparse and perltidy (perl -MO=Deparse jagh.pl | perltidy) the code looks like this:

( my $d =
"AA...GCTCACA\n" # snipped double helix part
) =~ s/\s+//g;
(%a) = map( { chr $_, $i++; } 65, 84, 67, 71 );
$p = join( $;, keys %a );
while ( $d =~ /([$p]{4})/g ) {
    next if $j++ % 96 >= 16;
    $c = 0;
    foreach $d ( 0 .. 3 ) {
        $c += $a{ substr $1, $d, 1 } * 4**$d;
    }
    $perl .= chr $c;
}

Here's what I've been able to decipher on my own.

( my $d =
"AA...GCTCACA\n" # snipped double helix part
) =~ s/\s+//g;

removes all whitespace in $d (the double helix).

(%a) = map( { chr $_, $i++; } 65, 84, 67, 71 );

makes a hash with as keys A, T, C and G and as values 0, 1, 2 and 3.
I normally code in Python, so this translates to a dictionary {'A': 0, 'B': 1, 'C': 2, 'D': 3} in Python.

$p = join( $;, keys %a );

joins the keys of the hash with $; the subscript separator for multidimensional array emulation. The documentation says that the default is "\034", the same as SUBSEP in awk, but when I do:

my @ascii = unpack("C*", $p);
print @ascii[1];

I get the value 28? Also, it is not clear to me how this emulates a multidimensional array. Is $p now something like [['A'], ['T'], ['C'], ['G']] in Python?

    while ( $d =~ /([$p]{4})/g ) {

As long as $d matches ([$p]{4}), execute the code in the while block. but since I don't completely understand what structure $p is, i also have a hard time understanding what happens here.

next if $j++ % 96 >= 16;

Continue if the $j modulo 96 is greater or equal to 16. $j increments with each pass of the while loop (?).

$c = 0;
foreach $d ( 0 .. 3 ) {
    $c += $a{ substr $1, $d, 1 } * 4**$d;
}

For $d in the range from 0 to 3 extract some substring, but at this point I'm completely lost. The last few lines concatenate everything and evaluates the result.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

匿名的好友 2025-01-14 10:04:09

注意:不要盲目运行混淆的 perl,特别是如果有 eval、反引号、systemopen、等调用其中的某个地方这可能不太明显*。在您了解发生了什么之前,必须使用 Deparse 对其进行反混淆,并小心地将 eval 替换为 print 语句。还应该考虑在沙箱中/使用非特权用户/在虚拟机中运行。

*s&&$_ⅇ 计算 $_ 的实例。


第一个观察:034 是八进制。它等于 28(十进制)或 0x1c(十六进制),所以没有什么可疑之处。

$; 的东西纯粹是混淆,找不到特别使用它的理由。 $p 只是一个字符串 ATCG (用 . 替换为 $;,无论它是什么)。< br>
因此,在正则表达式中 [$p] 匹配任何 {'A', 'T', 'C', 'G', $;}。由于 $; 从未出现在 $d 中,因此它在那里毫无用处。
反过来,[$p]{4} 匹配上述集合中四个字母的任何序列,就好像它已被使用一样(忽略无用的 $;):

while ( $d =~ /([ATCG]{4})/g ) { ... }

如果您必须自己写这个,删除空格后,您只需获取长度为四的 $d 的每个连续子字符串(假设 $d 中没有其他字符) 。

现在这部分很有趣:

foreach $d ( 0 .. 3 ) {
    $c += $a{ substr $1, $d, 1 } * 4**$d;
}
  • $1 保存当前的四字母代码点。 substr $1, $d, 1 返回该代码点的每个连续字母。
  • %aA 映射到 00b(二进制),T 映射到 01b,C 映射到 10b,并且G 到 11b。

    <前><代码>A 00
    T 01
    碳10
    G 11

  • 乘以 4**$d 相当于按位左移 0、2、4 和 6。

因此,这个有趣的构造允许您在以 ATCG 作为数字的四进制系统中构建任何 8 位值!

即它执行以下转换:

         A A A A
AAAA -> 00000000

         T A A T
TAAT -> 01000001 -> capital A in ascii

         T A A C
CAAT -> 01000010 -> capital B in ascii

CAATTCCTGGCTGTATTTCTTTCTGCCT -> BioGeek

这部分:

next if $j++ % 96 >= 16;

使上述转换仅针对前 16 个“代码点”运行,跳过接下来的 80 个,然后转换接下来的 16 个,跳过接下来的 80 个,等等。它本质上只是跳过了部分椭圆形(垃圾 DNA 去除系统)。


这是一个丑陋的文本到 DNA 转换器,您可以使用它来生成任何内容来替换螺旋(不处理 80 跳过的事情):

use strict;
use warnings;
my $in = shift;

my %conv = ( 0 => 'A', 1 => 'T', 2 => 'C', 3 => 'G');

for (my $i=0; $i<length($in); $i++) {
    my $chr = substr($in, $i, 1);
    my $chv = ord($chr);
    my $encoded ="";
    $encoded .= $conv{($chv >> 0) & 0x3};
    $encoded .= $conv{($chv >> 2) & 0x3};
    $encoded .= $conv{($chv >> 4) & 0x3};
    $encoded .= $conv{($chv >> 6) & 0x3};
    print $encoded;
}
print "\n";
$ perl q.pl 'print "BioGeek\n";'
AAGTCAGTTCCTCGCTATGTAACACACACAATTCCTGGCTGTATTTCTTTCTGCCTAGTTCGCTCACAGCGA

粘贴 $d 来代替螺旋(并删除跳过解码器中的一部分)。

Caution: don't blindly run obfuscated perl, especially if there's an eval, backticks, system, open, etc. call somewhere in it and that might not be all too obvious*. De-obfuscating it with Deparse and carefully replacing the evals with print statements is a must until you understand what's going on. Running in a sandbox/with an unprivileged user/in a VM should be considered too.

*s&&$_ⅇ evaluates $_ for intance.


First observation: 034 is octal. It's equal to 28 (dec) or 0x1c (hex), so nothing fishy there.

The $; thing is purely obfuscation, can't find a reason to use that in particular. $p will just be a string A.T.C.G (with . replaced by $;, whatever it is).
So in the regex [$p] matches any of {'A', 'T', 'C', 'G', $;}. Since $; never appears in $d, it's useless there.
In turn [$p]{4} matches any sequence of four letters in the above set, as if this had been used (ignoring the useless $;):

while ( $d =~ /([ATCG]{4})/g ) { ... }

If you had to write this yourself, after having removed whitespace, you'd just grab each successive substring of $d of length four (assuming there are no other chars in $d).

Now this part is fun:

foreach $d ( 0 .. 3 ) {
    $c += $a{ substr $1, $d, 1 } * 4**$d;
}
  • $1 holds the current four-letter codepoint. substr $1, $d, 1 returns each successive letter from that codepoint.
  • %a maps A to 00b (binary), T to 01b, C to 10b, and G to 11b.

    A   00
    T   01
    C   10
    G   11
    
  • multiplying by 4**$d will be equivalent to a bitwise left shift of 0, 2, 4 and 6.

So this funny construct allows you to build any 8bit value in the base-four system with ATCG as digits!

i.e. it does the following conversions:

         A A A A
AAAA -> 00000000

         T A A T
TAAT -> 01000001 -> capital A in ascii

         T A A C
CAAT -> 01000010 -> capital B in ascii

CAATTCCTGGCTGTATTTCTTTCTGCCT -> BioGeek

This part:

next if $j++ % 96 >= 16;

makes the above conversion run only for the first 16 "codepoints", skips the next 80, then converts for the next 16, skips the next 80, etc. It essentially just skips parts of the ellipse (junk DNA removal system).


Here's an ugly text to DNA converter that you could use to produce anything to replace the helix (doesn't handle the 80 skip thing):

use strict;
use warnings;
my $in = shift;

my %conv = ( 0 => 'A', 1 => 'T', 2 => 'C', 3 => 'G');

for (my $i=0; $i<length($in); $i++) {
    my $chr = substr($in, $i, 1);
    my $chv = ord($chr);
    my $encoded ="";
    $encoded .= $conv{($chv >> 0) & 0x3};
    $encoded .= $conv{($chv >> 2) & 0x3};
    $encoded .= $conv{($chv >> 4) & 0x3};
    $encoded .= $conv{($chv >> 6) & 0x3};
    print $encoded;
}
print "\n";
$ perl q.pl 'print "BioGeek\n";'
AAGTCAGTTCCTCGCTATGTAACACACACAATTCCTGGCTGTATTTCTTTCTGCCTAGTTCGCTCACAGCGA

Stick in $d that instead of the helix (and remove the skipping part in the decoder).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文