是否可以编写一个正则表达式来检查字符串(javascript 和 php 代码)是否被缩小?
是否可以编写一个正则表达式来检查字符串(某些代码)是否被缩小?
许多 PHP/JS 混淆器会删除空格字符(除其他外)。 因此,最终的缩小代码有时看起来像这样:
PHP:
$a=array();if(is_array($a)){echo'ok';}
JS:
a=[];if(typeof(a)=='object'&&(a instanceof Array){alert('ok')}
在这两种情况下,“{”、“}”、“;”等前后都没有空格字符。还有一些其他模式可以提供帮助。我并不期待高精度的正则表达式,只需要一个检查至少 100 个字符的字符串是否看起来像精简代码的正则表达式。 谢谢指教。
用途:网络恶意软件扫描程序
Is it possible to write a regular expression which checks if a string (some code) is minified?
Many PHP/JS obfuscators remove white space chars (among other things).
So, the final minified code sometimes looks like this:
PHP:
$a=array();if(is_array($a)){echo'ok';}
JS:
a=[];if(typeof(a)=='object'&&(a instanceof Array){alert('ok')}
in both cases there are no space chars before and after "{", "}", ";", etc. There also some other patterns which can help. I am not expecting a high accuracy regex, just need one which checks if at least 100 chars of string looks like minified code.
Thanks in advice.
PURPOSES: web malware scanner
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为缩小器会删除所有换行符,尽管如果将缩小后的代码粘贴回文本编辑器中,文件末尾可能仍然有一个换行符。像这样的东西可能会相当准确:
这只是测试整个事情中没有换行符,除了最后可能有一个。所以不能保证,但我认为它适用于任何较长的代码块。
I think a minifier will strip all newline characters, although there might possibly be one at the end of the file still if the minified code was pasted back in a text editor. Something like this will probably be fairly accurate:
That just tests that there are no newline characters in the whole thing except for possibly one at the end. So no guarantees, but I think it will work well on any longish block of code.
简短的回答是“不”,正则表达式不能做到这一点。
您最好的选择可能是对源文件进行统计分析,并与一些已知的启发式方法进行比较。例如,通过将变量名称与最小化代码中常见的变量名称进行比较。例如,最小化的文件可能有很多单字符变量名......并且在所有单字符变量名都用完之前不会有两字符变量名......等等。
另一种选择是简单地通过最小化器运行源文件,并查看输出与输入是否有足够的不同。如果没有,它可能已经被最小化了。
但我不得不同意sg3s的最后一句话:如果您能解释为什么您需要这个,我们也许可以针对您的实际需求提供更有用的答案。
The short answer is "no", regex cannot do this.
Your best bet will probably be to do a statistical analysis of the source files, and compare against some known heuristics. For instance, by comparing the variable names against those often found in minimized code. A minimized file probably has a lot of one-character variable names, for instance... and won't have two-character variable names until all the one-character variable names are exhausted... etc.
Another option would be simply to run the source file through a minimizer, and see if the output is sufficiently different from the input. If not, it was probably already minimized.
But I have to agree with sg3s's final sentence: If you can explain why you need this, we can probably provide more useful answers to your actual needs.
不会。因为语法/代码及其意图不会改变,并且一些非常熟悉 php 和/或 js 的人会在一行上编写简单的函数,根本没有任何空格(我:s)。
您可以做的是计算字符串中的所有空白字符,但这也是不可靠的,因为对于某些东西您只需要空白,例如 x instanceof y heh。另外,并非所有代码都被缩小并压缩为一行(请参阅 jQuery UI),因此您也不能真正指望这一点......
也许您可以解释为什么您需要知道这一点,我们可以尝试找到替代方案?
No. Since the syntax/code and its intention doesn't change and some people who're very familiar with the php and/or js will write simple functions on one line without any whitespace at all (me :s).
What you could do is count all the whitespace characters in a string though this would also be unreliable since for some stuff you simply need whitespace, like x instanceof y heh. Also not all code is minified and cramped into a single row (see jQuery UI) so you can't really count on that either....
Maybe you can explain why you need to know this and we can try and find an alternative?
您无法判断它是否被缩小或只是手工编写(可能仅适用于较小的脚本)。但您可以检查它是否不包含不必要的空格。
看一下开源混淆器/压缩器,看看他们使用什么规则来删除空格。验证这些规则是否适用应该有效,如果正则表达式变得复杂,可能需要一个简单的解析器。
只需确保排除像
a="if ( b )"
这样的字符串文字即可。You can't tell if it's got minified or just written like that by hand (probably only applies for smaller scripts). But you can check if it doesn't contain unnecessary whitespace.
Take a look at open source obfuscator/minifier and see what rules they use to remove the whitespace. Validating if those rules were applied should work, if regex get to complex, a simple parser might be needed.
Just make sure that string literals like
a="if ( b )"
are excluded.通过该特定语言的解析器运行它(即使是美化器也可能工作得很好)并修改它以计算未使用的字符数。使用文档中未使用字符的百分比与字符数作为缩小测试。我认为你不能用正则表达式准确地做到这一点,尽管计算空格与文档内容可能没问题。
Run it through a parser for that particular language (even a prettifier might work fine) and modify it to count the number of unused characters. Use the percentage of unused chars vs. number of chars in documents as a test for minification. I don't think you can do this accurately with regex, although counting whitespace vs. document content might be okay.