如何使用此 preg_replace 取消注释 JavaScript 代码?

发布于 2024-10-19 17:39:11 字数 902 浏览 8 评论 0原文

我正在尝试使用 php preg_replace() 对我的 javascript 中的 // comments 进行注释,并制作一个 preg_replace ,它应该执行以下操作:

1.当注释在新行上开始时,删除整行: // COMMENTS .....

2.当注释位于脚本中间时,1个TAB后 // 删除该注释部分 示例脚本(); // (1space) comments

3.不匹配 http:// 中的 //

此 pregreplace 执行上述工作,但是,它当前删除了 3 行带有 //< 的代码/code> 在其中。(请参阅下面的错误匹配标题)它应该跳过。

$buffer = preg_replace('/(?<!http:)\/\/\s*[^\r\n]*/', '', $buffer);

很好的匹配

//某事

//某事 *!&~@#^hjksdhaf

function();// comment

错误匹配

(/\/\.\//)
"//"  
"://"  

那么,如何过滤掉这三个错误匹配以及如何更改以下正则表达式?

(?<!http:)\/\/\s*[^\r\n]*

PS,我不想使用其他人的代码压缩器/框架来承担自己的开销。目前只是我自己的。

I'm trying to decomment my // comments in my javascript with php preg_replace() and made a preg_replace which should do following:

1.When a comment start on a new line, delete that entire line:
// COMMENTS .....

2.When comment is halfway behind a script, after 1 TAB // remove that comment part
exampleScript(); // (1space) comments

3.Don't match the // in http://

This pregreplace does the above job, HOWEVER, it currently removes 3 lines of code with // in it. (see the false matches header below) which it should skip.

$buffer = preg_replace('/(?<!http:)\/\/\s*[^\r\n]*/', '', $buffer);

good matches

//something

// something *!&~@#^hjksdhaf

function();// comment

false matches

(/\/\.\//)
"//"  
"://"  

So, How can I filter these three false matches out and how to change the below regex?

(?<!http:)\/\/\s*[^\r\n]*

PS, I don't wish to use others' code minifiers/frameworks with their own overheads. Just my own for now.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

仅此而已 2024-10-26 17:39:11

为什么不使用预先存在的 JavaScript 压缩器,例如 YUI 压缩机(PHP 绑定 此处)?


如果您确实想编写自己的代码,请查看 源代码看看它是如何完成的。
简短版本:正确的方法是使用正确的解析器/标记器方法。

Why not use a preexisting JavaScript minifier, like the YUI Compressor (PHP bindings here)?


If you are really set on writing your own, have a look through the source code to see how it's done.
Short version: The Right Way is to use a proper parser/tokenizer approach.

剑心龙吟 2024-10-26 17:39:11

JavaScript 的语法是上下文无关语法(我相信它是 LL(1) 可解析的)。 无法用正则表达式解析。

在可计算性理论中的形式语言理论中,有一个称为泵引理的结果,它证明不能用正则表达式解析任意上下文无关文法。

问题的要点是:您不能只查找字符串 //,因为它可能包含在其他有效的代码中,例如字符串。您不能只在两个引号内查找 // ,因为那样您会得到像 alert('no!') // can't do it 其中文本 ) // can 从技术上讲包含在两个 ' 标记之间。相反,您必须检测字符串的开始和结束位置。更糟糕的是,一种类型的字符串可以嵌套在另一种类型的字符串中,并且字符串(甚至半开字符串)可以嵌套在注释中!

没有简单的通用解决方案——JavaScript 语法元素(如字符串、方括号、圆括号等)可以嵌套任意深度。准确检测任何语法元素开始和结束位置的唯一方法是正确解析您在此过程中可能遇到的所有语法元素。

正确的答案是使用实际的解析器。

The grammar of JavaScript is a context-free grammar (I believe it's LL(1)-parseable). It cannot be parsed with regular expressions.

In the theory of formal languages in computability theory, there is a result known as the pumping lemma which proves that you cannot parse arbitrary context-free grammars with a regular expression.

The gist of the problem is this: you can't just look for the string //, because it could be contained inside otherwise valid code, for example, a string. You can't just look for a // inside two quotation marks, because then you'd get false positives like alert('no!') // can't do it where the text ) // can is technically contained between two ' marks. Instead, you'd have to detect where strings begin and end. Worse, one type of strings can be nested inside another type of strings, and strings (even half-open strings) can be nested inside of comments!

There is no simple general solution -- JavaScript syntactic elements like strings, brackets, parentheses, etc., can be nested arbitrarily many levels deep. The only way to accurately detect where any syntactic element begins and ends is to correctly parse all the syntactic elements that you might encounter along the way.

The correct answer is to use an actual parser.

你如我软肋 2024-10-26 17:39:11
$buffer = preg_replace('/(?<!\S)\/\/\s*[^\r\n]*/', '', $buffer);

适用于问题中提到的所有实例:保留正匹配,删除错误匹配。

网上三个很棒的网站可以帮助您找到正确的正则表达式:

http://gskinner.com/RegExr/

http://lumadis.be/regex/test_regex.php

http://cs.union.edu/~hannayd/csc350/simulators/RegExp/reg.htm

$buffer = preg_replace('/(?<!\S)\/\/\s*[^\r\n]*/', '', $buffer);

Works on all of the instances mentioned in the question: keeps the positive matches, removes the false matches.

Three awesome websites on the net that help with finding the correct regex:

http://gskinner.com/RegExr/

http://lumadis.be/regex/test_regex.php

http://cs.union.edu/~hannayd/csc350/simulators/RegExp/reg.htm

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文