通过 HTMLParser 验证 Google 优化器 javascript 代码

发布于 2024-08-06 22:13:32 字数 1553 浏览 7 评论 0原文

我正在尝试将下面的 Google 网站优化器 JavaScript 代码包含在 Zope3 页面模板中。它用于 A/B 测试。

但是,模板 html 解析器(我认为是标准的 Python HTMLParser 模块)会抛出以下错误:

raise PTRuntimeError(str(self._v_errors))
- Warning: Compilation failed
- Warning: <class 'HTMLParser.HTMLParseError'>: bad end tag: u"</sc'+'ript>", at line 45, column 44
PTRuntimeError: ['Compilation failed', '<class \'HTMLParser.HTMLParseError\'>: bad end tag: u"</sc\'+\'ript>", at line 45, column 44']

正如我所见,我有两个选择:

  • 重写代码,使其通过(我的 JS-foo 很弱,不知道从哪里开始)。

  • 使 HTMLParser 忽略该代码。我尝试过 CDATA 标签但没有成功。我还尝试将 js 放入外部文件并链接到它,但这似乎会破坏优化器功能。

可疑代码:

<!-- Google Website Optimizer Control Script -->
<script>
<![CDATA[
function utmx_section(){}function utmx(){}
(function(){var k='1010538027',d=document,l=d.location,c=d.cookie;function f(n){
if(c){var i=c.indexOf(n+'=');if(i>-1){var j=c.indexOf(';',i);return c.substring(i+n.
length+1,j<0?c.length:j)}}}var x=f('__utmx'),xx=f('__utmxx'),h=l.hash;
d.write('<sc'+'ript src="'+
'http'+(l.protocol=='https:'?'s://ssl':'://www')+'.google-analytics.com'
+'/siteopt.js?v=1&utmxkey='+k+'&utmx='+(x?x:'')+'&utmxx='+(xx?xx:'')+'&utmxtime='
+new Date().valueOf()+(h?'&utmxhash='+escape(h.substr(1)):'')+
'" type="text/javascript" charset="utf-8"></sc'+'ript>')})();
]]>
</script><script>utmx("url",'A/B');</script>
<!-- End of Google Website Optimizer Control Script -->

I'm trying to include the Google Website Optimizer JavaScript code, below, in a Zope3 page template. It's used for for A/B testing.

However, the template html parser, which I believe is the standard Python HTMLParser module, throws the following error:

raise PTRuntimeError(str(self._v_errors))
- Warning: Compilation failed
- Warning: <class 'HTMLParser.HTMLParseError'>: bad end tag: u"</sc'+'ript>", at line 45, column 44
PTRuntimeError: ['Compilation failed', '<class \'HTMLParser.HTMLParseError\'>: bad end tag: u"</sc\'+\'ript>", at line 45, column 44']

As I see it I have two options:

  • Rewrite the code so it passes (my JS-foo is weak, no idea where to start).

  • Make HTMLParser ignore the code. I've tried CDATA tags with no success. I've also tried putting the js in an external file and linking to it, but this seems to break the optimizer functionality.

The suspect code:

<!-- Google Website Optimizer Control Script -->
<script>
<![CDATA[
function utmx_section(){}function utmx(){}
(function(){var k='1010538027',d=document,l=d.location,c=d.cookie;function f(n){
if(c){var i=c.indexOf(n+'=');if(i>-1){var j=c.indexOf(';',i);return c.substring(i+n.
length+1,j<0?c.length:j)}}}var x=f('__utmx'),xx=f('__utmxx'),h=l.hash;
d.write('<sc'+'ript src="'+
'http'+(l.protocol=='https:'?'s://ssl':'://www')+'.google-analytics.com'
+'/siteopt.js?v=1&utmxkey='+k+'&utmx='+(x?x:'')+'&utmxx='+(xx?xx:'')+'&utmxtime='
+new Date().valueOf()+(h?'&utmxhash='+escape(h.substr(1)):'')+
'" type="text/javascript" charset="utf-8"></sc'+'ript>')})();
]]>
</script><script>utmx("url",'A/B');</script>
<!-- End of Google Website Optimizer Control Script -->

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

木落 2024-08-13 22:13:32

鉴于解析器的弱点,您可以尝试分解 CDATA 中试图解释为标签的部分,例如您现在有 ' 尝试 < ;'+'/sc'+'ript>' 等(+ 在 JS 中进行字符串连接,就像在 Python 中一样,因此它会再次将您分解的标签重新组合在一起方式,就像原版中已经分解的标签一样)。

如果持续出现解析错误,请丢失 CDATA 并将每个 < 更改为 <,将每个 > 更改为 & ;gt; -- 不确定 JS 是否会处理这个问题,但值得一试......祝你好运!

Given the parser's weakness, you could try breaking up the parts of the CDATA that it's trying to interpret as tags, e.g. where you now have </sc'+'ript>' try <'+'/sc'+'ript>' etc (+ does string catenation in JS, just like in Python, so it will put back together again the tags you break up this way, just like the tags that are already broken up in the original).

If that keeps giving parse errors, lose the CDATA and change every < into <, every > into > -- not sure if JS will handle that but it's worth a try... good luck!

日暮斜阳 2024-08-13 22:13:32

实际上,通过完全避免 HTML 解析器来解决这个问题是微不足道的。只需将 google 控制脚本作为 Javascript 放入单独的文件或对象中,然后使用 TAL include 将其拉入页面即可。

当文件作为 Javascript 引入时,不使用 HTML 解析器。

Actually, this problem is trivial to solve by avoiding the HTML parser all together. Simply put the google Control Script into a separate file or object as Javascript and pull it into the page with a TAL include.

The HTML parser is not used when the file is brought in as Javascript.

━╋う一瞬間旳綻放 2024-08-13 22:13:32

我的猜测是解析器不喜欢

</sc'+'ript> 

分成两部分的事实。哪个是完全有效的 javascript,但可能会混淆 htmlparser?

可能想尝试一下

<'+'/sc'+'ript>'

My guess is the parser doesn't like the fact that

</sc'+'ript> 

is split in two. Which is perfectly valid javascript but may confuse the htmlparser?

Might want to try

<'+'/sc'+'ript>'
甲如呢乙后呢 2024-08-13 22:13:32

另一种选择是将代码放置在外部文件中并引用它,而不是将其直接嵌入到代码中。我已经这样做了并且效果很好。如果您不希望验证器抓取任何 javascript 或 css,这始终是一种更简单的方法。

One other option you have is to place the code in an external file and reference it instead of embedding it directly into the code. I've done this and it works well. That's always an easier way if you don't want the validator to crawl any javascript or css.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文