使用 WWW::Mechanize 透明地处理 GZip 编码内容

发布于 2024-07-19 08:03:43 字数 587 浏览 19 评论 0原文

我正在使用 WWW::Mechanize,目前在代码中使用 'Content-Encoding: gzip' 标头处理 HTTP 响应,首先检查响应标头,然后使用 IO::Uncompress::Gunzip 获取未压缩的内容。

不过,我想透明地执行此操作,以便 WWW::Mechanize 方法(如 form()、links() 等)处理并解析未压缩的内容。 由于 WWW::Mechanize 是 LWP::UserAgent 的子类,因此我更喜欢使用 LWP::UA::handlers 来执行此操作。

虽然我取得了部分成功(例如,我可以打印未压缩的内容),但我无法以我可以调用的方式透明地执行此操作。

$mech->forms();

总之:如何“替换” $mech 对象内的内容,以便从从那时起,所有 WWW::Mechanize 方法都像内容编码从未发生过一样工作?

我将感谢您的关注和帮助。 谢谢

I am using WWW::Mechanize and currently handling HTTP responses with the 'Content-Encoding: gzip' header in my code by first checking the response headers and then using IO::Uncompress::Gunzip to get the uncompressed content.

However I would like to do this transparently so that WWW::Mechanize methods like form(), links() etc work on and parse the uncompressed content. Since WWW::Mechanize is a sub-class of LWP::UserAgent, I would prefer to use the LWP::UA::handlers to do this.

While I have been partly successful (I can print the uncompressed content for example), I am unable to do this transparently in a way that I can call

$mech->forms();

In summary: How do I "replace" the content inside the $mech object so that from that point onwards, all WWW::Mechanize methods work as if the Content-Encoding never happened?

I would appreciate your attention and help.
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

萧瑟寒风 2024-07-26 08:03:43

在我看来,您可以使用 $res->content( $bytes ) 成员来替换它。

顺便说一下,我通过查看 LWP::UserAgent 的来源,然后是 HTTP::Response,然后 HTTP::消息

It looks to me like you can replace it by using the $res->content( $bytes ) member.

By the way, I found this stuff by looking at the source of LWP::UserAgent, then HTTP::Response, then HTTP::Message.

幸福不弃 2024-07-26 08:03:43

它是与 UserAgent 一起内置的,因此是 Mechanize 内置的。 一个重要的警告可以为您节省一些麻烦

- 要进行调试,请确保在调用decoded_content后检查错误$@

$html = $r->decoded_content;
die $@ if $@;

更好的是,查看 HTTP::Message 的源代码并确保所有支持包都在那里。

在我的例子中,decoded_content 返回了 undef,而内容是原始二进制文件,我继续进行徒劳的追逐。 UserAgent 将在解码失败时设置错误标志,但 Mechanize 将忽略它(它不会检查或将事件记录为自己的错误/警告)。

就我而言 $@ sez: "Can't find IO/HTML.pm .. It was eval'ed

在深入研究源代码后,我发现内置的解码过程漫长、细致且艰巨,涵盖了几乎每个场景并进行大量猜测(谢谢 Gisle!

如果您很偏执,请在 new() 中明确设置每个请求使用的默认标头。

    $browser = new WWW::Mechanize('default_headers' => HTTP::Headers->new('Accept-Encoding' 
                            => scalar HTTP::Message::decodable()));

It is built in with UserAgent and thus Mechanize. One MAJOR caveat to save you some hair

-To debug, make sure you check for error $@ after the call to decoded_content.

$html = $r->decoded_content;
die $@ if $@;

Better yet, look through the source of HTTP::Message and make sure all the support packages are there

In my case, decoded_content returned undef while content is raw binary, and I went on a wild goose chase. UserAgent will set the error flag on failure to decode, but Mechanize will just ignore it (It doesn't check or log the incidence as its own error/warning).

In my case $@ sez: "Can't find IO/HTML.pm .. It was eval'ed

After having to dive into the source, I find out the built-in decoding process is long, meticulous, and arduous, covering just about every scenario and making tons of guesses (Thank you Gisle!).

if you are paranoid, explicitly set the default header to be used with every request at new()

    $browser = new WWW::Mechanize('default_headers' => HTTP::Headers->new('Accept-Encoding' 
                            => scalar HTTP::Message::decodable()));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文