如何让 Mechanize 自动将正文转换为 UTF8?
我找到了一些使用 post_connect_hook
和 pre_connect_hook
的解决方案,但似乎它们不起作用。我正在使用最新的 Mechanize 版本 (2.1)。新版本中没有 [:response]
字段,我不知道新版本中从哪里获取它们。
- https://gist.github.com/search?q=pre_connect_hooks
- https://gist.github.com/search?q=post_connect_hooks
是是否可以让 Mechanize 返回 UTF8 编码版本,而不必使用 iconv 手动转换它?
I found some solutions using post_connect_hook
and pre_connect_hook
, but it seems like they don't work. I'm using the latest Mechanize version (2.1). There are no [:response]
fields in the new version, and I don't know where to get them in the new version.
- https://gist.github.com/search?q=pre_connect_hooks
- https://gist.github.com/search?q=post_connect_hooks
Is it possible to make Mechanize return a UTF8 encoded version, instead of having to convert it manually using iconv
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
自 Mechanize 2.0 起,
pre_connect_hooks()
和post_connect_hooks()
的参数已更改。请参阅 Mechanize 文档:
现在您无法更改内部响应主体值,因为参数不是数组。因此,下一个最佳方法是用您自己的解析器替换内部解析器:
Since Mechanize 2.0, arguments of
pre_connect_hooks()
andpost_connect_hooks()
were changed.See the Mechanize documentation:
Now you can't change the internal response-body value because an argument is not array. So, the next best way is to replace an internal parser with your own:
我找到了一个效果很好的解决方案:
尚未发现问题。
I found a solution that works pretty well:
No issues were found yet.
在您的脚本中,只需输入:
page.encoding = 'utf-8'
但是,根据您的情况,您可能需要输入相反的内容(Mechanize 正在使用的网站的编码) 。为此,请打开 Firefox,打开您希望 Mechanize 使用的网站,选择菜单栏中的“工具”,然后打开“页面信息”。从那里确定页面的编码内容。
使用该信息,您可以输入页面的编码内容(例如
page.encoding = 'windows-1252'
)。In your script, just enter:
page.encoding = 'utf-8'
However, depending on your scenario, you may alternatively need to enter the reverse (the encoding of the website Mechanize is working with) instead. For that, open Firefox, open the website you want Mechanize to work with, select Tools in the menubar, and then open Page Info. Determine what the page is encoded in from there.
Using that info, you would instead enter what the page is encoded in (such as
page.encoding = 'windows-1252'
).像这样的事情怎么样:
How about something like this: