如何下载雅虎网上论坛?

发布于 2024-07-15 10:44:18 字数 703 浏览 7 评论 0原文

我想下载一些雅虎群组(文件、照片、消息、成员列表),我找到了这些脚本:

我已经从 CPAN 下载了 ActivePerl 和所需的模块(没什么花哨的;它们很容易找到)。 我已经成功安装了它们,但是当我运行脚本时,我在它告诉我已成功登录后收到错误: “在 yahoogroups_files.pl 第 244 行第 2 行的模式匹配 (m//) 中使用未初始化值 $cells。”

我猜测雅虎更改了页面布局或其他内容,但我无法自己更新脚本。 对于 Perl 和了解 Yahoo 生成页面的方式,我是一个新手,我只知道一些基本的 C++。 我想说的是,我并不懒惰,我会尝试自己修复它,但我需要你的帮助:提示、建议,任何事情。

PS:我已经联系了作者,但他不愿意更新脚本。

I want to download some Yahoo Groups (files, photos, messages, memberlist) and I've found these scripts:

I've downloaded ActivePerl and the needed modules from CPAN (nothing fancy; they're very easy to find). I've managed to install them, but when I run the script I get an error after it tells me that I've successfully logged in:
"Use of uninitialized value $cells in pattern match (m//) at yahoogroups_files.pl line 244, line 2."

I'm guessing that Yahoo changed the layout of the page or something, but I'm not able to update the script myself. I'm a newbie when it comes to Perl and understanding the way Yahoo generates the pages, I only know some basic C++. I want to mention that I'm not lazy, I'll try do fix it myself but I need your help: hints, advice, anything.

PS: I've contacted the author, but he isn't willing to update the scripts.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

巷子口的你 2024-07-22 10:44:18

您需要以下领域的知识:

  • 使用 html 解析器

  • http 知识 ( get/post/head )

  • 网页抓取

我建议你关注 WWW::Mechanize 因为它能够完成所有这些事情(以及更多)

编辑:另一个解决方案(不需要编程)是这样的:使用浏览器登录雅虎组,存储 cookie,然后运行 ​​wget ,传递存储 cookie 作为参数。 这样你就能很快完成任务。

在硬盘上找到浏览器的 cookies.txt 文件,然后像这样调用 wget (如果我没记错命令的话):

wget --load-cookies path_to_cookie_file -r -w 60 website

可以找到完整的手册页 此处

编辑2:另一种选择是使用 WebDriver 自动化 Firefox。 您可以使用 这篇文章作为如何实现这一目标的指南。

You would need knowledge in the following fields:

  • use of an html parser

  • http knowledge ( get/post/head )

  • web scraping

I suggest you focus on WWW::Mechanize since it's capable of all these things ( and more )

EDIT: another solution ( that doesn't need programming ) , is this: login with your browser on yahoo groups, store the cookie, and then run wget , passing the stored cookie as a parameter. This way you'll get the task accomplished very fast.

Find your browser's cookies.txt file on your harddrive, and then call wget like this ( if I remember the commands correctly ) :

wget --load-cookies path_to_cookie_file -r -w 60 website

The full man page can be found here

EDIT2: Another option is to use WebDriver to automate firefox. You can use this article as a guide on how to accomplish this.

萌梦深 2024-07-22 10:44:18

根据文件名,我假设您正在使用此处找到的雅虎组存档器: http://sourceforge.net/ items/grabyahoogroup/

我针对 SubEthaEdit 组运行了文件脚本,效果很好。 所有文件均顺利下载。

查看代码,如果 $cells 为空,则在 while 循环中处理 html 表时似乎会呕吐。

考虑到代码在我测试时确实有效,该组文件的列表可能出现问题。 您需要尝试输出 $content 并找出 243 上的正则表达式无法处理该 html 的位置和原因。

编辑:如果您不介意发布小组,我相信我自己或这里的其他人可以尝试并自行排除故障。 当问题无法重复时,很难确定到底发生了什么。 另外,尝试一下我所做的同一组,看看它是否适合你。 如果可行的话,当然与您正在尝试的团队有关。

By the filename I'm assuming you're using Yahoo Group archiver found here: http://sourceforge.net/projects/grabyahoogroup/

I ran the files script against the SubEthaEdit group and it works great. All of the files downloaded without incident.

Looking at the code it seems to barf while processing an html table in a while loop if $cells is empty.

Considering the code did work when I tested it it's possible there's something going on with the listing of that group's files. You'll want to try outputting $content and figure out where and why the regular expression on 243 isn't able to process that html.

EDIT: If you don't mind posting the group this is happening with I'm sure myself or someone else here can try it out and troubleshoot on their own. It's tough to pinpoint what's up when the issue can't be duplicated. Also, try the same group I did and see if it works out for you. Certainly something up with the group you're trying if that works.

能怎样 2024-07-22 10:44:18

不知道它是否会对您有帮助,但这就是我为使消息下载工作所做的事情:(

http://sourceforge.net/forum/forum.php?thread_id=3283915&forum_id=209170

我只使用消息下载,我没有查看文件下载)

Dunno if it will help you, but here's what I did to get the message-download working:

http://sourceforge.net/forum/forum.php?thread_id=3283915&forum_id=209170

(I only used message-download, I didn't look at file-download)

晨光如昨 2024-07-22 10:44:18

不久前正在修补这个问题,以备份我女朋友在大学的群组消息和文件。 在对最新脚本进行调试后,我发现 group_domain 声明似乎存在错误(我在 yahoo2maildir.pl 上发现了一个组声明错误) > 同一个项目,请参阅 $request

($group_domain) = $url =~ /\/\/(.*?groups.yahoo.com)\//;

在这种情况下,我已经用 sub download_folder() 函数覆盖了 $request var

from <br>
$request = GET "http://$group_domain/group/$group/files$sub_folder/";
<br> to <br>
$request = GET "http://**groups.yahoo.com/group/$user_group**/files$sub_folder/";

Was tinkering on this a while ago to backup my girlfriend's group messages and files from uni. Upon debugging on the latest scripts I've found out that there seems to be a bug on group_domain declaration (theres also a group declaration bug that i've found on yahoo2maildir.pl of the same project, see $request)

($group_domain) = $url =~ /\/\/(.*?groups.yahoo.com)\//;

in this case, i've overwritten the $request var under the function sub download_folder() with

from <br>
$request = GET "http://$group_domain/group/$group/files$sub_folder/";
<br> to <br>
$request = GET "http://**groups.yahoo.com/group/$user_group**/files$sub_folder/";
明月夜 2024-07-22 10:44:18

grabyahoogroup 在最新版本中运行良好,可以在 svn 存储库中找到:

http ://grabyahoogroup.svn.sourceforge.net/viewvc/grabyahoogroup/trunk/yahoo_group/

sourceforge.net/projects/grabyahoogroup/files/ 上的版本有错误并且不适合我。

grabyahoogroup works well in the latest edition, which can be found at the svn repo:

http://grabyahoogroup.svn.sourceforge.net/viewvc/grabyahoogroup/trunk/yahoo_group/

The version at sourceforge.net/projects/grabyahoogroup/files/ HAS BUGS AND DID NOT WORK FOR ME.

罪#恶を代价 2024-07-22 10:44:18

我一直在寻找一种从雅虎群组收集消息/对话的工具!。 我终于找到了这个可以转换您的 Yahoo! 的工具。 在努力尝试自己制作并在互联网上到处搜索之后,将消息分组为 MBOX 格式。

下载工具

以下两个都是 Google Chrome 扩展程序。

纯字符串到 Base64 二进制数据

在 2010 年 9 月 16 日过去的某个时间(至少对我来说),检索到的消息不再是纯文本,而是 Base 64 二进制数据 (ASCII)。 使用这个瑞士转换器工具可以让您按原样读取数据。

MBOX 格式的示例内容

VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4=

转换后的示例结果

敏捷的棕色狐狸跳过了懒狗。

I've been looking for a tool that collects messages/conversations from Yahoo Groups!. I finally found this tool that converts your Yahoo! Groups messages into MBOX format after struggling to try to make my own and searching everywhere on the internet.

Download tools

Both of the following are Google Chrome extensions.

Plain string to Base64 binary data

At some time past September 16, 2010 (at least for me), the messages retrieved are no longer plain text and instead Base 64 binary data (ASCII). Using this swiss converter tool can allow you to read the data as it is.

Sample content from the MBOX format

VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4=

Sample result after conversion

The quick brown fox jumps over the lazy dog.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文