使用 PHP 从必须登录的网站 (Reddit) 抓取和使用数据?

发布于 2024-08-21 19:10:58 字数 261 浏览 6 评论 0原文

我想创建一个网页,给定两个 reddit 用户名及其密码,让 user2 订阅 user1 订阅的所有 subreddit。所以我需要:

  1. 获取 user1 订阅的 subreddits。
  2. 订阅 user2 到那些 reddit

我有使用 PHP 的经验,但我没有抓取(特别是当用户必须登录时)以及提交用户“订阅”子 reddit 所需的信息类型的经验。有人对如何做到这一点有任何想法吗?

问候,

蒂姆

I would like to create a webpage that, given two reddit usernames and their passwords, subscribes user2 to all of the subreddits that user1 is subscribed to. So I need to:

  1. Get the subreddits that user1 is subscribed to.
  2. Subscribe user2 to those reddits

I have experience using PHP, but I have no experience with scraping (especially when the user must be logged in) and also submitting the type of information that would be necessary to "subscribe" a user to a subreddit. Does anyone have any ideas on how this can be done?

Regards,

Tim

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

硬不硬你别怂 2024-08-28 19:10:58

假设这不违反 reddits 的服务条款,使用 cURL 登录,人们可能可以轻松地 regex 获取必要的信息。从那里开始,需要检查 reddit 如何订阅收藏夹,以及导航到正确的 URL 或发布表单数据。

我将其称为中级任务,只要它不违反 reddit 服务条款。

Assuming this isn't against reddits' terms of service, using cURL to login, one could probably easily regex the necessary information. From there it's a matter of checking how reddit subscribes favorites and either navigating to the proper urls or posting form data.

I'd call it a medium-level task, as long as it's not against the reddit terms of service.

别理我 2024-08-28 19:10:58

开源产品 TestPlan 非常擅长此类事情。使用一种简单的语言,您可以使用一个用户登录该网站,获取子版块的名称,然后以其他用户身份登录以订阅组。

例如,如果您只想要顶部条目的标题,您可以使用以下代码:

GotoURL http://www.reddit.com/top/

set %Topics% as response //p[@class='title']
foreach %Topic% in %Topics%
    set %Title% as selectIn %Topic% string(.)
    Notice %Title%
end

它会产生如下输出:

00000000-00 GOTOURL http://www.reddit.com/top/
00000001-00 NOTICE LEGAL DVD vs. PIRATED COPY (i.imgur.com)
00000002-00 NOTICE Don't just shorten your URL, make it suspicious and frightening. - ShadyURL (shadyurl.com)
00000003-00 NOTICE HOLY CRAP! IS THAT A ROOM FOR RENT ON MY CRAIGSLIST??!?!? (houston.craigslist.org)
00000004-00 NOTICE Years from now when our children ask us, "What did we do after 9/11?" we shall explain it to them using this... (4gifs.com)
00000005-00 NOTICE TSA forces disabled boy to remove leg braces and walk through the metal detector. "I told him, 'This is overkill. He's 4 years old. I don't think he's a terrorist.' " (philly.com)
00000006-00 NOTICE This picture scares the shit out of me. (imgur.com)
00000007-00 NOTICE Civilization V Announced, in Development at Firaxis Games (hellforge.gameriot.com)
00000008-00 NOTICE I don't know, the price seems a little steep... [pic] (i.imgur.com)
00000009-00 NOTICE Reddit, last week we saw the depth of the ocean scaled relative to human size. I made a figure of the depth of the ocean accurately scaled to the width. It's really very shallow from this perspective. (i.imgur.com)

The open source product TestPlan is very good at such things. Using a simple language you can login to the site with one user, grab the names of the subreddits, then login as th other user to subscribe to the groups.

For example, if you just wanted the titles of the top entries you could use this code:

GotoURL http://www.reddit.com/top/

set %Topics% as response //p[@class='title']
foreach %Topic% in %Topics%
    set %Title% as selectIn %Topic% string(.)
    Notice %Title%
end

Which produces output like this:

00000000-00 GOTOURL http://www.reddit.com/top/
00000001-00 NOTICE LEGAL DVD vs. PIRATED COPY (i.imgur.com)
00000002-00 NOTICE Don't just shorten your URL, make it suspicious and frightening. - ShadyURL (shadyurl.com)
00000003-00 NOTICE HOLY CRAP! IS THAT A ROOM FOR RENT ON MY CRAIGSLIST??!?!? (houston.craigslist.org)
00000004-00 NOTICE Years from now when our children ask us, "What did we do after 9/11?" we shall explain it to them using this... (4gifs.com)
00000005-00 NOTICE TSA forces disabled boy to remove leg braces and walk through the metal detector. "I told him, 'This is overkill. He's 4 years old. I don't think he's a terrorist.' " (philly.com)
00000006-00 NOTICE This picture scares the shit out of me. (imgur.com)
00000007-00 NOTICE Civilization V Announced, in Development at Firaxis Games (hellforge.gameriot.com)
00000008-00 NOTICE I don't know, the price seems a little steep... [pic] (i.imgur.com)
00000009-00 NOTICE Reddit, last week we saw the depth of the ocean scaled relative to human size. I made a figure of the depth of the ocean accurately scaled to the width. It's really very shallow from this perspective. (i.imgur.com)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文