抓取和修改表单

发布于 2024-08-31 11:00:59 字数 2477 浏览 15 评论 0原文

我在这里得到了一些很大的帮助,我已经非常接近解决我的问题了,我可以尝到它的滋味。但我似乎被困住了。

我需要从本地网络服务器抓取一个简单的表单,并且只返回与用户本地电子邮件匹配的行(即 onemyndseye@localhost)。 simplehtmldom 可以轻松提取正确的表单元素:

foreach($html->find('form[action*="delete"]') as $form) echo $form;

返回:

<form action="/delete" method="post">
    <input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
        http://www.linux.com/rss/feeds.php
    </a> [email: 
        onemyndseye@localhost (Default)
    ]<br />         
    <input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
        http://www.ubuntu.com/rss.xml
    </a> [email: 
        onemyndseye@localhost (Default)
    ]<br />         
<input type="submit" name="delete_submit" value="Delete Selected" /></form>

但是我在进行下一步时遇到了麻烦。它返回包含“onemyndseye@localhost”的行并将其删除,以便仅返回以下内容:

<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">http://www.linux.com/rss/feeds.php</a> <br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">http://www.ubuntu.com/rss.xml</a> <br />

感谢该网站的出色用户,我已经走到了这一步,甚至可以只返回链接,但我无法获取其余的链接。 .. 重要的是,完整的 标签应如上所示完全返回,因为稍后需要将 id 和 name 值传递回发布数据中的原始表单。

提前致谢!

***** 编辑 *****

由于 Yacoby,问题现在已接近解决。最后一个小障碍是 str_ireplace 留下了一些垃圾。也许删除
gt;
之间的所有文本会更容易......?

在 Yacoby 添加后,输出如下:

<form action="/delete" method="post">
    <input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
        http://www.linux.com/rss/feeds.php
    </a> [email: 
         (Default)
    ]<br />         
    <input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
        http://www.ubuntu.com/rss.xml
    </a> [email: 
         (Default)
    ]<br />         
    <input type="checkbox" id="D3" name="D3" /><a href="http://mythbuntu.org/rss.xml">
        http://mythbuntu.org/rss.xml
    </a> [email: 
        
    ]<br />         
<input type="submit" name="delete_submit" value="Delete Selected" /></form>

注意 [email: (Default)] 和 [email:] 已被留下。最后还需要删除表单操作并提交行,但我认为我可以从之前的建议中收集这部分内容。

Ive gotten some great help here and I am so close to solving my problem that I can taste it. But I seem to be stuck.

I need to scrape a simple form from a local webserver and only return the lines that match a users local email (i.e. onemyndseye@localhost). simplehtmldom makes easy work of extracting the correct form element:

foreach($html->find('form[action*="delete"]') as $form) echo $form;

Returns:

<form action="/delete" method="post">
    <input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
        http://www.linux.com/rss/feeds.php
    </a> [email: 
        onemyndseye@localhost (Default)
    ]<br />         
    <input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
        http://www.ubuntu.com/rss.xml
    </a> [email: 
        onemyndseye@localhost (Default)
    ]<br />         
<input type="submit" name="delete_submit" value="Delete Selected" /></form>

However I am having trouble making the next step. Which is returning lines that contain 'onemyndseye@localhost' and removing it so that only the following is returned:

<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">http://www.linux.com/rss/feeds.php</a> <br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">http://www.ubuntu.com/rss.xml</a> <br />

Thanks to the wonderful users of this site Ive gotten this far and can even return just the links but I am having trouble getting the rest... Its important that the complete <input> tags are returned EXACTLY as shown above as the id and name values will need to be passed back to the original form in post data later on.

Thanks in advance!

***** EDIT ******

Issue close to solved now thanks to Yacoby. The last small hurdle is that some trash is left behind from the str_ireplace. Perhaps it would be easier to remove all text between </a> and <br /> ...?

After Yacoby's additions the output is as follows:

<form action="/delete" method="post">
    <input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
        http://www.linux.com/rss/feeds.php
    </a> [email: 
         (Default)
    ]<br />         
    <input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
        http://www.ubuntu.com/rss.xml
    </a> [email: 
         (Default)
    ]<br />         
    <input type="checkbox" id="D3" name="D3" /><a href="http://mythbuntu.org/rss.xml">
        http://mythbuntu.org/rss.xml
    </a> [email: 
        
    ]<br />         
<input type="submit" name="delete_submit" value="Delete Selected" /></form>

Notice [email: (Default)] and [email: ] have been left behind. Also would need to remove the form action and submit lines at last but that part I think i can gather from the previous suggestion.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

哀由 2024-09-07 11:00:59

也许这样的东西

if ( stripos($form->innertext, 'onemyndseye@localhost') !== false ){
    $form->innertext = str_ireplace('onemyndseye@localhost', '', $form->innertext);
    echo $form;
}

不适用于像这样的html,

<b>onemyndseye</b>@localhost

因为很容易找到删除标签的文本是否与使用plaintext的字符串匹配,但替换起来要困难得多。

Maybe something like

if ( stripos($form->innertext, 'onemyndseye@localhost') !== false ){
    $form->innertext = str_ireplace('onemyndseye@localhost', '', $form->innertext);
    echo $form;
}

This won't work with html like

<b>onemyndseye</b>@localhost

As it is easy to find if the text with tags removed matches a string using plaintext but it is far harder to replace.

心房敞 2024-09-07 11:00:59

我已经使用以下代码片段解决了该问题:

$html = file_get_html('http://localhost:9000/');
foreach($html->find('form[action*="delete"]') as $form)
  if ( stripos($form->innertext, 'onemyndseye@localhost') !== false ){
      $form = preg_replace('!</a>.*?<br />!s', '</a><br />', $form);
      echo $form;
}

I've solved the issue with following snippet:

$html = file_get_html('http://localhost:9000/');
foreach($html->find('form[action*="delete"]') as $form)
  if ( stripos($form->innertext, 'onemyndseye@localhost') !== false ){
      $form = preg_replace('!</a>.*?<br />!s', '</a><br />', $form);
      echo $form;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文