抓取和修改表单
我在这里得到了一些很大的帮助,我已经非常接近解决我的问题了,我可以尝到它的滋味。但我似乎被困住了。
我需要从本地网络服务器抓取一个简单的表单,并且只返回与用户本地电子邮件匹配的行(即 onemyndseye@localhost)。 simplehtmldom 可以轻松提取正确的表单元素:
foreach($html->find('form[action*="delete"]') as $form) echo $form;
返回:
<form action="/delete" method="post">
<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
http://www.linux.com/rss/feeds.php
</a> [email:
onemyndseye@localhost (Default)
]<br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
http://www.ubuntu.com/rss.xml
</a> [email:
onemyndseye@localhost (Default)
]<br />
<input type="submit" name="delete_submit" value="Delete Selected" /></form>
但是我在进行下一步时遇到了麻烦。它返回包含“onemyndseye@localhost”的行并将其删除,以便仅返回以下内容:
<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">http://www.linux.com/rss/feeds.php</a> <br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">http://www.ubuntu.com/rss.xml</a> <br />
感谢该网站的出色用户,我已经走到了这一步,甚至可以只返回链接,但我无法获取其余的链接。 .. 重要的是,完整的 标签应如上所示完全返回,因为稍后需要将 id 和 name 值传递回发布数据中的原始表单。
提前致谢!
***** 编辑 *****
由于 Yacoby,问题现在已接近解决。最后一个小障碍是 str_ireplace 留下了一些垃圾。也许删除 和
之间的所有文本会更容易......?
gt;
在 Yacoby 添加后,输出如下:
<form action="/delete" method="post">
<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
http://www.linux.com/rss/feeds.php
</a> [email:
(Default)
]<br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
http://www.ubuntu.com/rss.xml
</a> [email:
(Default)
]<br />
<input type="checkbox" id="D3" name="D3" /><a href="http://mythbuntu.org/rss.xml">
http://mythbuntu.org/rss.xml
</a> [email:
]<br />
<input type="submit" name="delete_submit" value="Delete Selected" /></form>
注意 [email: (Default)] 和 [email:] 已被留下。最后还需要删除表单操作并提交行,但我认为我可以从之前的建议中收集这部分内容。
Ive gotten some great help here and I am so close to solving my problem that I can taste it. But I seem to be stuck.
I need to scrape a simple form from a local webserver and only return the lines that match a users local email (i.e. onemyndseye@localhost). simplehtmldom makes easy work of extracting the correct form element:
foreach($html->find('form[action*="delete"]') as $form) echo $form;
Returns:
<form action="/delete" method="post">
<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
http://www.linux.com/rss/feeds.php
</a> [email:
onemyndseye@localhost (Default)
]<br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
http://www.ubuntu.com/rss.xml
</a> [email:
onemyndseye@localhost (Default)
]<br />
<input type="submit" name="delete_submit" value="Delete Selected" /></form>
However I am having trouble making the next step. Which is returning lines that contain 'onemyndseye@localhost' and removing it so that only the following is returned:
<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">http://www.linux.com/rss/feeds.php</a> <br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">http://www.ubuntu.com/rss.xml</a> <br />
Thanks to the wonderful users of this site Ive gotten this far and can even return just the links but I am having trouble getting the rest... Its important that the complete <input>
tags are returned EXACTLY as shown above as the id and name values will need to be passed back to the original form in post data later on.
Thanks in advance!
***** EDIT ******
Issue close to solved now thanks to Yacoby. The last small hurdle is that some trash is left behind from the str_ireplace. Perhaps it would be easier to remove all text between </a>
and <br />
...?
After Yacoby's additions the output is as follows:
<form action="/delete" method="post">
<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
http://www.linux.com/rss/feeds.php
</a> [email:
(Default)
]<br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
http://www.ubuntu.com/rss.xml
</a> [email:
(Default)
]<br />
<input type="checkbox" id="D3" name="D3" /><a href="http://mythbuntu.org/rss.xml">
http://mythbuntu.org/rss.xml
</a> [email:
]<br />
<input type="submit" name="delete_submit" value="Delete Selected" /></form>
Notice [email: (Default)] and [email: ] have been left behind. Also would need to remove the form action and submit lines at last but that part I think i can gather from the previous suggestion.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
也许这样的东西
不适用于像这样的html,
因为很容易找到删除标签的文本是否与使用
plaintext
的字符串匹配,但替换起来要困难得多。Maybe something like
This won't work with html like
As it is easy to find if the text with tags removed matches a string using
plaintext
but it is far harder to replace.我已经使用以下代码片段解决了该问题:
I've solved the issue with following snippet: