使用 Javascript/html5 进行所见即所得的网页抓取/爬行设置?
我的目标是允许经验不足的人设置从网站上抓取一些信息所需的参数。
这个想法是用户输入一个 URL,然后将该 URL 加载到框架中。然后,用户应该能够选择该框架内的文本,这应该为我提供足够的信息,以便在该特定文本动态更改时再次抓取该信息。
问题是,是否有可能检测外部站点源的哪一部分对应于用户在框架中的选择?
如果没有,还有其他选择吗?
提前致谢。
问候, 汤姆
My goal is to allow less experienced people to setup the required parameters needed to scrape some information from a website.
The idea is that a user enters an URL, after which this URL is loaded in a frame. The user should then be able to select text within this frame, which should give me enough information to scrape this information again when this specific text changes dynamically.
The question is, if it's even possible to detect what part of the source of an external site corresponds to the selection of the user in a frame?
If not, are there any alternatives ?
Thanks in advance.
Regards,
Tom
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
简短的回答是否定的。如果您不控制 iframe 中的内容,则无法与其进行交互。
但是,您可以制作一个小书签来执行您所描述的操作,或者制作一个浏览器插件。
The short answer is no. If you don't control the content in the iframe, there's not much you can do to interact with it.
However, you could make a bookmarklet that does something like you're describing, or a browser plugin.
之前曾尝试过基于视觉的抓取工具,但它们很快变得比编写代码更麻烦、更复杂。通过一些抽象(一个用于抓取的函数、一个通过 ID 选择表并将其转换为数组的函数等),您可以制作出仍然适合初学者的东西。
There have been attempts at visual based scrapers before, but they rapidly become more cumbersome and complex to learn than writing code. With a few abstractions (a function to scrape, a function to select a table by ID and convert it to an array etc) you can make something that is still suitable by beginners.