这种浏览器技术有名称吗?
这种技术是否有一个名称,它包括探索浏览器中打开的页面以查找特定内容并对其进行修改?
一些示例:
- Skype 在页面上查找电话号码,并附加呼叫菜单
- 脚本在页面中查找百分比并将其替换为小饼
- 广告引擎在页面中查找关键字并将其转换为超链接
- 在所有页面旁边添加图标页面上指向另一个域的超链接
- 等。
我知道这是一种渐进增强。但我对第一步特别感兴趣,即内容发现过程。我对提供最佳实践或解释该技术的缺点的文章感兴趣。
编辑:我添加了一个示例来表明该技术不仅适用于文本节点,而且可以适用于任何类型的 html 内容。
Is there a name for this technique that consists in exploring a page open in the browser to find specific content and modify it?
Some examples:
- Skype finds phone numbers on a page, and attaches a call menu
- a script finds percentages in a page and replaces them with a small pie
- an advertising engine finds keywords in the page and converts them into hyperlinks
- add an icon next to all the hyperlinks on the page that point to another domain
- etc.
I understand that it is a kind of progressive enhancement. But I am specifically interested in the first step, the content discovery process. I'd be interested in articles that offer best practices, or explain the shortcomings of this technique.
Edit: I added an example to show that this technique is not just for text nodes, but can apply to any kind of html content.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
例如,为此网页执行以下代码(从控制台),页面上的所有数字都将替换为“X”:
For example, execute this code for this web-page (from the console), and all numbers on the page will be replaced with "X":
此功能称为附加组件,这些功能使用的技术是DOM 遍历
您所描述的情况并非特定于某个网站,而是出现在您访问的每个网站上,因此必须向您的浏览器添加一些额外的功能。当安装 Skype 等新软件时检查安装工具栏等时,通常会发生这种情况。
该技术可以称为识别(如 PNR、Skype 电话号码识别),它们所做的就是遍历您的站点 DOM 。
上面描述的这个附加组件可能只在页面加载时运行,因此稍后使用 ajax 添加的内容不会受到影响。
如果它是您自己的附加组件,则可以使用此处描述的 javascript 广告来访问它: 如何从 html 按钮调用 Firefox 扩展中的函数。
另请看一下 GreaseMonkey 和 jQuery 遍历。
This is functionality is called Add-ons and the technic used by these is DOM traversing
The cases you describe is not something specific to one site, but appear on every site you visit, so there must be some extra functionality added to your browser. This often happen when checking on install toolbars etc when installing a new software like Skype
The technic can be called recognition (as in PNR, Skype Phone Number Recognition), and what they are doing is traversing your site DOM .
This add ons describe above probably runs only on page load, so content added later on with ajax will not be affected.
If its your own add-on there is a way to access it with javascript ad described here: how to call a function in Firefox extension from a html button.
Take also a look at GreaseMonkey and jQuery traversing.
所以现在的结论是,这种技术似乎还没有名称或既定的实践。
感谢那些提到搜索引擎的人,将其视为本地搜索是有意义的,并努力解释内容和结构。
So the conclusion for now is that there doesn't seem to be a name or established practices for this technique.
Thanks to those who have mentioned search engines, it makes sense to see it as a local search, with an effort to interpret the content and structure.
正如已经说过的那样,它是调用摘要,但您可以通过搜索“网络爬行机器人/技术/机器人”找到更多信息。这里有一些您可能会发现有用的起始文档:
爬行网络
As it is already said it is call summarization but you can find about it more searching therm "web crawling bot/technique/robot". Here some starting document you might find useful:
Crawling the Web
总结
这是所有网络爬虫都使用的技术。请查看开源的有据可查的网络爬虫/搜索引擎Yioop!
Summarization
It is the technique used in all the web crawlers. Please have a look at open source well documented web crawler/search engine Yioop!