FAST For SharePoint 网络爬虫元标记提取
我正在使用 FAST For SharePoint 来爬网非 SharePoint 网站。网站抓取没有错误,我可以得到任何关键字的结果。
我想通过 html 页面元标记在结果页面上创建精炼器。必须有二级精炼机;类别和子类别。如果用户单击类别,精简器面板必须显示所有相关的子类别。
元标记如下所示:
<meta name="Category" content="Products"/>
<meta name="SubCategory" content="Electronic"/>
如何提取使用 FAST For SharePoint Webcrawler 抓取 html 页面的元标记?
我尝试将元标记名称添加到 FAST Search Administration >托管属性并为这些元标记配置了精简器面板,但我无法获得结果。它不起作用。
谢谢你!
I am using FAST For SharePoint to crawl a non SharepPoint website. The website crawled with no error, I can get the results of any keyword.
I want to create refiner on result page by html page meta tags. There must be two level refiner; category and sub category. If user clicks category, refiner panel must show all related sub categories.
The meta tags like this:
<meta name="Category" content="Products"/>
<meta name="SubCategory" content="Electronic"/>
How can I extract meta tags that crawled html page(s) with FAST For SharePoint Webcrawler?
I tried to add the meta tag names to FAST Search Administration > Managed Properties and configured refiner panel for those meta tags, but I could not get result. It does not work.
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果要使用自定义托管属性,您需要首先将它们绑定到已爬网属性。爬网属性是在爬网过程中自动创建的,您也可以在 powershell 中创建它们,请参阅以下链接:http://msdn.microsoft.com/en-us/subscriptions/ff393776(v=office.14).aspx
如果我理解得很好,什么您想要做的就是获取页面 HTML 中的信息。在这种情况下,您无法使用现成的网络爬虫来获取此信息。如果您想创建自定义爬网程序来获取所需信息,我建议您查看自定义 BDC 连接器: http://msdn.microsoft.com/en-us/library/ee557349(v=office.14).aspx
If you want to use custom Managed Property, you need to first bind them to a crawled property. Crawled properties are created automatically during the crawl, or you can create them in powershell, see the following link: http://msdn.microsoft.com/en-us/subscriptions/ff393776(v=office.14).aspx
If I understand well, what you are trying to do is getting information that is in the HTML of your page. In this case, you cannot use the out-of-the-box web crawler to get this information. I suggest you take a look on custom BDC connector, if you want to create a custom crawler to get the information you want: http://msdn.microsoft.com/en-us/library/ee557349(v=office.14).aspx