Drupal 搜索结果中显示的未发布项目(Google 搜索应用程序)
我最近继承了一个 Drupal 5 站点,并且需要进行一系列增强。其中有几个是围绕搜索结果进行的。
未发布的页面显示在 搜索引擎结果。其中一些 是旧页面,其他是最近的 未发表。全部正确 在 CMS 中标记为未发布,并且 仍然出现。
搜索引擎中显示了过时的页面。 URL 路径结构已更改,这些项目是数据库中的旧结果。
据我所知,该网站使用 Google Search Appliance (GSA) 进行搜索,而不是默认的 Drupal 搜索。除了查看模块已启用之外,还有什么方法可以确定它正在使用 GSA?
如果是 GSA,我似乎可以让有权访问 GSA 的人在网站上重建搜索结果。这是正确的吗?
如果重建搜索结果是正确的方法,那么每当从网站上删除相当多的内容时,我就需要找人重建搜索。有更好/自动的方法吗?
I inherited a Drupal 5 site recently and have a series of enhancements to make. Several of then revolve around search results.
Unpublished pages showing up in
search engine results. Some of these
are old pages, others are recently
unpublished. All are correctly
marked as unpublished in the CMS and
are still showing up.Outdated pages are showing up from the search engine. The URL path structure changed and those items are old results in the DB.
From what I can tell the site uses Google Search Appliance(GSA) for the search rather than the default Drupal search. Is there a way I can be certain that it's using GSA other than seeing the module enabled?
If it is GSA it seems that I could get someone with access to the GSA to rebuild the search results on the site. Is this correct?
If rebuilding the search results is the right way to go about it, it seems whenever a fair amount of content is removed from the site I'll need to get someone to rebuild the search. Is there a better/automatic way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
听起来好像是 drupal 正在处理搜索。 Google 需要数据库访问权限来显示未发布的节点。可能您正在使用视图进行搜索,但忘记只获取已发布的节点。
如果 Drupal 正在处理搜索,您只需要刷新并重建搜索索引。如果您没有太多内容,这可以轻松完成。
Sounds like it's drupal that is handling the search. Google would need db access to show unpublished nodes. It could be you are using views to do search but forgot to only take published nodes.
If Drupal is handling the searchyou just need to flush and rebuild the search index. This can be done without too much trouble if you don't have too much content.
GSA 可能仍会显示已删除的内容,具体取决于您的数据源。
如果内容来自数据库提要,然后从查询中删除,则该内容也会被删除。如果内容来自自然爬网或通过自定义连接器源,则在删除时不会将其从索引中删除。相反,它需要自然地循环出索引,这可能需要一段时间。
阻止显示已删除网址的一种方法是通过前端执行此操作。在 GSA 管理界面中,转至服务 >然后,前端选择您的前端并单击“删除 URL”选项卡。您可以列出您的网址或通过正则表达式阻止一组网址。
The GSA could still be showing deleted content depending on what your data source is.
If the content is coming from a database feed and is then dropped from the query it would be dropped. If the content was coming from a natural crawl or through a custom connector feed it would not be removed from the index on delete. Instead it needs to naturally cycle out of the index which can take a while.
One way to block deleted url's from being displayed is to do it through the front end. In the GSA Admin interface go to Serving > Front Ends then choose your front end and click the Remove URL tab. You can either list your url's or block a group of url's through regular expressions.
我已经发布了您有关节点访问的更一般性问题的答案。您的搜索结果的问题很可能与此有关。
I have posted an answer to your more general question concerning node access. The problem with your search results might well be related to that.
为了使 Google Appliance 保持最新状态,您可以尝试 XmlSiteMap,这是一个发布适合您所有内容的 xml 站点地图。
对于在线网站来说,发布站点地图是让搜索引擎保持最新状态的好方法,因为搜索引擎可以使用它来了解新页面并清除旧页面。我假设 Google Appliance 也会使用这个。
In order to keep the Google Appliance more up to date, you might try out XmlSiteMap, a module that publishes a proper xml sitemap for all your content.
For an online website, publishing a sitemap is a good way to keep the search engines up to date, as they can use it to know about new pages and to purge old pages. I'm assuming that the Google Appliance would use this too,.