配置 Perl 脚本为非常大的网站自动生成 XML 站点地图
我是一名 SEO,在一家航班预订公司工作。我们正在尝试为我们的网站安装 XML 站点地图。我要求公司的开发团队安装一个 Perl 脚本,该脚本将有助于为我们庞大的网站(超过 15 万页)生成 XML 站点地图。
为此,我们使用了 Google Perl 站点地图生成器,因为某些原因我们可以仅使用 Perl。输出文件有很多垃圾,因为它主要爬取静态页面和服务器文件夹中的其他内容(它基本上没有遵循从主页到站点的 URL,而是爬取服务器上的每个文件)。我不确定术语是否正确,但我想您会明白我的意思。
上面的链接中提到了配置选项,但是我们无法弄清楚使用哪些参数来获取理想的 XML 站点地图,而无需使用不必要的 URL。
任何人都可以帮助 Perl 脚本或如何配置它。
I am a SEO working for a flight booking company. We are trying to get an XML sitemap installed for our site. I had asked the development team of my company to install a Perl script that will help to generate an XML sitemap for our huge site (more than 150k pages).
We used the Google Perl Sitemap Generator for the same, as for some reasons we can use only Perl. The output file had a lot of crap as it mainly crawled through the static pages and other content in the server folders (it basically did not follow the URLs from the homepage and down the site, but crawled every file on the server). I am not sure if the terminology is correct but I think you will get my point.
The configuration options are mentioned in the link above, however we are not able to figure out what parameters to use to obtain an ideal XML sitemap without the unnecessary URLs.
Could anyone please help with the Perl script or how to configure it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用“wget”(镜像选项)复制站点并从中构建站点地图。
Make a copy of the site with 'wget' (mirror option) and build a sitemap from that.
看这里,它有代码:
http://www.isrcomputing。 com/knowledge-base/linux-tips/240-how-to-create-google-sitemap-
使用-perl.html
Look here, it has the code:
http://www.isrcomputing.com/knowledge-base/linux-tips/240-how-to-create-google-sitemap-
using-perl.html
也许我很天真,但是你不能对从根开始的所有链接进行 BFS 'http::get' ,解析出每个
a href
吗?Perl 对此支持得很好。
Perhaps I'm naive, but couldn't you do a BFS 'http::get' of all links starting from the root, parsing out each
a href
?Perl supports that pretty well.