在 XML 源中搜索关键字
总之,
我正在构建一个网站,该网站将从大约 35 个不同的 RSS 源收集新闻报道,并将其存储在一个数组中。我使用 foreach() 循环来搜索标题和描述,以查看它是否包含大约 40 个关键字之一,对每篇文章使用 substr() 。如果搜索成功,该文章将存储在数据库中,并最终将出现在网站上。
该脚本每 30 分钟运行一次。问题是,这需要 1-3 分钟,具体取决于返回的故事数量。不是“可怕”,但在分片托管环境中,我可以看到这会导致很多问题,特别是随着网站的增长和添加更多的提要/关键字。
有什么方法可以优化关键字的“搜索”,以便加快“索引”速度?
谢谢!!
All,
I'm building a site which will gather news stories from about 35 different RSS feeds, storing in an array. I'm using a foreach() loop to search the title and description to see if it contains one of about 40 keywords, using substr() for each article. If the search is successful, that article is stored in a DB, and ultimately will appear on the site.
The script runs every 30 mins. Trouble is, it takes 1-3 mins depending on how many stories are returned. Not 'terrible' but on a shard hosting env, I can see this causing plenty of issues, especially as the site grows and more feeds/keywords are added.
Are there any ways that I can optimize the 'searching' of keywords, so that I can speed up the 'indexing'?
Thanks!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
35-40 个 RSS 提要是需要一个脚本一次性处理和解析的大量请求。您的瓶颈很可能是请求,而不是解析。您应该将关注点分开。使用一个脚本,每分钟左右一次请求一个 RSS 提要,并将结果存储在本地。然后另一个脚本应该每 15-30 分钟解析并保存/删除临时结果。
35-40 RSS feeds are a lot of requests for one script to handle and parse all at once. Your bottleneck is most likely the requests, not the parsing. You should separate the concerns. Have one script that requests an RSS feed one at a time every minute or so, and store the results locally. Then another script should parse and save/remove the temporary results every 15-30 minutes.
您可以使用 XPath 直接搜索 XML...类似:
那么,
$matchingNodes
将是所有匹配的
。然后您可以将它们保存在数据库中...item
节点的 DomNodeList因此,要将其调整为您的现实世界示例,您可以构建查询来一次性完成所有搜索:
或者只是重新查询每个关键字...个人而言,我会构建一个巨大的查询,因为所有匹配都是在编译的 C 代码中完成的(因此应该比在 php 中循环并在那里聚合结果更有效)...
You could use XPath to search the XML directly... Something like:
Then,
$matchingNodes
will be aDomNodeList
of all the matchingitem
nodes. Then you can save those in the database...So to adjust this to your real world example, you could either build the query to do all the searching for you in one shot:
Or just re-query for each keyword... Personally, I'd build one giant query, since then all the matching is done in complied C code (and hence should be more efficient than looping in php land and aggregating the results there)...