在 XML 源中搜索关键字

发布于 2024-09-15 14:50:57 字数 320 浏览 24 评论 0原文

总之，

我正在构建一个网站，该网站将从大约 35 个不同的 RSS 源收集新闻报道，并将其存储在一个数组中。我使用 foreach() 循环来搜索标题和描述，以查看它是否包含大约 40 个关键字之一，对每篇文章使用 substr() 。如果搜索成功，该文章将存储在数据库中，并最终将出现在网站上。

该脚本每 30 分钟运行一次。问题是，这需要 1-3 分钟，具体取决于返回的故事数量。不是“可怕”，但在分片托管环境中，我可以看到这会导致很多问题，特别是随着网站的增长和添加更多的提要/关键字。

有什么方法可以优化关键字的“搜索”，以便加快“索引”速度？

谢谢！！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

人心善变 2024-09-22 14:50:57

35-40 个 RSS 提要是需要一个脚本一次性处理和解析的大量请求。您的瓶颈很可能是请求，而不是解析。您应该将关注点分开。使用一个脚本，每分钟左右一次请求一个 RSS 提要，并将结果存储在本地。然后另一个脚本应该每 15-30 分钟解析并保存/删除临时结果。

回复收藏 0 原文

如日中天 2024-09-22 14:50:57

您可以使用 XPath 直接搜索 XML...类似：

$dom = new DomDocument();
$dom->loadXml($feedXml);
$xpath = new DomXpath($dom);

$query = '//item[contains(title, "foo")] | //item[contains(description, "foo")]';
$matchingNodes = $xpath->query($query);

那么， $matchingNodes 将是 所有匹配的 item 节点的 DomNodeList。然后您可以将它们保存在数据库中...

因此，要将其调整为您的现实世界示例，您可以构建查询来一次性完成所有搜索：

$query = array();
foreach($keywords as $keyword) {
    $query[] = '//item[contains(title, "'.$keyword.'")]';
    $query[] = '//item[contains(description, "'.$keyword.'")]';
}
$query = implode('|', $query);

或者只是重新查询每个关键字...个人而言，我会构建一个巨大的查询，因为所有匹配都是在编译的 C 代码中完成的（因此应该比在 php 中循环并在那里聚合结果更有效）...

You could use XPath to search the XML directly... Something like:

$dom = new DomDocument();
$dom->loadXml($feedXml);
$xpath = new DomXpath($dom);

$query = '//item[contains(title, "foo")] | //item[contains(description, "foo")]';
$matchingNodes = $xpath->query($query);

Then, $matchingNodes will be a DomNodeList of all the matching item nodes. Then you can save those in the database...

So to adjust this to your real world example, you could either build the query to do all the searching for you in one shot:

$query = array();
foreach($keywords as $keyword) {
    $query[] = '//item[contains(title, "'.$keyword.'")]';
    $query[] = '//item[contains(description, "'.$keyword.'")]';
}
$query = implode('|', $query);

Or just re-query for each keyword... Personally, I'd build one giant query, since then all the matching is done in complied C code (and hence should be more efficient than looping in php land and aggregating the results there)...

回复收藏 0 原文

~没有更多了~