加快 PHP 读取多个 XML 文件的速度

发布于 2024-12-21 05:32:56 字数 414 浏览 1 评论 0原文

我目前有一个 php 文件,必须读取数百个 XML 文件,我无法选择如何构造这些 XML 文件,它们是由第三方创建的。

第一个 xml 文件是其余 xml 文件的大量标题,因此我搜索第一个 xml 文件以获取其余 xml 文件的文件名。

然后,我读取每个 xml 文件,搜索其值中的特定短语。

这个过程真的很慢。我说的是 5 1/2 分钟的运行时间...这对于网站来说是不可接受的,客户不会停留那么长时间。

有谁知道一种可以将我的代码速度加快到大约 30 秒的最大运行时间的方法。

这是我的代码的粘贴: http://pastebin.com/HXSSj0Jt

谢谢,对于难以理解的英语感到抱歉。 ..

I currently have a php file that must read hundreds of XML files, I have no choice on how these XML files are constructed, they are created by a third party.

The first xml file is a large amount of titles for the rest of the xml files, so I search the first xml file to get file names for the rest of the xml files.

I then read each xml file searching its values for a specific phrase.

This process is really slow. I'm talking 5 1/2 minute runtimes... Which is not acceptable for a website, customers wont stay on for that long.

Does anyone know a way which could speed my code up, to a maximum runtime of approx 30s.

Here is a pastebin of my code : http://pastebin.com/HXSSj0Jt

Thanks, sorry for the incomprehensible English...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

谁许谁一生繁华 2024-12-28 05:32:56

您的主要问题是您试图进行数百次 http 下载来执行搜索。除非你摆脱这个限制,否则事情只会进展得这么快。

如果由于某种原因文件根本无法缓存(不太可能),甚至在某些时候也无法缓存,您可以通过并行下载来提高速度。请参阅curl_multi_*() 函数。或者,从命令行使用 wgetxargs 并行下载。

如果你有任何流量的话,上面的内容听起来很疯狂。

最有可能的是,这些文件至少可以缓存很短的时间。查看 http 标头,看看他们的服务器发送了什么样的新鲜度信息。它可能会说明文件多久到期,在这种情况下,您可以将其保存在本地直到到期。或者,它可能会给出最后修改或 etag,在这种情况下,您可以执行有条件的获取请求,这应该会加快速度。

我可能会设置一个本地squid 缓存 并让php 通过squid 发出这些请求。如果本地副本是新鲜的,它将负责所有使用,或者有条件地为您检索新版本逻辑。

如果您仍然想要更高的性能,您可以将缓存文件转换为更合适的格式(例如,将相关数据保存在数据库中)。或者,如果您必须坚持使用 xml 格式,则可以首先对文件进行字符串搜索,以测试是否应该将该文件解析为 xml。

Your main problem is you're trying to make hundreds of http downloads to perform the search. Unless you get rid of that restriction, it's only gonna go so fast.

If for some reason the files aren't cachable at all(unlikely), not even some of the time, you can pick up some speed by downloading in parallel. See the curl_multi_*() functions. Alternatively, use wget from the command line with xargs to download in parallel.

The above sounds crazy if you have any kinda of traffic though.

Most likely, the files can be cached for at least a short time. Look at the http headers and see what kind of freshness info their server sends. It might say how long until the file expires, in which case you can save it locally until then. Or, it might give a last modified or etag, in which case you can do conditional get requests, which should speed things up still.

I would probably set up a local squid cache and have php make these requests through squid. It'll take care of all the use the local copy if its fresh, or conditionally retrieve a new version logic for you.

If you still want more performance, you can transform cached files into a more suitable format(eg, stick the relevant data in a database). Or if you must stick with the xml format, you can do a string search on the file first, to test whether you should bother parsing that file as xml at all.

八巷 2024-12-28 05:32:56

首先,如果您必须为服务的每个请求处理大型 xml 文件,那么明智的做法是下载一次 xml,在本地进行预处理和缓存。

如果您无法预处理和缓存 xml,并且必须为每个请求下载它们(我并不真正相信是这种情况),您可以尝试使用 XMLReader 或某些基于 SAX 事件的 xml 解析器进行优化。 SimpleXML 的问题在于它在底层使用 DOM。 DOM(字母代表的意思)在 php 进程内存中创建文档对象模型,这需要大量时间并占用大量内存。我敢说 DOM 对于解析大型 XML 文件毫无用处。

而 XMLReader 将允许您逐个节点地遍历大型 XML,而几乎不会占用任何内存,但代价是您不能发出 xpath 查询或任何其他无关紧要的节点访问模式。

如何使用xmlreader可以参考XMLReader扩展的php手册

First of all if you have to deal with large xml files for each request to your service it is wise to download the xml's once, preprocess and cache them locally.

If you cannot preprocess and cache xml's and have to download them for each request (which I don't really believe is the case) you can try optimize by using XMLReader or some SAX event-based xml parser. The problem with SimpleXML is that it is using DOM underneath. DOM (as the letters stand for) creates document object model in your php process memory which takes a lot of time and eats tons of memory. I would risk to say that DOM is useless for parsing large XML files.

Whereas XMLReader will allow you to traverse the large XML node by node without barely eating any memory with the tradeoff that you cannot issue xpath queries or any other non-consequencial node access patterns.

How to use xmlreader you can consult with php manual for XMLReader extension

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文