PHP处理大字符串
我必须在传入的 xml 中将 xmlns
替换为 ns
才能修复 SimpleXMLElements xpath() 函数。大多数功能不存在性能问题。但随着字符串的增长,似乎总是会产生开销。
例如,2 MB 字符串上的 preg_replace
需要 50ms 来处理,即使我将替换限制为 1
并且替换是在一开始就完成的。
如果我 substr
前几个字符并替换该部分,速度会稍微快一些。但其实并不是我想要的。
有没有任何 PHP 方法可以更好地解决我的问题?如果没有选项,一个简单的 php 扩展可以提供帮助吗? C 中的 SimpleXMLElement?
I have to replace xmlns
with ns
in my incomming xml in order to fix SimpleXMLElements xpath() function. Most functions do not have a performance problem. But there allways seems to be an overhead as the string grows.
E.g. preg_replace
on a 2 MB string takes 50ms to process, even if I limit the replaces to 1
and the replace is done at the very beginning.
If I substr
the first few characters and just replace that part it is slightly faster. But not really that what I want.
Is there any PHP method that would perform better in my problem? And if there is no option, could a simple php extension help, that just does Replace => SimpleXMLElement in C?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
对于这样的事情,50ms 对我来说听起来很合理。这个要求本身就让人感觉有些不对劲。
您使用正则表达式有什么特殊原因吗?为什么人们总是跳到过度的正则表达式解决方案?
有一个名为
str_replace
的沼泽标准字符串替换函数,它可以在很短的时间内完成您想要的操作(尽管这是否适合您取决于您的搜索/替换的复杂程度)。50ms sounds pretty reasonable to me, for something like this. The requirement itself smells of something being wrong.
Is there any particular reason that you're using regular expressions? Why do people keep jumping to the overkill regex solution?
There is a bog-standard string replace function called
str_replace
that may do what you want in a fraction of the time (though whether this is right for you depends on how complex your search/replace is).从 PHP 源代码中,我们可以看到,例如这里:
http://svn.php.net /repository/php/php-src/branches/PHP_5_2/ext/standard/string.c
我没有看到任何副本,但我不是 C 专家。从另一方面我们可以看到有许多转换为字符串的调用,乍一看可以复制值。如果他们复制值,那么我们就有麻烦了。
仅当我们遇到麻烦时
尝试在逐字符处理的帮助下发明一些
str_replace
轮子。例如,我们有字符串$somestring = "somevalue"
。在 PHP 中,我们可以通过索引来处理它的字符echo $somestring{0}
,这将为我们提供“s”或echo $somestring{2}
这将为我们提供“米”。我不确定这种方式,但如果官方实施不使用参考文献,这是可能的,因为他们应该使用。From the PHP source, as we can see, for example here:
http://svn.php.net/repository/php/php-src/branches/PHP_5_2/ext/standard/string.c
I don`t see, any copies, but I'm not expert in C. From the other hand we can see there many convert to string calls, which at 1st sight could copy values. If they copy values, then we in trouble here.
Only if we in trouble
Try to invent some
str_replace
wheel here with the help of string-by-char processing. For example we have string$somestring = "somevalue"
. In PHP we could work with it's chars by indexes asecho $somestring{0}
, which will give us "s" orecho $somestring{2}
which will give us "m". I'm not sure in this way, but it's possible, if official implimentations don't use references, as they should use.如果您确切知道有问题的“x”、“m”和“l”在哪里,则可以使用类似
$xml[$x_pos] = ' '; 的内容。 $xml[$m_pos] = ' '; $xml[$l_pos] = ' '
将它们转换为空格。或者将它们转换为ns___
(其中_
= 空格)。If you know exactly where the offending "x", "m" and "l" are, you can just use something like
$xml[$x_pos] = ' '; $xml[$m_pos] = ' '; $xml[$l_pos] = ' '
to transform them into spaces. Or transform them intons___
(where_
= space).当尝试执行此操作时,您总是会产生开销 - 您正在处理一个 char 数组并尝试替换该数组的多个匹配元素(即单词)。
50ms 并不是太大的开销,除非(正如我怀疑的那样)你试图在循环中执行此操作?
You're always going to get an overhead when trying to do this - you're dealing with a char array and trying to do replace multiple matching elements of the array (i.e. words).
50ms is not much of an overhead, unless (as I suspect) you're trying to do this in a loop?