PHP preg_match_all问题
我有一个关于常规函数的问题,这让我很伤心。我有一个用标签分隔的项目列表。我试图提取两个特定标签(多次出现)之间的所有内容。这是我正在解析的列表的示例:
<ResumeResultItem_V3>
<ResumeTitle>Johnson</ResumeTitle>
<RecentEmployer>University of Phoenix</RecentEmployer>
<RecentJobTitle>Advisor</RecentJobTitle>
<RecentPay>40000</RecentPay>
</ResumeResultItem_V3>
<ResumeResultItem_V3>
<ResumeTitle>ResumeforJake</ResumeTitle>
<RecentEmployer>APEX</RecentEmployer>
<RecentJobTitle>Consultant</RecentJobTitle>
<RecentPay>66000</RecentPay>
</ResumeResultItem_V3>
我试图将“ResumeResultItem_V3”之间的所有内容作为文本块,但我似乎无法正确表达表达式。
这是我到目前为止的代码:
$test = "(<ResumeResultItem_V3>)";
$test2 = "(<\/ResumeResultItem_V3>)";
preg_match_all("/" . $test . "(\w+)" . $test2 . "/", $xml, $matches);
foreach ($matches[0] as $match) {
echo $match;
echo "<br /><br />";
}
我该如何解决这个问题?
I have a question about a regular function that is giving me grief. I have a list of items that is separated in tags. I am trying to extract everything between two particular tags (which occur multiple times). Here is a sample of the list I am parsing:
<ResumeResultItem_V3>
<ResumeTitle>Johnson</ResumeTitle>
<RecentEmployer>University of Phoenix</RecentEmployer>
<RecentJobTitle>Advisor</RecentJobTitle>
<RecentPay>40000</RecentPay>
</ResumeResultItem_V3>
<ResumeResultItem_V3>
<ResumeTitle>ResumeforJake</ResumeTitle>
<RecentEmployer>APEX</RecentEmployer>
<RecentJobTitle>Consultant</RecentJobTitle>
<RecentPay>66000</RecentPay>
</ResumeResultItem_V3>
I'm trying to get everything in between "ResumeResultItem_V3" as a blob of text, but I can't seem to get the expression right.
Here is the code I have so far:
$test = "(<ResumeResultItem_V3>)";
$test2 = "(<\/ResumeResultItem_V3>)";
preg_match_all("/" . $test . "(\w+)" . $test2 . "/", $xml, $matches);
foreach ($matches[0] as $match) {
echo $match;
echo "<br /><br />";
}
How can I fix this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我正在对您的 XML 结构做出假设,但我真的认为您需要一个使用 XML 解析器的示例,例如 SimpleXML 。
I'm making assuptions about your XML structure, but I really think you need an example using an XML parser, like SimpleXML.
您可能最好使用
simplexml
来提取此处的数据。但也要回答正则表达式问题。
\w+
仅匹配单词字符。但在这种情况下,您希望它匹配分隔符之间的几乎所有内容,可以使用.*?
。但仅适用于
/s
修饰符。You are probably better off with
simplexml
for extracting the data here.But to also answer the regex question.
\w+
only matches word-characters. But in this case you want it to match pretty much everything in between the delimeters, which.*?
can be used for.Only works with the
/s
modifier though.忽略您可能应该使用 XML 解析器 ,并且 PHP 有一个可以使用的...
问题是
\w+
匹配单词字符,而不是任何字符。空格和大多数标点符号都不是单词字符,因此您的匹配失败。相反,您需要匹配“任何”字符.
与+
一样多,但由于您可能能够过度分组,因此需要一个修饰符以使其不-贪婪,?
。如果您将\w+
更改为.+?
,您的表达式应该可以工作——任何字符匹配还需要s
修饰符,因此:Ignoring that you probably ought to use an XML parser, and that PHP has one you can use...
The issue is that
\w+
matches word characters, not any character. A space and most punctuation aren't word characters, so your match fails. You need instead to match "any" character.
for as many as there are+
, but because you might be able to group excessively, you need a modifier to make it non-greedy,?
. Your expression should work if you change\w+
to.+?
-- the any character match also requires ans
modifier, so:如果您可以将输出用作数组,其中每个“文本 blob”匹配项包含 1 个项目,请尝试以下操作:
结果:
If you can use the output as an array with 1 item for each of the "text blob" matches, try this:
Results in: