PHP preg_match_all问题

发布于 2024-11-08 00:40:39 字数 1050 浏览 0 评论 0原文

我有一个关于常规函数的问题，这让我很伤心。我有一个用标签分隔的项目列表。我试图提取两个特定标签（多次出现）之间的所有内容。这是我正在解析的列表的示例：


<ResumeResultItem_V3>
    <ResumeTitle>Johnson</ResumeTitle>
    <RecentEmployer>University of Phoenix</RecentEmployer>
    <RecentJobTitle>Advisor</RecentJobTitle>
    <RecentPay>40000</RecentPay>
</ResumeResultItem_V3>

<ResumeResultItem_V3>
    <ResumeTitle>ResumeforJake</ResumeTitle>
    <RecentEmployer>APEX</RecentEmployer>
    <RecentJobTitle>Consultant</RecentJobTitle>
    <RecentPay>66000</RecentPay>
</ResumeResultItem_V3>

我试图将“ResumeResultItem_V3”之间的所有内容作为文本块，但我似乎无法正确表达表达式。

这是我到目前为止的代码：




$test = "(<ResumeResultItem_V3>)";
$test2 = "(<\/ResumeResultItem_V3>)";

preg_match_all("/" . $test . "(\w+)" . $test2 . "/", $xml, $matches);

foreach ($matches[0] as $match) {
       echo $match;
       echo "<br /><br />";
}

我该如何解决这个问题？

原文

I have a question about a regular function that is giving me grief. I have a list of items that is separated in tags. I am trying to extract everything between two particular tags (which occur multiple times). Here is a sample of the list I am parsing:


<ResumeResultItem_V3>
    <ResumeTitle>Johnson</ResumeTitle>
    <RecentEmployer>University of Phoenix</RecentEmployer>
    <RecentJobTitle>Advisor</RecentJobTitle>
    <RecentPay>40000</RecentPay>
</ResumeResultItem_V3>

<ResumeResultItem_V3>
    <ResumeTitle>ResumeforJake</ResumeTitle>
    <RecentEmployer>APEX</RecentEmployer>
    <RecentJobTitle>Consultant</RecentJobTitle>
    <RecentPay>66000</RecentPay>
</ResumeResultItem_V3>

I'm trying to get everything in between "ResumeResultItem_V3" as a blob of text, but I can't seem to get the expression right.

Here is the code I have so far:




$test = "(<ResumeResultItem_V3>)";
$test2 = "(<\/ResumeResultItem_V3>)";

preg_match_all("/" . $test . "(\w+)" . $test2 . "/", $xml, $matches);

foreach ($matches[0] as $match) {
       echo $match;
       echo "<br /><br />";
}

How can I fix this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

妥活 2024-11-15 00:40:39

我正在对您的 XML 结构做出假设，但我真的认为您需要一个使用 XML 解析器的示例，例如 SimpleXML 。

$xml = new SimpleXMLElement( $file );
foreach( $xml->ResumeResultItem_V3 as $ResumeResultItem_V3 )
    echo (string)$ResumeResultItem_V3;

I'm making assuptions about your XML structure, but I really think you need an example using an XML parser, like SimpleXML.

$xml = new SimpleXMLElement( $file );
foreach( $xml->ResumeResultItem_V3 as $ResumeResultItem_V3 )
    echo (string)$ResumeResultItem_V3;

回复收藏 0 原文

娇女薄笑 2024-11-15 00:40:39

您可能最好使用 simplexml 来提取此处的数据。

但也要回答正则表达式问题。 \w+ 仅匹配单词字符。但在这种情况下，您希望它匹配分隔符之间的几乎所有内容，可以使用 .*? 。

preg_match_all("/$test(.*?)$test2/s", $xml, $matches);

但仅适用于 /s 修饰符。

You are probably better off with simplexml for extracting the data here.

But to also answer the regex question. \w+ only matches word-characters. But in this case you want it to match pretty much everything in between the delimeters, which .*? can be used for.

preg_match_all("/$test(.*?)$test2/s", $xml, $matches);

Only works with the /s modifier though.

回复收藏 0 原文

在梵高的星空下 2024-11-15 00:40:39

忽略您可能应该使用 XML 解析器，并且 PHP 有一个可以使用的...

问题是\w+ 匹配单词字符，而不是任何字符。空格和大多数标点符号都不是单词字符，因此您的匹配失败。相反，您需要匹配“任何”字符 . 与 + 一样多，但由于您可能能够过度分组，因此需要一个修饰符以使其不-贪婪，？。如果您将 \w+ 更改为 .+?，您的表达式应该可以工作——任何字符匹配还需要 s 修饰符，因此：

preg_match_all('/' . $test . '(.+?)' . $test2 . '/s', $xml, $matches);

Ignoring that you probably ought to use an XML parser, and that PHP has one you can use...

The issue is that \w+ matches word characters, not any character. A space and most punctuation aren't word characters, so your match fails. You need instead to match "any" character . for as many as there are +, but because you might be able to group excessively, you need a modifier to make it non-greedy, ?. Your expression should work if you change \w+ to .+? -- the any character match also requires an s modifier, so:

preg_match_all('/' . $test . '(.+?)' . $test2 . '/s', $xml, $matches);

回复收藏 0 原文

你列表最软的妹 2024-11-15 00:40:39

如果您可以将输出用作数组，其中每个“文本 blob”匹配项包含 1 个项目，请尝试以下操作：

<?php
$text =
"<ResumeResultItem_V3>
    <ResumeTitle>Johnson</ResumeTitle>
    <RecentEmployer>University of Phoenix</RecentEmployer>
    <RecentJobTitle>Advisor</RecentJobTitle>
    <RecentPay>40000</RecentPay>
</ResumeResultItem_V3>

<ResumeResultItem_V3>
    <ResumeTitle>ResumeforJake</ResumeTitle>
    <RecentEmployer>APEX</RecentEmployer>
    <RecentJobTitle>Consultant</RecentJobTitle>
    <RecentPay>66000</RecentPay>
</ResumeResultItem_V3>";

$matches = preg_split("/<\/ResumeResultItem_V3>/",preg_replace("/<ResumeResultItem_V3>/","",$text));
print_r($matches);
?>

结果：

Array
(
    [0] => 
    <ResumeTitle>Johnson</ResumeTitle>
    <RecentEmployer>University of Phoenix</RecentEmployer>
    <RecentJobTitle>Advisor</RecentJobTitle>
    <RecentPay>40000</RecentPay>

    [1] => 


    <ResumeTitle>ResumeforJake</ResumeTitle>
    <RecentEmployer>APEX</RecentEmployer>
    <RecentJobTitle>Consultant</RecentJobTitle>
    <RecentPay>66000</RecentPay>

    [2] => 
)

If you can use the output as an array with 1 item for each of the "text blob" matches, try this:

<?php
$text =
"<ResumeResultItem_V3>
    <ResumeTitle>Johnson</ResumeTitle>
    <RecentEmployer>University of Phoenix</RecentEmployer>
    <RecentJobTitle>Advisor</RecentJobTitle>
    <RecentPay>40000</RecentPay>
</ResumeResultItem_V3>

<ResumeResultItem_V3>
    <ResumeTitle>ResumeforJake</ResumeTitle>
    <RecentEmployer>APEX</RecentEmployer>
    <RecentJobTitle>Consultant</RecentJobTitle>
    <RecentPay>66000</RecentPay>
</ResumeResultItem_V3>";

$matches = preg_split("/<\/ResumeResultItem_V3>/",preg_replace("/<ResumeResultItem_V3>/","",$text));
print_r($matches);
?>

Results in:

Array
(
    [0] => 
    <ResumeTitle>Johnson</ResumeTitle>
    <RecentEmployer>University of Phoenix</RecentEmployer>
    <RecentJobTitle>Advisor</RecentJobTitle>
    <RecentPay>40000</RecentPay>

    [1] => 


    <ResumeTitle>ResumeforJake</ResumeTitle>
    <RecentEmployer>APEX</RecentEmployer>
    <RecentJobTitle>Consultant</RecentJobTitle>
    <RecentPay>66000</RecentPay>

    [2] => 
)

回复收藏 0 原文

~没有更多了~