PHP 从字符串中提取值

发布于 2024-12-25 19:08:47 字数 804 浏览 2 评论 0原文

我正在用 PHP 处理记录,想知道是否有一种有效的方法来提取流派:以下每个记录中的值。类型:可以是字符串中的任何位置。

在下面的字符串中,我需要提取单词“alternative”(最后一个单词)

[media:keywords] => upc:00602527365589,Records,mercury,artist:Neon 
 Trees,Alternative,trees,neon,genre:alternative

在下面的字符串中,我需要提取“Latin / Pop,latino,Pop”

[media:keywords] => genre:Latin / Pop,latino,Pop,upc:00602527341217,artist:Luis 
 Fonsi,luis,universal,Fonsi,Latin

在下面的记录中,我需要提取“other”

[media:keywords] => upc:793018101530,andy,razor,Other,tie,genre:other,artist:Andy 
McKee,McKee,&

在接下来的记录我需要拿出“岩石,漂浮物,废品”

[media:keywords] => and,upc:00602498572061,genre:rock,flotsam,jetsam,artist:Flotsam 
And Jetsam,rock,geffen

我正在为此抓狂(无论如何都剩下什么)。

I'm processing records in PHP and was wondering if there is an efficient method to pull out the genre: values from each of the following records. genre: can be anywhere in the string.

In the following string I need to pull out the word "alternative" (last word)

[media:keywords] => upc:00602527365589,Records,mercury,artist:Neon 
 Trees,Alternative,trees,neon,genre:alternative

In the following string I need to pull out "Latin / Pop,latino,Pop"

[media:keywords] => genre:Latin / Pop,latino,Pop,upc:00602527341217,artist:Luis 
 Fonsi,luis,universal,Fonsi,Latin

In the following record I need to pull out "other"

[media:keywords] => upc:793018101530,andy,razor,Other,tie,genre:other,artist:Andy 
McKee,McKee,&

In the following record I need to pull out "rock,flotsam,jetsam"

[media:keywords] => and,upc:00602498572061,genre:rock,flotsam,jetsam,artist:Flotsam 
And Jetsam,rock,geffen

I'm pulling my hair out on this (what is left anyway).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

少女的英雄梦 2025-01-01 19:08:47

将以下正则表达式与 preg_match() 结合使用

~\bgenre:(.+?)(?=(,[^:,]+:|$))~

:所需的结果将位于 matches 数组的第一个元素中(参数 3)。

Use the following regular expression coupled with preg_match():

~\bgenre:(.+?)(?=(,[^:,]+:|$))~

Your desired result will be in the first element of the matches array (paremeter 3).

任性一次 2025-01-01 19:08:47

我将使用 strpos 来定义流派的开始位置。您遇到的唯一问题是在哪里结束它,因为您没有分隔符。我应该使用已知的其他关键字,如“upc”、“artist”等来检查字符串是否需要在末尾被剪切。

I shall use a strpos to define where the genre starts. The only problem you have is where to end it because you do not have a delimeter. I should use the known other keywords like "upc","artist" etc to check if the string needs to be cut of at the end.

二手情话 2025-01-01 19:08:47

您确实可以使用一些模式检测。您总是在寻找固定的 genre: 后跟一个或多个单词或短语,它们本身都不能包含 :

所以这可能就足够了:

preg_match('~\bgenre:(,?[^:,]+(?=,|$))+~', $media_keywords, $match);
print $match[1];

You can indeed use a bit of pattern detection. You are always looking for the fixed genre: followed by one or more words or phrases, neither of which may itself contain a :

So this might suffice:

preg_match('~\bgenre:(,?[^:,]+(?=,|$))+~', $media_keywords, $match);
print $match[1];
地狱即天堂 2025-01-01 19:08:47
$mystring = 'abc';
$findme   = 'a';
$pos = strpos($mystring, $findme);

// Note our use of ===.  Simply == would not work as expected
// because the position of 'a' was the 0th (first) character.
if ($pos === false) {
    echo "The string '$findme' was not found in the string '$mystring'";
} else {
    echo "The string '$findme' was found in the string '$mystring'";
    echo " and exists at position $pos";
}

来自 strpos 的 PHP 文档

所以你可以只使用 $findme = “替代方案”

$mystring = 'abc';
$findme   = 'a';
$pos = strpos($mystring, $findme);

// Note our use of ===.  Simply == would not work as expected
// because the position of 'a' was the 0th (first) character.
if ($pos === false) {
    echo "The string '$findme' was not found in the string '$mystring'";
} else {
    echo "The string '$findme' was found in the string '$mystring'";
    echo " and exists at position $pos";
}

From the PHP Documentation for strpos

So you can just use $findme = "alternative"

瞄了个咪的 2025-01-01 19:08:47

解析此字符串的问题是您没有正常的分隔符和/或引号(即逗号分隔字段,但也可能包含在字段中 - 这与不带引号的 CSV 文件存在相同的问题)。

如果性能对你来说并不重要,我建议以更防弹的方式解析它,比如对什么是关键(如艺术家、流派、ups 等)做出一些假设,并引入一些正常的分隔符、概念证明代码是:(我留下了回声,这样你就可以看到发生了什么)

$string = "genre:Latin / Pop,latino,Pop,upc:00602527341217,artist:Luis Fonsi,luis,universal,Fonsi,Latin";
//introduce a delimiter
$delimiter = '|';
$withDelimiter = preg_replace('/([a-z]+):/', $delimiter . '$0', $string);
echo $withDelimiter . "\n";

$fields = explode($delimiter, $withDelimiter);
foreach ($fields as $field) {
    if (strlen($field)) {
        echo $field . "\n";

        list ($key, $valueWithPossiblyTrailingComma) = explode(':', $field);    

        if ($key === 'genre') {
            $genre = rtrim($valueWithPossiblyTrailingComma, ',');
            break;
        }
    }
}
echo $genre;

你可以让它在几乎所有情况下工作,它不仅可以让你找到任何类型的关键 - 但它的性能会很低。

我对你的字符串做了以下假设:

  • 它是一个 key => 的列表。由冒号分隔并与逗号键连接的值
  • 对可能仅包含 [az] 字符

your problem with parsing this string is that you don't have normal delimiter and/or quotes (i.e. comma separates fields, but may be as well included in a field - it's the same problem that exist with CSV files without quotes).

If performance does not matter a lot for you I would suggest parsing it in more bullet proof way, like make some assumption about what is a key (like artist, genre, ups, etc.) and introduce some normal delimiter, the proof of concept code would be: (i have left echoes so you can see whats happening)

$string = "genre:Latin / Pop,latino,Pop,upc:00602527341217,artist:Luis Fonsi,luis,universal,Fonsi,Latin";
//introduce a delimiter
$delimiter = '|';
$withDelimiter = preg_replace('/([a-z]+):/', $delimiter . '$0', $string);
echo $withDelimiter . "\n";

$fields = explode($delimiter, $withDelimiter);
foreach ($fields as $field) {
    if (strlen($field)) {
        echo $field . "\n";

        list ($key, $valueWithPossiblyTrailingComma) = explode(':', $field);    

        if ($key === 'genre') {
            $genre = rtrim($valueWithPossiblyTrailingComma, ',');
            break;
        }
    }
}
echo $genre;

you can make it work in nearly all cases, and it allows you to find any key not only genre - but it's performance will be low.

I have made following assumptions about your string:

  • it is a list of key => value pairs delimited by colon and concatenated with comma
  • key may have only [a-z] characters
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文