从数组中查找一个值及其在 PHP 文本文件中对应的键

发布于 2024-12-25 07:23:02 字数 676 浏览 0 评论 0原文

我有一个相当大的 txt 文件 (3.5 MB),结构如下:

sweep#1 expanse#1   0.375
loftiness#1 highness#2  0.375
lockstep#1  0.25
laziness#2  0.25
treponema#1 0.25
rhizopodan#1 rhizopod#1 0.25
plumy#3 feathery#3 feathered#1  -0.125
ruffled#2 frilly#1 frilled#1    -0.125
fringed#2   -0.125
inflamed#3  -0.125
inlaid#1    -0.125

每个单词后面跟着一个 #、一个整数,然后是它的“分数”。单词和乐谱之间有制表符。截至目前,文本文件是使用 file_get_contents() 作为字符串加载的。

从由单独的、小写的、去除字符的单词组成的字符串数组中,我需要查找每个值,找到其相应的分数并将其添加到运行总计中强>。

我想我需要某种形式的正则表达式来首先找到该单词,继续到下一个 \t然后将整数添加到运行总计中。解决这个问题的最佳方法是什么?

I have a sizeable txt file (3.5 MB) structured like so:

sweep#1 expanse#1   0.375
loftiness#1 highness#2  0.375
lockstep#1  0.25
laziness#2  0.25
treponema#1 0.25
rhizopodan#1 rhizopod#1 0.25
plumy#3 feathery#3 feathered#1  -0.125
ruffled#2 frilly#1 frilled#1    -0.125
fringed#2   -0.125
inflamed#3  -0.125
inlaid#1    -0.125

Each word is followed by a #, an integer and then its "score." There are tab breaks in between the word and score. As of right now, the textfile is loaded as a string using file_get_contents().

From an array of strings made up of individual, lower-case, character-stripped words, I need to look up each value, find its corresponding score and add it to a running total.

I imagine I would need some form of regex to first find the word, continue to the next \t and then add the integer to a running total. What's the best way of going about this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

秋心╮凉 2025-01-01 07:23:02

是的,可能有更好的方法来做到这一点。但这是如此简单:

<?php

$wordlist = file_get_contents("wordlist.txt");

//string string of invalid chars and make it lowercase
$string = "This is the best sentence ever! Winning!";
$string = strtolower($string);
$string = preg_replace('/[^\w\d_ -]/si', '', $string);
$words = explode(" ", $string);

$lines = explode("\n", $wordlist);
$scores = array();
foreach ($lines as $line) {
    $split = preg_split("/(\#|\t)/", $line); //split on # or tab
    $scores[$split[0]] = doubleval(array_pop($split));
    //split[0] (first element) contains the word
    //array_pop (last element) contains score
}

$total = 0;
foreach($words as $word) {
    if (isset($scores[$word])) $total += $scores[$word];
}

echo $total;
?>

Yes, there are probably better ways of doing this. But this is so oh-so-simple:

<?php

$wordlist = file_get_contents("wordlist.txt");

//string string of invalid chars and make it lowercase
$string = "This is the best sentence ever! Winning!";
$string = strtolower($string);
$string = preg_replace('/[^\w\d_ -]/si', '', $string);
$words = explode(" ", $string);

$lines = explode("\n", $wordlist);
$scores = array();
foreach ($lines as $line) {
    $split = preg_split("/(\#|\t)/", $line); //split on # or tab
    $scores[$split[0]] = doubleval(array_pop($split));
    //split[0] (first element) contains the word
    //array_pop (last element) contains score
}

$total = 0;
foreach($words as $word) {
    if (isset($scores[$word])) $total += $scores[$word];
}

echo $total;
?>
风吹雨成花 2025-01-01 07:23:02

如果您只需要查找一个单词,那么就很简单:

preg_match("/^$word#\d+\t+(\d+\.\d+)/m", $textfile, $match);
$sum += floatval($match[1]);

^/m 模式查找行的开头,而 # > 和 \t 是文字分隔符,而 \d+ 匹配小数。结果组 [1] 将是您的浮点数。

$word 需要转义 (preg_quote),因为它本身可能包含 / 正斜杠。要一次性搜索多个单词,请将它们内爆为替代列表 $word1|$word2|$word3,添加捕获组,然后使用 preg_match_all 代替。

If you just need to find a word, then it's as simple as:

preg_match("/^$word#\d+\t+(\d+\.\d+)/m", $textfile, $match);
$sum += floatval($match[1]);

^ looks for the start of a line in /m mode, and # and \t are literal separators, while \d+ matches decimals. The result group [1] will be your float number.

The $word needs escaping (preg_quote) could it potentially contain a / forward slash itself. To search multiple words in one go implode them as alternatives list $word1|$word2|$word3, add a capture group, and use preg_match_all instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文