解析文本文件的行,其中值由不同数量的空格字符分隔

发布于 2024-07-28 22:16:49 字数 214 浏览 8 评论 0原文

我需要在不同的数组中获取公司名称及其股票代码。 这是我存储在 txt 文件中的数据:

3M Company      MMM
99 Cents Only Stores    NDN
AO Smith Corporation    AOS
Aaron's, Inc.   AAN

等等

我如何使用正则表达式或其他一些技术来做到这一点?

I need to get the company name and its ticker symbol in different arrays. Here is my data which is stored in a txt file:

3M Company      MMM
99 Cents Only Stores    NDN
AO Smith Corporation    AOS
Aaron's, Inc.   AAN

and so on

How would I do this using regex or some other techniques?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

从来不烧饼 2024-08-04 22:16:49

迭代每一行,并使用正则表达式收集数据:

^(.+?)\s+([A-Z]+)$

反向引用 $1 将包含公司名称,$2 将包含股票代码。

您还可以使用两个或三个空格分隔符将字符串分成两部分,并修剪生成的两个字符串。 仅当您确定公司名称和股票代码始终由足够的空格分隔,并且公司名称本身不包含那么多空格时,此方法才有效。

Iterate over each line, and collect the data with a regular expression:

^(.+?)\s+([A-Z]+)$

The backreference $1 will contain the company name, $2 will contain the ticker symbol.

You can also split the string in two with a two or three-space delimiter and trim the resulting two strings. This only works if you are sure the company name and ticker symbol are always separated by enough spaces, and the company name itself doesn't contain that amount of spaces.

眸中客 2024-08-04 22:16:49

文本文件的格式是强加给你的吗? 如果您可以选择,我建议您不要使用空格来分隔文本文件中的字段。 相反,使用 | 或 $$ 或您可以放心的内容不会出现在内容中,然后将其拆分为数组。

Is the format of the text file imposed on you? If you have the choice, I'd suggest you don't use spaces to separate the fields in the text file. Instead, use | or $$ or something you can be assured won't appear in the content, then just split it to an array.

╭⌒浅淡时光〆 2024-08-04 22:16:49

尝试这个正则表达式:

(.+)\s*([A-Z]{3})$

也许具有更多 PHP 经验的人可以使用 preg_split 或类似的东西。

Try this regular expression:

(.+)\s*([A-Z]{3})$

Perhaps someone with more PHP experience could flesh out a code example using preg_split or something similar.

指尖凝香 2024-08-04 22:16:49

使用可变空格作为两列文本之间的分隔符,有多种方法可以做到这一点。

您可以使用 file() 逐行处理文本文件,并使用 preg_split() 在变量空格上分隔文本,变量空格后跟一系列大写字母后跟字符串末尾,或者您可以将 file_get_contents()preg_match_all() 结合使用,然后使用 array_column() 提取捕获的两个列。 虽然后者可能会快一点,因为它只进行 1 个 preg_ 函数调用,但该决定可能取决于开发人员的编码品味和输入文本的复杂性。

代码:(演示)

//$lines = file('your_text_file.txt', FILE_IGNORE_NEW_LINES);
$lines = [
    '3M Company      MMM',
    '99 Cents Only Stores    NDN',
    'AO Smith Corporation    AOS',
    'Aaron\'s, Inc.   AAN',
];

foreach ($lines as $line) {
    [$names[], $symbols[]] = preg_split('~\s+(?=[A-Z]+$)~m', $line);
}
var_export($names);
echo "\n---\n";
var_export($symbols);

或:

//$text = file_get_contents('your_text_file.txt');
$text = <<<TEXT
3M Company      MMM
99 Cents Only Stores    NDN
AO Smith Corporation    AOS
Aaron's, Inc.   AAN
TEXT;

preg_match_all('~(.+?)\s+([A-Z]+)$~m', $text, $matches, PREG_SET_ORDER);
var_export(array_column($matches, 1));
echo "\n---\n";
var_export(array_column($matches, 2));

With variable whitespaces as the delimiter between your two columns of text, there will be several ways to do this.

You could process the text file line-by-line with file() and use preg_split() to separate the text on variable spaces that are followed by a sequence of uppercase letters followed by the end of the string, or you could use file_get_contents() with preg_match_all() then extract the two captured columns with array_column(). While the latter may be a little faster since it only makes 1 preg_ function call, the decision is likely to come down to the developer's coding tastes and the complexity of the input text.

Code: (Demo)

//$lines = file('your_text_file.txt', FILE_IGNORE_NEW_LINES);
$lines = [
    '3M Company      MMM',
    '99 Cents Only Stores    NDN',
    'AO Smith Corporation    AOS',
    'Aaron\'s, Inc.   AAN',
];

foreach ($lines as $line) {
    [$names[], $symbols[]] = preg_split('~\s+(?=[A-Z]+$)~m', $line);
}
var_export($names);
echo "\n---\n";
var_export($symbols);

Or:

//$text = file_get_contents('your_text_file.txt');
$text = <<<TEXT
3M Company      MMM
99 Cents Only Stores    NDN
AO Smith Corporation    AOS
Aaron's, Inc.   AAN
TEXT;

preg_match_all('~(.+?)\s+([A-Z]+)$~m', $text, $matches, PREG_SET_ORDER);
var_export(array_column($matches, 1));
echo "\n---\n";
var_export(array_column($matches, 2));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文