用 perl 分割变化的字符串

发布于 2024-09-11 07:18:38 字数 226 浏览 5 评论 0原文

我在 Perl 中有一堆字符串,它们看起来都是这样的:

10 NE HARRISBURG
4 E HASWELL
2 SE OAKLEY
6 SE REDBIRD
PROVO
6 W EADS
21 N HARRISON

我需要做的是删除城市名称之前的数字和字母。我遇到的问题是,每个城市的情况差异很大。数据几乎从不相同。是否可以删除这些数据并将其保存在单独的字符串中?

I have a bunch of strings in perl that all look like this:

10 NE HARRISBURG
4 E HASWELL
2 SE OAKLEY
6 SE REDBIRD
PROVO
6 W EADS
21 N HARRISON

What I am needing to do is remove the numbers and the letters from before the city names. The problem I am having is that it varies a lot from city to city. The data is almost never the same. Is it possible to remove this data and keep it in a separate string?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

半边脸i 2024-09-18 07:18:38

试试这个:

for my $s (@strings) {
    my @fields = split /\s+/, $s, 3;
    my $city = $fields[-1];
}

您可以测试数组大小以确定字段的数量:

my $n = @fields;

Try this:

for my $s (@strings) {
    my @fields = split /\s+/, $s, 3;
    my $city = $fields[-1];
}

You can test the array size to determine the number of fields:

my $n = @fields;
樱娆 2024-09-18 07:18:38
my @l = (
'10 NE HARRISBURG',
'4 E HASWELL',
'2 SE OAKLEY',
'6 SE REDBIRD',
'PROVO',
'6 W EADS',
'21 N HARRISON',
);

foreach(@l) {

根据 hoobs 我更改了正则表达式输出

    my($beg, $rest) = ($_ =~ /^(\d*\s(?:[NS]|[NS]?[EW])*)?(.*)$/);
    print "beg=$beg \trest=$rest\n";    
}

beg=10 NE   rest=HARRISBURG
beg=4 E     rest=HASWELL
beg=2 SE    rest=OAKLEY
beg=6 SE    rest=REDBIRD
beg=    rest=PROVO
beg=6 W     rest=EADS
beg=21 N    rest=HARRISON

对于 shinjuo,如果您只想运行一个字符串,您可以这样做:

  my($beg, $rest) = ($l[3] =~ /^(\d*\s(?:[NS]|[NS]?[EW])*)?(.*)$/);
  print "beg=$beg \trest=$rest\n";

并且为了避免对未初始化值发出警告,您必须测试 $beg 是否已定义:

print defined$beg?"beg=$beg\t":"", "rest=$rest\n";
my @l = (
'10 NE HARRISBURG',
'4 E HASWELL',
'2 SE OAKLEY',
'6 SE REDBIRD',
'PROVO',
'6 W EADS',
'21 N HARRISON',
);

foreach(@l) {

according to hoobs i changed the regex

    my($beg, $rest) = ($_ =~ /^(\d*\s(?:[NS]|[NS]?[EW])*)?(.*)$/);
    print "beg=$beg \trest=$rest\n";    
}

output:

beg=10 NE   rest=HARRISBURG
beg=4 E     rest=HASWELL
beg=2 SE    rest=OAKLEY
beg=6 SE    rest=REDBIRD
beg=    rest=PROVO
beg=6 W     rest=EADS
beg=21 N    rest=HARRISON

for shinjuo, if you want to run only one string you can do :

  my($beg, $rest) = ($l[3] =~ /^(\d*\s(?:[NS]|[NS]?[EW])*)?(.*)$/);
  print "beg=$beg \trest=$rest\n";

and to avoid warning on uninitialized value you have to test if $beg is defined:

print defined$beg?"beg=$beg\t":"", "rest=$rest\n";
时光瘦了 2024-09-18 07:18:38

看起来您总是想要 split() 结果中的最后一个元素。或者你可以使用 m/(\S+)$/。

Looks like you always want the very last element in the result of split(). Or you can go with m/(\S+)$/.

腻橙味 2024-09-18 07:18:38

难道我们不能假设总是有一个城市名称并且它出现在一行的最后吗?如果是这种情况,请将线分开并保留最后一部分。这是一个单行命令行解决方案:

perl -lne 'split ; print $_[-1]' input.txt

输出:

HARRISBURG
HASWELL
OAKLEY
REDBIRD
PROVO
EADS
HARRISON

更新 1

如果您编写了像 SAN FRANCISCO 这样的城市名称(下面的评论中发现了这种情况),则此解决方案将不起作用。

您的输入数据来自哪里?如果您自己生成,则应添加分隔符。如果有人为您生成了它,请要求他们使用分隔符重新生成它。解析它就变得轻而易举了。

# replace ";" for your delimiter
perl -lne 'split ";" ; print $_[-1]' input.txt

Can't we assume there is always a city name and that it appears last on a line? If that's the case, split the line and keep the last portion of it. Here's a one liner command line solution:

perl -lne 'split ; print $_[-1]' input.txt

Output:

HARRISBURG
HASWELL
OAKLEY
REDBIRD
PROVO
EADS
HARRISON

Update 1

This solution won't work if you have composed city names like SAN FRANCISCO (case spotted in a comment below).

Where is your input data coming from? If you have generated it yourself, you should add delimiters. If someone generated it for you, ask them to regenerate it with delimiters. Parsing it will then become child's play.

# replace ";" for your delimiter
perl -lne 'split ";" ; print $_[-1]' input.txt
月牙弯弯 2024-09-18 07:18:38

正则表达式解决方案 解决

方案 1:保留所有内容(vol7ron 的电子邮件解决方案)


#!/usr/bin/perl -w    

use strict; 
use Data::Dumper;   

   sub main{    
      my @strings = (    
                      '10 NE HARRISBURG'    
                    , '4 E HASWELL'    
                    , '2 SE OAKLEY'    
                    , '6 SE REDBIRD'    
                    , 'PROVO'    
                    , '6 W EADS'    
                    , '21 N HARRISON'    
                    , '32 SAN FRANCISCO' 
                    , ''   
                    , '15 NEW YORK'    
                    , '15 NNW NEW YORK'    
                    , '15 NW NEW YORK'     
                    , 'NW NEW YORK'    
                    );       

      my %hash;
      my $count=0;
      for (@strings){    
         if (/\d*\s*[NS]{0,2}[EW]{0,1}\s+/){
            # if there was a speed / direction
            $hash{$count}{wind} = 
amp;;
            $hash{$count}{city} = 

解决方案 2:去掉不需要的内容


#!/usr/bin/perl -w    

use strict;    

   sub main{    
      my @strings = (    
                      '10 NE HARRISBURG'    
                    , '4 E HASWELL'    
                    , '2 SE OAKLEY'    
                    , '6 SE REDBIRD'    
                    , 'PROVO'    
                    , '6 W EADS'    
                    , '21 N HARRISON'    
                    , '32 SAN FRANCISCO'    
                    , '15 NEW YORK'    
                    , '15 NNW NEW YORK'    
                    , '15 NW NEW YORK'     
                    , 'NW NEW YORK'     
                    );    

      for my $elem (@strings){    
         $elem =~ s/\d*\s*[NS]{0,2}[EW]{0,1}\s+(\w*)/$1/;    
      }    

      $"="\n";    
      print "@strings\n";        
   }    
       
   main();    

更新:

使用 vol7ron 进行更改 的建议和示例,使用重复运算符有效。这将去掉前导数字和方向,并且如果数字或方向(或两者)丢失也不会中断。

; } else { # if there was only a city $hash{$count}{city} = $_; } $count++; } print Dumper(\%hash); } main();

解决方案 2:去掉不需要的内容



更新:

使用 vol7ron 进行更改 的建议和示例,使用重复运算符有效。这将去掉前导数字和方向,并且如果数字或方向(或两者)丢失也不会中断。

Regex Solution

Solution 1: Keep everything (vol7ron's emailed solution)


#!/usr/bin/perl -w    

use strict; 
use Data::Dumper;   

   sub main{    
      my @strings = (    
                      '10 NE HARRISBURG'    
                    , '4 E HASWELL'    
                    , '2 SE OAKLEY'    
                    , '6 SE REDBIRD'    
                    , 'PROVO'    
                    , '6 W EADS'    
                    , '21 N HARRISON'    
                    , '32 SAN FRANCISCO' 
                    , ''   
                    , '15 NEW YORK'    
                    , '15 NNW NEW YORK'    
                    , '15 NW NEW YORK'     
                    , 'NW NEW YORK'    
                    );       

      my %hash;
      my $count=0;
      for (@strings){    
         if (/\d*\s*[NS]{0,2}[EW]{0,1}\s+/){
            # if there was a speed / direction
            $hash{$count}{wind} = 
amp;;
            $hash{$count}{city} = 

Solution 2: Strip off what you don't need


#!/usr/bin/perl -w    

use strict;    

   sub main{    
      my @strings = (    
                      '10 NE HARRISBURG'    
                    , '4 E HASWELL'    
                    , '2 SE OAKLEY'    
                    , '6 SE REDBIRD'    
                    , 'PROVO'    
                    , '6 W EADS'    
                    , '21 N HARRISON'    
                    , '32 SAN FRANCISCO'    
                    , '15 NEW YORK'    
                    , '15 NNW NEW YORK'    
                    , '15 NW NEW YORK'     
                    , 'NW NEW YORK'     
                    );    

      for my $elem (@strings){    
         $elem =~ s/\d*\s*[NS]{0,2}[EW]{0,1}\s+(\w*)/$1/;    
      }    

      
quot;="\n";    
      print "@strings\n";        
   }    
       
   main();    

Update:

Making the changes with vol7ron's suggestion and example, using the repetition operator worked. This will strip off leading digits and the direction and won't break if the digits or direction (or both) are missing.

; } else { # if there was only a city $hash{$count}{city} = $_; } $count++; } print Dumper(\%hash); } main();

Solution 2: Strip off what you don't need



Update:

Making the changes with vol7ron's suggestion and example, using the repetition operator worked. This will strip off leading digits and the direction and won't break if the digits or direction (or both) are missing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文