如何将多行处理/存储到从 perl 文件中读取的单个字段中？

发布于 2024-11-08 13:30:54 字数 847 浏览 8 评论 0原文

我正在尝试用 perl 处理文本文件。我需要将文件中的数据存储到数据库中。我遇到的问题是某些字段包含换行符，这让我有点困惑。包含这些字段的最佳方式是什么？

示例 data.txt 文件：

ID|Title|Description|Date
1|Example 1|Example Description|10/11/2011
2|Example 2|A long example description
Which contains
a bunch of newlines|10/12/2011
3|Example 3|Short description|10/13/2011

当前（损坏的）Perl 脚本（示例）：

#!/usr/bin/perl -w
use strict;

open (MYFILE, 'data.txt');
while (<MYFILE>) {
    chomp;
    my ($id, $title, $description, $date) = split(/\|/);

    if ($id ne 'ID') {
        # processing certain fields (...)

        # insert into the database (example)
        $sqlInsert->execute($id, $title, $description, $date);
    }
}
close (MYFILE);

正如您从示例中看到的，在 ID 2 的情况下，它被分成几行，从而在尝试引用那些未定义的变量时导致错误。您如何将它们分组到正确的领域？

提前致谢！（我希望问题足够清楚，很难定义标题）

原文

I am trying to process a text file in perl. I need to store the data from the file into a database.
The problem that I'm having is that some fields contain a newline, which throws me off a bit.
What would be the best way to contain these fields?

Example data.txt file:

ID|Title|Description|Date
1|Example 1|Example Description|10/11/2011
2|Example 2|A long example description
Which contains
a bunch of newlines|10/12/2011
3|Example 3|Short description|10/13/2011

The current (broken) Perl script (example):

#!/usr/bin/perl -w
use strict;

open (MYFILE, 'data.txt');
while (<MYFILE>) {
    chomp;
    my ($id, $title, $description, $date) = split(/\|/);

    if ($id ne 'ID') {
        # processing certain fields (...)

        # insert into the database (example)
        $sqlInsert->execute($id, $title, $description, $date);
    }
}
close (MYFILE);

As you can see from the example, in the case of ID 2, it's broken into several lines causing errors when attempting to reference those undefined variables. How would you group them into the correct field?

Thanks in advance! (I hope the question was clear enough, difficult to define the title)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只是我以为 2024-11-15 13:30:54

我只是在分割线之前计算分隔符的数量。如果还不够，请阅读下一行并添加它。 tr 运算符是一种高效的计算字符数的方法。

#!/usr/bin/perl -w
use strict;
use warnings;

open (MYFILE, '<', 'data.txt');
while (<MYFILE>) {
    # Continue reading while line incomplete:
    while (tr/|// < 3) {
        my $next = <MYFILE>;
        die "Incomplete line at end" unless defined $next;
        $_ .= $next;
    }

    # Remaining code unchanged:
    chomp;
    my ($id, $title, $description, $date) = split(/\|/);

    if ($id ne 'ID') {
        # processing certain fields (...)

        # insert into the database (example)
        $sqlInsert->execute($id, $title, $description, $date);
    }
}
close (MYFILE);

I would just count the number of separators before splitting the line. If you don't have enough, read the next line and append it. The tr operator is an efficient way to count characters.

#!/usr/bin/perl -w
use strict;
use warnings;

open (MYFILE, '<', 'data.txt');
while (<MYFILE>) {
    # Continue reading while line incomplete:
    while (tr/|// < 3) {
        my $next = <MYFILE>;
        die "Incomplete line at end" unless defined $next;
        $_ .= $next;
    }

    # Remaining code unchanged:
    chomp;
    my ($id, $title, $description, $date) = split(/\|/);

    if ($id ne 'ID') {
        # processing certain fields (...)

        # insert into the database (example)
        $sqlInsert->execute($id, $title, $description, $date);
    }
}
close (MYFILE);

回复收藏 0 原文

自在安然 2024-11-15 13:30:54

阅读下一行，直到字段数量满足您的需要。类似的东西（我还没有测试过该代码）：

my @fields = split(/\|/);
unless ($#fields == 3) { # Repeat untill we get 4 fields in array

  <MYFILE>; # Read next line      
  chomp;

  # Split line
  my @add_fields = split(/\|/); 

  # Concatenate last element of first line with first element of the current line
  $fields[$#fields] = $fields[$#fields] . $add_fields[0]; 

  # Concatenate remaining array part
  push(@fields, @add_fields[1,$#add_fields]);

}

Read next line until number of fields is what you need. Something like that (I haven't tested that code):

my @fields = split(/\|/);
unless ($#fields == 3) { # Repeat untill we get 4 fields in array

  <MYFILE>; # Read next line      
  chomp;

  # Split line
  my @add_fields = split(/\|/); 

  # Concatenate last element of first line with first element of the current line
  $fields[$#fields] = $fields[$#fields] . $add_fields[0]; 

  # Concatenate remaining array part
  push(@fields, @add_fields[1,$#add_fields]);

}

回复收藏 0 原文

蓝色星空 2024-11-15 13:30:54

如果您可以更改 data.txt 文件以将管道分隔符作为每行/记录中的最后一个字符，那么您可以吞入整个文件，直接拆分到原始字段中。然后这段代码将执行您想要的操作：

#!/usr/bin/perl
use strict;
use warnings;

my @fields;
{
  $/ = "|";
  open (MYFILE, 'C:/data.txt') or die "$!";
  @fields = <MYFILE>;
  close (MYFILE);

  for(my $i = 0; $i < scalar(@fields); $i = $i + 4) {
    my $id = $fields[$i];
    my $title = $fields[$i+1];
    my $description = $fields[$i+2];
    my $date = $fields[$i+3];
    if ($id =~ m/^\d+$/) {
        # processing certain fields (...)

        # insert into the database (example)
    }
  }
}

If you could change your data.txt file to include the pipe separator as the last character in every line/record, you could slurp in the whole file, splitting directly into the raw fields. This code would then do what you want:

#!/usr/bin/perl
use strict;
use warnings;

my @fields;
{
  $/ = "|";
  open (MYFILE, 'C:/data.txt') or die "$!";
  @fields = <MYFILE>;
  close (MYFILE);

  for(my $i = 0; $i < scalar(@fields); $i = $i + 4) {
    my $id = $fields[$i];
    my $title = $fields[$i+1];
    my $description = $fields[$i+2];
    my $date = $fields[$i+3];
    if ($id =~ m/^\d+$/) {
        # processing certain fields (...)

        # insert into the database (example)
    }
  }
}

回复收藏 0 原文

~没有更多了~