如何读取分隔文件的部分内容

发布于 2024-11-29 14:30:42 字数 1973 浏览 4 评论 0原文

我有一个如下所示的文件：

1028806~HDR~20110815~15-AUG-2011~C~23:10~~~~~~~
1028806~DTL~C3914A~HWP-C3914A~1000949~A~LASERJET MAINT KIT 8100/N/DN~HEWLETT PACKARD~2659~12~0~0~475.75~658.75~0~3~Y~2~~2~475.75~5~~~009088336~~3179~10.60~N~8.25~8.50~20.50~~088698601976~44103109~6A~20030627~NNY~~A~S~~~~~~N~~~~~~20.50~8.50~8.25~~~~~~~~~~~~~~~~
1028806~DTL~70023301~OKI-70023301~1002121~A~OKILAN 6020E+ 10/100BASE-TX ETHERNET EXT~OKI PRINTING SOLUTIONS~2703~0~0~0~55.17~80.00~0~0~Y~0~~0~55.17~0~~~009117000~~2160~2.79~N~8.00~8.75~14.00~~000000180016~44101700~ACC-IMPACT~19950723~NNY~~A~S~~~~~~N~~~~~~14.00~8.75~8.00~~~~~~~~~~~~~~~~
1028806~DTL~PRO7T~APC-PRO7T~1003150~A~Professional-grade Protection for Computers and Electronics~AMERICAN POWER CONVERSION~20664~7~0~0~21.60~36.00~0~0~Y~0~~0~21.60~7~~~008112000~~4400~2.00~N~1.90~6.90~12.40~~731304000181~39121610~SURG~19950723~NNY~~A~S~~~~~~N~~~~~~12.40~6.90~1.90~~~~~~~~~~~~~~~~
1028806~DTL~PER7~APC-PER7~1003418~A~Surge suppressor ( external ) / 7 output connector(s)~AMERICAN POWER CONVERSION~20664~496~50~0~9.30~15.25~0~3~Y~86~~363~9.30~44~~~008118000~~4400~1.85~N~2.10~6.90~11.50~~731304000112~39121610~SURG~20011025~NNY~~A~S~~~~~~N~~~~~~11.50~6.90~2.10~~~~~~~~~~~~~~~~
1028806~DTL~PRO7~APC-PRO7~1003761~A~APC SurgeArrest Professional - Surge suppressor ( external ) - AC 120 V - 7 outp~AMERICAN POWER CONVERSION~20664~88~0~0~17.59~30.00~0~0~Y~12~~52~17.59~24~~~008112000~~4400~1.95~N~2.25~7.50~12.25~~731304000174~39121610~SURG~19950723~NNY~~A~S~~~~~~N~~~~~~12.25~7.50~2.25~~~~~~~~~~~~~~~~

我需要使用脚本来读取每行的某些部分（粗体部分）：

1028806~DTL~C3914A~HWP-C3914A~1000949~A~< strong>LASERJET 维护套件 8100/N/DN~HEWLETT包装~2659~12~0~0~475.75~658.75~0~3~Y~2~~2~475.75~5~~~009088336~~3179~10.60~N~8.25~8.50~2 0.50~~088698601976~44103109~6A~20030627~NNY~~A~S~~~~~~N~~~~~~20.50~8.50~8.25~~~~~~~~~~~~~~~ ～

该文件有超过 300k 的项目，因此无法手动浏览，那么当我不知道部分 # 和描述有多长时，如何让脚本只读取这些部分？同时忽略所有其他 ~ 字符。

谢谢

原文

I have a file that looks like this:

1028806~HDR~20110815~15-AUG-2011~C~23:10~~~~~~~
1028806~DTL~C3914A~HWP-C3914A~1000949~A~LASERJET MAINT KIT 8100/N/DN~HEWLETT PACKARD~2659~12~0~0~475.75~658.75~0~3~Y~2~~2~475.75~5~~~009088336~~3179~10.60~N~8.25~8.50~20.50~~088698601976~44103109~6A~20030627~NNY~~A~S~~~~~~N~~~~~~20.50~8.50~8.25~~~~~~~~~~~~~~~~
1028806~DTL~70023301~OKI-70023301~1002121~A~OKILAN 6020E+ 10/100BASE-TX ETHERNET EXT~OKI PRINTING SOLUTIONS~2703~0~0~0~55.17~80.00~0~0~Y~0~~0~55.17~0~~~009117000~~2160~2.79~N~8.00~8.75~14.00~~000000180016~44101700~ACC-IMPACT~19950723~NNY~~A~S~~~~~~N~~~~~~14.00~8.75~8.00~~~~~~~~~~~~~~~~
1028806~DTL~PRO7T~APC-PRO7T~1003150~A~Professional-grade Protection for Computers and Electronics~AMERICAN POWER CONVERSION~20664~7~0~0~21.60~36.00~0~0~Y~0~~0~21.60~7~~~008112000~~4400~2.00~N~1.90~6.90~12.40~~731304000181~39121610~SURG~19950723~NNY~~A~S~~~~~~N~~~~~~12.40~6.90~1.90~~~~~~~~~~~~~~~~
1028806~DTL~PER7~APC-PER7~1003418~A~Surge suppressor ( external ) / 7 output connector(s)~AMERICAN POWER CONVERSION~20664~496~50~0~9.30~15.25~0~3~Y~86~~363~9.30~44~~~008118000~~4400~1.85~N~2.10~6.90~11.50~~731304000112~39121610~SURG~20011025~NNY~~A~S~~~~~~N~~~~~~11.50~6.90~2.10~~~~~~~~~~~~~~~~
1028806~DTL~PRO7~APC-PRO7~1003761~A~APC SurgeArrest Professional - Surge suppressor ( external ) - AC 120 V - 7 outp~AMERICAN POWER CONVERSION~20664~88~0~0~17.59~30.00~0~0~Y~12~~52~17.59~24~~~008112000~~4400~1.95~N~2.25~7.50~12.25~~731304000174~39121610~SURG~19950723~NNY~~A~S~~~~~~N~~~~~~12.25~7.50~2.25~~~~~~~~~~~~~~~~

I need to use a script to read certain parts of each line (the bold parts):

1028806~DTL~C3914A~HWP-C3914A~1000949~A~LASERJET MAINT KIT 8100/N/DN~HEWLETT PACKARD~2659~12~0~0~475.75~658.75~0~3~Y~2~~2~475.75~5~~~009088336~~3179~10.60~N~8.25~8.50~20.50~~088698601976~44103109~6A~20030627~NNY~~A~S~~~~~~N~~~~~~20.50~8.50~8.25~~~~~~~~~~~~~~~~

The file has over 300k items so going through manually is not an option, so how can I get a script to read only these parts when I don't know how long the part # and descriptions are? While ignoring all the other ~ characters.

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝天白云 2024-12-06 14:30:42

fgetcsv() 可以在这里提供帮助，多一点内存 -比一次加载整个文件并将所有行explode()'到一个巨大的数组中更保守。

if (($handle = fopen("/path/to/file", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, "~")) !== FALSE) {
       echo $data[2] . " " . $data[6];
    }
}
fclose($handle);

fgetcsv() can help here, a little more memory-conservative than loading the whole file up at once and explode()'ing all the lines into a giant array.

if (($handle = fopen("/path/to/file", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, "~")) !== FALSE) {
       echo $data[2] . " " . $data[6];
    }
}
fclose($handle);

回复收藏 0 原文

攒一口袋星星 2024-12-06 14:30:42

看起来你可以在波浪号上爆炸：

$fields = explode('~', $line);
$part_num = $fields[2];
$desc = $fields[6];

Looks like you can explode on tilde:

$fields = explode('~', $line);
$part_num = $fields[2];
$desc = $fields[6];

回复收藏 0 原文

樱娆 2024-12-06 14:30:42

// read the file
$lines = file('file.txt');

// loop through each line
foreach($lines as $line){
    // separate the parts by the ~ delimiter up to the second bold part
    // ignoring the rest of ~
    $parts = explode('~', $line, 7);
    echo $parts[2]; // output first bold part
    echo $parts[6]; // output second bold part
}

// read the file
$lines = file('file.txt');

// loop through each line
foreach($lines as $line){
    // separate the parts by the ~ delimiter up to the second bold part
    // ignoring the rest of ~
    $parts = explode('~', $line, 7);
    echo $parts[2]; // output first bold part
    echo $parts[6]; // output second bold part
}

回复收藏 0 原文

~没有更多了~