从文本文件中提取块，他们的第一行以同一字符串开始

发布于 2025-02-09 20:58:39 字数 2988 浏览 2 评论 0原文

我有一个动态的文本文件，它的内容可能具有固定的行（重复一次）和X重复块。 “ s21.g00.30.001” ，但是它们不可能相同的内容，这是内容的摘录：

S10.G00.00.001,'www.mywebsite.com' //fixed line
S10.G00.00.002,'Company name' // fixed line
S10.G00.00.003,'v2.01' //fixed line
S10.G00.00.005,'02' //fixed line
.............

S21.G00.30.001,'employee one' //block 1
S21.G00.30.002,'AAAA'
S21.G00.30.004,'BBBB' 
S21.G00.30.005,'02'
S21.G00.30.006,'16021993'
S21.G00.30.007,'4'
S21.G00.30.008,'A Renasca' 
S21.G00 whatever code here ,'some text' // 30 or 40 or 55 ...
S21.G00.30.001,'employee 2' //block 2, S21.G00.30.001 is the divider
S21.G00.30.002,'CCCC'
S21.G00.30.004,'DDDD' 
S21.G00.30.005,'02'
S21.G00 whatever code here ,'some text' // 30 or 40 or 55 ...
S21.G00.30.001,'employee 3' //block 3, S21.G00.30.001 is the divider
S21.G00.30.002,'EEEE'
S21.G00.30.004,'FFFF' 
S21.G00.30.005,'02'
S21.G00.30.007,'4'
S21.G00.30.008,'some text 3'
S21.G00 whatever code here ,'some text' // 30 or 40 or 55 ...

每个块都以相同的代码行开始，线值仅重复，我使用此方法：

$file = fopen($this->getParameter('dsn_txt_folder') . 'dsn.txt', 'r');

    if ($file) {
        while (($line = fgets($file)) !== false) {
            if (str_starts_with($line, 'S10.G00.00.001')) {
                $website = $this->getStringBetween($line, "'", "'");
            }
            if (str_starts_with($line, 'S10.G00.00.002')) {
                $companyName = $this->getStringBetween($line, "'", "'");
            }
            if (str_starts_with($line, 'S10.G00.00.003')) {
                $version = $this->getStringBetween($line, "'", "'");
            }
            .......
       }
            
       fclose($file);

    }

但是对于x重复块，我如何提取以 divider 线开始的每个块 “ s21.g00.30.001” ，但是每个块的末端未知，然后将每个块放入数组中，这样我就可以轻松地读取每行的值。

每个块之间的分隔线或分离器是“ s21.g00.30.001”。

finaly，对于这3个块，我想获得这样的数组。

array:1 [▼
  0 => array:3 [▼
    1 => array:7 [▼
      0 => "S21.G00.30.001,'employee one'"
      1 => "S21.G00.30.002,'AAAA'"
      2 => "S21.G00.30.004,'BBBB'"
      3 => "S21.G00.30.005,'02'"
      4 => "S21.G00.30.006,'16021993'"
      5 => "S21.G00.30.007,'4'"
      6 => "S21.G00.30.008,'A Renasca'"
      7 => "S21.G00.40.008,'some text'"
      8 => "whatever code here,'some text'"
    ]
    2 => array:5 [▼
      0 => "S21.G00.30.001,'employee 2'"
      1 => "S21.G00.30.002,'CCCC'"
      2 => "S21.G00.30.004,'DDDD'"
      3 => "S21.G00.30.005,'02'"
      4 => "S21.G00.30.006,'16021993'"
      5 => "whatever code here,'some text'"
    ]
    3 => array:6 [▼
      0 => "S21.G00.30.001,'employee 3'"
      1 => "S21.G00.30.002,'EEEE'"
      2 => "S21.G00.30.004,'FFFF'"
      3 => "S21.G00.30.005,'02'"
      4 => "S21.G00.30.007,'4'"
      5 => "S21.G00.30.008,'some text 3'"
      6 => "whatever code here,'some text'"
    ]
  ]
]

原文

I have a dynamic text file, and it's content could have fixed lines (repeated once) and x repeated blocks.
Each block starts with the same code line "S21.G00.30.001" , but they could haven't the same contents, this is an extract from the content:

S10.G00.00.001,'www.mywebsite.com' //fixed line
S10.G00.00.002,'Company name' // fixed line
S10.G00.00.003,'v2.01' //fixed line
S10.G00.00.005,'02' //fixed line
.............

S21.G00.30.001,'employee one' //block 1
S21.G00.30.002,'AAAA'
S21.G00.30.004,'BBBB' 
S21.G00.30.005,'02'
S21.G00.30.006,'16021993'
S21.G00.30.007,'4'
S21.G00.30.008,'A Renasca' 
S21.G00 whatever code here ,'some text' // 30 or 40 or 55 ...
S21.G00.30.001,'employee 2' //block 2, S21.G00.30.001 is the divider
S21.G00.30.002,'CCCC'
S21.G00.30.004,'DDDD' 
S21.G00.30.005,'02'
S21.G00 whatever code here ,'some text' // 30 or 40 or 55 ...
S21.G00.30.001,'employee 3' //block 3, S21.G00.30.001 is the divider
S21.G00.30.002,'EEEE'
S21.G00.30.004,'FFFF' 
S21.G00.30.005,'02'
S21.G00.30.007,'4'
S21.G00.30.008,'some text 3'
S21.G00 whatever code here ,'some text' // 30 or 40 or 55 ...

So, to get the fixed lines values witch are repeated only once, I use this method :

$file = fopen($this->getParameter('dsn_txt_folder') . 'dsn.txt', 'r');

    if ($file) {
        while (($line = fgets($file)) !== false) {
            if (str_starts_with($line, 'S10.G00.00.001')) {
                $website = $this->getStringBetween($line, "'", "'");
            }
            if (str_starts_with($line, 'S10.G00.00.002')) {
                $companyName = $this->getStringBetween($line, "'", "'");
            }
            if (str_starts_with($line, 'S10.G00.00.003')) {
                $version = $this->getStringBetween($line, "'", "'");
            }
            .......
       }
            
       fclose($file);

    }

But for x repeated blocks , how can I extract each blocks which starts with divider line
"S21.G00.30.001" but the end of each block is unknown, and then put each block inside an array, like so I can easly read the values of each line.

The divider or the separator between each block is the line with "S21.G00.30.001".

Finaly , for those 3 blocks, I'd like to get an array like this.

array:1 [▼
  0 => array:3 [▼
    1 => array:7 [▼
      0 => "S21.G00.30.001,'employee one'"
      1 => "S21.G00.30.002,'AAAA'"
      2 => "S21.G00.30.004,'BBBB'"
      3 => "S21.G00.30.005,'02'"
      4 => "S21.G00.30.006,'16021993'"
      5 => "S21.G00.30.007,'4'"
      6 => "S21.G00.30.008,'A Renasca'"
      7 => "S21.G00.40.008,'some text'"
      8 => "whatever code here,'some text'"
    ]
    2 => array:5 [▼
      0 => "S21.G00.30.001,'employee 2'"
      1 => "S21.G00.30.002,'CCCC'"
      2 => "S21.G00.30.004,'DDDD'"
      3 => "S21.G00.30.005,'02'"
      4 => "S21.G00.30.006,'16021993'"
      5 => "whatever code here,'some text'"
    ]
    3 => array:6 [▼
      0 => "S21.G00.30.001,'employee 3'"
      1 => "S21.G00.30.002,'EEEE'"
      2 => "S21.G00.30.004,'FFFF'"
      3 => "S21.G00.30.005,'02'"
      4 => "S21.G00.30.007,'4'"
      5 => "S21.G00.30.008,'some text 3'"
      6 => "whatever code here,'some text'"
    ]
  ]
]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尐偏执 2025-02-16 20:58:39

就个人而言，我首先会使用 file> file_get_get_contents （） 。然后使用 preg_match_all（） a>提取我需要的东西。您可以将此解决方案调整以使用fopen（），fgets（）和preg_match（）自行。

良好的正则表达式将准确捕获您的需求，然后根据您的逻辑来组织数据。这是一个可以处理多个“ ID”字符串的示例：

<?php

//$data = file_get_contents($this->getParameter('dsn_txt_folder') . 'dsn.txt');
$data = "
S10.G00.00.001,'www.mywebsite.com' //fixed line
S10.G00.00.002,'Company name' // fixed line
S10.G00.00.003,'v2.01' //fixed line
S10.G00.00.005,'02' //fixed line
.............

S21.G00.30.001,'employee one' //block 1
S21.G00.30.002,'AAAA'
S21.G00.30.004,'BBBB' sx
S21.G00.30.005,'02'
S21.G00.30.006,'16021993'
S21.G00.30.007,'4'
S21.G00.30.008,'A Renasca'
S21.G00.30.001,'employee 2' //block 2
S21.G00.30.002,'CCCC'
S21.G00.30.004,'DDDD' 
S21.G00.30.005,'02'
S21.G00.30.001,'employee 3' //block 3
S21.G00.30.002,'EEEE'
S21.G00.30.004,'FFFF' 
S21.G00.30.005,'02'
S21.G00.30.007,'4'
S21.G00.30.008,'some text 3'
";
$extracted = [];
$ids = [
    'S21.G00.30.',
    //'S10.G00.00.',
];
foreach($ids as $id){
  $regex = "/^".implode('\\.', explode('.', $id))."(\d{3}),'(.*)'/m"; // "/^S21\.G00\.30\.(\d{3}),'(.*)'/m"
  $matches = [];
  $block = 0;
  preg_match_all($regex, $data, $matches);
  foreach($matches[0] as $i => $full){
    if('001' === $matches[1][$i]) 
      ++$block;
    $extracted[$id][$block][$matches[1][$i]] = $matches[2][$i];
  }
}

var_export($extracted);

这将产生以下内容：

array (
  'S21.G00.30.' => 
  array (
    1 => 
    array (
      '001' => 'employee one',
      '002' => 'AAAA',
      '004' => 'BBBB',
      '005' => '02',
      '006' => '16021993',
      '007' => '4',
      '008' => 'A Renasca',
    ),
    2 => 
    array (
      '001' => 'employee 2',
      '002' => 'CCCC',
      '004' => 'DDDD',
      '005' => '02',
    ),
    3 => 
    array (
      '001' => 'employee 3',
      '002' => 'EEEE',
      '004' => 'FFFF',
      '005' => '02',
      '007' => '4',
      '008' => 'some text 3',
    ),
  ),
)

在此处查看行动： https ：//onlinephp.io/c/fc256

Personally, I would first get all the data with file_get_contents(). Then use preg_match_all() to extract what I need. You can adapt this solution to use fopen(), fgets(), and preg_match() on your own.

A good regex will capture exactly what you need, then it's up to you to organize the data according to your logic. Here is an example that can handle multiple "id" strings:

<?php

//$data = file_get_contents($this->getParameter('dsn_txt_folder') . 'dsn.txt');
$data = "
S10.G00.00.001,'www.mywebsite.com' //fixed line
S10.G00.00.002,'Company name' // fixed line
S10.G00.00.003,'v2.01' //fixed line
S10.G00.00.005,'02' //fixed line
.............

S21.G00.30.001,'employee one' //block 1
S21.G00.30.002,'AAAA'
S21.G00.30.004,'BBBB' sx
S21.G00.30.005,'02'
S21.G00.30.006,'16021993'
S21.G00.30.007,'4'
S21.G00.30.008,'A Renasca'
S21.G00.30.001,'employee 2' //block 2
S21.G00.30.002,'CCCC'
S21.G00.30.004,'DDDD' 
S21.G00.30.005,'02'
S21.G00.30.001,'employee 3' //block 3
S21.G00.30.002,'EEEE'
S21.G00.30.004,'FFFF' 
S21.G00.30.005,'02'
S21.G00.30.007,'4'
S21.G00.30.008,'some text 3'
";
$extracted = [];
$ids = [
    'S21.G00.30.',
    //'S10.G00.00.',
];
foreach($ids as $id){
  $regex = "/^".implode('\\.', explode('.', $id))."(\d{3}),'(.*)'/m"; // "/^S21\.G00\.30\.(\d{3}),'(.*)'/m"
  $matches = [];
  $block = 0;
  preg_match_all($regex, $data, $matches);
  foreach($matches[0] as $i => $full){
    if('001' === $matches[1][$i]) 
      ++$block;
    $extracted[$id][$block][$matches[1][$i]] = $matches[2][$i];
  }
}

var_export($extracted);

This will yield the following:

array (
  'S21.G00.30.' => 
  array (
    1 => 
    array (
      '001' => 'employee one',
      '002' => 'AAAA',
      '004' => 'BBBB',
      '005' => '02',
      '006' => '16021993',
      '007' => '4',
      '008' => 'A Renasca',
    ),
    2 => 
    array (
      '001' => 'employee 2',
      '002' => 'CCCC',
      '004' => 'DDDD',
      '005' => '02',
    ),
    3 => 
    array (
      '001' => 'employee 3',
      '002' => 'EEEE',
      '004' => 'FFFF',
      '005' => '02',
      '007' => '4',
      '008' => 'some text 3',
    ),
  ),
)

See it in action here: https://onlinephp.io/c/fc256

回复收藏 0 原文

辞别 2025-02-16 20:58:39

您可以按行解析文件。您将将当前块作为数组变量，将其填充，因为排除行，当新块启动时，只需将上一个块添加到最终结果阵列中。

以下代码使用基本函数（而不是$ this-＆gt;调用，就像您在问题中一样）。您可以根据需要更新代码。

<?php
// the file was placed on my server for testing
$file = fopen('test.txt','r');
// this will contain the final result
$result = [];
// currentBlock is null at first
$currentBlock = null;
while (($line = fgets($file)) !== false) {
    // extracting the line code
    $lineCode = substr($line, 0, 14);
    // checking if the row contains a value, between two '
    $rowComponents = explode("'", $line);
    if (count($rowComponents) < 2) {
        // the row is not formatted ok
        continue;
    }
    $value = $rowComponents[1];
    switch ($lineCode) {
        case 'S10.G00.00.001':
            $website = $value;
            break;
        case 'S10.G00.00.002':
            $companyName = $value;
            break;
        case 'S10.G00.00.003':
            $version = $value;
            break;
        case 'S21.G00.30.001':
            // starting a new entry
            if ($currentBlock !== null) {
                // we already have a block being parsed
                // so we added it to the final result
                $result[] = $currentBlock;
            }
            // starting the current block as an empty array
            $currentBlock = [];
            $currentBlock['property1'] = $value;
            break;
        case 'S21.G00.30.002':
            $currentBlock ['property2'] = $value;
            break;
        case 'S21.G00.30.004':
            $currentBlock ['property4'] = $value;
            break;
    }
}
// adding the last entry into the final result
// only if the block exists
if ($currentBlock !== null) {
    $result[] = $currentBlock;
}
fclose($file);
// output the result for debugging
// you also have the $website, $companyName, $version parameters populated
var_dump($result);

?>

滚动运行后，我从var_dump调用：

array(3) {
  [0]=>
  array(3) {
    ["property1"]=>
    string(12) "employee one"
    ["property2"]=>
    string(4) "AAAA"
    ["property4"]=>
    string(4) "BBBB"
  }
  [1]=>
  array(3) {
    ["property1"]=>
    string(10) "employee 2"
    ["property2"]=>
    string(4) "CCCC"
    ["property4"]=>
    string(4) "DDDD"
  }
  [2]=>
  array(3) {
    ["property1"]=>
    string(10) "employee 3"
    ["property2"]=>
    string(4) "EEEE"
    ["property4"]=>
    string(4) "FFFF"
  }
}

You can parse the file line by line. You will have the current block as an array variable, populate it as rows are parsed and, when a new block start just add the previous block to the final result array.

The following code uses basic functions (and not $this-> calls, as you have in the question). You can update the code as you wish.

<?php
// the file was placed on my server for testing
$file = fopen('test.txt','r');
// this will contain the final result
$result = [];
// currentBlock is null at first
$currentBlock = null;
while (($line = fgets($file)) !== false) {
    // extracting the line code
    $lineCode = substr($line, 0, 14);
    // checking if the row contains a value, between two '
    $rowComponents = explode("'", $line);
    if (count($rowComponents) < 2) {
        // the row is not formatted ok
        continue;
    }
    $value = $rowComponents[1];
    switch ($lineCode) {
        case 'S10.G00.00.001':
            $website = $value;
            break;
        case 'S10.G00.00.002':
            $companyName = $value;
            break;
        case 'S10.G00.00.003':
            $version = $value;
            break;
        case 'S21.G00.30.001':
            // starting a new entry
            if ($currentBlock !== null) {
                // we already have a block being parsed
                // so we added it to the final result
                $result[] = $currentBlock;
            }
            // starting the current block as an empty array
            $currentBlock = [];
            $currentBlock['property1'] = $value;
            break;
        case 'S21.G00.30.002':
            $currentBlock ['property2'] = $value;
            break;
        case 'S21.G00.30.004':
            $currentBlock ['property4'] = $value;
            break;
    }
}
// adding the last entry into the final result
// only if the block exists
if ($currentBlock !== null) {
    $result[] = $currentBlock;
}
fclose($file);
// output the result for debugging
// you also have the $website, $companyName, $version parameters populated
var_dump($result);

?>

After the scrips runs, I have the following output, from the var_dump call:

array(3) {
  [0]=>
  array(3) {
    ["property1"]=>
    string(12) "employee one"
    ["property2"]=>
    string(4) "AAAA"
    ["property4"]=>
    string(4) "BBBB"
  }
  [1]=>
  array(3) {
    ["property1"]=>
    string(10) "employee 2"
    ["property2"]=>
    string(4) "CCCC"
    ["property4"]=>
    string(4) "DDDD"
  }
  [2]=>
  array(3) {
    ["property1"]=>
    string(10) "employee 3"
    ["property2"]=>
    string(4) "EEEE"
    ["property4"]=>
    string(4) "FFFF"
  }
}

回复收藏 0 原文

~没有更多了~