如何在 PHP 中检测字符串中的分隔符?

发布于 2024-09-29 14:25:50 字数 1238 浏览 0 评论 0原文

我很好奇如果你有一个字符串你会如何检测分隔符?

我们知道php可以使用explode()来分割字符串,这需要一个分隔符参数。

但是在将分隔符发送到爆炸函数之前检测分隔符的方法怎么样?

现在我只是将字符串输出给用户,然后他们输入分隔符。没关系,但我正在寻找适合我的模式识别应用程序。

我应该在字符串中寻找正则表达式来进行这种类型的模式识别吗?

编辑:我最初未能指定可能存在一组预期的分隔符。 CSV 中可能使用的任何分隔符。因此从技术上讲,任何人都可以使用任何字符来分隔 CSV 文件,但更有可能使用以下字符之一:逗号、分号、竖线和空格。

编辑2:这是我为“确定的分隔符”提出的可行解决方案。

$get_images = "86236058.jpg 86236134.jpg 86236134.jpg";

    //Detection of delimiter of image filenames.
        $probable_delimiters = array(",", " ", "|", ";");

        $delimiter_count_array = array(); 

        foreach ($probable_delimiters as $probable_delimiter) {

            $probable_delimiter_count = substr_count($get_images, $probable_delimiter);
            $delimiter_count_array[$probable_delimiter] = $probable_delimiter_count;

        }

        $max_value = max($delimiter_count_array);
        $determined_delimiter_array = array_keys($delimiter_count_array, max($delimiter_count_array));

        while( $element = each( $determined_delimiter_array ) ){
        $determined_delimiter_count = $element['key'];
        $determined_delimiter = $element['value'];
        }

        $images = explode("{$determined_delimiter}", $get_images);

I am curious if you have a string how would you detect the delimiter?

We know php can split a string up with explode() which requires a delimiter parameter.

But what about a method to detect the delimiter before sending it to explode function?

Right now I am just outputting the string to the user and they enter the delimiter. That's fine -- but I am looking for the application to pattern recognize for me.

Should I look to regular expressions for this type of pattern recognition in a string?

EDIT: I have failed to initially specify that there is a likely expected set of delimiters. Any delimiter that is probably used in a CSV. So technically anyone could use any character to delimit a CSV file but it is more probable to use one of the following characters: comma, semicolon, vertical bar and a space.

EDIT 2: Here is the workable solution I came up with for a "determined delimiter".

$get_images = "86236058.jpg 86236134.jpg 86236134.jpg";

    //Detection of delimiter of image filenames.
        $probable_delimiters = array(",", " ", "|", ";");

        $delimiter_count_array = array(); 

        foreach ($probable_delimiters as $probable_delimiter) {

            $probable_delimiter_count = substr_count($get_images, $probable_delimiter);
            $delimiter_count_array[$probable_delimiter] = $probable_delimiter_count;

        }

        $max_value = max($delimiter_count_array);
        $determined_delimiter_array = array_keys($delimiter_count_array, max($delimiter_count_array));

        while( $element = each( $determined_delimiter_array ) ){
        $determined_delimiter_count = $element['key'];
        $determined_delimiter = $element['value'];
        }

        $images = explode("{$determined_delimiter}", $get_images);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

谷夏 2024-10-06 14:25:51

我想说这在 99.99% 的情况下都有效:)
基本思想是,有效分隔符的数量应该逐行相同。
该脚本计算所有行之间的分隔符计数差异。
差异越小意味着分隔符越可能有效。

将所有这些放在一起,该函数读取行并将其作为数组返回:

function readCSV($fileName)
{
    //detect these delimeters
    $delA = array(";", ",", "|", "\t");
    $linesA = array();
    $resultA = array();

    $maxLines = 20; //maximum lines to parse for detection, this can be higher for more precision
    $lines = count(file($fileName));
    if ($lines < $maxLines) {//if lines are less than the given maximum
        $maxLines = $lines;
    }

    //load lines
    foreach ($delA as $key => $del) {
        $rowNum = 0;
        if (($handle = fopen($fileName, "r")) !== false) {
            $linesA[$key] = array();
            while ((($data = fgetcsv($handle, 1000, $del)) !== false) && ($rowNum < $maxLines)) {
                $linesA[$key][] = count($data);
                $rowNum++;
            }

            fclose($handle);
        }
    }

    //count rows delimiter number discrepancy from each other
    foreach ($delA as $key => $del) {
        echo 'try for key=' . $key . ' delimeter=' . $del;
        $discr = 0;
        foreach ($linesA[$key] as $actNum) {
            if ($actNum == 1) {
                $resultA[$key] = 65535; //there is only one column with this delimeter in this line, so this is not our delimiter, set this discrepancy to high
                break;
            }

            foreach ($linesA[$key] as $actNum2) {
                $discr += abs($actNum - $actNum2);
            }

            //if its the real delimeter this result should the nearest to 0
            //because in the ideal (errorless) case all lines have same column number
            $resultA[$key] = $discr;
        }
    }

    var_dump($resultA);

    //select the discrepancy nearest to 0, this would be our delimiter
    $delRes = 65535;
    foreach ($resultA as $key => $res) {
        if ($res < $delRes) {
            $delRes = $res;
            $delKey = $key;
        }
    }

    $delimeter = $delA[$delKey];

    echo '$delimeter=' . $delimeter;

    //get rows
    $row = 0;
    $rowsA = array();
    if (($handle = fopen($fileName, "r")) !== false) {
        while (($data = fgetcsv($handle, 1000, $delimeter)) !== false) {
            $rowsA[$row] = Array();
            $num = count($data);
            for ($c = 0; $c < $num; $c++) {
                $rowsA[$row][] = trim($data[$c]);
            }
            $row++;
        }
        fclose($handle);
    }

    return $rowsA;
}

I would say this works 99.99% of the cases :)
The basic idea is, that number of valid delimiters should be the same line by line.
This script calculates delimiter count discrepancies between all lines.
Less discrepancy means more likely valid delimiter.

Putting it all together this function read rows and return it back as an array:

function readCSV($fileName)
{
    //detect these delimeters
    $delA = array(";", ",", "|", "\t");
    $linesA = array();
    $resultA = array();

    $maxLines = 20; //maximum lines to parse for detection, this can be higher for more precision
    $lines = count(file($fileName));
    if ($lines < $maxLines) {//if lines are less than the given maximum
        $maxLines = $lines;
    }

    //load lines
    foreach ($delA as $key => $del) {
        $rowNum = 0;
        if (($handle = fopen($fileName, "r")) !== false) {
            $linesA[$key] = array();
            while ((($data = fgetcsv($handle, 1000, $del)) !== false) && ($rowNum < $maxLines)) {
                $linesA[$key][] = count($data);
                $rowNum++;
            }

            fclose($handle);
        }
    }

    //count rows delimiter number discrepancy from each other
    foreach ($delA as $key => $del) {
        echo 'try for key=' . $key . ' delimeter=' . $del;
        $discr = 0;
        foreach ($linesA[$key] as $actNum) {
            if ($actNum == 1) {
                $resultA[$key] = 65535; //there is only one column with this delimeter in this line, so this is not our delimiter, set this discrepancy to high
                break;
            }

            foreach ($linesA[$key] as $actNum2) {
                $discr += abs($actNum - $actNum2);
            }

            //if its the real delimeter this result should the nearest to 0
            //because in the ideal (errorless) case all lines have same column number
            $resultA[$key] = $discr;
        }
    }

    var_dump($resultA);

    //select the discrepancy nearest to 0, this would be our delimiter
    $delRes = 65535;
    foreach ($resultA as $key => $res) {
        if ($res < $delRes) {
            $delRes = $res;
            $delKey = $key;
        }
    }

    $delimeter = $delA[$delKey];

    echo '$delimeter=' . $delimeter;

    //get rows
    $row = 0;
    $rowsA = array();
    if (($handle = fopen($fileName, "r")) !== false) {
        while (($data = fgetcsv($handle, 1000, $delimeter)) !== false) {
            $rowsA[$row] = Array();
            $num = count($data);
            for ($c = 0; $c < $num; $c++) {
                $rowsA[$row][] = trim($data[$c]);
            }
            $row++;
        }
        fclose($handle);
    }

    return $rowsA;
}
↙厌世 2024-10-06 14:25:51

我有同样的问题,我正在处理来自不同数据库的大量 CSV,不同的人以不同的方式提取到 CSV,有时对于同一数据集每次都不同......在我的转换库中简单地实现了这样的函数班级

protected function detectDelimiter() {
    $handle = @fopen($this->CSVFile, "r");
    if ($handle) {
        $line=fgets($handle, 4096);
        fclose($handle);            

        $test=explode(',', $line);
        if (count($test)>1) return ',';

        $test=explode(';', $line);
        if (count($test)>1) return ';';

        //.. and so on
    }
    //return default delimiter
    return $this->delimiter;
}

I have the same problem, I am dealing with a lot of CSV's from various databases, which various people extract to CSV in various ways, sometimes different each time for the same dataset ... Have simply implemented a function like this in my convert base class

protected function detectDelimiter() {
    $handle = @fopen($this->CSVFile, "r");
    if ($handle) {
        $line=fgets($handle, 4096);
        fclose($handle);            

        $test=explode(',', $line);
        if (count($test)>1) return ',';

        $test=explode(';', $line);
        if (count($test)>1) return ';';

        //.. and so on
    }
    //return default delimiter
    return $this->delimiter;
}
假情假意假温柔 2024-10-06 14:25:51

我做了这样的事情:

$line = fgetcsv($handle, 1000, "|");
if (isset($line[1]))
    {
    echo "delimiter is: |";
    $delimiter="|";
    }
    else
    {
    $line1 = fgetcsv($handle, 1000, ";");
    if (isset($line1[1]))
        {
        echo "delimiter is: ;";
        $delimiter=";";
        }
        else
        {
        echo "delimiter is: ,";
        $delimiter=",";
        }
    }

这只是检查读取一行后是否有第二列。

I made something like this:

$line = fgetcsv($handle, 1000, "|");
if (isset($line[1]))
    {
    echo "delimiter is: |";
    $delimiter="|";
    }
    else
    {
    $line1 = fgetcsv($handle, 1000, ";");
    if (isset($line1[1]))
        {
        echo "delimiter is: ;";
        $delimiter=";";
        }
        else
        {
        echo "delimiter is: ,";
        $delimiter=",";
        }
    }

This simply checks whether there is a second column after a line is read.

魂归处 2024-10-06 14:25:51

我有同样的问题。我的系统将从客户端接收 CSV 文件,但它可以使用“;”、“”或“”作为分隔符,我希望改进系统,这样客户端就不必知道哪个是(他们从来不知道)。

我搜索并找到了这个库:
https://github.com/parsecsv/parsecsv-for-php

非常好便于使用。

I am having the same issue. My system will recieve CSV files from the client but it could use ";", "," or " " as delimiter and I wnat to improve the system so the client dont have to know which is (They never do).

I search and found this library:
https://github.com/parsecsv/parsecsv-for-php

Very good and easy to use.

桃扇骨 2024-10-06 14:25:50

确定您认为可能的分隔符(例如 ;|)以及每次搜索它们在字符串中出现的频率 (substr_count)。然后选择出现次数最多的一个作为分隔符并分解

尽管这可能不是万无一失的,但它在大多数情况下都应该有效;)

Determine which delimiters you consider probable (like ,, ; and |) and for each search how often they occur in the string (substr_count). Then choose the one with most occurrences as the delimiter and explode.

Even though that might not be fail-safe it should work in most cases ;)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文