如何在 PHP 中检测字符串中的分隔符?
我很好奇如果你有一个字符串你会如何检测分隔符?
我们知道php可以使用explode()来分割字符串,这需要一个分隔符参数。
但是在将分隔符发送到爆炸函数之前检测分隔符的方法怎么样?
现在我只是将字符串输出给用户,然后他们输入分隔符。没关系,但我正在寻找适合我的模式识别应用程序。
我应该在字符串中寻找正则表达式来进行这种类型的模式识别吗?
编辑:我最初未能指定可能存在一组预期的分隔符。 CSV 中可能使用的任何分隔符。因此从技术上讲,任何人都可以使用任何字符来分隔 CSV 文件,但更有可能使用以下字符之一:逗号、分号、竖线和空格。
编辑2:这是我为“确定的分隔符”提出的可行解决方案。
$get_images = "86236058.jpg 86236134.jpg 86236134.jpg";
//Detection of delimiter of image filenames.
$probable_delimiters = array(",", " ", "|", ";");
$delimiter_count_array = array();
foreach ($probable_delimiters as $probable_delimiter) {
$probable_delimiter_count = substr_count($get_images, $probable_delimiter);
$delimiter_count_array[$probable_delimiter] = $probable_delimiter_count;
}
$max_value = max($delimiter_count_array);
$determined_delimiter_array = array_keys($delimiter_count_array, max($delimiter_count_array));
while( $element = each( $determined_delimiter_array ) ){
$determined_delimiter_count = $element['key'];
$determined_delimiter = $element['value'];
}
$images = explode("{$determined_delimiter}", $get_images);
I am curious if you have a string how would you detect the delimiter?
We know php can split a string up with explode() which requires a delimiter parameter.
But what about a method to detect the delimiter before sending it to explode function?
Right now I am just outputting the string to the user and they enter the delimiter. That's fine -- but I am looking for the application to pattern recognize for me.
Should I look to regular expressions for this type of pattern recognition in a string?
EDIT: I have failed to initially specify that there is a likely expected set of delimiters. Any delimiter that is probably used in a CSV. So technically anyone could use any character to delimit a CSV file but it is more probable to use one of the following characters: comma, semicolon, vertical bar and a space.
EDIT 2: Here is the workable solution I came up with for a "determined delimiter".
$get_images = "86236058.jpg 86236134.jpg 86236134.jpg";
//Detection of delimiter of image filenames.
$probable_delimiters = array(",", " ", "|", ";");
$delimiter_count_array = array();
foreach ($probable_delimiters as $probable_delimiter) {
$probable_delimiter_count = substr_count($get_images, $probable_delimiter);
$delimiter_count_array[$probable_delimiter] = $probable_delimiter_count;
}
$max_value = max($delimiter_count_array);
$determined_delimiter_array = array_keys($delimiter_count_array, max($delimiter_count_array));
while( $element = each( $determined_delimiter_array ) ){
$determined_delimiter_count = $element['key'];
$determined_delimiter = $element['value'];
}
$images = explode("{$determined_delimiter}", $get_images);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我想说这在 99.99% 的情况下都有效:)
基本思想是,有效分隔符的数量应该逐行相同。
该脚本计算所有行之间的分隔符计数差异。
差异越小意味着分隔符越可能有效。
将所有这些放在一起,该函数读取行并将其作为数组返回:
I would say this works 99.99% of the cases :)
The basic idea is, that number of valid delimiters should be the same line by line.
This script calculates delimiter count discrepancies between all lines.
Less discrepancy means more likely valid delimiter.
Putting it all together this function read rows and return it back as an array:
我有同样的问题,我正在处理来自不同数据库的大量 CSV,不同的人以不同的方式提取到 CSV,有时对于同一数据集每次都不同......在我的转换库中简单地实现了这样的函数班级
I have the same problem, I am dealing with a lot of CSV's from various databases, which various people extract to CSV in various ways, sometimes different each time for the same dataset ... Have simply implemented a function like this in my convert base class
我做了这样的事情:
这只是检查读取一行后是否有第二列。
I made something like this:
This simply checks whether there is a second column after a line is read.
我有同样的问题。我的系统将从客户端接收 CSV 文件,但它可以使用“;”、“”或“”作为分隔符,我希望改进系统,这样客户端就不必知道哪个是(他们从来不知道)。
我搜索并找到了这个库:
https://github.com/parsecsv/parsecsv-for-php
非常好便于使用。
I am having the same issue. My system will recieve CSV files from the client but it could use ";", "," or " " as delimiter and I wnat to improve the system so the client dont have to know which is (They never do).
I search and found this library:
https://github.com/parsecsv/parsecsv-for-php
Very good and easy to use.
确定您认为可能的分隔符(例如
、
、;
和|
)以及每次搜索它们在字符串中出现的频率 (substr_count
)。然后选择出现次数最多的一个作为分隔符并分解
。尽管这可能不是万无一失的,但它在大多数情况下都应该有效;)
Determine which delimiters you consider probable (like
,
,;
and|
) and for each search how often they occur in the string (substr_count
). Then choose the one with most occurrences as the delimiter andexplode
.Even though that might not be fail-safe it should work in most cases ;)