如何确定 CSV 文件字段是制表符分隔还是逗号分隔?
我试图确定 CSV 文件字段是制表符分隔还是逗号分隔?我需要 PHP 验证。
我怎样才能确定这一点?
I'm trying to determine if CSV file fields are tab delimited or comma delimited? I need PHP validation for this.
How can I determine this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
现在回答这个问题已经太晚了,但希望它能对某人有所帮助。
这是一个简单的函数,它将返回文件的分隔符。
使用此函数如下所示:
PS 我使用了 preg_split() 而不是explode(),因为explode('\t', $value) 不会给出正确的结果。
更新:感谢@RichardEB指出代码中的错误。我现在已经更新了。
It's too late to answer this question but hope it will help someone.
Here's a simple function that will return a delimiter of a file.
Use this function as shown below:
P.S I have used preg_split() instead of explode() because explode('\t', $value) won't give proper results.
UPDATE: Thanks for @RichardEB pointing out a bug in the code. I have updated this now.
这就是我所做的。
这不会 100% 有效,但它是一个不错的起点。至少,它将减少可能的分隔符的数量(使您的用户更容易选择正确的分隔符)。
Here's what I do.
This will not work 100% of the time, but it is a decent starting point. At minimum, it will reduce the number of possible delimiters (making it easier for your users to select the correct delimiter).
没有 100% 可靠的方法来确定这一点。您可以做的是,
There is no 100% reliable way to detemine this. What you can do is
我只是计算 CSV 文件中不同分隔符的出现次数,出现次数最多的分隔符可能应该是正确的分隔符:
I'm just counting the occurrences of the different delimiters in the CSV file, the one with the most should probably be the correct delimiter:
在我的情况下,用户提供 csv 文件,然后将其输入到 SQL 数据库中。他们可能会将 Excel 电子表格另存为逗号或制表符分隔文件。将电子表格转换为 SQL 的程序需要自动识别字段是制表符分隔还是逗号分隔。
许多 Excel csv 导出将字段标题作为第一行。标题测试不太可能包含逗号,除非作为分隔符。对于我的情况,我计算了第一行的逗号和制表符,并使用较大的数字来确定它是 csv 还是制表符
In my situation users supply csv files which are then entered into an SQL database. They may save an Excel Spreadsheet as comma or tab delimited files. A program converting the spreadsheet to SQL needs to automatically identify whether fields are tab separated or comma
Many Excel csv export have field headings as the first line. The heading test is unlikely to contain commas except as a delimiter. For my situation I counted the commas and tabs of the first line and use that with the greater number to determine if it is csv or tab
感谢您的所有投入,我使用您的技巧制作了我的:preg_split、fgetcsv、循环等。
但我实现了一些令人惊讶的东西,这里没有,使用 fgets 而不是读取整个文件,如果文件很重,效果会更好!
这是代码:
Thanks for all your inputs, I made mine using your tricks : preg_split, fgetcsv, loop, etc.
But I implemented something that was surprisingly not here, the use of fgets instead of reading the whole file, way better if the file is heavy!
Here's the code :
我使用 @Jay Bhatt 的解决方案来查找 csv 文件的分隔符,但它对我不起作用,因此我应用了一些修复和注释以使该过程更容易理解。
请参阅我的 @Jay Bhatt 函数版本:
我个人使用此函数来帮助使用 PHPExcel 自动解析文件,而且它工作起来又漂亮又快。
我建议解析至少 10 行,以使结果更加准确。我个人使用它有 100 行,它运行速度很快,没有延迟或滞后。解析的行越多,结果就越准确。
注意:这只是 @Jay Bhatt 对问题的解决方案的修改版本。所有积分均归@Jay Bhatt 所有。
I used @Jay Bhatt's solution for finding out a csv file's delimiter, but it didn't work for me, so I applied a few fixes and comments for the process to be more understandable.
See my version of @Jay Bhatt's function:
I personally use this function for helping automatically parse a file with PHPExcel, and it works beautifully and fast.
I recommend parsing at least 10 lines, for the results to be more accurate. I personally use it with 100 lines, and it is working fast, no delays or lags. The more lines you parse, the more accurate the result gets.
NOTE: This is just a modifed version of @Jay Bhatt's solution to the question. All credits goes to @Jay Bhatt.
当我输出 TSV 文件时,我使用 \t 编写选项卡,同样的方法会编写像 \n 这样的换行符,因此我猜一种方法可能如下:
我猜这可能不是正确的方式,因为你实际内容中也可以有制表符和逗号。这只是一个想法。使用正则表达式可能会更好,尽管我对此不太了解。
When I output a TSV file I author the tabs using \t the same method one would author a line break like \n so that being said I guess a method could be as follows:
I Guess this may not be the right manner, because you could have tabs and commas in the actual content as well. It's just an idea. Using regular expressions may be better, although I am not too clued up on that.
您可以简单地使用 fgetcsv(); PHP原生函数是这样的:
you can simply use the fgetcsv(); PHP native function in this way:
除了 c sv 文件始终以逗号分隔的简单答案 - 它在名称中,我认为您无法提出任何硬性规则。 TSV 和 CSV 文件的指定都足够宽松,您可以提出可以接受的文件。
(假设 \t == TAB)
你如何决定这是 TSV 还是 CSV?
Aside from the trivial answer that c sv files are always comma-separated - it's in the name, I don't think you can come up with any hard rules. Both TSV and CSV files are sufficiently loosely specified that you can come up with files that would be acceptable as either.
(Assuming \t == TAB)
How would you decide whether this is TSV or CSV?
您还可以使用 fgetcsv (http://php.net/manual/en/function.fgetcsv .php)向其传递一个分隔符参数。如果函数返回 false,则意味着 $delimiter 参数不是
检查分隔符是否为“;”的正确样本。
You also can use fgetcsv (http://php.net/manual/en/function.fgetcsv.php) passing it a delimiter parameter. If the function returns false it means that the $delimiter parameter wasn't the right one
sample to check if the delimiter is ';'
来点简单的怎么样?
How about something simple?
这是我的解决方案。
如果你知道你期望有多少列,它就会起作用。
最后,分隔符是 $actual_separation_character
This is my solution.
Its works if you know how many columns you expect.
Finally, the separator character is the $actual_separation_character
如果您有一个非常大的文件示例(以 GB 为单位),请在前几行开头,放入临时文件。在vi中打开临时文件
If you have a very large file example in GB, head the first few line, put in a temporary file. Open the temporary file in vi
我回答这个问题的最简单方法是在纯文本编辑器或 TextMate 中打开它。
Easiest way I answer this is open it in a plain text editor, or in TextMate.