使用 Ruby 进行上传文件字符集转换

发布于 2024-08-12 11:15:09 字数 225 浏览 7 评论 0原文

我有一个应用程序,我们让客户将 csv 文件上传到我们的服务器。然后我们处理 csv 中的数据并将其放入数据库中。我们遇到了一些字符集问题,尤其是在处理 JSON 时,特别是一些未转换的 UTF-8 字符在 JSON 响应上破坏了 IE。

在我们开始处理之前,有没有办法将上传的 csv 文件转换为 UTF-8?有没有办法确定上传文件的字符编码?我玩过一些 iconv,但我们并不总是确定上传的文件将采用什么编码。谢谢。

I have an application where we're having our clients upload a csv file to our server. We then process and put the data from the csv into our database. We're running into some issues with char-sets especially when we're dealing with JSON, in particular some non-converted UTF-8 characters are breaking IE on JSON responses.

Is there a way to convert the uploaded csv file to UTF-8 before we start processing it? Is there a way to determine the character encoding of an uploaded file? I've played with iconv a bit but we're not always sure what encoding the uploaded file will have. Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

仅冇旳回忆 2024-08-19 11:15:09

这个解决方案可能并不理想,但应该可以完成工作。

一、配料:

  • chardet(sudo gem install chardet)
  • fastercsv(sudo gem install
    fastcsv

现在是实际代码(未测试):

require 'rubygems'
require 'UniversalDetector'
require 'fastercsv'
require 'iconv'

file_to_import = File.open("path/to/your.csv")
# determine the encoding based on the first 100 characters
chardet = UniversalDetector::chardet(file_to_import.read[0..100])
if chardet['confidence'] > 0.7
  charset = chardet['encoding']
else 
  raise 'You better check this file manually.'
end
file_to_import.each_line do |l| 
  converted_line = Iconv.conv('utf-8', charset, l)
  row = FasterCSV.parse(converted_line)[0]
  # do the business here
end

This solution might be not ideal, but should do the job.

First, the ingredients:

  • chardet (sudo gem install chardet)
  • fastercsv (sudo gem install
    fastercsv
    )

Now the actual code (not tested):

require 'rubygems'
require 'UniversalDetector'
require 'fastercsv'
require 'iconv'

file_to_import = File.open("path/to/your.csv")
# determine the encoding based on the first 100 characters
chardet = UniversalDetector::chardet(file_to_import.read[0..100])
if chardet['confidence'] > 0.7
  charset = chardet['encoding']
else 
  raise 'You better check this file manually.'
end
file_to_import.each_line do |l| 
  converted_line = Iconv.conv('utf-8', charset, l)
  row = FasterCSV.parse(converted_line)[0]
  # do the business here
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文