如何为 FasterCSV 预处理 CSV 数据?
我们在为我们的小应用程序创建批量上传功能时遇到了很多问题。我们使用 FasterCSV gem 将数据上传到 MySQL 数据库,但 Faster CSV 的要求非常不稳定且精确,以至于它经常因格式错误的 CSV 错误和超时错误而中断。
csv 文件通常是通过用户从其网站或 Microsoft Word 文档粘贴文本来创建的,因此期望数据中永远不会出现智能引号或重音等奇怪字符是不合理的。此外,用户也无法轻易确定他们的数据对于 FasterCSV 来说是否足够完美。我们需要找到一种方法来自动为他们修复它。
在让 FasterCSV gem 处理数据之前,是否有一个好的方法或可靠的工具来预处理 CSV 数据来修复数据中的任何问题?
We're having a significant number of problems creating a bulk upload function for our little app. We're using the FasterCSV gem to upload data to a MySQL database but he Faster CSV is so twitchy and precise in its requirements that it constantly breaks with malformed CSV errors and time out errors.
The csv files are generally created by users' pasting text from their web sites or from Microsoft Word docs so it is not reasonable to expect that there will never be odd characters like smart quotes or accents in the data. Also users aren't going to be readily able to identify whether their data is perfect enough for FasterCSV or not. We need to find a way to fix it for them automatically.
Is there a good way or a reliable tool for pre-processing CSV data to fix any nits in the data before having the FasterCSV gem process it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
尝试标准库中的 CSV 库。它对格式错误的 CSV 更加宽容:
http://ruby-doc.org/stdlib/libdoc/csv/ rdoc/index.html
Try the CSV library in the standard lib. It is more forgiving about malformed CSV:
http://ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
创建 FasterCsv 解析器的新实例时,您可以将文件的编码类型传递到 FasterCSV 选项中。 (请参阅此处的文档:http://fastercsv.rubyforge.org/classes/FasterCSV.html #M000018)
将其设置为 utf-8 或 Microsoft 编码应该可以让它通过大多数狡猾的额外字符,使其能够实际解析为您所需的字符串......然后您可以根据自己的喜好清理字符串。
文档中还有一些关于“转换器”的内容,您可以传入 - 尽管这更多地旨在转换数字或日期类型,但您可以将其用于 gsub 来处理不可靠的字符。
You can pass the file's encoding type into the FasterCSV options when creating a new instance of the FasterCsv parser. (see docs here: http://fastercsv.rubyforge.org/classes/FasterCSV.html#M000018)
Setting it to utf-8 or the Microsoft encoding should get it past most dodgy extra characters, allowing it to actually parse into your required strings... then you can clean the strings to your heart's content.
There's also something in the docs about "converters" that you can pass in - though this is aimed more at converting, say, numeric or date types, you ight be able to use it to gsub for the dodgy chars.
尝试 smarter_csv Gem - 您可以将一个块传递给它的过程方法并在使用之前清理数据
https:// github.com/tilo/smarter_csv
Try the smarter_csv Gem - you can pass a block to it's proces method and clean-up data before it is used
https://github.com/tilo/smarter_csv