mysqlimport 和双引号
我们有一个很大的制表符分隔文本文件(大约 120,000 条记录,50MB),我们正在尝试使用 mysqlimport 将其推入 MySQL。 有些字段用双引号括起来,有些则没有。 我们使用的是 fields-optionally-enheld-by='\"'
开关,但问题是某些字段值本身包含双引号(表示英寸),因此分隔的字段值可能会是“ABCDEF19”。 合理?
我们无法控制文件的来源,因此无法更改其中的格式。 我尝试删除 fields-optically-end-by
开关,但随后导入了值周围的双引号。
值中带有引号的记录变得严重混乱。 有没有一种方法可以告诉 mysqlimport 有些字段可以选择用引号引起来,但仍可能包含引号? 我们认为也许可以进行全局搜索和替换来转义字段值中的双引号? 或者还有其他建议吗?
We have a large tab-delimited text file (approximately 120,000 records, 50MB) that we're trying to shove into MySQL using mysqlimport. Some fields are enclosed in double-quotes, some not. We're using the fields-optionally-enclosed-by='\"'
switch, but the problem is some of the field values themselves contain double-quotes (indicating inches) so the delimited field value might be something "ABCDEF19"". Make sense?
We have no control over the source of the file, so we can't change the formatting there. I tried removing the fields-optionally-enclosed-by
switch, but then the double-quotes that surround the values are imported.
he records with quotes in the values are getting seriously messed up. Is there a way we can tell mysqlimport that some fields are optionally enclosed by quotes, but may still contain quotes? We've thought maybe a global search and replace to escape the double-quotes in field values? Or any other suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您的数据在字段引用正文中包含引号,而没有以某种方式对其进行分隔,那么您就会遇到问题。 您不能保证 mysqlimport 会正确执行此操作。
在尝试以这种方式插入数据之前,请先对数据进行处理。
幸运的是,它是制表符分隔的,因此您可以运行正则表达式将引号替换为分隔版本,然后告诉 mysqlimport 分隔符。
If your data is including quotes inside of the body of the field quote without delimiting that somehow, you have a problem. You can't guarantee that mysqlimport will do this properly.
Massage the data first before trying to insert it in this way.
Luckily, it is tab-delimited, so you can run a regex to replace the quotes with a delimited version and then tell mysqlimport the delimiter.
您可以使用引号导入它(删除字段可选封闭开关),然后运行检查该值是否在开头和结尾有双引号(假设没有一个值在开头有英寸),然后截断从开头和结尾各 1 个字符,以删除导入时获得的额外引号。
编辑:在阅读 kekoav 的回复后,我必须同意,如果您能够在导入之前操作文件,那将是一个更明智的选择,但如果您被迫之后删除引号,您可以使用类似这样的内容:
对于每个 'column ' 在'表'中
You could import it with the quotes (fields-optionally-enclosed-by switch removed) and then run a check where if the value has double quotes at the beginning and end (assuming none of the values have inches at the beginning) then truncate by 1 character from the beginning and end to remove the extra quotes you got from importing.
EDIT: after reading kekoav's response I have to agree that if you are able to manipulate the file before importing that would be a much wiser option, but if you are forced to remove quotes afterwards, you could use something like this:
for every 'column' in 'table'