RegEx - Java Split 命令解析 Csv 文件
我有一个以下格式的 CSV
11000,Christopher,Nolan,MR.,Inception,25993,France,"Lefoullon,Paris",920,Director,*461-7755,33-461-7755,12175,"O'Horner, James",12300,"Glebova, Nathalie",,[email protected],Capital,NEW
在此链接中@Mark Byers 和@R。 Bemrose 建议 String[] tokens = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1 );
但是,如果您仔细观察上面的 CSV,您会发现一个名为 "O'Horner, James"
的名字会导致问题,并且会抛出 ORA-0917: Missing逗号
错误。有没有办法避免它或者必须纠正 reg-ex
?
I have a CSV in this below format
11000,Christopher,Nolan,MR.,Inception,25993,France,"Lefoullon,Paris",920,Director,*461-7755,33-461-7755,12175,"O'Horner, James",12300,"Glebova, Nathalie",,[email protected],Capital,NEW
Regarding Java Split Command Parsing Csv File
In this link @Mark Byers and @R. Bemrose suggested String[] tokens = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);
But if you notice carefully in the above CSV, you will find a name with "O'Horner, James"
is causing problems and its throwing ORA-0917: missing comma
error. Is there a way to avoid it or the reg-ex has to be corrected?
Kinda confused :-o
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
警告:以下所有内容都是无意义的猜测和猜测,因为您没有提供任何验证代码,并且我的 palantir 正在车间进行预防性维护。
思路:早期的
“Lefoullon,Paris”
不会有问题,但“O'Horner,James”
确实有问题......这表明撇号可能是问题的(无害的)原因。假设:该字段已成功从 CSV 中提取为
O'Horner, James
...请注意,撇号对于 CSV 来说并不是特殊的(并且不会出现在那个宏伟的 [请参阅注释] 正则表达式中)。然而,撇号对于 SQL 来说很重要;撇号在 SQL 中引用字符串文字,并且数据中的撇号必须加倍。
像这样:
INSERT INTO ..... VALUES(...,'O''Horner, James', .. .);
如果您在 SQL 接口中使用参数替换(您应该这样做),那么系统会将您的数据字段转换为有效的 SQL 常量。否则
编写代码来修复每个字符串字段(将每次出现的
'
替换为''
,然后将结果包装在'
前后)google(“SQL注入”),使用参数替换读取、悔改并重写您的代码
注意:“宏伟”,如“C'est magnifique, mais ce n'est pas la guerre”。为了理智起见,请使用 CSV 解析器。
Caveat: all of the following is idle speculation and guesswork, as you haven't supplied any code for verification, and my palantir is in the workshop for preventative maintenance.
Train of thought: You don't get a problem with the earlier
"Lefoullon,Paris"
but you do get a problem with"O'Horner, James"
... this suggests that the apostrophe is probably the (innocent) cause of the problem.Hypothesis: The field is successfully extracted from the CSV as
O'Horner, James
... note that apostrophe is NOT special to CSV (and doesn't occur in that magnificent [see note] regex).However the apostrophe is significant to SQL; apostrophes quote string literals in SQL, and apostrophes in the data must be doubled.
Like this:
INSERT INTO ..... VALUES(...,'O''Horner, James', ...);
If you are using parameter substitution in your SQL interface (as you should be), converting your data fields into valid SQL constants will be done for you. Otherwise
write code to fix each string field (replace every occurrence of
'
by''
then wrap the result in'
front and back)google("SQL injection"), read, repent, and rewrite your code using parameter substitution
Note: "magnificent" as in "C'est magnifique, mais ce n'est pas la guerre". Use a CSV parser, for sanity's sake.