我有一个CSV文件,其中包含以下
ID,名称,comp_name
1,Raj,“ Rajeswari,Motors”
2,Shiva,Shiva,Amber Kings,
我的要求是读取此文件以激发RDD,然后用COMA定界符分开MAP。
但是给出代码将所有昏迷分开
val splitdata = data.map(_。split(“,”)
我不想用双引号拆分昏迷。
但是我不想使用正则表达式。有什么简单的有效方法可以实现这一目标吗?
另外,第二个要求在上面的CSV文件上读取以引发数据框架并显示,但我需要在结果中查看双引号
输出应该看起来像
ID名称Comp_name
1 Raj“ Rajeswari,Motors”
2 Shiva Amber Kings
双引号未正常显示,但有任何方法吗?
我正在使用Spark 2.4 / Scala 2.11 / Eclipse IDE
I have a CSV file with data as below
id,name,comp_name
1,raj,"rajeswari,motors"
2,shiva,amber kings
my requirement is to read this file to spark RDD, then do map split with coma delimiter.
but giving code this splits all comas
val splitdata = data.map(_.split(",")
i do not want to split coma with in double quotes.
But i DO NOT want to use REGEX expression. is there any simple efficient method to acheive this?
Also 2nd requirement is read above csv file to Spark Dataframe and show it but i need to see double quotes in result
output should look like
id name comp_name
1 raj "rajeswari,motors"
2 shiva amber kings
double quotes are not shown normally but is any way to do it?
I am using spark 2.4 / scala 2.11 / Eclipse IDE
发布评论
评论(1)
我建议尝试使用DataFrame而不是RDD?
不会有直接的方法,您必须在下面使用这样的正则忽略“忽略”,“在“
您会得到这样的输出
”之间,“ Rajeswari,Motors”
Amber Kings
参考此帖子以了解表达式: ssplitting in Comma of comma offe offe offe offe offe offe offe offe
I would suggest try using dataframe instead of RDD?
There won't be direct way, you have to use regex like this below to ignore "," enclosed between ""
You'd get output like this
"rajeswari,motors"
amber kings
Refer this post for understanding expression : Splitting on comma outside quotes