根据文档,
Inferschema:自动进化列类型。需要一个额外的
传递数据,默认情况下是错误的
,我知道Spark将读取CSV以确定数据类型并相应地分配。
我很想知道背景中发生了什么。
- 火花扫描整个CSV吗?
- 如果仅扫描示例数据,那么它将扫描多少行?
- Spark如何得出结论,因此列是特定数据类型并在InferSchema = true上分配的?
有人可以帮助我更好地理解它或分享一些链接!
谢谢。
According to documentation,
inferSchema: automatically infers column types. It requires one extra
pass over the data and is false by default
alright, I understood that spark will read the CSV to determine the data type and assigns accordingly.
I am curious to know what is happening in the background.
- Does spark scans whole csv?
- if it scans only a sample data, then how many rows will it scan?
- How does spark conclude that so and so column is of a particular datatype and assigns it on inferSchema = true?
Can someone help me to understand it better or share some links!
Thank you.
发布评论
评论(1)
默认情况下回答您的一些问题
Answering some of your questions