将大型 Oracle 表引入 Databricks 需要更长的时间
我有一个 Oracle 表,其中包含 5000 万条记录和大约 13-15 列,并且具有复合主键。我正在尝试使用oracle.jdbc.driver.OracleDriver将此表提取到databricks中。我尝试了以下两种不同的方法:
方法 1
val myDF = spark.read.jdbc(url = url,
table = "TableName",
columnName="PartionColumn",
lowerBound=lowerBound,
upperBound=UpperBound,
numPartitions=10,
connectionProperties)
myDF.write.option("mergeSchema", "true").format("delta").mode("overwrite").saveAsTable("TableName")
方法 2
val myDF = spark.read.jdbc(url, query, connectionProperties)
myDF.write.option("mergeSchema", "true").format("delta").mode("overwrite").saveAsTable("TableName")
这需要超过 25 个小时。
另外,当我尝试将数据加载到数据框中并显示它时,它不显示结果。
有人可以建议我在这里做错了什么吗?或者任何有关实现这一目标的最佳方法的帮助将不胜感激。
I have an oracle table containing 50 Million Records and about 13-15 columns and having composite primary key. I am trying to fetch this table into databricks using oracle.jdbc.driver.OracleDriver. I have tried two different approaches as below:
Approach 1
val myDF = spark.read.jdbc(url = url,
table = "TableName",
columnName="PartionColumn",
lowerBound=lowerBound,
upperBound=UpperBound,
numPartitions=10,
connectionProperties)
myDF.write.option("mergeSchema", "true").format("delta").mode("overwrite").saveAsTable("TableName")
Approach 2
val myDF = spark.read.jdbc(url, query, connectionProperties)
myDF.write.option("mergeSchema", "true").format("delta").mode("overwrite").saveAsTable("TableName")
And this takes more than 25 hours.
Also, when I try to load data into dataframe and display it, it doesn't display the result.
Can someone suggest me what I am doing wrong here ? Or any help on best approach to achieve this would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论