ucanaccess继续重复列名作为数据

发布于 2025-02-10 09:29:48 字数 864 浏览 2 评论 0原文

我在ucanaccess 5.0.1 in databricks 9.1lts(spark 3.1.2,scala 2.1.2),出于任何原因,我使用以下代码来阅读 DB表它一直将列名称作为记录本身(我尝试添加更多记录并获得相同的结果)

单个记录访问

ID  Field1  Field2
2   key1    column2
3   key2    column2-1

connectionProperties = {
  "driver" : "net.ucanaccess.jdbc.UcanaccessDriver"
}
 
url = "jdbc:ucanaccess:///dbfs/mnt/internal/Temporal/Database1.accdb"
df = spark.read.jdbc(url=url, table="Table1", properties=connectionProperties)

结果看起来这样:

df.printSchema()
df.count()

root
 |-- Field1: string (nullable = true)
 |-- Field2: string (nullable = true)
 |-- ID: string (nullable = true)
Out[21]: 2

df.show()
+------+------+---+
|Field1|Field2| ID|
+------+------+---+
|Field1|Field2| ID|
|Field1|Field2| ID|
+------+------+---+

有什么想法/建议吗?

I was using UcanAccess 5.0.1 in databricks 9.1LTS (Spark 3.1.2, Scala 2.1.2), and for whatever reasons when I use the following code to read in a single record Access db table it keeps treating the column names as the record itself (I've tried adding more records and got the same results.)

The Access db table looks like this (2 records):

ID  Field1  Field2
2   key1    column2
3   key2    column2-1

The python code looks like this:

connectionProperties = {
  "driver" : "net.ucanaccess.jdbc.UcanaccessDriver"
}
 
url = "jdbc:ucanaccess:///dbfs/mnt/internal/Temporal/Database1.accdb"
df = spark.read.jdbc(url=url, table="Table1", properties=connectionProperties)

And the result looks like this:

df.printSchema()
df.count()

root
 |-- Field1: string (nullable = true)
 |-- Field2: string (nullable = true)
 |-- ID: string (nullable = true)
Out[21]: 2

df.show()
+------+------+---+
|Field1|Field2| ID|
+------+------+---+
|Field1|Field2| ID|
|Field1|Field2| ID|
+------+------+---+

Any idea/suggestion?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

高速公鹿 2025-02-17 09:29:48

如果您的数据在第一行中具有列名,则可以尝试header = true,将第一行设置为列标题。

示例代码 -

df = spark.read.jdbc( url = url, table="Table1", header = true, properties= connectionProperties)

但是,如果您的数据没有列标题,则需要明确定义列名,然后将其分配为列标题。

示例代码 -

columns = ["column_name_1"," column_name_2"," column_name_3"]
df = spark.read.jdbc( url = url, table="Table1”, schema=columns, properties= connectionProperties)

您也可以参考此答案 Alberto Bonsanto

参考 - https://learn.microsoft.com/en-comrosoft.com/en-en-us/azure/ databricks/data/data-ources/sql-databases#read-data-from-jdbc

If your data has column names in a first row, you can try header = True, to set first row as column headers.

Sample code –

df = spark.read.jdbc( url = url, table="Table1", header = true, properties= connectionProperties)

But if your data does not have column headers, you need to explicitly define column names and then assign them as column headers.

Sample code –

columns = ["column_name_1"," column_name_2"," column_name_3"]
df = spark.read.jdbc( url = url, table="Table1”, schema=columns, properties= connectionProperties)

You can also refer this answer by Alberto Bonsanto

Reference - https://learn.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases#read-data-from-jdbc

堇色安年 2025-02-17 09:29:48

事实证明,JDBC代码中存在一个错误([https://stackoverflow.com/questions/631777736/spark-read-ap-read-as-jdbc-redurn-redurn-rown-all-rows-as-columns-name-name])

我添加了以下内容代码,现在UCanaccess驱动程序正常工作:

%scala
import org.apache.spark.sql.jdbc.JdbcDialect
import org.apache.spark.sql.jdbc.JdbcDialects
private case object HiveDialect extends JdbcDialect {
  override def canHandle(url : String): Boolean = url.startsWith("jdbc:ucanaccess")
  override def quoteIdentifier(colName: String): String = {
    colName.split('.').map(part => s"`$part`").mkString(".")
  }
}

JdbcDialects.registerDialect(HiveDialect)

然后Display(DF)将显示

|Field1 |Field2 |ID     |
|:------|:------|:----- |
|key1   |column2 |  2|
|key2   |column2-1| 3|

turns out that there was a bug in the jdbc code ([https://stackoverflow.com/questions/63177736/spark-read-as-jdbc-return-all-rows-as-columns-name])

I added the following code and now the ucanaccess driver works fine:

%scala
import org.apache.spark.sql.jdbc.JdbcDialect
import org.apache.spark.sql.jdbc.JdbcDialects
private case object HiveDialect extends JdbcDialect {
  override def canHandle(url : String): Boolean = url.startsWith("jdbc:ucanaccess")
  override def quoteIdentifier(colName: String): String = {
    colName.split('.').map(part => s"`$part`").mkString(".")
  }
}

JdbcDialects.registerDialect(HiveDialect)

Then display(df) would show

|Field1 |Field2 |ID     |
|:------|:------|:----- |
|key1   |column2 |  2|
|key2   |column2-1| 3|
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文