Databricks SQL错误:无法评估在线表定义中的表达式外部表达式

发布于 2025-02-10 05:27:29 字数 908 浏览 2 评论 0原文

我加入了两个表,并试图获得两个时间戳字段的最大时间戳值。这是我的SQL语句:

SELECT 
   rts.DataLakeModified_DateTime, 
   rtt.DataLakeModified_DateTime,
   (
     SELECT MAX(DataLakeModifiedDateTime)
     FROM VALUES(rts.DataLakeModified_DateTime), (rtt.DataLakeModified_DateTime) AS AllDates (DataLakeModifiedDateTime)
   ) AS DataLakeModifiedDateTime
FROM RetailTransactionSalesTrans_Full rts
LEFT JOIN RetailTransactionTable_Full rtt
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID
        AND rtt.CHANNEL = rts.CHANNEL
        AND rtt.STORE = rts.STORE
        AND rtt.TERMINAL = rts.TERMINALID

但是它有一个错误:

Error in SQL statement: AnalysisException: cannot evaluate expression outer(rts.DataLakeModified_DateTime) in inline table definition; line 6 pos 17

我正在使用Spark3.2.1和MSSQL数据库。而且我使用104 LTS群集在Azure Databricks笔记本上运行它(包括Apache Spark3.2.1,Scala 2.12)。

您是否有解决此问题的解决方案?

I joined two tables and tried to get a max timestamp value of two timestamp fields. This is my SQL statement:

SELECT 
   rts.DataLakeModified_DateTime, 
   rtt.DataLakeModified_DateTime,
   (
     SELECT MAX(DataLakeModifiedDateTime)
     FROM VALUES(rts.DataLakeModified_DateTime), (rtt.DataLakeModified_DateTime) AS AllDates (DataLakeModifiedDateTime)
   ) AS DataLakeModifiedDateTime
FROM RetailTransactionSalesTrans_Full rts
LEFT JOIN RetailTransactionTable_Full rtt
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID
        AND rtt.CHANNEL = rts.CHANNEL
        AND rtt.STORE = rts.STORE
        AND rtt.TERMINAL = rts.TERMINALID

But it has an error:

Error in SQL statement: AnalysisException: cannot evaluate expression outer(rts.DataLakeModified_DateTime) in inline table definition; line 6 pos 17

I'm using Spark3.2.1 and MSSQL database. And I'm running it on Azure Databricks notebook using 104 LTS cluster (include Apache Spark3.2.1, Scala 2.12).

Do you have any solution to solve this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

回眸一遍 2025-02-17 05:27:29

错误似乎是因为以下SQL语句的使用以获取最大时间戳值。

SELECT MAX(DataLakeModifiedDateTime) 
     FROM VALUES(rts.DataLakeModified_DateTime), (rtt.DataLakeModified_DateTime) AS AllDates (DataLakeModifiedDateTime) 
  • 我尝试了2个示例表tb1tb2具有时间戳字段t1t2。桌子看。

  • 当我使用与使用相同的语法获得最大的时间邮票值时,它给出了相同的错误。

  • 相反,当我使用case语句尝试时,它给出了所需的输出。

因此,请修改您的SQL查询,如下所示:

SELECT  
   rts.DataLakeModified_DateTime,  
   rtt.DataLakeModified_DateTime, 
   ( 
     CASE WHEN rts.DataLakeModified_DateTime > rtt.DataLakeModified_DateTime THEN rts.DataLakeModified_DateTime ELSE rtt.DataLakeModified_DateTime END 
) AS DataLakeModifiedDateTime 
FROM RetailTransactionSalesTrans_Full rts 
LEFT JOIN RetailTransactionTable_Full rtt 
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID 
        AND rtt.CHANNEL = rts.CHANNEL 
        AND rtt.STORE = rts.STORE 
        AND rtt.TERMINAL = rts.TERMINALID 

The error seems to be because of the usage of the following SQL statement to get the maximum timestamp value.

SELECT MAX(DataLakeModifiedDateTime) 
     FROM VALUES(rts.DataLakeModified_DateTime), (rtt.DataLakeModified_DateTime) AS AllDates (DataLakeModifiedDateTime) 
  • I tried with 2 sample tables tb1 and tb2 which have timestamp fields t1 and t2.The following is how the tables look.

enter image description here

  • When I used to get maximum time stamp values using the same syntax that you used, it gave the same error.

enter image description here

  • Instead, when I tried it using the CASE statement, it gave the desired output.

enter image description here

So, modify your SQL query as shown below:

SELECT  
   rts.DataLakeModified_DateTime,  
   rtt.DataLakeModified_DateTime, 
   ( 
     CASE WHEN rts.DataLakeModified_DateTime > rtt.DataLakeModified_DateTime THEN rts.DataLakeModified_DateTime ELSE rtt.DataLakeModified_DateTime END 
) AS DataLakeModifiedDateTime 
FROM RetailTransactionSalesTrans_Full rts 
LEFT JOIN RetailTransactionTable_Full rtt 
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID 
        AND rtt.CHANNEL = rts.CHANNEL 
        AND rtt.STORE = rts.STORE 
        AND rtt.TERMINAL = rts.TERMINALID 
深居我梦 2025-02-17 05:27:29

我简单地找到了另一个具有最大功能的解决方案。

GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime 

看起来比...然后...子句。

最后,SQL语句是:

SELECT  
   rts.DataLakeModified_DateTime,  
   rtt.DataLakeModified_DateTime, 
   GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime 
FROM RetailTransactionSalesTrans_Full rts 
LEFT JOIN RetailTransactionTable_Full rtt 
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID 
        AND rtt.CHANNEL = rts.CHANNEL 
        AND rtt.STORE = rts.STORE 
        AND rtt.TERMINAL = rts.TERMINALID 

它正常工作。
谢谢你!

I found another solution with GREATEST function, simply.

GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime 

It looks like it's simply than WHEN ... THEN ... clause.

Finally, the SQL statement is :

SELECT  
   rts.DataLakeModified_DateTime,  
   rtt.DataLakeModified_DateTime, 
   GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime 
FROM RetailTransactionSalesTrans_Full rts 
LEFT JOIN RetailTransactionTable_Full rtt 
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID 
        AND rtt.CHANNEL = rts.CHANNEL 
        AND rtt.STORE = rts.STORE 
        AND rtt.TERMINAL = rts.TERMINALID 

It worked correctly.
Thank you!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文