计算客户在SQL中最后两个订单中的订单数量少于当前订单值的订单数?

发布于 2025-02-13 01:54:45 字数 2015 浏览 0 评论 0原文

认为我有一张桌子

客户
A496
A2263
A3219
A5027
A5799
A6997
72 93 B8222 22
B222
B8764
C 22 60 C2260
C302 23
2 224874
C4968
C5511
C8579

我需要计算每个客户的订单数量 小于他过去两个订单的订单值订单,即

客户数量计数
A4960
A22630
A32190
A50271
A57992
A72931
B69970
B82220
B87641
C22600
C302 230
C48742
C49681
C55110
C85792

Consider that I have a table

CustomerDayAmount
A496
A2263
A3219
A5027
A5799
A7293
B6997
B8222
B8764
C2260
C3022
C4874
C4968
C5511
C8579

I need to calculate the number of orders for every customer whose value is less than the order value of the present order among his past two orders, i.e

CustomerDayAmountCount
A4960
A22630
A32190
A50271
A57992
A72931
B69970
B82220
B87641
C22600
C30220
C48742
C49681
C55110
C85792

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

香橙ぽ 2025-02-20 01:54:46

由于您仅是 2行,因此您可以预先计算lag值,并利用case语句来生成所需的逻辑。

上更具动态和灵活性

它在本质

s = StringIO("""
Customer    Day Amount
A   4   96
A   22  63
A   32  19
A   50  27
A   57  99
A   72  93
B   69  97
B   82  22
B   87  64
C   22  60
C   30  22
C   48  74
C   49  68
C   55  11
C   85  79
"""
)

df = pd.read_csv(s,delimiter='\t')

sparkDF = sql.createDataFrame(df)

sparkDF.show()

+--------+---+------+
|Customer|Day|Amount|
+--------+---+------+
|       A|  4|    96|
|       A| 22|    63|
|       A| 32|    19|
|       A| 50|    27|
|       A| 57|    99|
|       A| 72|    93|
|       B| 69|    97|
|       B| 82|    22|
|       B| 87|    64|
|       C| 22|    60|
|       C| 30|    22|
|       C| 48|    74|
|       C| 49|    68|
|       C| 55|    11|
|       C| 85|    79|
+--------+---+------+

否则,如果您希望

sparkDF.registerTempTable("TB1")

sql.sql("""
SELECT
    CUSTOMER,
    DAY,
    AMOUNT,
    CASE 
        WHEN AMOUNT > LAG_1 AND AMOUNT > LAG_2 THEN 2
        WHEN AMOUNT > LAG_1 OR AMOUNT > LAG_2 THEN 1
        ELSE 0
    END as Count
FROM (
    SELECT
        CUSTOMER,
        DAY,
        AMOUNT,
        LAG(AMOUNT,1) OVER(PARTITION BY CUSTOMER ORDER BY DAY) as lag_1,
        LAG(AMOUNT,2) OVER(PARTITION BY CUSTOMER ORDER BY DAY) as lag_2
    FROM TB1
)
ORDER BY 1,2
;
""").show()

+--------+---+------+-----+
|CUSTOMER|DAY|AMOUNT|Count|
+--------+---+------+-----+
|       A|  4|    96|    0|
|       A| 22|    63|    0|
|       A| 32|    19|    0|
|       A| 50|    27|    1|
|       A| 57|    99|    2|
|       A| 72|    93|    1|
|       B| 69|    97|    0|
|       B| 82|    22|    0|
|       B| 87|    64|    1|
|       C| 22|    60|    0|
|       C| 30|    22|    0|
|       C| 48|    74|    2|
|       C| 49|    68|    1|
|       C| 55|    11|    0|
|       C| 85|    79|    2|
+--------+---+------+-----+

Since you are only PRECEEDING 2 ROWS , you can pre-compute the LAG values and utilise a CASE statement to generate the required logic.

Else if you want it to be more dynamic and flexible in nature you can utilise Shubham Sharma's answer

Data Preparation

s = StringIO("""
Customer    Day Amount
A   4   96
A   22  63
A   32  19
A   50  27
A   57  99
A   72  93
B   69  97
B   82  22
B   87  64
C   22  60
C   30  22
C   48  74
C   49  68
C   55  11
C   85  79
"""
)

df = pd.read_csv(s,delimiter='\t')

sparkDF = sql.createDataFrame(df)

sparkDF.show()

+--------+---+------+
|Customer|Day|Amount|
+--------+---+------+
|       A|  4|    96|
|       A| 22|    63|
|       A| 32|    19|
|       A| 50|    27|
|       A| 57|    99|
|       A| 72|    93|
|       B| 69|    97|
|       B| 82|    22|
|       B| 87|    64|
|       C| 22|    60|
|       C| 30|    22|
|       C| 48|    74|
|       C| 49|    68|
|       C| 55|    11|
|       C| 85|    79|
+--------+---+------+

SparkSQL

sparkDF.registerTempTable("TB1")

sql.sql("""
SELECT
    CUSTOMER,
    DAY,
    AMOUNT,
    CASE 
        WHEN AMOUNT > LAG_1 AND AMOUNT > LAG_2 THEN 2
        WHEN AMOUNT > LAG_1 OR AMOUNT > LAG_2 THEN 1
        ELSE 0
    END as Count
FROM (
    SELECT
        CUSTOMER,
        DAY,
        AMOUNT,
        LAG(AMOUNT,1) OVER(PARTITION BY CUSTOMER ORDER BY DAY) as lag_1,
        LAG(AMOUNT,2) OVER(PARTITION BY CUSTOMER ORDER BY DAY) as lag_2
    FROM TB1
)
ORDER BY 1,2
;
""").show()

+--------+---+------+-----+
|CUSTOMER|DAY|AMOUNT|Count|
+--------+---+------+-----+
|       A|  4|    96|    0|
|       A| 22|    63|    0|
|       A| 32|    19|    0|
|       A| 50|    27|    1|
|       A| 57|    99|    2|
|       A| 72|    93|    1|
|       B| 69|    97|    0|
|       B| 82|    22|    0|
|       B| 87|    64|    1|
|       C| 22|    60|    0|
|       C| 30|    22|    0|
|       C| 48|    74|    2|
|       C| 49|    68|    1|
|       C| 55|    11|    0|
|       C| 85|    79|    2|
+--------+---+------+-----+
ˇ宁静的妩媚 2025-02-20 01:54:46

我们可以在否则的情况下做到这一点

from pyspark.sql import functions as F
from pyspark.sql import Window
schema="Customer string,day int,Amount int"

data=[('A',4,96),('A',22,63),('A',32,19),('A',50,27),('A',57,99),('A',72,93),('B',69,97),('B',82,22),('B',87,64),('C',22,60),('C',30,22),('C',48,74),('C',49,68),('C',55,11),('C',85,79)]
dql=spark.createDataFrame(data,schema)
dql.withColumn("count",when((col('Amount')>F.lag("Amount",2).over(Window.partitionBy("Customer").orderBy("day"))) & (col('Amount')>F.lag("Amount",1).over(Window.partitionBy("Customer").orderBy("day"))) ,2).when((col('Amount')>F.lag("Amount",2).over(Window.partitionBy("Customer").orderBy("day"))) | (col('Amount')>F.lag("Amount",1).over(Window.partitionBy("Customer").orderBy("day"))) ,1).otherwise("0")).show()

#output
+--------+---+------+-----+
|Customer|day|Amount|count|
+--------+---+------+-----+
|       A|  4|    96|    0|
|       A| 22|    63|    0|
|       A| 32|    19|    0|
|       A| 50|    27|    1|
|       A| 57|    99|    2|
|       A| 72|    93|    1|
|       B| 69|    97|    0|
|       B| 82|    22|    0|
|       B| 87|    64|    1|
|       C| 22|    60|    0|
|       C| 30|    22|    0|
|       C| 48|    74|    2|
|       C| 49|    68|    1|
|       C| 55|    11|    0|
|       C| 85|    79|    2|
+--------+---+------+-----+

We can do this with when otherwise and lag

from pyspark.sql import functions as F
from pyspark.sql import Window
schema="Customer string,day int,Amount int"

data=[('A',4,96),('A',22,63),('A',32,19),('A',50,27),('A',57,99),('A',72,93),('B',69,97),('B',82,22),('B',87,64),('C',22,60),('C',30,22),('C',48,74),('C',49,68),('C',55,11),('C',85,79)]
dql=spark.createDataFrame(data,schema)
dql.withColumn("count",when((col('Amount')>F.lag("Amount",2).over(Window.partitionBy("Customer").orderBy("day"))) & (col('Amount')>F.lag("Amount",1).over(Window.partitionBy("Customer").orderBy("day"))) ,2).when((col('Amount')>F.lag("Amount",2).over(Window.partitionBy("Customer").orderBy("day"))) | (col('Amount')>F.lag("Amount",1).over(Window.partitionBy("Customer").orderBy("day"))) ,1).otherwise("0")).show()

#output
+--------+---+------+-----+
|Customer|day|Amount|count|
+--------+---+------+-----+
|       A|  4|    96|    0|
|       A| 22|    63|    0|
|       A| 32|    19|    0|
|       A| 50|    27|    1|
|       A| 57|    99|    2|
|       A| 72|    93|    1|
|       B| 69|    97|    0|
|       B| 82|    22|    0|
|       B| 87|    64|    1|
|       C| 22|    60|    0|
|       C| 30|    22|    0|
|       C| 48|    74|    2|
|       C| 49|    68|    1|
|       C| 55|    11|    0|
|       C| 85|    79|    2|
+--------+---+------+-----+

暖伴 2025-02-20 01:54:46

让我们使用 coce> collect_list_list代码>窗口收集与过去两个订单相对应的金额,然后 gentrage> gentreg> gentreg 计算列表中的值在当前行中小于金额

w = Window.partitionBy('Customer').orderBy('Day').rowsBetween(-2, -1)

df.withColumn(
    'Count',
    F.aggregate(
        F.collect_list('Amount').over(w), F.lit(0),
        lambda acc, x: acc + (x < F.col('Amount')).cast('int')
    )
)

+--------+---+------+-----+
|Customer|Day|Amount|Count|
+--------+---+------+-----+
|       A|  4|    96|    0|
|       A| 22|    63|    0|
|       A| 32|    19|    0|
|       A| 50|    27|    1|
|       A| 57|    99|    2|
|       A| 72|    93|    1|
|       B| 69|    97|    0|
|       B| 82|    22|    0|
|       B| 87|    64|    1|
|       C| 22|    60|    0|
|       C| 30|    22|    0|
|       C| 48|    74|    2|
|       C| 49|    68|    1|
|       C| 55|    11|    0|
|       C| 85|    79|    2|
+--------+---+------+-----+

Let us use collect_list over Window to collect the amounts corresponding to past two orders, then aggregate to count the values in list which are less than Amount in current row

w = Window.partitionBy('Customer').orderBy('Day').rowsBetween(-2, -1)

df.withColumn(
    'Count',
    F.aggregate(
        F.collect_list('Amount').over(w), F.lit(0),
        lambda acc, x: acc + (x < F.col('Amount')).cast('int')
    )
)

+--------+---+------+-----+
|Customer|Day|Amount|Count|
+--------+---+------+-----+
|       A|  4|    96|    0|
|       A| 22|    63|    0|
|       A| 32|    19|    0|
|       A| 50|    27|    1|
|       A| 57|    99|    2|
|       A| 72|    93|    1|
|       B| 69|    97|    0|
|       B| 82|    22|    0|
|       B| 87|    64|    1|
|       C| 22|    60|    0|
|       C| 30|    22|    0|
|       C| 48|    74|    2|
|       C| 49|    68|    1|
|       C| 55|    11|    0|
|       C| 85|    79|    2|
+--------+---+------+-----+
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文