如何在pyspark中使用未结合的,未结合的folodingfollow和Currentrow
我对方法有点困惑 pyspark.sql.window.rowsbetween
接受window.unboundedpreceding
,window> window> window.unboundedfollowing
和窗口.currentrow
对象作为start
和end
参数。您能否解释该功能的工作原理以及如何正确使用window
对象,并在一些示例中正确使用?谢谢你!
I am a little confused about the method pyspark.sql.Window.rowsBetween
that accepts Window.unboundedPreceding
, Window.unboundedFollowing
, and Window.currentRow
objects as start
and end
arguments. Could you please explain how the function works and how to use Window
objects correctly, with some examples? Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
名称之间的行之间/范围之间的行建议有助于限制窗口内考虑的行数。
让我们举一个简单的例子。
从数据开始:
现在通过此数据,让我们尝试查找每行运行最大Max的运行最大最大值:
因此,如预期的那样,它将每个价格从上到下查看每个价格,并填充了最大值。它得到的行为被称为<代码> start = window.unboundedpreceding to
end = end = window.currentrow
现在将行之间更改为
start = start = window.unboundedpreceding
toeend = end = window.unboundedfollowing
我们将获得如下:现在,如您在同一窗口中所看到的那样,它在所有值中向下看,而不是将其限制为当前行。
现在,第三个将是
start = window.currentrow
和end = window.unboundedfollowing
现在,它仅查看最大一个从当前的启动其行。
另外,它不仅限于要使用的3个要使用的3个上方或下方的所有值都将仅查看上方的1行,下方1行。
这样:
因此,您可以想象它在窗口内部的窗口围绕当前行进行处理。
Rows between/Range between as the name suggests help with limiting the number of rows considered inside a window.
Let us take a simple example.
Starting with data:
Now over this data let's try to find of running max i.e max for each row:
So as expected it looked at each price from top to bottom one by one and populated the max value it got this behaviour is known as
start = Window.unboundedPreceding
toend = Window.currentRow
Now changing rows between values to
start = Window.unboundedPreceding
toend = Window.unboundedFollowing
we will get as below:Now as you can see in the same window it's looking downwards in all values for a max instead of limiting it to the current row.
Now third will be
start = Window.currentRow
andend = Window.unboundedFollowing
Now it's looking down only for a max starting its row from the current one.
Also, it's not limited to just these 3 to use as is you can even
start = Window.currentRow-1
andend = Window.currentRow+1
so instead of looking for all values above or below it will only look at 1 row above and 1 row below.like this:
So you can imagine it a window inside the window which works around the current row it's processing.