如何为pyspark.sql.dataframe.dataframe编写此PANDAS逻辑,而不会在Spark API上使用Pandas?
我是Pyspark的新手,因为Pyspark没有LOC功能,我们如何编写此逻辑。我尝试了指定条件,但无法获得理想的结果,任何帮助将不胜感激!
df['Total'] = (df['level1']+df['level2']+df['level3']+df['level4'])/df['Number']
df.loc[df['level4'] > 0, 'Total'] += 4
df.loc[((df['level3'] > 0) & (df['Total'] < 1)), 'Total'] += 3
df.loc[((df['level2'] > 0) & (df['Total'] < 1)), 'Total'] += 2
df.loc[((df['level1'] > 0) & (df['Total'] < 1)), 'Total'] += 1
I'm totally new to Pyspark, as Pyspark doesn't have loc feature how can we write this logic. I tried by specifying conditions but couldn't get the desirable result, any help would be greatly appreciated!
df['Total'] = (df['level1']+df['level2']+df['level3']+df['level4'])/df['Number']
df.loc[df['level4'] > 0, 'Total'] += 4
df.loc[((df['level3'] > 0) & (df['Total'] < 1)), 'Total'] += 3
df.loc[((df['level2'] > 0) & (df['Total'] < 1)), 'Total'] += 2
df.loc[((df['level1'] > 0) & (df['Total'] < 1)), 'Total'] += 1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于以下数据,
您实际上是在每个语句中实际上更新
total
列,而不是以if-then-else方式更新。可以使用多个withColumn()
在pyspark中复制您的代码(如),例如()如下。我们可以将所有
与column()
与合并为() >语句。
就像和 sql的
案例
语句。For a data like the following
You're actually updating
total
column in each statement, not in an if-then-else way. Your code can be replicated (as is) in pyspark using multiplewithColumn()
withwhen()
like the following.We can merge all the
withColumn()
withwhen()
into a singlewithColumn()
with multiplewhen()
statements.It's like
numpy.where
and SQL'scase
statements.