如何用极点替换Pandas DF.Rank(Axis = 1)
Alpha因素有时需要排名,因此:
import pandas as pd
df = pd.Dataframe(some_data)
df.rank(axis=1, pct=True)
如何有效地使用Polars实施此部分?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
PORARS数据框的属性为:
因此,Polars不需要Pandas的
Axis = 1
API。计算数字,字符串,布尔值甚至更复杂的嵌套类型(例如结构和列表)之间的等级是没有多大意义的。Pandas通过给您
numeric_only
关键字参数来解决此问题。Polars'更加有用,并希望您使用 expression api 。
表达式
杆表达式在具有保证它们由均匀数据的列上工作。列具有此保证,排在
dataFrame
中的行不多。幸运的是,我们有一个数据类型,可以保证行是同质的:pl.list
数据类型。假设我们有以下数据:
如果我们要计算所有列的等级,则除 student 外,我们可以将它们收集到
list
数据类型:这将给出:
在列表元素上运行PORRARS表达式
我们可以运行 Polars表达式在列表元素上使用
Polars不能提供关键字参数计算排名的百分比。但是,由于表达方式如此通用,我们可以创建自己的百分比等级表达式。
请注意,我们使用 element() 是指在
list.eval
表达式中评估的元素。输出:
请注意,此解决方案适用于您要进行行明智的任何表达式/操作。
A polars DataFrame has properties being:
For this reason polars does not want the
axis=1
API from pandas. It does not make much sense to compute the rank between numeric, string, boolean or even complexer nested types like structs and lists.Pandas solves this by giving you
numeric_only
keyword argument.Polars' is more opinionated and wants to nudge you in using the expression API.
Expression
Polars expressions work on columns that have the guarantee that they consist of homogeneous data. Columns have this guarantee, rows in a
DataFrame
not so much. Luckily we have a data type that has the guarantee that the rows are homogeneous:pl.List
data type.Let's say we have the following data:
If we want to compute the rank of all the columns except for student, we can collect those into a
list
data type:This would give:
Running polars expression on list elements
We can run any polars expression on the elements of a list with the
list.eval
expression! These expressions entirely run on polars' query engine and can run in parallel so will be super fast.Polars doesn't provide a keyword argument the compute the percentages of the ranks. But because expressions are so versatile we can create our own percentage rank expression.
Note that we use
element()
which refers to the element being evaluated in alist.eval
expression.This outputs:
Note that this solution works for any expressions/operation you want to do row wise.