编写自定义熊猫杂质而不制作所有dtypes对象
我(认为我)需要为 geopandas.geodataframe.dissolve()操作。在合并多个多边形时,我想将多边形的信息与最大的区域保持,这也符合其他条件。该操作运行良好,但是之后我的GeodataFrame的所有属性均为dtype object
。
常规pandas groupy()
发生了同样的问题,因此我简化了下面的示例。有人可以告诉我是否应该以不同的方式编写我的custom_sort()
,以保持dtypes完整吗?
import pandas as pd
df = pd.DataFrame({
'group': ['A', 'A', 'B', 'B'],
'ints': [1, 2, 3, 4],
'floats': [1.0, 2.0, 2.2, 3.2],
'strings': ['foo', 'bar', 'baz', 'qux'],
'bools': [True, True, True, False],
'test': ['drop this', 'keep this', 'keep this', 'drop this'],
})
def custom_sort(df):
"""Define custom aggregation function with special sorting."""
df = df.sort_values(by=['bools', 'floats'], ascending=False)
return df.iloc[0]
print(df)
print(df.dtypes)
print()
grouped = df.groupby(by='group').agg(custom_sort)
print(grouped)
print(grouped.dtypes) # Issue: All dtypes are object
print()
print(grouped.convert_dtypes().dtypes) # Possible solution, but not for me
# Please note that I cannot use convert_dtypes(). I actually need this for
# geopandas.GeoDataFrame.dissolve() and I think convert_dtypes() messes up
# the geometry information
输出:
group ints floats strings bools test
0 A 1 1.0 foo True drop this
1 A 2 2.0 bar True keep this
2 B 3 2.2 baz True keep this
3 B 4 3.2 qux False drop this
group object
ints int64
floats float64
strings object
bools bool
test object
dtype: object
ints floats strings bools test
group
A 2 2.0 bar True keep this
B 3 2.2 baz True keep this
ints object
floats object
strings object
bools object
test object
dtype: object
ints Int64
floats Float64
strings string
bools boolean
test string
dtype: object
I (think I) need to write a custom aggregation function for the geopandas.GeoDataFrame.dissolve() operation. When merging multiple polygons, I want to keep the information of the polygon with the largest area, that also fulfils other criteria. The operation works fine, but afterwards all attributes of my GeoDataFrame are of dtype object
.
The same issue happens with regular pandas groupy()
, so I have simplified the example below. Can someone tell me if I should write my custom_sort()
differently, to keep the dtypes intact?
import pandas as pd
df = pd.DataFrame({
'group': ['A', 'A', 'B', 'B'],
'ints': [1, 2, 3, 4],
'floats': [1.0, 2.0, 2.2, 3.2],
'strings': ['foo', 'bar', 'baz', 'qux'],
'bools': [True, True, True, False],
'test': ['drop this', 'keep this', 'keep this', 'drop this'],
})
def custom_sort(df):
"""Define custom aggregation function with special sorting."""
df = df.sort_values(by=['bools', 'floats'], ascending=False)
return df.iloc[0]
print(df)
print(df.dtypes)
print()
grouped = df.groupby(by='group').agg(custom_sort)
print(grouped)
print(grouped.dtypes) # Issue: All dtypes are object
print()
print(grouped.convert_dtypes().dtypes) # Possible solution, but not for me
# Please note that I cannot use convert_dtypes(). I actually need this for
# geopandas.GeoDataFrame.dissolve() and I think convert_dtypes() messes up
# the geometry information
Output:
group ints floats strings bools test
0 A 1 1.0 foo True drop this
1 A 2 2.0 bar True keep this
2 B 3 2.2 baz True keep this
3 B 4 3.2 qux False drop this
group object
ints int64
floats float64
strings object
bools bool
test object
dtype: object
ints floats strings bools test
group
A 2 2.0 bar True keep this
B 3 2.2 baz True keep this
ints object
floats object
strings object
bools object
test object
dtype: object
ints Int64
floats Float64
strings string
bools boolean
test string
dtype: object
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题的来源是
df.iloc [0]
返回PANDAS系列。该系列具有多个值,具有不同的dtypes。自动,熊猫可以将系列的dtype转换为对象
。 如果我没记错的话,这取决于您正在使用的熊猫库的版本。随着时间的流逝,这种行为已经改变了。解决问题的解决方案在很大程度上取决于您在自定义
agg
函数中所做的操作。在您的玩具示例中,我建议事先操纵您的数据框架,并使用类似可能的聚合功能。
例如,预期复杂的逻辑给出了一个简单的
head
作为agg:对于价值,我还建议您使用更多最新的
pandas
版本。The source of the problem is that
df.iloc[0]
returns a pandas series. This series has multiple values in it, with different dtypes. Automatically, pandas may convert the dtype of the series toobject
. If I recall correctly, this depends on the version of the pandas library you're working with. Changes have been made to this behavior over time.The solution to your problem heavily depends on the operations you're doing in your custom
agg
function.In your toy example, I would suggest manipulating your dataframe beforehand, and using the simples possible aggregating function.
For example, anticipating the complex logic gives a simple
head
as agg:For what is worth, I'd also suggest you use more recent
pandas
versions.