将值添加到 Pandas 数据框以创建具有加权平均值的排名系统

发布于 2025-01-10 05:50:09 字数 2155 浏览 0 评论 0原文

我不确定如何最好地描述这一点(我确信有更合适的描述方式)。

我有一个包含房屋详细信息(例如墙壁、浴室、卧室等)的大型数据集,我需要根据它们的特征对其进行分析和排名。我创建了一个排名系统,“4”是最好的,“0”是最差的,例如,一栋有 1 间卧室的房子的卧室分数可能会得到“0”,但有 3 间浴室的房子可能会得到“ 4”为他们的浴室得分。

一旦我将排名与所有特征相关联,我计划创建一个加权平均值来看看哪些房子是最好的。

最好的方法是什么?我需要执行此操作大约 20 次(针对 20 个特征),到目前为止,这是我知道如何执行此操作的唯一方法 - 而且非常乏味,特别是如果我需要返回并更改任何内容。

另外,最好能更好地理解 df.loc 函数的工作原理,我能够让它工作,但我不太明白它。

    #EXAMPLE ONE, GRADING LAND USE  
    ParcelsData.loc[ParcelsData["land_use"] == 'Flum/Swim Floodway (Restrected)', 'LandUseGrade'] = 0
    ParcelsData.loc[ParcelsData["land_use"] == 'Single Family Residential', 'LandUseGrade'] = 4
    ParcelsData.loc[ParcelsData["land_use"] == 'Wasteland, Slivers, Gullies, Rock Outcrop', 'LandUseGrade'] = 0
    ParcelsData.loc[ParcelsData["land_use"] == 'Single Family Residential - Common', 'LandUseGrade'] = 4
    ParcelsData.loc[ParcelsData["land_use"] == 'Multi Family', 'LandUseGrade'] = 2
    
    #EXAMPLE TWO, STORY 
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '1 STORY', 'StoryGrade'] = 4
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '1.5 STORY', 'StoryGrade'] = 2
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '2.0 STORY', 'StoryGrade'] = 3
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '2.5 STORY', 'StoryGrade'] = 2
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '3.0 STORY', 'StoryGrade'] = 2
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == 'RANCH W/BSMT', 'StoryGrade'] = 4
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == 'BI-LEVEL', 'StoryGrade'] = 1
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == 'SPLIT LEVEL', 'StoryGrade'] = 1
    
     #EXAMPLE THREE, ACRES
    ParcelsData.loc[ParcelsData["Acres"] <= .1, 'AcresGrade'] = 1
    ParcelsData.loc[ParcelsData["Acres"] <= .2, 'AcresGrade'] = 2
    ParcelsData.loc[ParcelsData["Acres"] <= .3, 'AcresGrade'] = 3
    ParcelsData.loc[ParcelsData["Acres"] <= .4, 'AcresGrade'] = 7
    ParcelsData.loc[ParcelsData["Acres"] <= .5, 'AcresGrade'] = 8
    ParcelsData.loc[ParcelsData["Acres"] > .5, 'AcresGrade'] = 9

I am not sure how to best describe this (I am sure there is a more proper way of describing it).

I have a large dataset full of house details (eg. walls, bathrooms, bedrooms, etc.) that I need to analyze and rank based on their characteristics. I have created a ranking system with "4" being the best and "0" being the worst, for example, a house with 1 bedroom may get a "0" for their bedroom score but a house with a 3 bathrooms may get a "4" for their bathroom score.

Once I assocaite the ranks to all the characteristics, I plan on creating a weighted average to see which houses are the best.

How is the best way to do this? I need to do this about 20 times (for 20 characteristics) and so far this is the only way I know how to do it-- and it is quite tedious, especially if I ever need to go back and change anything.

Also, would be good to better understand how the df.loc function works, I was able to do make it work but I don't quite understand it.

    #EXAMPLE ONE, GRADING LAND USE  
    ParcelsData.loc[ParcelsData["land_use"] == 'Flum/Swim Floodway (Restrected)', 'LandUseGrade'] = 0
    ParcelsData.loc[ParcelsData["land_use"] == 'Single Family Residential', 'LandUseGrade'] = 4
    ParcelsData.loc[ParcelsData["land_use"] == 'Wasteland, Slivers, Gullies, Rock Outcrop', 'LandUseGrade'] = 0
    ParcelsData.loc[ParcelsData["land_use"] == 'Single Family Residential - Common', 'LandUseGrade'] = 4
    ParcelsData.loc[ParcelsData["land_use"] == 'Multi Family', 'LandUseGrade'] = 2
    
    #EXAMPLE TWO, STORY 
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '1 STORY', 'StoryGrade'] = 4
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '1.5 STORY', 'StoryGrade'] = 2
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '2.0 STORY', 'StoryGrade'] = 3
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '2.5 STORY', 'StoryGrade'] = 2
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == '3.0 STORY', 'StoryGrade'] = 2
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == 'RANCH W/BSMT', 'StoryGrade'] = 4
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == 'BI-LEVEL', 'StoryGrade'] = 1
    ParcelsData.loc[ParcelsData["HomeDetails_storyheight"] == 'SPLIT LEVEL', 'StoryGrade'] = 1
    
     #EXAMPLE THREE, ACRES
    ParcelsData.loc[ParcelsData["Acres"] <= .1, 'AcresGrade'] = 1
    ParcelsData.loc[ParcelsData["Acres"] <= .2, 'AcresGrade'] = 2
    ParcelsData.loc[ParcelsData["Acres"] <= .3, 'AcresGrade'] = 3
    ParcelsData.loc[ParcelsData["Acres"] <= .4, 'AcresGrade'] = 7
    ParcelsData.loc[ParcelsData["Acres"] <= .5, 'AcresGrade'] = 8
    ParcelsData.loc[ParcelsData["Acres"] > .5, 'AcresGrade'] = 9

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

爱你不解释 2025-01-17 05:50:10

我会为土地使用做这个,希望你明白。

请参阅 https://pandas.pydata.org/docs/reference /api/pandas.Series.map.html 了解更多详情

land_use_map = {
    'Flum/Swim Floodway (Restrected)': 0,
    'Single Family Residential':  4,
    'Wasteland, Slivers, Gullies, Rock Outcrop':  0,
    'Single Family Residential - Common':  4,
    'Multi Family':  2,
}

ParcelsData['land_use'] = ParcelsData['LandUseGrade'].map(land_use_map)

I'll do this for land_use, hope you get the idea.

See https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html for more details

land_use_map = {
    'Flum/Swim Floodway (Restrected)': 0,
    'Single Family Residential':  4,
    'Wasteland, Slivers, Gullies, Rock Outcrop':  0,
    'Single Family Residential - Common':  4,
    'Multi Family':  2,
}

ParcelsData['land_use'] = ParcelsData['LandUseGrade'].map(land_use_map)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文