试图拆分数据范围的数据范围时，要出现价值错误的问题

发布于 2025-02-07 18:25:53 字数 1667 浏览 2 评论 0原文

我有一个dataframe 包裹完整的地址，我从Excel电子表格中导入了地址。我已经导入三列parcel_id，prop_locat，zipcode和housenum。我从这个问题中刮了在一起（复制）一些代码： split（爆炸）dataframe中的范围为多行，但是正在接收value eRror：Int（）的Int（）base 10：'''' error。

我的代码在下面的示例数据集和我要实现的目标：

import pandas as pd

def split(s):
    ranges = (x.split("-") for x in s.split("-"))
    return [i for r in ranges for i in range(int(r[0]), int(r[-1]) + 1)]

xlsx = 'drive/MyDrive/Parcels.xlsx'
parcels = pd.read_excel(xlsx,usecols=['parcel_id','prop_locat','ZipCode','HouseNum'])
parcels['HouseNum'] = parcels['HouseNum'].astype(str)
parcels['Length'] = parcels['HouseNum'].apply(len)

parcels['full_length'] = parcels['prop_locat'].apply(len)
parcels['address'] = parcels.apply(lambda x: x['prop_locat'][x['Length']:x['full_length']], 1)

parcels.HouseNum = parcels.HouseNum.apply(split)
parcels.set_index(['parcel_id','ZipCode','address',]).Options.apply(pd.Series).stack().reset_index().drop('level_2',1)

print(parcels)

示例数据集：

    parcel_id       prop_locat      HouseNum   ZipCode  Length  full_length address 
0  xxxxxxxxxxxxxx  1234 W 500 S       1234      xxxxx       4           12   W 500 S  
1  xxxxxxxxxxxxxx  123-130 W 700 S    123-130   xxxxx       7           15   W 700 S

目标是能够获取地址123-130的值范围，并将其附加到数据框架上，并增加了地址。即123 W 700 s，124 W 700 s，125 W 700 s，n，130 W 700 S.

在正确方向上的任何指向都将不胜感激。谢谢你！

原文

I have a dataframe parcels full of addresses which I have imported from an excel spreadsheet. I have imported three columns parcel_id,prop_locat,ZipCode, and HouseNum. I have scraped together (copied) some code from this question: Split (explode) range in dataframe into multiple rows but am receiving a ValueError: invalid literal for int() with base 10: ' ' error.

My code is below with a sample data set with what I am trying to achieve:

import pandas as pd

def split(s):
    ranges = (x.split("-") for x in s.split("-"))
    return [i for r in ranges for i in range(int(r[0]), int(r[-1]) + 1)]

xlsx = 'drive/MyDrive/Parcels.xlsx'
parcels = pd.read_excel(xlsx,usecols=['parcel_id','prop_locat','ZipCode','HouseNum'])
parcels['HouseNum'] = parcels['HouseNum'].astype(str)
parcels['Length'] = parcels['HouseNum'].apply(len)

parcels['full_length'] = parcels['prop_locat'].apply(len)
parcels['address'] = parcels.apply(lambda x: x['prop_locat'][x['Length']:x['full_length']], 1)

parcels.HouseNum = parcels.HouseNum.apply(split)
parcels.set_index(['parcel_id','ZipCode','address',]).Options.apply(pd.Series).stack().reset_index().drop('level_2',1)

print(parcels)

Sample data set:

    parcel_id       prop_locat      HouseNum   ZipCode  Length  full_length address 
0  xxxxxxxxxxxxxx  1234 W 500 S       1234      xxxxx       4           12   W 500 S  
1  xxxxxxxxxxxxxx  123-130 W 700 S    123-130   xxxxx       7           15   W 700 S

The goal is to be able to take the range of values for the address 123-130 and append them to the data frame with the added address. IE 123 W 700 S, 124 W 700 S, 125 W 700 S, n, 130 W 700 S.

Any pointing in the right direction would be greatly appreciated. Thank you!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晨与橙与城 2025-02-14 18:25:53

错误发生在您的housenum是空格而不是您的功能期望的内容中。有了一些轻微的更改，可以考虑无效的情况：

给定：

  house_num
0      1234
1       103
2   123-130
3           # a blank space ' ' as given in your error.

执行：

def stuff(val):
    val = [int(x) for x in val.split('-') if x.isnumeric()]
    return [x for x in range(val[0], val[-1]+1)] if val else None

df.house_num = df.house_num.apply(stuff)
df = df.explode('house_num')
print(df)

输出：

  house_num
0      1234
1       103
2       123
2       124
2       125
2       126
2       127
2       128
2       129
2       130
3      None

The error occurs from one of your HouseNum being a blank space instead of what your function is expecting. With some slight changes, invalid cases can be accounted for:

Given:

  house_num
0      1234
1       103
2   123-130
3           # a blank space ' ' as given in your error.

Doing:

def stuff(val):
    val = [int(x) for x in val.split('-') if x.isnumeric()]
    return [x for x in range(val[0], val[-1]+1)] if val else None

df.house_num = df.house_num.apply(stuff)
df = df.explode('house_num')
print(df)

Output:

  house_num
0      1234
1       103
2       123
2       124
2       125
2       126
2       127
2       128
2       129
2       130
3      None

回复收藏 0 原文

命比纸薄 2025-02-14 18:25:53

我的猜测是基于您的错误，您正在尝试将空字符串转换为整数，这显然是行不通的。我将使用或操作员检查空字符串，然后简单地解析为零：int（arg或“ 0”）

回复收藏 0 原文

~没有更多了~

关于作者

本宫微胖

暂无简介

文章

563 人气

关注发私信

友情链接

文江博客

试图拆分数据范围的数据范围时，要出现价值错误的问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

试图拆分数据范围的数据范围时，要出现价值错误的问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。