试图拆分数据范围的数据范围时,要出现价值错误的问题
我有一个dataframe 包裹
完整的地址,我从Excel电子表格中导入了地址。我已经导入三列parcel_id
,prop_locat
,zipcode
和housenum
。我从这个问题中刮了在一起(复制)一些代码: split(爆炸)dataframe中的范围为多行,但是正在接收value eRror:Int()的Int()base 10:''''
error。
我的代码在下面的示例数据集和我要实现的目标:
import pandas as pd
def split(s):
ranges = (x.split("-") for x in s.split("-"))
return [i for r in ranges for i in range(int(r[0]), int(r[-1]) + 1)]
xlsx = 'drive/MyDrive/Parcels.xlsx'
parcels = pd.read_excel(xlsx,usecols=['parcel_id','prop_locat','ZipCode','HouseNum'])
parcels['HouseNum'] = parcels['HouseNum'].astype(str)
parcels['Length'] = parcels['HouseNum'].apply(len)
parcels['full_length'] = parcels['prop_locat'].apply(len)
parcels['address'] = parcels.apply(lambda x: x['prop_locat'][x['Length']:x['full_length']], 1)
parcels.HouseNum = parcels.HouseNum.apply(split)
parcels.set_index(['parcel_id','ZipCode','address',]).Options.apply(pd.Series).stack().reset_index().drop('level_2',1)
print(parcels)
示例数据集:
parcel_id prop_locat HouseNum ZipCode Length full_length address
0 xxxxxxxxxxxxxx 1234 W 500 S 1234 xxxxx 4 12 W 500 S
1 xxxxxxxxxxxxxx 123-130 W 700 S 123-130 xxxxx 7 15 W 700 S
目标是能够获取地址123-130的值范围,并将其附加到数据框架上,并增加了地址。即123 W 700 s,124 W 700 s,125 W 700 s,n,130 W 700 S.
在正确方向上的任何指向都将不胜感激。谢谢你!
I have a dataframe parcels
full of addresses which I have imported from an excel spreadsheet. I have imported three columns parcel_id
,prop_locat
,ZipCode
, and HouseNum
. I have scraped together (copied) some code from this question: Split (explode) range in dataframe into multiple rows but am receiving a ValueError: invalid literal for int() with base 10: ' '
error.
My code is below with a sample data set with what I am trying to achieve:
import pandas as pd
def split(s):
ranges = (x.split("-") for x in s.split("-"))
return [i for r in ranges for i in range(int(r[0]), int(r[-1]) + 1)]
xlsx = 'drive/MyDrive/Parcels.xlsx'
parcels = pd.read_excel(xlsx,usecols=['parcel_id','prop_locat','ZipCode','HouseNum'])
parcels['HouseNum'] = parcels['HouseNum'].astype(str)
parcels['Length'] = parcels['HouseNum'].apply(len)
parcels['full_length'] = parcels['prop_locat'].apply(len)
parcels['address'] = parcels.apply(lambda x: x['prop_locat'][x['Length']:x['full_length']], 1)
parcels.HouseNum = parcels.HouseNum.apply(split)
parcels.set_index(['parcel_id','ZipCode','address',]).Options.apply(pd.Series).stack().reset_index().drop('level_2',1)
print(parcels)
Sample data set:
parcel_id prop_locat HouseNum ZipCode Length full_length address
0 xxxxxxxxxxxxxx 1234 W 500 S 1234 xxxxx 4 12 W 500 S
1 xxxxxxxxxxxxxx 123-130 W 700 S 123-130 xxxxx 7 15 W 700 S
The goal is to be able to take the range of values for the address 123-130 and append them to the data frame with the added address. IE 123 W 700 S, 124 W 700 S, 125 W 700 S, n, 130 W 700 S.
Any pointing in the right direction would be greatly appreciated. Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
错误发生在您的
housenum
是空格而不是您的功能期望的内容中。有了一些轻微的更改,可以考虑无效的情况:
给定:
执行:
输出:
The error occurs from one of your
HouseNum
being a blank spaceinstead of what your function is expecting. With some slight changes, invalid cases can be accounted for:
Given:
Doing:
Output:
我的猜测是基于您的错误,您正在尝试将空字符串转换为整数,这显然是行不通的。我将使用或操作员检查空字符串,然后简单地解析为零:
int(arg或“ 0”)
My guess based on your error is you're trying to convert empty strings into integers, which obviously wouldn't work. I would use the or operator to check for empty strings, and simply resolve to zero:
int(arg or "0")