如何从另一个CSV中的一个CSV查找数据?
- 在
crq_data
文件中,我有一个城市和状态来自用户上传的 *.csv文件, code> citydoordinates.csv
文件我有一个美国城市库,并有坐标。 ,我希望这是一种“查找工具”,以比较上传的.csv文件以查找其坐标,
以便在folium中映射,它逐行读取行因此,它一次(n秒)一次附加坐标,我希望它运行得更快,以便如果有6000行,则用户不必等待6000秒。
这是我代码的一部分:
crq_file = askopenfilename(filetypes=[('CSV Files', '*csv')])
crq_data = pd.read_csv(crq_file, encoding="utf8")
coords = pd.read_csv("cityCoordinates.csv")
for crq in range(len(crq_data)):
task_city = crq_data.iloc[crq]["TaskCity"]
task_state = crq_data.iloc[crq]["TaskState"]
for coordinates in range(len(coords)):
cityCoord = coords.iloc[coordinates]["City"]
stateCoord = coords.iloc[coordinates]["State"]
latCoord = coords.iloc[coordinates]["Latitude"]
lngCoord = coords.iloc[coordinates]["Longitude"]
if task_city == cityCoord and task_state == stateCoord:
crq_data["CRQ Latitude"] = latCoord
crq_data["CRQ Longitude"] = lngCoord
print(cityCoord, stateCoord, latCoord, lngCoord)
- In the
crq_data
file I have cities and states from a user uploaded *.csv file - In the
cityDoordinates.csv
file I have a library of American cities and states along with their coordinates, I would like this to be a sort of "look up tool" to compare an uploaded .csv file to find their coordinates to map in Folium
Right now, it reads line by line so it appends the coordinates one at a time (n seconds) I would like it to run much faster so that if there are 6000 lines the user doesn't have to wait for 6000 seconds.
Here is part of my code:
crq_file = askopenfilename(filetypes=[('CSV Files', '*csv')])
crq_data = pd.read_csv(crq_file, encoding="utf8")
coords = pd.read_csv("cityCoordinates.csv")
for crq in range(len(crq_data)):
task_city = crq_data.iloc[crq]["TaskCity"]
task_state = crq_data.iloc[crq]["TaskState"]
for coordinates in range(len(coords)):
cityCoord = coords.iloc[coordinates]["City"]
stateCoord = coords.iloc[coordinates]["State"]
latCoord = coords.iloc[coordinates]["Latitude"]
lngCoord = coords.iloc[coordinates]["Longitude"]
if task_city == cityCoord and task_state == stateCoord:
crq_data["CRQ Latitude"] = latCoord
crq_data["CRQ Longitude"] = lngCoord
print(cityCoord, stateCoord, latCoord, lngCoord)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这不是与大熊猫进行优化的问题,而是为快速查找找到一个良好的数据结构:快速查找的良好数据结构是dict。不过,dict记忆了;您需要为自己评估这笔费用。
我嘲笑了您的CityCoordinates CSV的外观:
当我运行它时,我得到:
现在,迭代任务数据看起来几乎相同:我们采用一对城市并进行了状态,将其制作出标准化的钥匙,然后尝试尝试查找已知坐标的钥匙。
我嘲笑了一些任务数据:
当我运行此任务时:
我得到:
此解决方案原则上会更快,因为您不做
City> City> CityData-Rows x Taskdata-Rows
Quadratic Loop。而且,在实践中,熊猫进行行迭代时会遭受痛苦^1 ,我不确定是否有同样的索引(iLoc
),但总的来说,Pandas是为了操纵数据的列,我想说的不是面向行的问题/解决方案。I see this not as a problem w/optimizing Pandas, but finding a good data structure for fast lookups: and a good data structure for fast lookups is the dict. The dict takes memory, though; you'll need to evaluate that cost for yourself.
I mocked up what your cityCoordinates CSV could look like:
When I run that, I get:
Now, iterating the task data looks pretty much the same: we take a pair of City and State, make a normalized key out of them, then try to look up that key for known coordinates.
I mocked up some task data:
and when I run this:
I get:
This solution is going to be much faster in principle because you're not doing a
cityCoordinates-ROWS x taskData-ROWS
quadratic loop. And, in practice, Pandas suffers when doing row iteration^1, I'm not sure if the same holds for indexing (iloc
), but in general Pandas is for manipulating columns of data, and I would say is not for row-oriented problems/solutions.