如何在特定纬度范围内的地理框架中选择线链
我正在使用以下数据集进行研究工作。
https://xfer.services.ncdot.gov/gisdot/DistDOTData/NCRoutes_SHP. zip
我正在尝试从上面数据集提取衬里。线条必须在纬度-80.86和-80.83之间。
首先,我使用地理器加载了数据集。
import geopandas as gpd
graph = gpd.read_file("NCRoutes.shp")
graph = graph.to_crs(4236)
print(len(graph['geometry'])) # number of rows in geodataframe
graph.head()
输出 -
373370
Division MaintCntyC RouteID RouteClass RouteNumbe RouteQuali RouteInven RouteName BeginMP EndMP RouteMaint Shape_Leng geometry
0 13 011 10000026011 1 26 0 0 I-26 0.000063 29.164804 System 56252.469013 MULTILINESTRING Z ((-82.53991 35.79167 0.00000...
1 14 045 10000026045 1 26 0 0 I-26 0.000000 17.459971 System 89044.485587 MULTILINESTRING Z ((-82.53591 35.44044 664.953...
2 13 057 10000026057 1 26 0 0 I-26 0.000000 12.606171 System 66456.002756 MULTILINESTRING Z ((-82.56037 35.95480 1145.01...
3 14 075 10000026075 1 26 0 0 I-26 0.000000 13.121029 System 69207.803704 LINESTRING Z (-82.34883 35.25453 565.22225, -8...
4 7 001 10000040001 1 40 0 0 I-40 0.000000 16.013000 System 84492.598738 LINESTRING Z (-79.53887 36.06290 179.46660, -7...
如上所述,Graph
geodataframe具有373370行。 几何
列的每一行中都有一个多线或统一性。因此,我编写了下面的代码以提取所需的线条,并将其保留在GeodataFrame 区域
中。
from IPython.display import clear_output
area = gpd.GeoDataFrame(columns=['geometry'], geometry='geometry') # geodataframe for saving roads of interest
itr = 0 # to keep track of which row I'm going through
for String in graph['geometry']: # iterating through each row in geometry column of geodataframe
print("iterating through",itr,"th row in geodataframe")
itr += 1
if String.geom_type=='MultiLineString': # if the current row contains a multilinestring
for ls in String: # iterating through linestrings in the multilinestring
for i in range(len(ls.xy[0])): # iterating through points in the linestring
#print("longitude:",ls.xy[0][i],"latitude:",ls.xy[1][i])
if(-80.86<=ls.xy[0][i]<=-80.83):
temp = gpd.GeoDataFrame({'geometry':[ls]},geometry='geometry',crs=4236) #creating a temporary geodataframe for storing the road of interest
area = area.append(temp)
break # no need to keep looking if linestring is already added to area geodataframe
else: # if the current row contains a linestring
for i in range(len(String.xy[0])):
if(-80.86<=String.xy[0][i]<=-80.83):
temp = gpd.GeoDataFrame({'geometry':[String]},geometry='geometry',crs=4236)
area = area.append(temp)
break
clear_output(wait=True)
但是,我的笔记本电脑中经过150行的迭代大约需要12分钟。数据框架中有373370行。因此,通过整个Graph
GeodataFrame在我的计算机中迭代大约需要500个小时。有更快的方法吗?
I'm using the following dataset for my research work.
https://xfer.services.ncdot.gov/gisdot/DistDOTData/NCRoutes_SHP.zip
I'm trying to extract linestrings from the above dataset. The linestrings have to be between the latitudes -80.86 and -80.83.
First I loaded the dataset using geopandas.
import geopandas as gpd
graph = gpd.read_file("NCRoutes.shp")
graph = graph.to_crs(4236)
print(len(graph['geometry'])) # number of rows in geodataframe
graph.head()
The output-
373370
Division MaintCntyC RouteID RouteClass RouteNumbe RouteQuali RouteInven RouteName BeginMP EndMP RouteMaint Shape_Leng geometry
0 13 011 10000026011 1 26 0 0 I-26 0.000063 29.164804 System 56252.469013 MULTILINESTRING Z ((-82.53991 35.79167 0.00000...
1 14 045 10000026045 1 26 0 0 I-26 0.000000 17.459971 System 89044.485587 MULTILINESTRING Z ((-82.53591 35.44044 664.953...
2 13 057 10000026057 1 26 0 0 I-26 0.000000 12.606171 System 66456.002756 MULTILINESTRING Z ((-82.56037 35.95480 1145.01...
3 14 075 10000026075 1 26 0 0 I-26 0.000000 13.121029 System 69207.803704 LINESTRING Z (-82.34883 35.25453 565.22225, -8...
4 7 001 10000040001 1 40 0 0 I-40 0.000000 16.013000 System 84492.598738 LINESTRING Z (-79.53887 36.06290 179.46660, -7...
As you can see above, The graph
geodataframe has 373370 rows. There is either a multilinestring or a linestring in each row of the geometry
column. So I wrote the code below to extract the linestrings I need and keep them in a geodataframe area
.
from IPython.display import clear_output
area = gpd.GeoDataFrame(columns=['geometry'], geometry='geometry') # geodataframe for saving roads of interest
itr = 0 # to keep track of which row I'm going through
for String in graph['geometry']: # iterating through each row in geometry column of geodataframe
print("iterating through",itr,"th row in geodataframe")
itr += 1
if String.geom_type=='MultiLineString': # if the current row contains a multilinestring
for ls in String: # iterating through linestrings in the multilinestring
for i in range(len(ls.xy[0])): # iterating through points in the linestring
#print("longitude:",ls.xy[0][i],"latitude:",ls.xy[1][i])
if(-80.86<=ls.xy[0][i]<=-80.83):
temp = gpd.GeoDataFrame({'geometry':[ls]},geometry='geometry',crs=4236) #creating a temporary geodataframe for storing the road of interest
area = area.append(temp)
break # no need to keep looking if linestring is already added to area geodataframe
else: # if the current row contains a linestring
for i in range(len(String.xy[0])):
if(-80.86<=String.xy[0][i]<=-80.83):
temp = gpd.GeoDataFrame({'geometry':[String]},geometry='geometry',crs=4236)
area = area.append(temp)
break
clear_output(wait=True)
But it took around 12 minutes to iterate through 150 rows in my laptop; whereas there are 373370 rows in the dataframe. So it'll take around 500 hours to iterate through the whole graph
geodataframe in my machine. Is there a faster way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您要使用
cx
,坐标索引器。请参阅文档中的更多信息 htttps://geopandas.org/geopandas.org/en/stable/docs /user_guide/indexing.html
关于多线链vs linestring,您可以在索引之前使用
爆炸>爆炸
以确保使用各个零件,而不是整个几何形状。You want to use
cx
, the coordinate indexer.See more in the documentation https://geopandas.org/en/stable/docs/user_guide/indexing.html
Regarding MultiLinestring vs LineString, you can use
explode
before the indexing to ensure individual parts are used, not whole geometries.