选择具有大多边形的良好 SQL Server 2008 空间索引

发布于 2024-09-03 12:14:22 字数 5242 浏览 7 评论 0原文

我正在尝试为我正在处理的数据集选择一个合适的 SQL Server 2008 空间索引设置,这很有趣。

数据集是多边形,代表整个地球的轮廓。表中有 106,000 行,多边形存储在几何字段中。

我遇到的问题是许多多边形覆盖了地球的很大一部分。这似乎使得获得一个空间索引来消除主过滤器中的许多行变得非常困难。例如,查看以下查询:

SELECT "ID","CODE","geom".STAsBinary() as "geom" FROM "dbo"."ContA"
WHERE "geom".Filter(
  geometry::STGeomFromText('POLYGON ((-142.03193662573682 59.53396984952896,
    -142.03193662573682 59.88928136451884,
    -141.32743833481925 59.88928136451884,
    -141.32743833481925 59.53396984952896,
    -142.03193662573682 59.53396984952896))', 4326)
) = 1

这是查询仅与表中两个多边形相交的区域。无论我选择何种空间索引设置组合,Filter() 始终返回大约 60,000 行。

用 STIntersects() 替换 Filter() 当然只返回我想要的两个多边形,但当然需要更长的时间(Filter() 是 6 秒,STIntersects() 是 12 秒)。

任何人都可以给我一些提示,说明是否有一个空间索引设置可能会改进 60,000 行,或者我的数据集是否与 SQL Server 的空间索引不太匹配?

更多信息

按照建议,我使用全球范围内的 4x4 网格分割多边形。我找不到使用 QGIS 执行此操作的方法,因此我编写了自己的查询来执行此操作。首先,我定义了 16 个边界框,第一个看起来像这样:

declare  @box1 geometry = geometry::STGeomFromText('POLYGON ((
-180 90,
-90 90,
-90 45,
-180 45,
-180 90))', 4326)

然后我使用每个边界框来选择并截断与该框相交的多边形:

insert ContASplit
select CODE, geom.STIntersection(@box1), CODE_DESC from ContA
where geom.STIntersects(@box1) = 1

显然,我对 4x4 网格中的所有 16 个边界框都执行了此操作。最终结果是我有一个包含约 107,000 行的新表(这证实我实际上没有很多巨大的多边形)。

我添加了一个空间索引,每个对象有 1024 个单元格,每个级别的单元格为低、低、低、低。

然而,非常奇怪的是,这个带有分割多边形的新表的性能仍然与旧表相同。执行上面列出的 .Filter仍然 返回约 60,000 行。我真的完全不明白这一点,显然我不明白空间索引实际上是如何工作的。

矛盾的是,虽然 .Filter() 仍然返回约 60,000 行,但它提高了性能。 .Filter() 现在需要大约 2 秒而不是 6 秒,.STIntersects() 现在需要 6 秒而不是 12 秒。

按照要求,这里是索引的 SQL 示例:

CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-90, -180, 90, 180),
GRIDS =(LEVEL_1 = LOW,LEVEL_2 = LOW,LEVEL_3 = LOW,LEVEL_4 = LOW), 
CELLS_PER_OBJECT = 1024,
PAD_INDEX  = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ALLOW_ROW_LOCKS  = ON,
ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

尽管请记住,我已经尝试了一系列每个对象的网格和单元格的设置不同,每次都有相同的结果。

以下是运行 sp_help_spatial_geometry_index 的结果,这是在我的分割数据集上,其中没有单个多边形占据地球的 1/16 以上:

Base_Table_Rows 215138 Bounding_Box_xmin -90 Bounding_Box_ymin -180 Bounding_Box_xmax 90 Bounding_Box_ymax 180 网格_大小_级别_1 64 网格_大小_级别_2 64 网格_大小_级别_3 64 网格_大小_级别_4 64 Cells_Per_Object 16 总计主索引行 378650 主索引总计页数 1129 每个基本行的平均索引行数 1 Total_Number_Of_ObjectCells_In_Level0_For_QuerySample 1 Total_Number_Of_ObjectCells_In_Level0_In_Index 60956 第 361 章 对象单元总数 In_Level2_In_Index 2935 Total_Number_Of_ObjectCells_In_Level3_In_Index 32420 Total_Number_Of_ObjectCells_In_Level4_In_Index 281978 Total_Number_Of_Interior_ObjectCells_In_Level2_In_Index 1 Total_Number_Of_Interior_ObjectCells_In_Level3_In_Index 49 内部对象单元总数 In_Level4_In_Index 4236 Total_Number_Of_Intersecting_ObjectCells_In_Level1_In_Index 29 相交对象单元总数 In_Level2_In_Index 1294 Total_Number_Of_Intersecting_ObjectCells_In_Level3_In_Index 29680 Total_Number_Of_Intersecting_ObjectCells_In_Level4_In_Index 251517 Total_Number_Of_Border_ObjectCells_In_Level0_For_QuerySample 1 Total_Number_Of_Border_ObjectCells_In_Level0_In_Index 60956 边框对象单元总数 In_Level1_In_Index 332 边界对象单元总数 In_Level2_In_Index 1640 边框对象单元总数 In_Level3_In_Index 2691 边界对象单元总数 In_Level4_In_Index 26225 Interior_To_To_To_Cells_Normalized_To_Leaf_Grid_Percentage 0.004852925 相交_To_Total_Cells_Normalized_To_Leaf_Grid_Percentage 0.288147586 边框到总单元格归一化到叶网格百分比 99.70699949 Average_Cells_Per_Object_Normalized_To_Leaf_Grid 405.7282349 Average_Objects_PerLeaf_GridCell 0.002464704 Number_Of_SRIDs_Found 1 单元格宽度 1 2.8125 单元格宽度2 0.043945313 单元格宽度3 0.000686646 单元格宽度 4 1.07E-05 1 级单元格高度 5.625 单元格高度2 0.087890625 单元格高度3 0.001373291 单元格高度4 2.15E-05 Level1 单元格面积 1012.5 Level2 单元格面积 15.8203125 Level3 单元格面积 0.247192383 Level4 单元格面积 0.003862381 CellArea_To_BoundingBoxArea_Percentage_In_Level1 1.5625 CellArea_To_BoundingBoxArea_Percentage_In_Level2 0.024414063 CellArea_To_BoundingBoxArea_Percentage_In_Level3 0.00038147 CellArea_To_BoundingBoxArea_Percentage_In_Level4 5.96E-06 按主过滤器选择的行数 60956 Number_Of_Rows_Selected_By_Internal_Filter 0 Number_Of_Time_Secondary_Filter_Is_Called 60956 输出行数 2 Percentage_Of_Rows_NotSelected_By_Primary_Filter 71.66655821 Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter 0 内部过滤器效率 0 Primary_Filter_Efficiency 0.003281055

"Base_Table_Rows 215138" 对我来说没有多大意义,表中有 107,000 行,而不是 215,000

渲染时,数据集如下所示: 替代文本
(来源:norman.cx

进一步研究:

根据这些数据,我仍然对初级过滤器的糟糕性能感到困惑。所以我做了一个测试来看看我的数据是如何分裂的。通过我最初的未分割功能,我在表格中添加了一个“单元格”列。然后,我运行 16 个查询来计算该特征跨越 4x4 网格中的单元格数量。因此,我对每个单元格运行了这样的查询:

declare  @box1 geometry = geometry::STGeomFromText('POLYGON ((
-180 90,
-90 90,
-90 45,
-180 45,
-180 90))', 4326)
update ContA set cells = cells + 1 where
geom.STIntersects(@box1) = 1

如果我查看表中的“单元格”列,则整个数据集中只有 672 个特征与 4x4 网格内的 1 个以上单元格相交。那么,从字面上看,主过滤器到底是如何为查看 200 英里宽的小矩形的查询返回 60,000 个特征的呢?

此时看来,我可以编写自己的索引方案,该方案比 SQL Server 对这些功能的执行方式更好。

I'm having some fun trying to pick a decent SQL Server 2008 spatial index setup for a data set I am dealing with.

The dataset is polygons, representing contours over the whole globe. There are 106,000 rows in the table, the polygons are stored in a geometry field.

The issue I have is that many of the polygons cover a large portion of the globe. This seems to make it very hard to get a spatial index that will eliminate many rows in the primary filter. For example, look at the following query:

SELECT "ID","CODE","geom".STAsBinary() as "geom" FROM "dbo"."ContA"
WHERE "geom".Filter(
  geometry::STGeomFromText('POLYGON ((-142.03193662573682 59.53396984952896,
    -142.03193662573682 59.88928136451884,
    -141.32743833481925 59.88928136451884,
    -141.32743833481925 59.53396984952896,
    -142.03193662573682 59.53396984952896))', 4326)
) = 1

This is querying an area which intersects with only two of the polygons in the table. No matter what combination of spatial index settings I chose, that Filter() always returns around 60,000 rows.

Replacing Filter() with STIntersects() of course returns just the two polygons I want, but of course takes much longer (Filter() is 6 seconds, STIntersects() is 12 seconds).

Can anyone give me any hints on whether there is a spatial index setup that is likely to improve on 60,000 rows or is my dataset just not a good match for SQL Server's spatial indexing ?

More info:

As suggested, I split the polygons up, using a 4x4 grid across the globe. I couldn't see a way to do it with QGIS, so I wrote my own query to do it. First I defined 16 bounding boxes, the first looked like this:

declare  @box1 geometry = geometry::STGeomFromText('POLYGON ((
-180 90,
-90 90,
-90 45,
-180 45,
-180 90))', 4326)

Then I used each bounding box to select and truncate the polygons that intersected that box:

insert ContASplit
select CODE, geom.STIntersection(@box1), CODE_DESC from ContA
where geom.STIntersects(@box1) = 1

I obviously did this for all 16 bounding boxes in the 4x4 grid. The end result is that I have a new table with ~107,000 rows (which confirms that I didn't actually have many huge polygons).

I added a spatial index with 1024 cells per object and low,low,low,low for the cells per level.

However, very oddly this new table with the split polygons still performs the same as the old one. Doing the .Filter listed above still returns ~60,000 rows. I really don't understand this at all, clearly I don't understand how the spatial index actually work.

Paradoxically, while .Filter() still returns ~60,000 rows, it has improved performance. The .Filter() now takes around 2 seconds rather than 6 and the .STIntersects() now takes 6 seconds rather than 12.

As requested here is an example of the SQL for the index:

CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-90, -180, 90, 180),
GRIDS =(LEVEL_1 = LOW,LEVEL_2 = LOW,LEVEL_3 = LOW,LEVEL_4 = LOW), 
CELLS_PER_OBJECT = 1024,
PAD_INDEX  = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ALLOW_ROW_LOCKS  = ON,
ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Though remember, I have tried a whole range of different settings for the grids and cells per object, with the same results each time.

Here are the results of running sp_help_spatial_geometry_index, this is on my split dataset where no single polygon occupies more than 1/16th of the globe:

Base_Table_Rows 215138
Bounding_Box_xmin -90
Bounding_Box_ymin -180
Bounding_Box_xmax 90
Bounding_Box_ymax 180
Grid_Size_Level_1 64
Grid_Size_Level_2 64
Grid_Size_Level_3 64
Grid_Size_Level_4 64
Cells_Per_Object 16
Total_Primary_Index_Rows 378650
Total_Primary_Index_Pages 1129
Average_Number_Of_Index_Rows_Per_Base_Row 1
Total_Number_Of_ObjectCells_In_Level0_For_QuerySample 1
Total_Number_Of_ObjectCells_In_Level0_In_Index 60956
Total_Number_Of_ObjectCells_In_Level1_In_Index 361
Total_Number_Of_ObjectCells_In_Level2_In_Index 2935
Total_Number_Of_ObjectCells_In_Level3_In_Index 32420
Total_Number_Of_ObjectCells_In_Level4_In_Index 281978
Total_Number_Of_Interior_ObjectCells_In_Level2_In_Index 1
Total_Number_Of_Interior_ObjectCells_In_Level3_In_Index 49
Total_Number_Of_Interior_ObjectCells_In_Level4_In_Index 4236
Total_Number_Of_Intersecting_ObjectCells_In_Level1_In_Index 29
Total_Number_Of_Intersecting_ObjectCells_In_Level2_In_Index 1294
Total_Number_Of_Intersecting_ObjectCells_In_Level3_In_Index 29680
Total_Number_Of_Intersecting_ObjectCells_In_Level4_In_Index 251517
Total_Number_Of_Border_ObjectCells_In_Level0_For_QuerySample 1
Total_Number_Of_Border_ObjectCells_In_Level0_In_Index 60956
Total_Number_Of_Border_ObjectCells_In_Level1_In_Index 332
Total_Number_Of_Border_ObjectCells_In_Level2_In_Index 1640
Total_Number_Of_Border_ObjectCells_In_Level3_In_Index 2691
Total_Number_Of_Border_ObjectCells_In_Level4_In_Index 26225
Interior_To_Total_Cells_Normalized_To_Leaf_Grid_Percentage 0.004852925
Intersecting_To_Total_Cells_Normalized_To_Leaf_Grid_Percentage 0.288147586
Border_To_Total_Cells_Normalized_To_Leaf_Grid_Percentage 99.70699949
Average_Cells_Per_Object_Normalized_To_Leaf_Grid 405.7282349
Average_Objects_PerLeaf_GridCell 0.002464704
Number_Of_SRIDs_Found 1
Width_Of_Cell_In_Level1 2.8125
Width_Of_Cell_In_Level2 0.043945313
Width_Of_Cell_In_Level3 0.000686646
Width_Of_Cell_In_Level4 1.07E-05
Height_Of_Cell_In_Level1 5.625
Height_Of_Cell_In_Level2 0.087890625
Height_Of_Cell_In_Level3 0.001373291
Height_Of_Cell_In_Level4 2.15E-05
Area_Of_Cell_In_Level1 1012.5
Area_Of_Cell_In_Level2 15.8203125
Area_Of_Cell_In_Level3 0.247192383
Area_Of_Cell_In_Level4 0.003862381
CellArea_To_BoundingBoxArea_Percentage_In_Level1 1.5625
CellArea_To_BoundingBoxArea_Percentage_In_Level2 0.024414063
CellArea_To_BoundingBoxArea_Percentage_In_Level3 0.00038147
CellArea_To_BoundingBoxArea_Percentage_In_Level4 5.96E-06
Number_Of_Rows_Selected_By_Primary_Filter 60956
Number_Of_Rows_Selected_By_Internal_Filter 0
Number_Of_Times_Secondary_Filter_Is_Called 60956
Number_Of_Rows_Output 2
Percentage_Of_Rows_NotSelected_By_Primary_Filter 71.66655821
Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter 0
Internal_Filter_Efficiency 0
Primary_Filter_Efficiency 0.003281055

"Base_Table_Rows 215138" doesn't make much sense to me, there are 107,000 rows in the table, not 215,000

When rendered the data set looks like this:
alt text
(source: norman.cx)

Further research:

I continue to be puzzled by the poor performance of the primary filter with this data. So I did a test to see exactly how my data splits up. With my original unsplit features I added a "cells" column to the table. I then ran 16 queries to count how many cells in a 4x4 grid the feature spanned. So I ran a query like this for each cell:

declare  @box1 geometry = geometry::STGeomFromText('POLYGON ((
-180 90,
-90 90,
-90 45,
-180 45,
-180 90))', 4326)
update ContA set cells = cells + 1 where
geom.STIntersects(@box1) = 1

If I then look at the "cells" column in the table there are only 672 features in the whole of my data set that intersect with more than 1 cell within the 4x4 grid. So how on Earth, quite literally, can the primary filter be returning 60,000 features for a query looking at a small 200 mile wide rectangle ?

At this point it looks like I could write my own indexing scheme that would work better that how SQL Server's is performing for these features.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

故人如初 2024-09-10 12:14:22

在索引查询中,您使用:

CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-90, -180, 90, 180),
...

BOUNDING_BOX 因此映射到:

xmin = -90
ymin = -180
xmax = 90
ymax = 180
  • 经度(-180 到
    180 - 指定东/西
    子午线)应映射到 X
  • 纬度(-90 到 90 - 指定如何
    赤道以北或以南)
    应该映射到 Y

因此,要为您应该使用的世界创建 BOUNDING_BOX:

CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-180, -90, 180, 90),
...

这应该创建一个适合您的数据的索引,并且意味着该索引涵盖了您的所有功能。

In your index query you use:

CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-90, -180, 90, 180),
...

The BOUNDING_BOX therefore maps to:

xmin = -90
ymin = -180
xmax = 90
ymax = 180
  • Longtitude (-180 to
    180 - designating East / West of the
    Meridian) should map to X
  • Latitude (-90 to 90 - designating how
    far North or South of the Equator)
    should map to Y

So to create the BOUNDING_BOX for the world you should use:

CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-180, -90, 180, 90),
...

This should create an index that fits your data and means all your features are covered by the index.

筱武穆 2024-09-10 12:14:22

分割数据

如果查询用于显示数据,那么您可以使用网格分割大多边形。然后通过索引可以非常快速地检索这些内容。您可以删除轮廓,这样功能看起来仍然是连续的。

大多数商业 GIS 软件包都具有将一个多边形数据集分割为另一个多边形数据集的工具。搜索进行交叉的工具。

如果您使用 OpenSource,请查看 QGIS 和 http://www.ftools.ca,其中 “执行地理处理操作,包括交集、求差、并集、溶解和裁剪。”我自己没有使用过后者。

看一下: http://postgis.refractions.net/docs/ch04.html #id2790790 为什么大特征不好。

过滤和相交

这里有更多关于 Filter 子句的信息 - 链接

空间索引

另外需要检查的是空间索引是否确实在查询计划中使用。您可能必须强制查询使用带有WITH子句的索引:

链接

有关以下索引的更多详细信息:

链接

另请尝试对您的数据运行 sp_help_spatial_geometry_index 以查看空间索引使用哪些设置

http://msdn.microsoft.com/en-us/library/cc627426.aspx

运行此 SP测试几何会生成各种统计数据,以尝试根据数据定制索引。完整的属性列表位于 http://msdn.microsoft.com/en -us/library/cc627425.aspx

这些包括诸如以下的值:

  • CellArea_To_BoundingBoxArea_Percentage_In_Level1
  • Number_Of_Rows_Selected_By_Primary_Filter

混乱的几何图形

从 sp_help_spatial_geometry_index 的结果看来,您可能有几何图形本身的问题,而不是空间索引的问题。

Base_Table_Rows 计数看起来是一个错误 - http://connect.microsoft.com/SQLServer/feedback/details/475838/number-of-rows-in-base-table-in Correct-in -sp-help-spatial-geography-index-xml
可能值得重新创建表/数据库并从头开始尝试索引。

Total_Number_Of_ObjectCells_In_Level0_In_Index 60956 是在级别 0 处返回的大量要素。它们很可能位于空间索引范围之外或为空。然后,它对所有这些功能运行 Intersect (Number_Of_Times_Secondary_Filter_Is_Called 60956),这可以解释为什么它很慢。尽管文档声称空特征不会影响性能 - 我相信它仍然必须查找记录,即使没有执行相交。

对 NULL 和空实例进行计数
处于0级但不会影响
表现。 0级将有同样多的
单元格为 NULL 且为空实例
基表。

我相信 Primary_Filter_Efficiency 为 0.003281055 表明效率为 0.03%!

需要尝试的一些事情:

  1. SELECT * FROM sys.spatial_indexes 有什么奇怪的地方吗?

  2. MakeValid 语句:

    更新 MyTable SET GeomFieldName = GeomFieldName.MakeValid()

  3. 重置/仔细检查 SRID:

    更新 MyTable SET GeomFieldName.STSrid = 4326

  4. 添加一些字段以显示要素的范围。这可能会突出显示问题/NULL 几何图形。

    更改表我的表
    添加 MinX AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((1)).STX,0)) 持久
    更改表我的表
    添加 MinY AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((1)).STY,0)) PERSISTED
    更改表我的表
    添加 MaxX AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((3)).STX,0)) 持久
    更改表我的表
    ADD MaxY AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((3)).STY,0)) PERSISTED

Splitting Data

If the query is for displaying data then you could split up your large polygons using a grid. These would be then very quick to retrieve with an index. You could remove the outlines so the features would still look contiguous.

Most commercial GIS packages will have tools to split one polygon dataset by another. Search for tools that do intersections.

If you are using OpenSource then have a look at QGIS and http://www.ftools.ca which "perform geoprocessing operations including intersections, differencing, unions, dissolves, and clipping." I've not used the latter myself.

Have a look at: http://postgis.refractions.net/docs/ch04.html#id2790790 for why large features are bad.

Filter and Intersects

There is more on the Filter clause here - Link

Spatial Indexes

Something else to check is that the spatial index is actually being used in the query plan. You may have to force the query to use the index with the WITH clause:

Link

More details on indexes below:

Link

Also try running sp_help_spatial_geometry_index for your data to see what settings to use for your spatial index

http://msdn.microsoft.com/en-us/library/cc627426.aspx

Running this SP with some test geometry produces all sorts of statistics to try and tailor your index to your data. A full list of properties is at http://msdn.microsoft.com/en-us/library/cc627425.aspx

These include values such as:

  • CellArea_To_BoundingBoxArea_Percentage_In_Level1
  • Number_Of_Rows_Selected_By_Primary_Filter

Messed Up Geometry

From the results of sp_help_spatial_geometry_index it looks like you may have issues with the geometry itself rather than the spatial index.

The Base_Table_Rows count looks to be a bug - http://connect.microsoft.com/SQLServer/feedback/details/475838/number-of-rows-in-base-table-incorrect-in-sp-help-spatial-geography-index-xml
It may be worth recreating table / database and trying the index from scratch.

Total_Number_Of_ObjectCells_In_Level0_In_Index 60956 is a lot of features to return at level 0. It is likely they are either outside the spatial index extent or nulls. It then runs the Intersect (Number_Of_Times_Secondary_Filter_Is_Called 60956) on all these features which would explain why it is slow. Even though the docs claim no performance hit for null features - I believe it still has to look up the records, even if no intersect is performed.

NULL and empty instances are counted
at level 0 but will not impact
performance. Level 0 will have as many
cells as NULL and empty instances at
the base table.

The Primary_Filter_Efficiency of 0.003281055 I believe indicates 0.03% efficiency!

A few things to try:

  1. Anything strange from SELECT * FROM sys.spatial_indexes?

  2. The MakeValid statement:

    UPDATE MyTable SET GeomFieldName = GeomFieldName.MakeValid()

  3. Reset / double check SRID:

    UPDATE MyTable SET GeomFieldName.STSrid = 4326

  4. Add in some fields to show the extents of your features. This may highlight issues / NULL geometries.

    ALTER TABLE MyTable
    ADD MinX AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((1)).STX,0)) PERSISTED
    ALTER TABLE MyTable
    ADD MinY AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((1)).STY,0)) PERSISTED
    ALTER TABLE MyTable
    ADD MaxX AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((3)).STX,0)) PERSISTED
    ALTER TABLE MyTable
    ADD MaxY AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((3)).STY,0)) PERSISTED

别再吹冷风 2024-09-10 12:14:22

我也发现很难“猜测”特定几何表的适当空间索引是什么。我尝试使用 sp_help_spatial_geometry_index 存储过程进行更有根据的猜测。所有这些所做的只是告诉我在每次“GUESS”之后我的空间索引表现有多差。即使我仅考虑 2-8 个 CELLS_PER_OBJECT 来限制我的选项,仅此一项就可以提供 567 种排列(选择 3 种类型 4 次 = 81。然后乘以 7 个 CELLS_PER_OBJECT 选项)。我决定让 SQL Server 为我做实验并给我一些经验证据。我创建了一个存储过程,它将遍历排列并为每个排列在空间表上重建空间索引。然后,它将使用两个提供的几何实例来测试空间索引的每个排列的查询性能。我选择了一个包含整个数据集的几何实例,然后选择了另一个包含数据集较小部分的实例。该过程在每个实例上使用 STIntersect() 4 次,然后将结果记录在表中。然后,您可以查询结果表以找出哪个空间索引在您的特定数据集上表现最佳。尝试一下,如果您有任何改进建议或意见,请告诉我。

使用此 https://gist.github.com/anonymous/5322650 创建过程。然后使用以下示例设置执行语句:

/* set up some strings to be used to create geometry instances when our test spatial queries run */ 
DECLARE @ada VARCHAR(MAX) 
SET @ada = 'GEOMETRY::STGeomFromText(''POLYGON ((2422068 527322, 2422068 781170, 2565405 781170, 2565405 527322, 2422068 527322))'', 0)'
DECLARE @mer VARCHAR(MAX) 
SET @mer = 'GEOMETRY::STGeomFromText(''POLYGON ((2451235 696087, 2451235 721632, 2473697 721632, 2473697 696087, 2451235 696087))'', 0)'
DECLARE @mer1 VARCHAR(MAX) 
SET @mer1 = 'GEOMETRY::STGeomFromText(''POLYGON ((244386 712283, 2443866 717980, 2454872 717980, 2454872 712283, 244386 712283))'', 0)'
DECLARE @mer2 VARCHAR(MAX) 
SET @mer2 = 'GEOMETRY::STGeomFromText(''POLYGON ((2434259 687278, 2434259 701994, 2449657 701994, 2449657 687278, 2434259 687278))'', 0)'


EXEC gis.sp_tune_spatial_index 'PARCEL_ADA', 'S104_idx', 2, 8, @ada, @mer1 
GO

注意:显然,重建空间索引 567 次将需要很长时间。将其从命令行中删除,或者在您执行其他操作时让它运行。如果它是您要经常使用的数据集并且几何形状保持相似,那么运行该过程所花费的时间将是值得的。结果表显示性能(以毫秒为单位)。

I too have found it very difficult to "GUESS" what an appropriate spatial index will be for a particular table of geometries. I tried making more educated guesses using the sp_help_spatial_geometry_index stored procedure. All this did was tell me how poorly my spatial index was performing after each "GUESS". Even if I limited my options by only considering 2-8 CELLS_PER_OBJECT, that alone gives 567 permutations (3 types chosen 4 times = 81. Then multiply by 7 CELLS_PER_OBJECT options). I decided I was going to let SQL server do the experimenting for me and give me some empirical evidence. I created a stored procedure that would spin through the permutations and rebuild the spatial index on a spatial table for each one. Then it would test query performance of each permutation of the spatial index using two supplied geometry instances. I selected one geometry instance that included the entire data set and then another instance that included a smaller portion of the data set. The proc uses STIntersect() 4 times on each instance and then records the results in a table. You can then query the results table to find out which spatial index performed best on your particular data set. Give it a try and let me know if you have any suggested improvements or observations.

Create the proc using this https://gist.github.com/anonymous/5322650. Then set up an execution statement using this example:

/* set up some strings to be used to create geometry instances when our test spatial queries run */ 
DECLARE @ada VARCHAR(MAX) 
SET @ada = 'GEOMETRY::STGeomFromText(''POLYGON ((2422068 527322, 2422068 781170, 2565405 781170, 2565405 527322, 2422068 527322))'', 0)'
DECLARE @mer VARCHAR(MAX) 
SET @mer = 'GEOMETRY::STGeomFromText(''POLYGON ((2451235 696087, 2451235 721632, 2473697 721632, 2473697 696087, 2451235 696087))'', 0)'
DECLARE @mer1 VARCHAR(MAX) 
SET @mer1 = 'GEOMETRY::STGeomFromText(''POLYGON ((244386 712283, 2443866 717980, 2454872 717980, 2454872 712283, 244386 712283))'', 0)'
DECLARE @mer2 VARCHAR(MAX) 
SET @mer2 = 'GEOMETRY::STGeomFromText(''POLYGON ((2434259 687278, 2434259 701994, 2449657 701994, 2449657 687278, 2434259 687278))'', 0)'


EXEC gis.sp_tune_spatial_index 'PARCEL_ADA', 'S104_idx', 2, 8, @ada, @mer1 
GO

NOTE: Obviously, rebuilding a spatial index 567 times will take a long time. Kick it off command line or just let it run while you do other things. If it is a dataset you are going to use often and the geometries remain similar, it will be worth the time it takes to run the proc. Results table shows performance in milliseconds.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文