我应该使用哪种分层模型?邻接、嵌套还是枚举?

发布于 2024-10-15 05:01:54 字数 2030 浏览 1 评论 0原文

我有一个表,其中包含世界上所有地理位置及其关系的位置。

这是一个显示层次结构的示例。您将看到数据实际上存储为所有三个

  • 枚举路径
  • 邻接列表
  • 嵌套集

数据显然也永远不会改变。下面是英国布莱顿位置的直系祖先示例,该位置的 woeid 为 13911。

表:geoplanet_places(有 560 万行) 祖先 大图: http://chrisacky.com/ancestors.jpg

然后我有另一个名为 实体。该表存储我想要映射到地理位置的项目。我存储了一些基本信息,但最重要的是我存储了 woeid,它是来自 geoplanet_places 的外键。 在此处输入图像描述

最终,entities 表将包含数千个实体。我想要一种能够返回包含实体的所有节点的完整树的方法。

我计划创建一些东西来促进根据地理位置过滤和搜索实体,并能够发现在该特定节点上可以找到多少实体。

因此,如果我的 entities 表中只有一个实体,我可能会有这样的内容

`地球(1)

英国 (1)

英格兰 (1)

东萨塞克斯 (1)

布莱顿和霍夫市 (1)

布莱顿 (1)`

假设我有另一个位于德文郡的实体,那么它会显示如下内容:

地球(2)

联合王国 (2)

英格兰(2)

德文郡 (1)

东萨塞克斯 (1) ...等等

(计数)将说明每个地理位置“内部”有多少实体不需要实时存在。我可以忍受每小时生成我的对象并缓存它。

目的是能够创建一个界面,该界面可能一开始仅显示拥有实体的国家/地区。

例如

阿根廷 (1021)智利 (291)..., 美国 (32,103), 英国 (12,338)

然后用户将点击一个位置,例如英国,然后将获得所有直接子节点,这些子节点是英国的后代并且其中有一个实体。

如果英国有 32 个县,但当您向下钻取时最终只有 23 个县存储了实体,那么我不想显示其他 9 个县。它只是位置。

该网站恰当地演示了我希望实现的功能: http://www.homeaway.com/vacation-rentals/europe/r5 在此处输入图像描述

您建议我如何管理这样的数据结构?

我正在使用的东西。

  • PHP
  • MySQL
  • Solr

我计划尽可能快地进行深入研究。我想创建一个无缝搜索的 AJAX 界面。

我也有兴趣知道您建议在哪些列上建立索引。

I have a table which contains a location of all geographical locations in the world and their relationships.

Here is a example that shows the hierarchy. You will see that the data is actually stored as all three

  • Enumerated Path
  • Adjacency list
  • Nested Set

The data obviously never changes either. Below is an example of direct ancestors of the location Brighton in England which has a woeid of 13911.

Table: geoplanet_places (Has 5.6million rows)
Ancestors
Large Image: http://chrisacky.com/ancestors.jpg

I then have another table called entities. This table stores my items which I would like to map to a geographical location. I store some basic information but most important I store the woeid which is a foreign key from geoplanet_places.
enter image description here

Eventually the entities table will contain several thousand entities. And I would like a way to be able to return a full tree of all of the nodes which contain entities.

I plan on creating something to facilitate the filtering and searching of entities based on their geographical location and be able to discover how many entities can be found on that particular node.

So if I only have one entity in my entities table, I might have something like this

`Earth (1)

United Kingdom (1)

England (1)

East Sussex (1)

Brighton and Hove City (1)

Brighton (1)`

Lets then say that I have another entity which is located in Devon, then it would show something like:

Earth (2)

United Kingom (2)

England (2)

Devon (1)

East Sussex (1)
... etc

The (Counts) which will say how many entities are "inside" of each geographical location do not need to be live. I can live with generating my object every hour and caching it.

The aim, is to be able to create an interface which might start out showing only the Countries which have entities..

So like

Argentina (1021), Chile (291), ..., United States (32,103), United Kingdom (12,338)

Then the user will click on a location, such as United Kindom, and will then be given all of the immediate child nodes which are descendants of United Kingdom AND have an entity in them.

If there are 32 Counties in United Kindgdom, but only 23 of them eventually when you drill down have entities stored in them, then I don't want to display the other 9. It is only locations.

This site aptly demonstrates the functionality that I wish to achieve:
http://www.homeaway.com/vacation-rentals/europe/r5
enter image description here

How do you recommend that I manage such a data structure?

Things I am using.

  • PHP
  • MySQL
  • Solr

I plan on having the Drill downs be as rapid as possible. I want to create an AJAX interface that will be seemless for searching.

I would also be interested to know which columns you would recommend indexing on.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清风不识月 2024-10-22 05:01:54

通常,层次结构中存在三种查询会导致麻烦:

  1. 返回所有祖先
  2. 返回所有后代
  3. 返回所有子级(直接后代)。

这是一个小表,显示了 MySQL 中不同方法的性能:

                        Ancestors  Descendants  Children        Maintainability InnoDB
Adjacency list          Good       Decent       Excellent       Easy            Yes
Nested sets (classic)   Poor       Excellent    Poor/Excellent  Very hard       Yes
Nested sets (spatial)   Excellent  Very good    Poor/Excellent  Very hard       No
Materialized path       Excellent  Very good    Poor/Excellent  Hard            Yes

children 中,poor/excellent 意味着答案取决于您是否将该方法与邻接表混合,即在每条记录中存储parentID

对于您的任务,您需要所有三个查询:

  1. 所有祖先显示地球/英国/德文郡事物
  2. 所有孩子显示“欧洲目的地”(项目)
  3. 所有后代显示“欧洲目的地”(计数)

我会去对于物化路径,因为这种等级制度很少改变(仅在战争、叛乱等情况下)。

创建一个名为 path 的 varchar 列,对其进行索引并使用如下值填充它:

1:234:6345:45454:

其中数字是相应父级的主键,按正确的顺序(欧洲为 1234(英国等))

您还需要一个名为 levels 的表来保存从 120 的数字(或任何您想要的最大嵌套级别)。

选择所有祖先:

SELECT   pa.*
FROM     places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.path, ':', l.level) <> p.path
JOIN     places pa
ON       pa.path = CONCAT(SUBSTRING_INDEX(p.path, ':', l.level), ':') 
WHERE    p.id = @id_of_place_in_devon

选择所有子代以及其中的位置计数:

SELECT  pc.*, COUNT(pp.id)
FROM    places p
JOIN    places pc
ON      pc.parentId = p.id
JOIN    places pp
ON      pp.path BETWEEN pc.path AND CONCAT(pc.path, ':')
        AND pp.id NOT IN
        (
        SELECT  parentId
        FROM    places
        )
WHERE   p.id = @id_of_europe
GROUP BY
        pc.id

Typically, there are three kinds of queries in the hierarchies which cause troubles:

  1. Return all ancestors
  2. Return all descendants
  3. Return all children (immediate descendants).

Here's a little table which shows the performance of different methods in MySQL:

                        Ancestors  Descendants  Children        Maintainability InnoDB
Adjacency list          Good       Decent       Excellent       Easy            Yes
Nested sets (classic)   Poor       Excellent    Poor/Excellent  Very hard       Yes
Nested sets (spatial)   Excellent  Very good    Poor/Excellent  Very hard       No
Materialized path       Excellent  Very good    Poor/Excellent  Hard            Yes

In children, poor/excellent means that the answer depends on whether you are mixing the method with adjacency list, i. e. storing the parentID in each record.

For your task, you need all three queries:

  1. All ancestors to show the Earth / UK / Devon thing
  2. All children to show "Destinations in Europe" (the items)
  3. All descendants to show "Destinations in Europe" (the counts)

I would go for materialized paths, since this kind of hierarchy rarely changes (only in case of war, revolt etc).

Create a varchar column called path, index it and fill it with the value like this:

1:234:6345:45454:

where the numbers are primary keys of the appropriate parents, in correct order (1 for Europe, 234 for UK etc.)

You will also need a table called levels to keep numbers from 1 to 20 (or whatever maximum nesting level you want).

To select all ancestors:

SELECT   pa.*
FROM     places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.path, ':', l.level) <> p.path
JOIN     places pa
ON       pa.path = CONCAT(SUBSTRING_INDEX(p.path, ':', l.level), ':') 
WHERE    p.id = @id_of_place_in_devon

To select all children and counts of places within them:

SELECT  pc.*, COUNT(pp.id)
FROM    places p
JOIN    places pc
ON      pc.parentId = p.id
JOIN    places pp
ON      pp.path BETWEEN pc.path AND CONCAT(pc.path, ':')
        AND pp.id NOT IN
        (
        SELECT  parentId
        FROM    places
        )
WHERE   p.id = @id_of_europe
GROUP BY
        pc.id
梦回梦里 2024-10-22 05:01:54

这是我提出的问题。它是对你的建议的改编,Quassnoi。

SELECT   pa.*,  level, SUBSTRING_INDEX(p.ancestry, '/', l.level),  p.*
FROM     geoplanet_places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.ancestry, '/', l.level) <> p.ancestry 
JOIN     geoplanet_places  pa
ON       pa.woeid =  SUBSTRING_INDEX( SUBSTRING_INDEX(p.ancestry, '/', l.level),'/',-1)
WHERE    p.woeid = "13911"

这返回了布莱顿的所有父母。

您的查询的问题是它没有返回父级的路径,而是返回共享相同路径的任何节点。

SELECT     pa.*, GROUP_CONCAT(pa.name ORDER BY pa.lft asc),group_concat( pa.lft  ), pa.ancestry
                                            FROM     geo_places p
                                            JOIN     levels l
                                            ON       SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level) <> p.ancestry 
                                            JOIN     geo_places  pa
                                            ON       pa.woeid =  SUBSTRING_INDEX( SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level),'/',-1)
                                            WHERE    p.woeid IN ("12767488","12832668","12844837","131390","131391","12846428","24534461")
                                            GROUP BY p.woeid

This is the query that I came up. It is an adaption of what you suggestion Quassnoi.

SELECT   pa.*,  level, SUBSTRING_INDEX(p.ancestry, '/', l.level),  p.*
FROM     geoplanet_places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.ancestry, '/', l.level) <> p.ancestry 
JOIN     geoplanet_places  pa
ON       pa.woeid =  SUBSTRING_INDEX( SUBSTRING_INDEX(p.ancestry, '/', l.level),'/',-1)
WHERE    p.woeid = "13911"

This returns all of the parents of Brighton.

The problem with your query was that it wasn't return the path to parents, but instead any node which shared the same path.

SELECT     pa.*, GROUP_CONCAT(pa.name ORDER BY pa.lft asc),group_concat( pa.lft  ), pa.ancestry
                                            FROM     geo_places p
                                            JOIN     levels l
                                            ON       SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level) <> p.ancestry 
                                            JOIN     geo_places  pa
                                            ON       pa.woeid =  SUBSTRING_INDEX( SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level),'/',-1)
                                            WHERE    p.woeid IN ("12767488","12832668","12844837","131390","131391","12846428","24534461")
                                            GROUP BY p.woeid
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文