我应该使用哪种分层模型?邻接、嵌套还是枚举?
我有一个表,其中包含世界上所有地理位置及其关系的位置。
这是一个显示层次结构的示例。您将看到数据实际上存储为所有三个
- 枚举路径
- 邻接列表
- 嵌套集
数据显然也永远不会改变。下面是英国布莱顿位置的直系祖先示例,该位置的 woeid 为 13911。
表:geoplanet_places
(有 560 万行) 大图: http://chrisacky.com/ancestors.jpg
然后我有另一个名为 实体
。该表存储我想要映射到地理位置的项目。我存储了一些基本信息,但最重要的是我存储了 woeid
,它是来自 geoplanet_places
的外键。
最终,entities
表将包含数千个实体。我想要一种能够返回包含实体的所有节点的完整树的方法。
我计划创建一些东西来促进根据地理位置过滤和搜索实体,并能够发现在该特定节点上可以找到多少实体。
因此,如果我的 entities
表中只有一个实体,我可能会有这样的内容
`地球(1)
英国 (1)
英格兰 (1)
东萨塞克斯 (1)
布莱顿和霍夫市 (1)
布莱顿 (1)`
假设我有另一个位于德文郡的实体,那么它会显示如下内容:
地球(2)
联合王国 (2)
英格兰(2)
德文郡 (1)
东萨塞克斯 (1) ...等等
(计数)将说明每个地理位置“内部”有多少实体不需要实时存在。我可以忍受每小时生成我的对象并缓存它。
目的是能够创建一个界面,该界面可能一开始仅显示拥有实体的国家/地区。
例如
阿根廷 (1021)
、智利 (291)
、 ...
, 美国 (32,103)
, 英国 (12,338)
然后用户将点击一个位置,例如英国,然后将获得所有直接子节点,这些子节点是英国的后代并且其中有一个实体。
如果英国有 32 个县,但当您向下钻取时最终只有 23 个县存储了实体,那么我不想显示其他 9 个县。它只是位置。
该网站恰当地演示了我希望实现的功能: http://www.homeaway.com/vacation-rentals/europe/r5
您建议我如何管理这样的数据结构?
我正在使用的东西。
- PHP
- MySQL
- Solr
我计划尽可能快地进行深入研究。我想创建一个无缝搜索的 AJAX 界面。
我也有兴趣知道您建议在哪些列上建立索引。
I have a table which contains a location of all geographical locations in the world and their relationships.
Here is a example that shows the hierarchy. You will see that the data is actually stored as all three
- Enumerated Path
- Adjacency list
- Nested Set
The data obviously never changes either. Below is an example of direct ancestors of the location Brighton in England which has a woeid of 13911.
Table: geoplanet_places
(Has 5.6million rows)
Large Image: http://chrisacky.com/ancestors.jpg
I then have another table called entities
. This table stores my items which I would like to map to a geographical location. I store some basic information but most important I store the woeid
which is a foreign key from geoplanet_places
.
Eventually the entities
table will contain several thousand entities. And I would like a way to be able to return a full tree of all of the nodes which contain entities.
I plan on creating something to facilitate the filtering and searching of entities based on their geographical location and be able to discover how many entities can be found on that particular node.
So if I only have one entity in my entities
table, I might have something like this
`Earth (1)
United Kingdom (1)
England (1)
East Sussex (1)
Brighton and Hove City (1)
Brighton (1)`
Lets then say that I have another entity which is located in Devon, then it would show something like:
Earth (2)
United Kingom (2)
England (2)
Devon (1)
East Sussex (1)
... etc
The (Counts) which will say how many entities are "inside" of each geographical location do not need to be live. I can live with generating my object every hour and caching it.
The aim, is to be able to create an interface which might start out showing only the Countries which have entities..
So like
Argentina (1021)
, Chile (291)
, ...
, United States (32,103)
, United Kingdom (12,338)
Then the user will click on a location, such as United Kindom, and will then be given all of the immediate child nodes which are descendants of United Kingdom AND have an entity in them.
If there are 32 Counties in United Kindgdom, but only 23 of them eventually when you drill down have entities stored in them, then I don't want to display the other 9. It is only locations.
This site aptly demonstrates the functionality that I wish to achieve:
http://www.homeaway.com/vacation-rentals/europe/r5
How do you recommend that I manage such a data structure?
Things I am using.
- PHP
- MySQL
- Solr
I plan on having the Drill downs be as rapid as possible. I want to create an AJAX interface that will be seemless for searching.
I would also be interested to know which columns you would recommend indexing on.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
通常,层次结构中存在三种查询会导致麻烦:
这是一个小表,显示了 MySQL 中不同方法的性能:
在
children
中,poor/excellent
意味着答案取决于您是否将该方法与邻接表混合,即在每条记录中存储parentID
。对于您的任务,您需要所有三个查询:
我会去对于物化路径,因为这种等级制度很少改变(仅在战争、叛乱等情况下)。
创建一个名为
path
的 varchar 列,对其进行索引并使用如下值填充它:其中数字是相应父级的主键,按正确的顺序(欧洲为
1
,234
(英国等))您还需要一个名为
levels
的表来保存从1
到20
的数字(或任何您想要的最大嵌套级别)。选择所有祖先:
选择所有子代以及其中的位置计数:
Typically, there are three kinds of queries in the hierarchies which cause troubles:
Here's a little table which shows the performance of different methods in
MySQL
:In
children
,poor/excellent
means that the answer depends on whether you are mixing the method with adjacency list, i. e. storing theparentID
in each record.For your task, you need all three queries:
I would go for materialized paths, since this kind of hierarchy rarely changes (only in case of war, revolt etc).
Create a varchar column called
path
, index it and fill it with the value like this:where the numbers are primary keys of the appropriate parents, in correct order (
1
for Europe,234
for UK etc.)You will also need a table called
levels
to keep numbers from1
to20
(or whatever maximum nesting level you want).To select all ancestors:
To select all children and counts of places within them:
这是我提出的问题。它是对你的建议的改编,Quassnoi。
这返回了布莱顿的所有父母。
您的查询的问题是它没有返回父级的路径,而是返回共享相同路径的任何节点。
This is the query that I came up. It is an adaption of what you suggestion Quassnoi.
This returns all of the parents of Brighton.
The problem with your query was that it wasn't return the path to parents, but instead any node which shared the same path.