从页面字典创建层次结构树内容
以下键:值对是“页面”和“页面内容”。
{
'section-a.html':{'contents':'section-b.html section-c.html section-d.html'},
'section-b.html':{'contents':'section-d.html section-e.html'},
'section-c.html':{'contents':'product-a.html product-b.html product-c.html product-d.html'},
'section-d.html':{'contents':'product-a.html product-c.html'},
'section-e.html':{'contents':'product-b.html product-d.html'},
'product-a.html':{'contents':''},
'product-b.html':{'contents':''},
'product-c.html':{'contents':''},
'product-d.html':{'contents':''}
}
对于任何给定的“项目”,我如何找到该项目的路径?在大多数情况下,由于我对数据结构的了解非常有限,我假设这将是一个层次结构树。如果我错了请纠正我!
更新:抱歉,我应该更清楚地了解数据和我的预期结果。
假设“page-a”是一个索引,每个“页面”实际上是网站上出现的页面,其中每个“项目”类似于出现在亚马逊、新蛋等上的产品页面。
因此,我的预期输出对于“item-d”,将是该项目的一个或多个路径。 例如(分隔符是任意的,此处用于说明): item-d 有以下路径:
page-a > page-b > page-e > item-d
page-a > page-c > item-d
UPDATE2:更新了我原来的dict
以提供更准确和真实的数据。添加“.html”以进行澄清。
The following key:value pairs are 'page' and 'page contents'.
{
'section-a.html':{'contents':'section-b.html section-c.html section-d.html'},
'section-b.html':{'contents':'section-d.html section-e.html'},
'section-c.html':{'contents':'product-a.html product-b.html product-c.html product-d.html'},
'section-d.html':{'contents':'product-a.html product-c.html'},
'section-e.html':{'contents':'product-b.html product-d.html'},
'product-a.html':{'contents':''},
'product-b.html':{'contents':''},
'product-c.html':{'contents':''},
'product-d.html':{'contents':''}
}
For any given 'item' how could I find the path(s) to said item? With my very limited knowledge of data structures in most cases, I'm assuming this would be a hierarchy tree. Please correct me if I'm wrong!
UPDATE: My apologies, I should have been more clear about the data and my expected outcome.
Assuming 'page-a' is an index, each 'page' is literally a page appearing on a website, where as each 'item' is something like a product page that would appear on Amazon, Newegg, etc.
Thus, my expected output for 'item-d' would be a path (or paths) to that item.
For example (delimiter is arbitrary, for illustration here):
item-d has the following paths:
page-a > page-b > page-e > item-d
page-a > page-c > item-d
UPDATE2: Updated my original dict
to provide more accurate and real data. '.html' added for clarification.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一个简单的方法——它是 O(N 平方),所以,并不是那么高度可扩展,但是对于合理的书籍大小来说,它会很好地为您服务(如果您有数百万页,您需要考虑一个非常好的方法)不同且不太简单的方法;-)。
首先,制作一个更可用的字典,将页面映射到内容集:例如,如果原始字典是
d
,则制作另一个字典mud
为:然后,进行字典映射每个页面到其父页面:
在这里,我使用父页面列表(集合也可以),但是对于像您的示例中那样具有 0 或 1 个父页面的页面也可以 - 您只需使用空列表表示“没有父项”,否则是一个将父项作为唯一项的列表。这应该是一个非循环有向图(如果您有疑问,当然可以检查,但我跳过该检查)。
现在,给定一个页面,找到从其父级到无父级父级(“根页面”)的路径只需要“遍历”父级字典即可。例如,在 0/1 父案例中:
如果您可以更好地阐明您的规格(每本书的页数范围、每页的父项数量等),则毫无疑问可以改进此代码,但作为开始我希望它能有所帮助。
编辑:正如OP澄清的那样,带有>的情况1 个父级(因此,多个路径)确实很有趣,让我展示如何处理这个问题:
当然,您可以
yield
每个路径,而不是print
ing当它到达根时(使函数体成为生成器),或者以您需要的任何方式处理它。再次编辑:评论者担心图表中的循环。如果这种担心是有道理的,那么跟踪路径中已经看到的节点并检测和警告任何循环并不困难。最快的方法是在每个代表部分路径的列表旁边保留一个集合(我们需要列表进行排序,但检查集合中的成员资格是 O(1) ,而检查列表中的 O(N) ):
为了清楚起见,打包可能是值得的list 和 set 表示具有合适方法的小型实用程序类 Path 的部分路径。
Here's a simple approach -- it's O(N squared), so, not all that highly scalable, but will serve you well for a reasonable book size (if you have, say, millions of pages, you need to be thinking about a very different and less simple approach;-).
First, make a more usable dict, mapping page to set of contents: e.g., if the original dict is
d
, make another dictmud
as:Then, make the dict mapping each page to its parent pages:
Here, I'm using lists of parent pages (sets would be fine too), but that's OK for pages with 0 or 1 parents as in your example, too -- you'll just be using an empty list to mean "no parent", else a list with the parent as the one and only item. This should be an acyclic directed graph (if you're in doubt, you can check, of course, but I'm skipping that check).
Now, given a page, finding the paths up its parent(s) to a parentless-parent ("root page") simply require "walking" the
parent
dict. E.g., in the 0/1 parent case:If you can clarify your specs better (ranges of number of pages per book, number of parents per page, and so on), this code can no doubt be refined, but as a start I hope it can help.
Edit: as the OP clarified that cases with > 1 parent (and so, multiple paths) are indeed of interest, let me show how do deal with that:
Of course, instead of
print
ing, you canyield
each path when it reaches a root (making the function whose body this is into a generator), or otherwise treat it in whatever way you require.Edit again: a commenter is worried about cycles in the graph. If that worry's warranted, it's not hard to keep track of nodes already seen in a path and detect and warn about any cycles. Fastest is to keep a set alongside each list representing a partial path (we need the list for ordering, but checking for membership is O(1) in sets vs O(N) in lists):
It's probably worthwhile, for clarity, packing the list and set representing a partial path into a small utility class Path with suitable methods.
这是您的问题的说明。当你有图片时,就更容易推理图表。
首先,缩写数据:
结果:
转换为 graphviz 的格式:
结果:
绘制图表:
Here's an illustration for your question. It is easier to reason about graphs when you have a picture.
First, abbreviate the data:
Result:
Convert to graphviz's format:
Result:
Plot the graph:
编辑随着问题得到更好的解释,我认为以下内容可能是您所需要的,或者至少可以提供一些起点。
旧
我真的不知道你期望看到什么,但也许类似
这会起作用。
如果你使用稍微一点的话,会更容易,而且我认为更正确
不同的数据结构:
那么你就不需要拆分。
鉴于最后一种情况,它甚至可以表达得更短:
甚至可以更短,删除空列表:
那应该是一行,但我使用 \ 作为换行符,以便可以读取它
没有滚动条。
EDIT With the question explained a bit better I think the following might be what you need, or could at least provide something of a starting point.
OLD
I don't really know what you expect to see, but maybe something like
this will work.
It would be easier, and I think more correct, if you'd use a slightly
different data structure:
Then you wouldn't need to split.
Given that last case, it can even be expressed a bit shorter:
And even shorter, with the empty lists removed:
That should be a single line, but I used \ as line break indicator so it can be read
without scrollbars.