指纹树生成

发布于 2024-11-15 14:59:41 字数 345 浏览 2 评论 0 原文

有一群人(假设有 1874 人),都代表世界上不同的公司(假设有 236 家)。我的任务是最好地确定每个人在哪家公司工作。诀窍在于,我不能简单地问一个人“你在哪里工作”并得到答案,但我拥有的是一份包含许多问题的调查问卷(假设有 290 个问题)以及我应该期望员工得到的确切答案每个公司的。有些公司可能会有相同的答案,所以最后,即使我不能准确确定一个人在哪家公司工作,我也应该能够缩小范围并说他/她一定在其中一家公司工作。

使用多值地图和一些其他数据结构,我已经通过 1 个问题 [查询] 确定了我可以识别的所有公司。使用这些查询来表示树数据结构的根,我需要使用其他查询/问题作为分支来构建树的其余部分来识别其余部分。

有什么建议/帮助/建议吗?

There is group of people [let's say 1874 of them], all representing different companies [lets say 236 of them] in the world. My task is to best identify what company each person works for. The trick is that I cannot simply ask a person "Where do you work for" and get the answer, but what I do have is a questionnaire with a number of question [lets say 290 questions] and the exact responses I should expect for employees of each company. Some companies might have identical answers, so at the end, even if I can not determine exactly what company a person works for, I should be able to narrow it down and say that he/she must work for one of these companies.

Using multi-value maps, and some other data structures, I've gone as far as determining all the companies that I can identify with 1 question [query]. Using these queries to represent the root of a tree data-structure, I need to build out the rest of the tree using other queries/questions as branches to identify the rest.

Any advice/help/suggestion ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

沙与沫 2024-11-22 14:59:41

根据您在评论中的回答,我觉得您也可以让树的每个级别代表一个问题,并让该级别上的节点的分支/子节点代表答案。正如 btilly 所提到的,从技术上讲,这将是一个特里树。

更有效(尽管不一定在空间上)的解决方案可能涉及使用哈希表和哈希函数,该函数作用于答案选择来创建其哈希,但我认为考虑到您的要求和要求,特里树是最好的方法不在乎。

哦,对了:根据答案选择的布局方式,您可能在特定分支上有一系列答案,其中某些级别没有任何子分支/树;在这种情况下,您可能会将这些单一分支部分折叠成单独的节点。 http://en.wikipedia.org/wiki/Trie#compressing_tries 也可能提供一些尖端。


根据您对我最初答案的回复,这是我的想法:

为问题及其答案选择保留一个节点数组,每个答案选择都与哈希表(或您希望使用的任何数据结构)相关联;我建议哈希表(由于大量使用 Python 并使用 Python 的 set 数据结构(它作为哈希表的一种类型实现))包含指向每个公司的指针,或者包含指向单个公司的指针(如果给定问题的给出答案将表明公司开始 和。

当您第一次检查特定问题的答案时,并且有多个公司与该答案选择相关联,请将第一个答案的哈希表中的数据临时复制为链接列表或其他内容。当回答更多问题时,根据每个新答案的哈希表检查列表的元素,并从列表中删除每个新答案的哈希表中不存在的公司。重复提问过程,直到 1) 列表中只剩下一家公司,2) 列表中没有公司留下,或 3) 您已问完所有问题。

如果是 1),那就是提问者的雇主。
如果是 2),则该员工不受任何要检查的公司雇用,和/或某处存在错误。
如果是3),则链表中剩余的公司就是问答者可能受雇的公司。

可能有一种更有效的方法来做到这一点,因为我的实现至少需要 580 个哈希表(每个答案一个,每个问题至少 2 个答案),但我现在真的想不出任何东西。

Based on your answer in the comments, I feel that you may as well just have each level of your tree represent a question, and the branches/subnodes of the nodes on that level representing the answers. This would technically be a trie, as mentioned by btilly.

A more efficient (though not necessarily space-wise) solution would possibly involve using a hashtable and a hash function that acts on the answer choices to create its hash, but I think a trie is the best way to go given your requirements and the don't-cares.

Oh, right: depending on how the answer choices are laid out, it's possible you may have a series of answers on particular branches where there aren't any sub-branches/trees for a few levels; in such a case, you could potentially collapse those singular branch sections into individual nodes. http://en.wikipedia.org/wiki/Trie#Compressing_tries might also provide some tips.


Based on your response to my initial answer, here's my idea:

Keep an array of nodes for the questions and their answer choices, with each answer choice being associated with a hash table (or whatever data structure you'd wish to use; I suggested a hash table due to using Python a lot and being used to Python's set data structure, which is implemented as a type of hash table) containing pointers to each company, or a pointer to a single company if a given answer for a given question will indicate the company to begin with.

The first time you check an answer to a specific question, and there are multiple companies associated with that answer choice, make a temporary copy of the data in that first answer's hash table as a linked list or something. As more questions are answered, check the elements of the list against the hash table of each new answer, and remove companies that are not present in each new answer's hash table from the list. Repeat the question-asking process until 1) only one company is left in the list, 2) no companies are left in the list, or 3) you've asked all the questions.

If 1), that is the question-answerer's employer.
If 2), the employee isn't employed by any of the companies to check for, and/or there's an error somewhere.
If 3), the companies remaining in the linked list are the possible companies that question-answerer is employed by.

There's probably a more efficient method of doing this, as my implementation would require a minimum of 580 hash tables (one for each answer, with a minimum of 2 answers per question), but I can't really think of anything right now.

树深时见影 2024-11-22 14:59:41

从根开始递归构建树。在每个步骤中,您都会有一组有效的调查问卷,最初将是所有调查问卷。从活跃的调查问卷中,选择一个问题,其中“是”的答案与“否”的答案一样多。为这个问题建立一个树节点。使用对您在此节点选择的问题回答“是”的子集调查问卷,创建“是”子树(递归地)。还可以使用对您选择的问题回答“否”的调查问卷子集创建“否”子树。

简单的例子:

假设我们试图猜测动物,并且我们有来自熊、斑马、鲑鱼和鳄鱼的调查问卷。

我们查看调查问卷,发现大约一半的人对“你是哺乳动物吗?”回答“是”,所以我们将其作为树的根。

现在我们只接受对这个问题回答“是”的调查问卷。在我们的例子中,它们是来自熊和斑马的。我们选择问题“你有条纹吗?”,因为大约一半人说有,一半人说没有。由于每个答案只有一份调查问卷,因此您创建了适当猜测斑马和熊的叶节点。

现在我们回溯到根节点并对“否”分支重复该过程。也就是说,我们查看鲑鱼和鳄鱼的调查问卷,并选择一个问题来区分不同的组。 “你喜欢微笑吗?”符合要求。

最终的树如下所示:

Ask: "Are you a mammal?"
 |
 +- yes -> Ask: "Do you have stripes?"
 |          |
 |          +- yes -> Guess: Zebra
 |          |
 |          +- no --> Guess: Bear
 |
 +- no --> Ask: "Do you like to smile?"
            |
            +- yes -> Guess: Crocodile
            |
            +- no --> Guess: Salmon

Build the tree recursively, starting at the root. At each step, you'll have an active set of questionnaires, which initially will be all of the questionnaires. From the active questionnaires, select a question that has about as many yes answers as no answers. Make a tree node for this question. Create a yes subtree (recursively), using the subset questionnaires that answered yes for the question you selected at this node. Also create a no subtree, using the subset of questionnaires that answered no for the question you selected.

Simple Example:

Suppose we're trying to guess the animal, and we have questionnaires from a bear, a zebra, a salmon, and a crocodile.

We look at the questionnaires and see that about half of them said "yes" to "Are you a mammal?", so we'll make that the root of the tree.

Now we take just the questionnaires that said yes to that question. In our example, they are the ones from the bear and the zebra. We select the question "Do you have stripes?", since about half of them say yes and half say no. Since there's only one questionnaire for each of those answers, you create leaf nodes that guess zebra and bear appropriately.

Now we backtrack to the root node and repeat the process for the "no" branch. That is, we look at the questionnaires for the salmon and the crocodile and select a question that distinguishes that set into separate groups. "Do you like to smile?" fits the bill.

The final tree looks like this:

Ask: "Are you a mammal?"
 |
 +- yes -> Ask: "Do you have stripes?"
 |          |
 |          +- yes -> Guess: Zebra
 |          |
 |          +- no --> Guess: Bear
 |
 +- no --> Ask: "Do you like to smile?"
            |
            +- yes -> Guess: Crocodile
            |
            +- no --> Guess: Salmon
以歌曲疗慰 2024-11-22 14:59:41

使用专家系统构建Prolog 是一种可能的解决方案。您考虑过这个选择吗?

通过这样做,您甚至可以添加一些自然语言处理功能,以简化与用户的交互。

Building an Expert system using Prolog is one possible solution. Have you considered this option ?

By doing so, you may even add some Natural Language Processing capabilities easing interaction with users.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文