基于给定调查问卷答案的解决方案查找算法
考虑以下场景:
- 向潜在客户提供一份调查问卷,他们可以对每个问题不选择、选择一个或多个答案。
- 自动化算法应根据客户的答案推荐最佳解决方案。
示例:
- 有 3 个可能的解决方案 S1、S2、S3
- 调查问卷包含 10 个问题 Q1、Q2…Q10
- 每个问题包含不同数量的可能答案,其中:
- A1.1 是问题 1 的第一个答案。
- A3.2 是问题 3 的第二个答案。
- 我希望能够根据客户提供的答案对以下解决方案进行建模:
- A1.1、A1.3、A2.1、A3.2=> S1
- A1.1、A1.3、A2.2 => S1
- A1.2=> S2
- A2.2=> S2
- A1.1、A3.1、A3.2 => S2
- 任何其他组合=> S3
总结:
- 对于给定的一组答案,必须推荐一个解决方案。
- 由较少数量的答案定义的解决方案应该优先于由较多数量的答案定义的解决方案。
我正在寻找解决上述问题的现有算法(和数据模型),而不是尝试从头开始编写自己的算法。
Consider the following scenario:
- Potential customers are presented with a questionnaire where they can select none, one or multiple answers for every question.
- An automated algorithm should recommend the optimal solution based on the customers' answers.
Example:
- There are 3 possible solutions S1, S2, S3
- The questionnaire contains 10 questions Q1, Q2…Q10
- Each question contains a variable number of possible answers where:
- A1.1 is the first answer for question 1.
- A3.2 is the second answer for question 3.
- I want to be able to model the following solutions based on the answers provided by the customer:
- A1.1, A1.3, A2.1, A3.2 => S1
- A1.1, A1.3, A2.2 => S1
- A1.2 => S2
- A2.2 => S2
- A1.1, A3.1, A3.2 => S2
- Any other combination => S3
In summary:
- For given set of answers a solution must be recommended.
- Solutions defined by the smaller number of answers should be preferred over the ones defined by larger number of answers.
I’m looking for an existing algorithm (and data model) for the problem presented above instead of attempting to write my own from scratch.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
最简单的尝试可能有效的是最近邻算法:计算一组新答案与已知解决方案的每组答案之间的相似性(按答案总数加权,如果这是您想要的),并且从一组同样接近的答案中提供最常选择的已知解决方案。
如果效果不佳,那么您需要某种更复杂的分类器。您应该查找决策树(及其扩展,交替决策树和随机森林)和< a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier" rel="nofollow noreferrer">贝叶斯分类器等。
您可以在机器学习或神经网络工具箱中找到其中一些内容的代码。由于您没有指定一种语言,所以我无法指出一种语言,但算法(而不是代码)在各种书籍中都有描述,例如 Hastie、Tibshirani 和 Friedman 的《统计学习的要素》。
The simplest thing to try that just might work is a nearest-neighbor algorithm: compute the similarity between a new set of answers and every set of answers with a known solution (weighting by total number of answers, if that's what you want), and offer the most-frequently-chosen known solution from the set of equally close set of answers.
If that doesn't perform well, then you want a more sophisticated classifier of some sort. You should look up decision trees (and their extensions, alternating decision trees and random forests) and Bayesian classifiers, among others.
You can find code for some of these things in machine learning or neural network toolboxes. Since you didn't specify a language, I can't point to one, but the algorithms (not code) are described in various books like The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman.
对我来说,它更像是一个声明性的逻辑程序,而不是一个组合或统计问题。只需反转您关于从给定答案中选择哪个解决方案的陈述并替换“=>”用“:-”就可以得到Prolog。
这些语句是霍纳子句,可以使用 SLD 解析 算法来解决,因为您的规则很简单。有许多与不同语言绑定的架式求解器,因此您可以选择其中一些。
To me it seems more as a declarative logical program then a combinatorial or statistical problem. Just reverse your statements about which solution to choose from given answers and replace "=>" with ":-" and you get Prolog.
These statements are Horner clauses and can be solved using SLD resolution algorithm given your rules are simple. There are many of the shelf solvers with bindings to different languages, so you can choose some of them.