For each respondent, you have 20 answers and one label, which indicates whether this respondent gets the product trial or not.
You want to know which of the 20 questions are critical to give trial or not decision. I'd suggest you first build a decision tree model on the training data. And study the tree carefully to get some insights, e.g. the low level decision nodes contain most discriminant questions.
Use association analysis to see if there are patterns in the answers.
Q3AnsC + Q8AnsB -> IsSelected
Use classification (such as logistic regression or a decision tree) to model how users are selected.
Use clustering. Are there distinct groups of respondents? In what ways are they different? Use the "elbow" or scree method to determine the number of clusters.
Do you have other info about the respondents, such as demographics? Pivot table would be good in that case.
Is there missing data? Are there patterns in the way that people skipped questions?
发布评论
评论(2)
这是一种逆向工程。
对于每位受访者,您有 20 个答案和一个标签,表明该受访者是否获得产品试用。
您想知道这 20 个问题中哪一个对于
是否进行试验
决定至关重要。我建议您首先在训练数据上构建决策树模型。仔细研究树以获得一些见解,例如低级决策节点包含大多数判别问题。It is kind of reverse engineering.
For each respondent, you have 20 answers and one label, which indicates whether this respondent gets the product trial or not.
You want to know which of the 20 questions are critical to
give trial or not
decision. I'd suggest you first build a decision tree model on the training data. And study the tree carefully to get some insights, e.g. the low level decision nodes contain most discriminant questions.出于分析目的,可以将答案设为数字,例如:
Q3AnsC + Q8AnsB -> IsSelected
使用分类(例如逻辑回归或决策树)对如何选择用户进行建模。
使用聚类。是否有不同的受访者群体?它们在哪些方面有所不同?使用“肘部”或筛子方法来确定簇的数量。
您是否有有关受访者的其他信息,例如人口统计数据?在这种情况下,数据透视表会很好。
是否有缺失数据?人们跳过问题的方式是否有规律?
The answers can be made numeric for analysis purposes, example:
Q3AnsC + Q8AnsB -> IsSelected
Use classification (such as logistic regression or a decision tree) to model how users are selected.
Use clustering. Are there distinct groups of respondents? In what ways are they different? Use the "elbow" or scree method to determine the number of clusters.
Do you have other info about the respondents, such as demographics? Pivot table would be good in that case.
Is there missing data? Are there patterns in the way that people skipped questions?