如何组织数据以进行多级建模 - 决策树、分类或回归
我有三个表 - 销售经理、客户和订单。每个销售经理有多个客户,每个客户可以有多个订单。
我有兴趣确定销售经理的某些属性和客户的属性是否会导致特定产品的销售(假设产品 A 是/否)。
假设我有 3 个销售经理、10 个客户和 20 个订单。
我应该将数据集构造为 3 行、10 行还是 20 行。请指教。
另外,决策树和分类算法会自动理解经理、客户和订单之间的层次关系吗?
谢谢。
I have three tables - Sales Manager, Customer, and Order. Each sales manager has multiple customers, and each customer can have multiple orders.
I am interested in determining if certain attributes of sales manager and attributes of customer will lead to sales of a particular product (Let's say Product A Yes/no).
Suppose I have 3 sales managers, 10 customers, and 20 orders.
Should I structure the data set to have 3 rows, 10 rows or 20 rows. Please advise.
Also, will the decision tree, and classification algorithm automatically understand the hierarchical relationships among manager, customer and order?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为你应该从中制作一个大的特征矩阵。假设您有表
Sales Manager (id attr_1 ... attr_m)
客户(id attr_1 ... attr_n sales_manager_id)
Order (id Product_id_1 ... Product_id_l customer_id)
那么按以下形式创建矩阵很可能是合理的
矩阵:
Product_id order_attr_1 ... order_attr_l customer_attr_1 ... customer_attr_n ... manager_attr_1 ... manager_attr_m
现在您有 20*l 行矩阵,其中包含针对特定顺序给出的所有属性。
在最简单的形式中,您可以使用以下矩阵进行分类。如果属性太多,也许首先使用 PCA 是合理的。也许您应该尝试使用 Weka 看看结果如何。
考虑到您关于层次关系的问题,那么分类算法将无法明确地理解它们。
我在这里推荐这本书:数据挖掘简介,因为它回答了您的大部分问题。
I think you should make one big feature matrix out of it. Suppose you have tables
Sales Manager (id attr_1 ... attr_m)
Customer (id attr_1 ... attr_n sales_manager_id)
Order (id product_id_1 ... product_id_l customer_id)
Then it is most probably reasonable to create the matrix in the following form
Matrix:
product_id order_attr_1 ... order_attr_l customer_attr_1 ... customer_attr_n ... manager_attr_1 ... manager_attr_m
Now you have 20*l row matrix with all the attributes that are given for certain order.
In the simplest form you can use the following matrix for classification. In case of too many attributes maybe it is reasonable to use PCA first. Maybe you should try to use Weka and see, what turns out.
Considering your question about the hierarchical relations, then the classification algorithms will not understand them explicitly.
I would recommend this book here: Introduction to Data Mining, as it answers most of your questions.