聚类分析按什么顺序进行?
首先从数据库中找到最小频繁模式。
然后将它们分为各种数据类型,如基于区间、二元、序数变量等,并为所有变量定义各种距离度量。
最后应用聚类分析方法。
这个序列是正确的还是我遗漏了什么?
First find the minimum frequent patterns from the database.
Then divide them into various data types like interval based , binary ,ordinal variables etc and define various distance measures for all the variables.
Finally apply cluster analysis method.
Is this sequence right or am i missing something?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你是否正确取决于你想做什么。您描述的一般方法似乎朝着正确的方向发展,但您永远不会知道您是否达到目标,直到您回答以下问题:
根据您的描述,在我看来您想要执行“预处理”步骤,例如特征选择和矢量化。不幸的是,这本身就非常具有挑战性。例如,最大的具体问题之一是距离函数的设计(有大量的研究可用)。
因此,请向我们提供有关您的具体目标应用的更多信息。
whether you're right or not depends on what you want to do. The general approach that you describe seems to go into the right direction, but you'll never know if your on target until you answer the following questions:
From what you describe it seems to me that you want to do 'preprocessing' steps like feature selection and vectorization. Unfortunately, this by itself can be quite challenging. For example, one of the biggest partical problems is the design of a distance function (there's a tremendous amount of research available).
So, please give us more information on your specific target application.