混合变量(分类和数值)距离函数
我想对一组工作进行模糊聚类。 工作属性是:
- 分类:职位、文凭、技能
- 数字:薪水、经验年数
我的问题是:如何计算不同职位之间的距离职位?
例如 job1(程序员,计算机科学学士,(java ,.net,职责),1500, 3)
和 job2(测试员,计算机科学学士,(黑白盒测试),1200,1)
PS:我是数据挖掘集群的初学者,非常感谢您的帮助。
I want to fuzzy cluster a set of jobs.
Jobs Attributes are:
- Categorical: position,diploma, skills
- Numerical : salary , years of experience
My question is: how to calculate the distance between different jobs?
e.g job1(programmer,bs computer science,(java ,.net,responsibility),1500, 3)
and job2(tester,bs computer science,(black and white box testing),1200,1)
PS: I'm beginner in data mining clustering, I highly appreciate your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以以此为起点:
http://www.econ.upf.edu/~michael/stanford/maeb4 .pdf。最后很好地解释了分类数据之间的距离。
You may take this as your starting point:
http://www.econ.upf.edu/~michael/stanford/maeb4.pdf. Distance between categorical data is nicely explained at the end.
以下是几种不同聚类方法以及如何在 R 中使用它们的很好的演练:http://biocluster.ucr.edu/~tgirke/HTML_Presentations/Manuals/Clustering/clustering.pdf
一般来说,聚类离散数据与计数的使用(例如向量中的重叠)相关,或者与从计数得出的某些统计数据相关。尽管我很想讨论统计方面的问题,但我想您对算法感兴趣,所以我就到此为止。
Here is a good walk-through of several different clustering methods and how to use them in R: http://biocluster.ucr.edu/~tgirke/HTML_Presentations/Manuals/Clustering/clustering.pdf
In general, clustering for discrete data is related to either the use of counts (e.g. overlaps in vectors) or related to some statistic derived from counts. As much as I'd like to address the statistical side, I suppose you're interested in the algorithm, so I'll leave it at that.