“聚类分析”与MySQL
这是一项艰难的任务。这个问题可能有一个名字,但我不知道,所以我会准确地描述这个问题。
我有一个数据集,其中包含许多用户提交的值。我需要能够基于某种平均值或更好的“数据接近度”来确定哪个值是正确的值。例如,如果我收到来自三个用户 4、10、3 的以下三个提交,我就会知道在这种情况下 3 或 4 将是“正确”值。如果我将其平均,我会得到 5.6,这不是预期的结果。
我正在尝试使用 MySQL 和 PHP 来完成此操作。
tl;dr 需要根据相对值的“接近度”从数据集中找到一个值(使用 MySQL/PHP)
谢谢!
This is a tough one. There is probably a name for this and I don't know it, so I'll describe the problem exactly.
I have a dataset including a number of user-submitted values. I need to be able to determine based on some sort of average, or better, a "closeness of data", which value is the correct value. For example, if I received the following three submissions from three users, 4, 10, 3, I would know that 3 or 4 would be the "correct" value in this case. If I were to average it out, I'd get 5.6 which is not the intended result.
I'm attempting to do this using MySQL and PHP.
tl;dr Need to find a value from a dataset based on "closeness" of relative values (using MySQL/PHP)
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用数据库的集群不会是单一查询类型的过程。需要迭代才能有效地生成簇。
您首先需要决定需要多少个集群。如果您只想要一个集群,那么显然所有内容都会放入其中。如果您想要两个,那么您可以编写程序使用某种相关性度量将节点分成两组。
换句话说,我不认为这是一个 MySQL 问题,而是一个集群问题。
Clustering using a database isn't going to be a single query type of procedure. It takes iterations to generate the clusters effectively.
You first need to decide how many clusters you want. If you wanted only one cluster, then obviously everything would go into it. If you want two, then you can write your program to separate the nodes into two groups using some sort of correlation metric.
In other words, I don't think this is a MySQL question so much as a clustering question.
我认为这就是您正在寻找的东西:
例如,如果您的数据集包含以下 ID:3、4、10,平均值为 5.6667。与 5.6667 最接近的值为 4。如果您的数据集为 3、6、10、14,平均值为 8.25,则最接近的值为 10。
这就是此查询返回的值。希望有帮助。
I think that is the kind of thing you're looking for:
Per example, if your data set contains the following IDs: 3, 4, 10, with an average of 5.6667. The closest value to 5.6667 is 4. If your data set is 3, 6, 10, 14, with an average of 8.25, the clostest value is 10.
This is what this query returns. Hope it helps.
我的印象是您正在寻找中位数
例如,在列表 1 2 3 4 100 中,中位数(中心值)是 3。
您可能需要搜索 [https://stackoverflow.com/search?q=sql+median 查找 SQL 中的中位数]。
I have the impression you are looking for the median
E.g. in the list 1 2 3 4 100, the median (central value) is 3.
You may want to search for [https://stackoverflow.com/search?q=sql+median finding the median in SQL].