K近邻算法疑问

发布于 2024-07-15 18:54:59 字数 747 浏览 9 评论 0原文

我是人工智能新手。我了解K最近邻算法以及如何实现它。然而，如何计算不在秤上的物体的距离或重量呢？

例如，年龄的距离可以很容易计算出来，但是如何计算红色与蓝色的距离有多近呢？也许颜色是一个坏例子，因为你仍然可以说使用频率。例如汉堡、披萨、薯条怎么样？

我感觉有一个聪明的方法可以做到这一点。

预先感谢您的关注。

编辑：谢谢大家非常好的答案。这真的很有帮助，我很感激。但我想一定有一条出路。

我可以这样做吗？假设我正在使用 KNN 算法来预测一个人是否会在我提供上述所有三种食物的餐厅吃饭。当然，还有其他因素，但为了简单起见，对于最喜欢的食物领域，300 人中，150 人喜欢汉堡，100 人喜欢披萨，50 人喜欢薯条。常识告诉我，最喜欢的食物会影响人们是否吃的决定。

现在，一个人输入他/她最喜欢的食物作为汉堡，我将预测他/她是否会在我的餐厅吃饭。忽略其他因素，并根据我之前的（训练）知识库，常识告诉我，与输入披萨或薯条相比，这种特定领域最喜欢的食物的 k 最近邻居距离更近的可能性更高。

唯一的问题是我使用了概率，我可能是错的，因为我不知道并且可能无法计算实际距离。我还担心这个字段对我的预测的影响太大/太小，因为距离可能与其他因素（价格、一天中的时间、餐厅是否满员等我可以轻松量化的因素）成比例，但我我想我也许可以通过一些参数调整来解决它。

哦，大家都给出了很好的答案，但我只能接受一个。既然如此，我明天就接受得票最高的那个。再次感谢大家。

原文

I am new to Artificial Intelligence. I understand K nearest neighbour algorithm and how to implement it. However, how do you calculate the distance or weight of things that aren't on a scale?

For example, distance of age can be easily calculated, but how do you calculate how near is red to blue? Maybe colours is a bad example because you still can say use the frequency. How about a burger to pizza to fries for example?

I got a feeling there's a clever way to do this.

Thank you in advance for your kind attention.

EDIT: Thank you all for very nice answers. It really helped and I appreciate it. But I am thinking there must be a way out.

Can I do it this way? Let's say I am using my KNN algorithm to do a prediction for a person whether he/she will eat at my restaurant that serves all three of the above food. Of course, there's other factors but to keep it simple, for the field of favourite food, out of 300 people, 150 loves burger, 100 loves pizza, and 50 loves fries. Common sense tells me favourite food affect peoples' decision on whether to eat or not.

So now a person enters his/her favourite food as burger and I am going to predict whether he/she's going to eat at my restaurant. Ignoring other factors, and based on my (training) previous knowledge base, common sense tells me that there's a higher chance the k nearest neighbours' distance for this particular field favourite food is nearer as compared to if he entered pizza or fries.

The only problem with that is that I used probability, and I might be wrong because I don't know and probably can't calculate the actual distance. I also worry about this field putting too much/too little weight on my prediction because the distance probably isn't to scale with other factors (price, time of day, whether the restaurant is full, etc that I can easily quantify) but I guess I might be able to get around it with some parameter tuning.

Oh, everyone put up a great answer, but I can only accept one. In that case, I'll just accept the one with highest votes tomorrow. Thank you all once again.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑饮青盏花 2024-07-22 18:54:59

将您收集数据的所有食物表示为“维度”（或表中的列）。

记录您可以收集数据的每个人的“喜欢”，并将结果放在表格中：

          Burger  |    Pizza  |   Fries   | Burritos |  Likes my food
person1     1     |        0  |       1   |     1    |      1
person2     0     |        0  |       1   |     0    |      0
person3     1     |        1  |       0   |     1    |      1
person4     0     |        1  |       1   |     1    |      0

现在，给定一个新人，并提供有关他喜欢的一些食物的信息，您可以使用简单的方法来衡量与其他人的相似性例如皮尔逊相关系数，或余弦相似度等。

现在你有办法找到 K 个最近邻居并做出一些决定。

有关这方面的更多高级信息，请查找“协同过滤”（但我会警告你，它变得数学化）。

Represent all food for which you collect data as a "dimension" (or a column in a table).

Record "likes" for every person on whom you can collect data, and place the results in a table:

          Burger  |    Pizza  |   Fries   | Burritos |  Likes my food
person1     1     |        0  |       1   |     1    |      1
person2     0     |        0  |       1   |     0    |      0
person3     1     |        1  |       0   |     1    |      1
person4     0     |        1  |       1   |     1    |      0

Now, given a new person, with information about some of the foods he likes, you can measure similarity to other people using a simple measure such as the Pearson Correlation Coefficient, or the Cosine Similarity, etc.

Now you have a way to find K nearest neighbors and make some decision..

For more advanced information on this, look up "collaborative filtering" (but I'll warn you, it gets math-y).

回复收藏 0 原文

挥剑断情 2024-07-22 18:54:59

好吧，“最近”意味着您有一些衡量标准，可以衡量事物或多或少“遥远”。 “汉堡”、“披萨”和“薯条”的量化与其说是一个 KNN 问题，不如说是一个基本系统建模问题。如果你有一个系统，你正在做分析，其中“汉堡”、“披萨”和“薯条”是术语，那么该系统存在的原因将决定它们的量化方式——就像如果你试图弄清楚如何在给定的金额下获得最好的口味和最少的卡路里，然后你就知道你的指标是什么了。（当然，“最佳品味”是主观的，但这是另一组问题。）

这些术语不具有内在的可量化性，从而告诉您如何设计分析系统；它们是由这些术语决定的。由您决定要实现的目标并从中设计指标。

回复收藏 0 原文