模糊 C 均值算法的简单/实用示例

发布于 2024-08-07 13:33:47 字数 295 浏览 11 评论 0原文

我正在写关于动态击键身份验证主题的硕士论文。为了支持正在进行的研究,我正在编写代码来测试不同的特征提取和特征匹配方法。

我当前的简单方法只是检查参考密码键码是否与当前输入的键码相匹配,并检查按键时间(停留)和按键时间(飞行)是否与参考时间相同+/- 100ms(容差) )。这当然是非常有限的,我想通过某种模糊 c 均值模式匹配来扩展它。

对于每个键,其特征如下:键码、停留时间、飞行时间(第一个飞行时间始终为 0)。

显然,键码可以从模糊算法中取出,因为它们必须完全相同。 在这种情况下,模糊 C 均值的实际实现会是什么样子?

I am writing my master thesis on the subject of dynamic keystroke authentication. To support ongoing research, I am writing code to test out different methods of feature extraction and feature matching.

My current simple approach just checks if the reference password keycodes matches the currently typed in keycodes and also checks if the keypress times (dwell) and the key-to-key times (flight) are the same as reference times +/- 100ms (tolerance). This is of course very limited and I want to extend it with some sort of fuzzy c-means pattern matching.

For each key the features look like: keycode, dwelltime, flighttime (first flighttime is always 0).

Obviously the keycodes can be taken out of the fuzzy algorithm because they have to be exactly the same.
In this context, how would a practical implementation of fuzzy c-means look like?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

稀香 2024-08-14 13:33:47

一般来说,您会执行以下操作:

  1. 确定您想要多少个集群(2?“真实”和“假”?)
  2. 确定您想要集群哪些元素(单个击键?登录尝试?)
  3. 确定您的特征向量是什么样子(停留时间、飞行时间?)
  4. 确定您将使用什么距离度量(您将如何测量每个样本与每个集群的距离?)
  5. 为每个集群类型创建示例训练数据(真实的登录是什么样子?)
  6. 运行对训练数据进行 FCM 算法以生成聚类
  7. 要为任何给定的登录尝试样本创建隶属度向量,请使用您在步骤 6 中找到的聚类通过 FCM 算法运行该算法
  8. 使用生成的隶属度向量来确定(基于某些阈值标准)登录尝试是否真实

我不是专家,但这似乎是确定登录尝试是否真实的奇怪方法。我见过 FCM 用于模式识别(例如,我正在做出哪种面部表情?),这是有道理的,因为您正在处理具有定义特征的多个类别(例如,快乐、悲伤、愤怒等...)。就您而言,您实际上只有一个具有定义特征的类别(真实的)。非真实击键只是“不像”真实击键,因此它们不会聚集。

也许我错过了什么?

Generally, you would do the following:

  1. Determine how many clusters you want (2? "Authentic" and "Fake"?)
  2. Determine what elements you want to cluster (individual keystrokes? login attempts?)
  3. Determine what your feature vectors will look like (dwell time, flight time?)
  4. Determine what distance metric you will be using (how will you measure the distance of each sample from each cluster?)
  5. Create exemplar training data for each cluster type (what does an authentic login look like?)
  6. Run the FCM algorithm on the training data to generate the clusters
  7. To create the membership vector for any given login attempt sample, run it through the FCM algorithm using the clusters you found in step 6
  8. Use the resulting membership vector to determine (based on some threshold criteria) whether the login attempt is authentic

I'm not an expert, but this seems like an odd approach to determining whether a login attempt is authentic or not. I've seen FCM used for pattern recognition (eg. which facial expression am I making?), which makes sense because you're dealing with several categories (eg. happy, sad, angry, etc...) with defining characteristics. In your case, you really only have one category (authentic) with defining characteristics. Non-authentic keystrokes are simply "not like" authentic keystrokes, so they won't cluster.

Perhaps I am missing something?

野の 2024-08-14 13:33:47

我不认为你真的想在这里进行聚类。您可能想要进行一些适当的模糊匹配,而不是只允许每个值存在一些增量。

对于聚类,您需要有许多数据点。此外,您需要知道您需要的适当数量的资金。

但这些多个对象意味着什么呢?每个键码都有一个数据点。您不想让用户输入 100 次密码来看看他是否能始终如一地输入密码。即便如此,您期望这些集群是什么样的?您已经知道哪个键码出现在哪个位置,您不想知道用户使用什么键码作为密码...

抱歉,我在这里确实没有看到任何聚类。 “模糊”一词似乎误导了您对这种聚类算法的认识。尝试“模糊逻辑”。

I don't think you really want to do clustering here. You might want to do some proper fuzzy matching though instead of just allowing some delta on each value.

For clustering, you need to have many data points. Additionally, you'd need to know the proper number of means you need.

But what are these multiple objects meant to be? You have one data point for every keycode. You don't want to have the user type the password 100 times to see if he can do it consistently. And even then, what do you expect the clusters to be? You already know which keycode comes at which position, you don't want to find out what keycodes the user use for his password...

Sorry, I really don't see any clustering here. The term "fuzzy" seems to have mislead you to this clustering algorithm. Try "fuzzy logic" instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文