如何将 MediaPipe Holistic 关键点与手语关联起来?

发布于 2025-01-10 03:26:07 字数 214 浏览 2 评论 0原文

我正在尝试制作一个手语检测应用程序。我使用 MediaPipe Holistic 来提取关键点,并将使用 LSTM 来训练模型。

MediaPipe Holistic 为每个手语手势总共生成 543 个地标(33 个姿势地标、468 个面部地标和每只手 21 个手部地标)。

现在,我的问题是,如何将 543 个地标连接到手势? 有没有办法让计算机知道它正在提取的关键点属于某个手势?

I'm trying to make a sign language detection application. I'm using MediaPipe Holistic to extract key points and will use LSTM to train the model.

MediaPipe Holistic generates a total of 543 landmarks (33 pose landmarks, 468 face landmarks, and 21 hand landmarks per hand) for each sign language gesture.

Now, my question is, how I can connect the 543 landmarks to the gesture?.
Is there a way that I can the computer that the keypoints that it is extracting belong to a certain gesture?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

合久必婚 2025-01-17 03:26:07

您问题的答案可以在Gabriel Guerin 的优秀文章 和 随附代码示例。代码示例仅查看手部标志。我几乎必须粘贴整篇文章才能完全回答这个问题,但我会给出一个高层次的概述。将地标转换为特征向量。构建一个由多个帧组成的模型,每个帧包含手部向量。使用动态时间规整 (DTW) 将给定符号与一小组已知符号进行比较。使用样本相似度阈值来提供符号预测。如果只有少量经过训练的样本可以识别,则使用此技术将有效。如果使用完整的手语词汇,它就会崩溃。对于大词汇量来说,使用分类器进行深度学习将是更好的技术。即使这样也可能会失败,因为真正的手语并不是与口语单词一一对应的符号的集合。手语具有复杂的结构,可以有不同的词序和介词,这些介词仅在手语者面对的方向上表达。我对一个能够识别更多标志的项目非常感兴趣。我相信整体模型将使之成为可能,但需要大量样本语​​料库和解释复杂语法的方法。

The answer to your question can be found in Gabriel Guerin's excellent article and accompanying code samples. The code sample only looks at the hand landmarks. I'd pretty much have to paste the whole article to answer the question completely but I'll give a high level overview. Convert the landmarks into feature vectors. Build a model consisting of several frames with each frame containing the vectors of the hand. Use Dynamic Time Warping (DTW) to compare a given sign with a small set of knows signs. Use a threshold of similarity to the samples to offer a prediction of the sign. Using this technique works will if there is only a small number of samples trained to recognize. It would break down if a full sign language vocabulary was used. Deep learning with a classifier would be a better technique for a large vocabulary. Even this would probably break down because a real sign language is not a collection of signs with a one to one correspondence with spoken word. Sign languages have complex structures that can have different word orders and prepositions that are expressed only be the direction the signer is facing. I would be very interested in an projects that can recognize more that a few signs. I believe the holistic model will make it possible but will require a large corpus of samples and a way to interpret the complex grammars.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文