当前位置：文江博客话题详情

Kinect手势分析

发布于 2024-12-09 22:18:48 字数 195 浏览 0 评论 0原文

我正在使用官方 Kinect SDK 制作一个 kinect 应用程序。

我想要的结果 1）能够识别身体已挥动5秒。如果确实如此，就做某事 2）能够识别单腿倾斜5秒。如果确实如此，就做某事。

有人知道该怎么做吗？我正在 WPF 应用程序中做。

想要一些例子。我对 Kinect 还很陌生。

预先感谢您的所有帮助！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淑女气质 2024-12-16 22:18:48

Kinect 为您提供它正在跟踪的骨骼，剩下的事情就由您来做。基本上，您需要为所需的每个手势创建一个定义，并在每次触发 SkeletonFrameReady 事件时针对骨架运行该定义。这并不容易。

定义手势

定义手势可能非常困难。最简单（最简单）的手势是在单个时间点发生的手势，因此不依赖于肢体的过去位置。例如，如果您想检测用户何时将手举过头顶，则可以在每个单独的帧上进行检查。更复杂的手势需要考虑一段时间。对于挥手的手势，您无法从单个帧中判断出一个人是在挥手还是只是将手举在面前。

所以现在你需要能够存储过去的相关信息，但是哪些信息是相关的呢？您是否应该存储最后 30 帧并针对它运行算法？ 30 帧只能为您提供一秒钟的信息。也许 60 帧？或者你的 5 秒，300 帧？人类的移动速度没有那么快，因此也许您可以每隔 5 帧使用一次，这会将 5 秒的时间缩短到 60 帧。更好的想法是从框架中挑选相关信息。对于挥手手势，手的当前速度、移动了多长时间、移动了多远等都可能是有用的信息。

在弄清楚如何获取和存储与手势相关的所有信息后，如何将这些数字转化为定义？挥手可能需要一定的最小速度或方向（左/右而不是上/下）或持续时间。但是，此持续时间不是您感兴趣的 5 秒持续时间。此持续时间是假设用户正在挥手所需的绝对最小值。如上所述，您无法从一帧中确定波浪。你不应该从 2、3 或 5 波中确定一波，因为时间不够。如果我的手抽搐了不到一秒，你会认为这是波浪吗？可能有一个最佳点，大多数人都会同意从左到右的运动构成波浪，但我当然不太了解它，无法在算法中定义它。

要求用户在一段时间内执行某个手势还有另一个问题。很可能，无论你的定义写得多么好，这五秒钟内的每一帧都不会看起来是波。虽然您可以轻松确定某人是否将手举过头顶五秒钟（因为可以在单帧的基础上确定），但对于复杂的手势来说，做到这一点要困难得多。虽然挥手并没有那么复杂，但它仍然显示出这个问题。当你的手在波浪两侧改变方向时，它会停止移动一小会儿。那你还在挥手吗？如果您回答“是”，请放慢挥手速度，以便在两侧多停顿一下。这种停顿还能被视为浪潮吗？很可能，在这五秒手势中的某个时刻，定义将无法检测到波浪。所以现在你需要考虑到手势持续时间的宽大处理。如果挥手手势发生在最后五秒的 95% 中，这样就足够了吗？ 90%？ 80%？

我想在这里指出的一点是，没有简单的方法来进行手势识别。您必须仔细考虑手势并确定某种定义，将一堆关节位置（骨架数据）转换为手势。您需要跟踪过去帧的相关数据，但要意识到手势定义可能并不完美。

考虑用户

现在我已经说了为什么五秒波难以检测，请允许我至少给出我对如何做到这一点的想法：不要。您不应强迫用户在设定的时间段（五秒挥手）内重复基于运动的手势。这是令人惊讶的累，而且不是人们对计算机的期望/想要的。点击即可即时完成；一旦我们点击，我们就会期待得到回应。没有人希望必须按住点击按钮五秒钟才能打开扫雷。如果连续执行某些操作（例如使用手势循环浏览列表），在一段时间内重复手势是可以的 - 用户会明白他们必须继续执行该手势才能在列表中进一步移动。这甚至使手势更容易检测，因为您不需要最后 5 秒的信息，而只需要足够的信息来了解用户现在是否正在执行该手势。

如果您希望用户在设定的时间内保持某个手势，请使其成为固定手势（将手保持在某个位置 x 秒比挥手容易得多）。提供一些视觉反馈（表明计时器已启动）也是一个非常好的主意。如果用户搞砸了手势（错误的手、错误的位置等）并最终站在那里等待 5 或 10 秒等待某件事发生，他们不会高兴，但这并不是这个问题的一部分。

从 Kinect 手势开始

从小处开始……真的很小。首先，确保您了解 SkeletonData 类。每个骨架上有 20 个被跟踪的关节，每个关节都有一个 TrackingState。此跟踪状态将显示 Kinect 是否确实可以看到关节（已跟踪），是否根据骨架的其余部分计算出关节的位置（推断），或者是否完全放弃尝试寻找关节（未跟踪）。这些状态很重要。您不想仅仅因为 Kinect 看不到另一条腿并报告其虚假位置而认为用户单腿站立。每个关节都有一个位置，这就是您如何知道用户站立的位置......一点一点。熟悉坐标系。

了解如何报告骨架数据的基础知识后，尝试一些简单的手势。当用户将手举过头顶时，在屏幕上打印一条消息。这只需要将每只手与头部关节进行比较，看看任意一只手在坐标平面中是否高于头部。完成该工作后，请继续处理更复杂的事情。我建议尝试滑动动作（手放在身体前面，从右到左或从左到右移动一段最小距离）。这需要来自过去帧的信息，因此您必须考虑要存储哪些信息。如果你能做到这一点，你可以尝试在短时间内串联一系列滑动手势，并将其解释为挥手。

tl;dr：手势很难。从小事做起，逐步向上。不要让用户为单个动作做重复的动作，这既累人又烦人。包括基于手势持续时间的视觉反馈。阅读这篇文章的其余部分。

The Kinect provides you with the skeletons it's tracking, you have to do the rest. Basically you need to create a definition for each gesture you want, and run that against the skeletons every time the SkeletonFrameReady event is fired. This isn't easy.

Defining Gestures

Defining the gestures can be surprisingly difficult. The simplest (easiest) gestures are ones that happen at a single point in time, and therefore don't rely on past locations of the limbs. For example, if you want to detect when the user has their hand raised above their head, this can be checked on every individual frame. More complicated gestures need to take a period of time into account. For your waving gesture, you won't be able to tell from a single frame whether a person is waving or just holding their hand up in front of them.

So now you need to be able to store relevant information from the past, but what information is relevant? Should you keep a store of the last 30 frames and run an algorithm against that? 30 frames only gets you a second's worth of information.. perhaps 60 frames? Or for your 5 seconds, 300 frames? Humans don't move that fast, so maybe you could use every fifth frame, which would bring your 5 seconds back down to 60 frames. A better idea would be to pick and choose the relevant information out of the frames. For a waving gesture the hand's current velocity, how long it's been moving, how far it's moved, etc. could all be useful information.

After you've figured out how to get and store all the information pertaining to your gesture, how do you turn those numbers into a definition? Waving could require a certain minimum speed, or a direction (left/right instead of up/down), or a duration. However, this duration isn't the 5 second duration you're interested in. This duration is the absolute minimum required to assume that the user is waving. As mentioned above, you can't determine a wave from one frame. You shouldn't determine a wave from 2, or 3, or 5, because that's just not enough time. If my hand twitches for a fraction of a second, would you consider that a wave? There's probably a sweet spot where most people would agree that a left to right motion constitutes a wave, but I certainly don't know it well enough to define it in an algorithm.

There's another problem with requiring a user to do a certain gesture for a period of time. Chances are, not every frame in that five seconds will appear to be a wave, regardless of how well you write the definition. Where as you can easily determine if someone held their hand over their head for five seconds (because it can be determined on a single frame basis), it's much harder to do that for complicated gestures. And while waving isn't that complicated, it still shows this problem. As your hand changes direction at either side of a wave, it stops moving for a fraction of a second. Are you still waving then? If you answered yes, wave more slowly so you pause a little more at either side. Would that pause still be considered a wave? Chances are, at some point in that five second gesture, the definition will fail to detect a wave. So now you need to take into account a leniency for the gesture duration.. if the waving gesture occurred for 95% of the last five seconds, is that good enough? 90%? 80%?

The point I'm trying to make here is there's no easy way to do gesture recognition. You have to think through the gesture and determine some kind of definition that will turn a bunch of joint positions (the skeleton data) into a gesture. You'll need to keep track of relevant data from past frames, but realize that the gesture definition likely won't be perfect.

Consider the Users

So now that I've said why the five second wave would be difficult to detect, allow me to at least give my thoughts on how to do it: don't. You shouldn't force users to repeat a motion based gesture for a set period of time (the five second wave). It is surprisingly tiring and just not what people expect/want from computers. Point and click is instantaneous; as soon as we click, we expect a response. No one wants to have to hold a click down for five seconds before they can open Minesweeper. Repeating a gesture over a period of time is okay if it's continually executing some action, like using a gesture to cycle through a list - the user will understand that they must continue doing the gesture to move farther through the list. This even makes the gesture easier to detect, because instead of needing information for the last 5 seconds, you just need enough information to know if the user is doing the gesture right now.

If you want the user to hold a gesture for a set amount of time, make it a stationary gesture (holding your hand at some position for x seconds is a lot easier than waving). It's also a very good idea to give some visual feedback, to say that the timer has started. If a user screws up the gesture (wrong hand, wrong place, etc) and ends up standing there for 5 or 10 seconds waiting for something to happen, they won't be happy, but that's not really part of this question.

Starting with Kinect Gestures

Start small.. really small. First, make sure you know your way around the SkeletonData class. There are 20 joints tracked on each skeleton, and they each have a TrackingState. This tracking state will show whether the Kinect can actually see the joint (Tracked), if it is figuring out the joint's position based on the rest of the skeleton (Inferred), or if it has entirely abandoned trying to find the joint (NotTracked). These states are important. You don't want to think the user is standing on one leg simply because the Kinect doesn't see the other leg and is reporting a bogus position for it. Each joint has a position, which is how you know where the user is standing.. piece by piece. Become familiar with the coordinate system.

After you know the basics of how the skeleton data is reported, try for some simple gestures. Print a message to the screen when the user raises a hand above their head. This only requires comparing each hand to the Head joint and seeing if either hand is higher than the head in the coordinate plane. After you get that working, move up to something more complicated. I'd suggest trying a swiping motion (hand in front of body, moves either right to left or left to right some minimum distance). This requires information from past frames, so you'll have to think through what information to store. If you can get that working, you could try string a series of swiping gestures in a small amount of time and interpreting that as a wave.

tl;dr: Gestures are hard. Start small, build your way up. Don't make users do repetitive motions for a single action, it's tiring and annoying. Include visual feedback for duration based gestures. Read the rest of this post.

回复收藏 0 原文