使用 kinect 和 python 进行手势识别：嗯学习

发布于 2024-12-21 08:29:11 字数 971 浏览 1 评论 0原文

我想用 kinect 在 python 中进行手势识别。

在阅读了一些理论之后，我认为最好的方法之一是使用隐马尔可夫模型（HMM）（鲍姆韦尔奇或某种 EM 方法）和一些已知的手势数据进行无监督学习，以实现一组经过训练的 HMM（每个手势一个）我想认识的）。

然后，我会将观察到的数据的最大对数似然（维特比？）与训练集中的 HMM 进行匹配进行识别。

例如，我用 kinect 设备记录了一些手势（打招呼、踢一拳、用手画圈）的数据（右手的坐标 x、y、z），并且我做了一些训练：

# training
known_datas = [
tuple( load_data('punch.mat'),                'PUNCH' ),
tuple( load_data('say_hello.mat'),            'HELLO' ), 
tuple( load_data('do_circle_with_hands.mat'), 'CIRCLE' )
]

gestures = set()
for x, name in known_datas:
    m = HMM()
    m.baumWelch(x)
    gestures.add(m)

然后我执行识别观察到的新数据执行最大 loglik 并选择之前保存的手势，对于每个经过训练的 HMM 具有最大 loglik：

# recognition
observed = load_data('new_data.mat')
logliks = [m.viterbi(observed) for m in gestures]

print 'observed data is ', gestures[logliks.index(max(logliks))]

我的问题是：

这是完全愚蠢的事情吗？
真实案例有多少训练集？
每个 HMM 有多少个状态？
可以实时进行吗？

原文

I want to do gesture recognition in python with kinect.

After reading up on some theory, I think one of the best method is unsupervised learning with Hidden Markov Model (HMM) (baum welch or some EM method) with some known gesture data, to achieve a set of trained HMM (one for each gesture that I want to recognize).

I would then do the recognition matching the max log likelihood (with viterbi?) of observed data with the HMM in the trained set.

For example, I have data (coordinate x,y,z of the right hand) recorded with the kinect device of some gestures (saying hello, kick a punch, do a circle with the hand) and I do some training:

# training
known_datas = [
tuple( load_data('punch.mat'),                'PUNCH' ),
tuple( load_data('say_hello.mat'),            'HELLO' ), 
tuple( load_data('do_circle_with_hands.mat'), 'CIRCLE' )
]

gestures = set()
for x, name in known_datas:
    m = HMM()
    m.baumWelch(x)
    gestures.add(m)

then I perform recognition of observed new data performing the max loglik and choose the gesture saved before that has the max loglik for each trained HMM:

# recognition
observed = load_data('new_data.mat')
logliks = [m.viterbi(observed) for m in gestures]

print 'observed data is ', gestures[logliks.index(max(logliks))]

My questions are:

Is this something totally stupid?
How many training set for a real case?
How many states for each HMM?
Is it possible to do it in realtime?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

零度° 2024-12-28 08:29:11

首先：这是一个非常特殊的问题，你需要一位机器学习专家。不幸的是，堆栈交换站点中还没有 ML 等价物……也许有一天会有一个。 :)

我想你的方法是有效的，只是一些评论：

你刚刚用 HMM() 实例化的 HMM 类需要精心设计，以便 HMM 的结构可以像手势一样表示某物。 HMM 之间有状态和转换，那么如何为手势定义 HMM？我确信这是可能的（甚至认为这是一个很好的方法），但它需要一些思考。也许状态只是 3D 立方体的角，对于您识别的手势的每个观察点，您选择该立方体最近的角。然后，BW 算法可以通过训练数据来近似转换可能性。但您可能需要选择一个更细粒度的状态模型，可能是一个 n * n * n 体素网格。
维特比算法给出的不是模型的可能性，而是给定序列观察的最可能的状态序列。 IIRC 您将选择前向算法来获取给定模型的给定观察序列的概率。

我认为，如果有一个训练有素且不太复杂的 HMM，您应该能够实时识别手势，但这只是一个有根据的猜测。 :)