准确测量一组基准点之间的相对距离(增强现实应用)
假设我有一组 5 个标记。我试图使用增强现实框架来查找每个标记之间的相对距离,例如 ARToolkit 。在我的相机中,前 20 帧仅向我显示前 2 个标记,以便我可以计算出 2 个标记之间的转换。第二个 20 帧仅显示第二个和第三个标记,依此类推。最后 20 帧显示了第 5 个和第 1 个标记。我想构建所有 5 个标记的标记位置的 3D 地图。
我的问题是,知道由于视频源的质量低下,距离会不准确,根据我收集的所有信息,如何最大限度地减少不准确度?
我天真的方法是使用第一个标记作为基点,从前 20 帧中取变换的平均值,并放置第二个标记,依此类推,为第三个和第四个标记。对于第 5 个标记,通过将其放置在第 5 个和第 1 个以及第 4 个和第 5 个之间的变换平均值的中间,将其放置在第 4 个和第 1 个标记之间。我觉得这种方法对第一个标记的放置有偏见,并且没有考虑到相机每帧看到超过 2 个标记。
最终我希望我的系统能够计算出 x 个标记的地图。在任何给定帧中最多可以出现 x 个标记,并且由于图像质量而存在非系统性错误。
任何有关解决此问题的正确方法的帮助将不胜感激。
编辑: 有关该问题的更多信息:
假设现实世界地图如下:
假设我对于图像中箭头表示的点之间的每个变换,获取 100 个读数。实际值写在箭头上方。
我获得的值有一些误差(假设遵循关于实际值的高斯分布)。例如,标记 1 至 2 获得的读数之一可能是 x:9.8 y:0.09。鉴于我拥有所有这些读数,我如何估计地图。理想情况下,结果应尽可能接近真实值。
我的幼稚方法存在以下问题。如果从 1 到 2 的变换平均值稍微偏离,则即使 2 到 3 的读数非常准确,3 的位置也可能会偏离。这个问题如下所示:
绿色是实际值,黑色是计算值。 1 到 2 的平均变换是 x:10 y:2。
Let's say I have a set of 5 markers. I am trying to find the relative distances between each marker using an augmented reality framework such as ARToolkit. In my camera feed thee first 20 frames show me the first 2 markers only so I can work out the transformation between the 2 markers. The second 20 frames show me the 2nd and 3rd markers only and so on. The last 20 frames show me the 5th and 1st markers. I want to build up a 3D map of the marker positions of all 5 markers.
My question is, knowing that there will be inaccuracies with the distances due to low quality of the video feed, how do I minimise the inaccuracies given all the information I have gathered?
My naive approach would be to use the first marker as a base point, from the first 20 frames take the mean of the transformations and place the 2nd marker and so forth for the 3rd and 4th. For the 5th marker place it inbetween the 4th and 1st by placing it in the middle of the mean of the transformations between the 5th and 1st and the 4th and 5th. This approach I feel has a bias towards the first marker placement though and doesn't take into account the camera seeing more than 2 markers per frame.
Ultimately I want my system to be able to work out the map of x number of markers. In any given frame up to x markers can appear and there are non-systemic errors due to the image quality.
Any help regarding the correct approach to this problem would be greatly appreciated.
Edit:
More information regarding the problem:
Lets say the realworld map is as follows:
Lets say I get 100 readings for each of the transformations between the points as represented by the arrows in the image. The real values are written above the arrows.
The values I obtain have some error (assumed to follow a gaussian distribution about the actual value). For instance one of the readings obtained for marker 1 to 2 could be x:9.8 y:0.09. Given I have all these readings how do I estimate the map. The result should ideally be as close to the real values as possible.
My naive approach has the following problem. If the average of the transforms from 1 to 2 is slightly off the placement of 3 can be off even though the reading of 2 to 3 is very accurate. This problem is shown below:
The greens are the actual values, the blacks are the calculated values. The average transform of 1 to 2 is x:10 y:2.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用最小二乘方法来找到最适合您所有的变换数据。如果您想要的只是标记之间的距离,那么这只是测量距离的平均值。
假设您的标记位置是固定的(例如,固定到固定刚体),并且您想要它们的相对位置,那么您可以简单地记录它们的位置并对其进行平均。如果有可能将一个标记与另一个标记混淆,您可以逐帧跟踪它们,并使用每个标记位置在其两个周期之间的连续性来确认其身份。
如果您预计刚体会移动(或者身体不是刚体,等等),那么您的问题就会困难得多。一次两个标记不足以固定刚体的位置(需要三个)。但请注意,在每次转换时,您几乎同时获得旧标记、新标记和连续标记的位置。如果您已经有了每个标记在身体上的预期位置,那么这应该可以每 20 帧提供一个刚性姿势的良好估计。
一般来说,如果您的身体在移动,则最佳性能将需要某种动态模型,该模型应该用于跟踪其随时间变化的姿势。给定动态模型,您可以使用 卡尔曼滤波器 进行跟踪;卡尔曼滤波器非常适合集成您所描述的数据类型。
通过将标记的位置作为卡尔曼状态向量的一部分,您也许能够从纯粹的传感器数据中推断出它们的相对位置(这似乎是您的目标),而不是先验地要求此信息。如果您希望能够有效地处理任意数量的标记,您可能需要对常用方法进行一些巧妙的修改;您的问题似乎旨在避免通过传统的分解方法(例如顺序卡尔曼滤波)来解决。
按照下面的评论进行编辑:
如果您的标记产生完整的 3D 姿势(而不仅仅是 3D 位置),则附加数据将使您更容易维护有关您正在跟踪的对象的准确信息。但是,上述建议仍然适用:
我想到的新点是:
You can use a least-squares method, to find the transformation that gives the best fit to all your data. If all you want is the distance between the markers, this is just the average of the distances measured.
Assuming that your marker positions are fixed (e.g., to a fixed rigid body), and you want their relative position, then you can simply record their positions and average them. If there is a potential for confusing one marker with another, you can track them from frame to frame, and use the continuity of each marker location between its two periods to confirm its identity.
If you expect your rigid body to be moving (or if the body is not rigid, and so forth), then your problem is significantly harder. Two markers at a time is not sufficient to fix the position of a rigid body (which requires three). However, note that, at each transition, you have the location of the old marker, the new marker, and the continuous marker, at almost the same time. If you already have an expected location on the body for each of your markers, this should provide a good estimate of a rigid pose every 20 frames.
In general, if your body is moving, best performance will require some kind of model for its dynamics, which should be used to track its pose over time. Given a dynamic model, you can use a Kalman filter to do the tracking; Kalman filters are well-adapted to integrating the kind of data you describe.
By including the locations of your markers as part of the Kalman state vector, you may be able to be able to deduce their relative locations from purely sensor data (which appears to be your goal), rather than requiring this information a priori. If you want to be able to handle an arbitrary number of markers efficiently, you may need to come up with some clever mutation of the usual methods; your problem seems designed to avoid solution by conventional decomposition methods such as sequential Kalman filtering.
Edit, as per the comments below:
If your markers yield a full 3D pose (instead of just a 3D position), the additional data will make it easier to maintain accurate information about the object you are tracking. However, the recommendations above still apply:
New points that come to mind: