跟踪结果是否已使用哈希表计算出来
我的应用程序将在密集矩阵上执行大量矩阵运算(例如加/乘)。 我想缓存唯一的结果以避免重复计算。
密集矩阵:
typdef struct denseMatrix{
int m;
int n;
double **d; // actual matrix
multiplyTable **entry; // key & result
} dns;
表项:
typedef struct multiplyTable{
dns *rightOperand; // key
dns *result; // value
} multiplyTable; // or something like that
dns *A, *B, *C, *D...; // allocated internally
C = mult(A,B); //may be called many many times.
在这种情况下,mult 会将一个项(操作数,结果)对添加到表中
add(A->entry, B, C); //B is the right operand and C is the result
稍后,如果再次调用 D = mult(A, B),则 search(A->entry,B ) 将检索 C。另一方面,如果特定操作数不在列表中,则它将与指向结果矩阵的指针一起添加。
我以前从未做过这样的事情,我什至不确定这是否是解决问题的方法。 根据我有限的理解,哈希表可以用来实现这样的东西。
我的实际问题包括: (a) 哈希表首先是解决该问题的合适方法吗? 他们是否允许指针地址作为键和值?
(b) 将“哈希表”保留为结构中的“字段”是否有意义? 这样,我已经有了左操作数,我只需要在乘法表中搜索右操作数即可。 或者,是否应该有一个独立的表,以左右操作数作为键?
(c) 我是否为加法/乘法等创建单独的表,或者应该有一个包含操作数和运算符的表?
(d) 跟踪所有创建的对象以便可以适当地释放这些对象的最佳方法是什么?
(e) 什么公开可用的库(c 中)适合实现这样的事情?
我正在寻求有关以下方面的意见/建议:(a) 解决问题的替代方法,以及 (b) 此类替代方案的优点/缺点。
最后,我发现这个论坛非常有帮助,并想表达我的谢意。 ++谢谢。
My application would perform a large number of matrix operations (e.g., add/ multiply) on dense matrices. I would like to cache unique results in order to avoid duplicate computations.
Dense matrix:
typdef struct denseMatrix{
int m;
int n;
double **d; // actual matrix
multiplyTable **entry; // key & result
} dns;
Table-entry:
typedef struct multiplyTable{
dns *rightOperand; // key
dns *result; // value
} multiplyTable; // or something like that
dns *A, *B, *C, *D...; // allocated internally
C = mult(A,B); //may be called many many times.
In this case, mult would add an entry (operand, result) pair to the table
add(A->entry, B, C); //B is the right operand and C is the result
Later if D = mult(A, B) were to be called again, search(A->entry,B) would retrieve C. If on the other hand, the specific operand is not in the list, it would be added along with the pointer to the result matrix.
I have never done anything like this before and I am not even sure if this is the way to approach the problem. From my limited understanding, hash-tables may be used to implement something like this.
Among the practical questions I have are:
(a) Is hash-table the appropriate solution to the problem in the first place? Do they allow pointer addresses as keys and values??
(b) Does it make sense to keep the "hash-table" as a "field" in the struct? That way, I already have the left operand, I just need to search for the rightOperand in the multiplication table. Or, should there be an independent table with both the left and right operands as keys?
(c) Do I create separate tables for addition/multiplication etc., or should there be a single table, with operands and operators?
(d) Wht would be the best way of keeping track of all the objects that are created so that these could be freed appropriately??
(e) What publicly available library (in c) would be suitable for implementing something like this?
I am seeking input/suggestions regarding (a) alternative ways in which the problem could be approached, and (b) advantages/disadvantages of such alternatives.
Lastly, I have found this forum to be immensely helpful and wanted to express my gratitude. ++Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先回答简单的部分:对于矩阵运算的 C++ 库,请查看 newmat< /a> 内置了大量功能,并且性能方面非常高效。
对于构建哈希以加速计算的特定情况 - 缓存只有在您要在非常有限的矩阵集上执行操作时才值得。 要为矩阵构建唯一的哈希值,您需要访问每个条目 - 并根据每个条目的位置和值计算哈希值。 更糟糕的是,矩阵并不总是可交换的,例如 AB != BA(特殊情况除外)。
这意味着您的缓存必须为每个特定计算存储一个条目。 因此,除非您处理的输入矩阵范围非常小,否则保存所有结果的内存成本将是巨大的。
缓存可以加快结果的速度,但仅限于非常有限的情况。
鉴于您也征求了有关库的建议,这听起来像是过早优化的情况。 我会在没有缓存的情况下实现您的程序,对其进行性能分析,并且如果您发现数组算术中存在性能瓶颈,然后考虑优化数字运算的方法。
编辑:关于计算哈希:如果您有一个 n×m 矩阵 X,那么计算该矩阵的哈希至少与操作 RXC 一样复杂,其中 R 是行向量 [1,..,n],C 是列向量 [1,..,m]。 我还没有计算出最佳回报,但对于 2x2、3x3 数量级的非常小的矩阵,进行原始计算将比计算哈希更便宜。
Answering the easy part first: For a C++ library of matrix operations have a look at newmat which has a large range of functionality built-in, and performance-wise is pretty efficient.
For your specific case of building a hash to speed up computations - caching is only worthwhile unless you are going to be performing operations on a very limited set of matrices. To build a unique hash for the matrix you will need to visit every entry - and calculate the hash based on each entry's location and value. Worse still, matrices are not always commutative, e.g. AB != BA except in special cases.
This means your cache would have to store one entry for each specific calculation. So unless you're dealing with a very small range of input matrices the memory cost of holding all the results will be huge.
Caching would speed up the results, but only in a very limited set of circumstances.
Given that you've asked for advice on a library as well, this sounds like a case of premature optimisation. I would implement your program without caching, performance profile it, and if you find a performance bottleneck in the array arithmetic, then consider ways to optimise the number crunching.
Edit: on calculating the hash: if you have a n-by-m matrix X, then calculating the hash for that one matrix is at least as complicated as the operation RXC where R is the row vector [1,..,n] and C is the column vector [1,..,m]. I haven't worked out the optimal payoff, but for really small matrices on the order of 2x2, 3x3 doing the raw calc is going to be cheaper than calculating the hash.
你必须非常小心哈希值。 如果发生冲突(不同原始值具有相同的哈希值),您可能会得到错误的结果。
您确定计算矩阵的哈希值比执行实际操作更有效吗(显然这完全取决于这些操作的数量/复杂性)
第二个关注点 - 您还没有提及有关缓存逐出策略的任何内容。 您打算只添加到哈希表而不删除吗? 根据不同矩阵的数量,您可能会耗尽内存......
You have to be very careful with hashes. If you have a collision (same hash value for different original values) you may end up with wrong results.
Are you sure that calculating hash of a matrix will be much more efficient than performing the actual operations (it all depends on the number/complexity of these operations obviously)
Second concern - you haven't said anything about your cache eviction policy. Are you going to just add to the hash table without removing? Depending on the number of different matrices you potentially may run out of memory...
此功能称为记忆 - 请参阅 wikipedia 文章了解详细信息。
本文还提到了一些可以为您提供帮助的库。
This feature is called memoization - see wikipedia article for details.
This article also mentions some libraries that can help you.