我有一个密集矩阵,其中索引对应于基因。虽然基因标识符通常是整数,但它们不是连续的整数。它们也可以是字符串。
我想我可以使用某种带有整数键的增强稀疏矩阵,并且它们是否连续也没关系。或者这仍然会占用大量空间,特别是如果某些基因具有九位数字的标识符?
此外,我担心稀疏存储是不合适的,因为这是一个全部矩阵(如果基因存在,每个单元格中都会有一个距离)。
我不太可能需要执行任何矩阵运算(例如矩阵乘法)。我需要从矩阵(切片)中提取向量。
看起来最好的矩阵类型应该是 Boost unordered_map(哈希映射),甚至可能只是一个 STL 映射。
我是否以错误的方式看待这个问题?我真的需要自己动手吗?我想我以前在哪里见过这样的课程。
谢谢!
I have a dense matrix where the indices correspond to genes. While gene identifiers are often integers, they are not contiguous integers. They could be strings instead, too.
I suppose I could use a boost sparse matrix of some sort with integer keys, and it wouldn't matter if they're contiguous. Or would this still occupy a great deal of space, particularly if some genes have identifiers that are nine digits?
Further, I am concerned that sparse storage is not appropriate, since this is an all-by-all matrix (there will be a distance in each and every cell, provided the gene exists).
I'm unlikely to need to perform any matrix operations (e.g., matrix multiplication). I will need to pull vectors out of the matrix (slices).
It seems like the best type of matrix would be keyed by a Boost unordered_map (a hash map), or perhaps even simply an STL map.
Am I looking at this the wrong way? Do I really need to roll my own? I thought I saw such a class somewhere before.
Thanks!
发布评论
评论(3)
您可以使用 std::map 将基因标识符映射到唯一的、连续分配的整数(每次向图谱添加新的基因标识符时,您可以将图谱的大小作为其标识符,假设你永远不会从地图上删除基因)。
如果您希望能够根据基因的唯一整数搜索基因的标识符,则可以使用第二个映射,或者可以使用
boost::bimap
,它提供元素的双向映射。至于使用哪个矩阵容器,您可以考虑boost::ublas::matrix;它提供了对矩阵行和列的类似向量的访问。
You could use a
std::map
to map the gene identifiers to unique, consecutively assigned integers (every time you add a new gene identifier to the map, you can give it the map's size as its identifier, assuming you never remove genes from the map).If you want to be able to search for the identifier of a gene based on its unique integer, you can use a second map or you could use a
boost::bimap
, which provides a bidirectional mapping of elements.As for which matrix container to use, you might consider
boost::ublas::matrix
; it provides vector-like access to rows and columns of the matrix.如果不需要矩阵运算,就不需要矩阵。带有字符串键的 2D 地图可以使用
map
用纯 C++ 编写,或者使用来自 Boost 的哈希映射。If you don't need matrix operations, you don't need a matrix. A 2D map with string keys can be done with
map<map<string> >
in plain C++, or using a hash map accordingly from Boost.Boost.MultiArray 允许您管理非连续索引。
如果您想要高效地实现静态大小的矩阵,还有 Boost.LA ,现在已列入审核计划。
还有 NT2 应该很快就会提交给 Boost。
There is Boost.MultiArray which will allow you to manage with non-continuous indexes.
If you want an efficient implementation working with matrices with static size, there is also Boost.LA, which in now on the review schedule.
And las there is also NT2 which should be submitted to Boost soon.