Python 中的矩阵
昨天我需要 Python 中的矩阵类型。
显然,对此需求的一个简单答案是使用 numpy.matrix() ,但我遇到的另一个问题是我想要一个矩阵来存储具有混合类型的任意值,类似于列表。 numpy.matrix 不执行此操作。 一个例子是
>>> numpy.matrix([[1,2,3],[4,"5",6]])
matrix([['1', '2', '3'],
['4', '5', '6']],
dtype='|S4')
>>> numpy.matrix([[1,2,3],[4,5,6]])
matrix([[1, 2, 3],
[4, 5, 6]])
如您所见,numpy.matrix
的内容必须是同质的。 如果我的初始化中存在字符串值,则每个值都会隐式存储为字符串。 这也可以通过访问单个值来确认。
>>> numpy.matrix([[1,2,3],[4,"5",6]])[1,1]
'5'
>>> numpy.matrix([[1,2,3],[4,"5",6]])[1,2]
'6'
现在,Python 列表类型可以接受混合类型。 您可以有一个包含整数和字符串的列表,两者都保留其类型。 我需要的是类似于列表的东西,但以类似矩阵的行为进行操作。
因此,我必须实现我自己的类型。 对于内部实现,我有两种选择:包含列表的列表和字典。 两种解决方案都有缺点:
- 列表列表需要仔细同步各个列表的大小。 交换两行很容易。 交换两列不太容易。 删除一行也很容易。
- 字典(以元组作为键)稍微好一些,但是您必须定义键的限制(例如,如果您的矩阵是 3x3,则不能插入元素 5,5),并且使用它们来插入更加复杂、删除或交换列或行。
编辑:澄清。 我需要此功能的具体原因是因为我正在读取 CSV 文件。 一旦我从 CSV 文件中收集值(值可以是字符串、整数、浮点数),我想执行交换、删除、插入和其他类似操作。 因此我需要一个“矩阵列表”。
我的好奇心是:
- 你知道提供此服务的 Python 数据类型是否已经存在(可能在“不包含电池”的库中)?
- 为什么标准库中没有提供这种数据类型? 也许兴趣太有限了?
- 你会如何解决这个需求? 字典、列表还是其他更智能的解决方案?
Yesterday I had the need for a matrix type in Python.
Apparently, a trivial answer to this need would be to use numpy.matrix()
, but the additional issue I have is that I would like a matrix to store arbitrary values with mixed types, similarly to a list. numpy.matrix
does not perform this. An example is
>>> numpy.matrix([[1,2,3],[4,"5",6]])
matrix([['1', '2', '3'],
['4', '5', '6']],
dtype='|S4')
>>> numpy.matrix([[1,2,3],[4,5,6]])
matrix([[1, 2, 3],
[4, 5, 6]])
As you can see, the numpy.matrix
must be homogeneous in content. If a string value is present in my initialization, every value gets implicitly stored as a string. This is also confirmed by accessing the single values
>>> numpy.matrix([[1,2,3],[4,"5",6]])[1,1]
'5'
>>> numpy.matrix([[1,2,3],[4,"5",6]])[1,2]
'6'
Now, the Python list type can instead accept mixed types. You can have a list containing an integer and a string, both conserving their type. What I would need is something similar to a list, but operating in a matrix-like behavior.
Therefore, I had to implement my own type. I had two choices for the internal implementation: list containing lists, and dictionaries. Both solutions have shortcomings:
- list of lists require careful synchronization of the various lists' sizes. Swapping two rows is easy. Swapping two columns is less easy. Removing a row is easy as well.
- dictionaries (with a tuple as a key) are slightly better, but you have to define the limits of your key (for example, you cannot insert element 5,5 if your matrix is 3x3), and they are more complex to use to insert, remove, or swap columns or rows.
Edit: clarification. The concrete reason on why I need this functionality is because I am reading CSV files. Once I collect the values from a CSV file (values that can be string, integers, floats) I would like to perform swapping, removal, insertion and other operations alike. For this reason I need a "matrix list".
My curiosities are:
- do you know if a Python data type providing this service already exists (maybe in a "non-battery included" library out there)?
- why is this data type not provided in the standard library? Too restricted interest maybe?
- How would you have solved this need? Dictionary, list, or another smarter solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您的
dtype
是object
,您可以拥有非同质类型:除了花哨的索引之外,我不知道这对您有什么好处,因为,正如 Don 指出的那样,您不能用这个矩阵做数学运算。
You can have inhomogeneous types if your
dtype
isobject
:I have no idea what good this does you other than fancy indexing, because, as Don pointed out, you can't do math with this matrix.
我很好奇你为什么想要这个功能; 据我了解,拥有矩阵(在 numpy 中)的原因主要是为了进行线性数学(矩阵变换等)。
我不确定小数和字符串的乘积的数学定义是什么。
在内部,您可能需要查看稀疏矩阵实现(http://www .inf.ethz.ch/personal/arbenz/pycon03_contrib.pdf)。 有很多方法可以做到这一点(散列、列表、链表),每种方法都有自己的优点和缺点。 如果您的矩阵不会有很多空值或零,那么您可以放弃稀疏实现。
I'm curious why you want this functionality; as I understand it, the reason for having matrices (in numpy), is primarily for doing linear math (matrix transformations and so on).
I'm not sure what the mathematical definition would be for the product of a decimal and a String.
Internally, you'll probably want to look at sparse matrix implementations (http://www.inf.ethz.ch/personal/arbenz/pycon03_contrib.pdf). There are lots of ways to do this (hash, list, linked list), and each has its own advantages and drawbacks. If your matrix isn't going to have a lot of nulls or zeroes, then you can ditch the sparse implementations.
您看过 numpy.recarray 功能吗?
例如这里:
http://docs.scipy.org/doc/numpy/reference/
generated/numpy.recarray.html 它的设计目的是允许具有混合数据类型的数组。
我不知道数组是否适合您的目的,或者您是否真的需要一个矩阵 - 我还没有使用过 numpy 矩阵。 但如果数组足够好,重新数组可能会起作用。
Have you looked at the numpy.recarray capabilities?
For instance here:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html
It's designed to allow arrays with mixed datatypes.
I don't know if an array will suit your purposes, or if you really need a matrix - I haven't worked with the numpy matrices. But if an array is good enough, recarray might work.
可能是回复晚了
但,
为什么不使用pandas?
Maybe it's a late answer,
but,
why not use pandas?
看看 sympy——它在多态性方面做得很好
在它的矩阵中,你可以对 sympy.matrices.Matrix 进行操作
像 col_swap、col_insert、col_del 等对象...
Check out sympy -- it does quite a good job at polymorphism
in its matrices and you you have operations on sympy.matrices.Matrix
objects like col_swap, col_insert, col_del, etc...
您是否考虑过使用 csv 模块来处理 csv 文件?
csv 模块的 Python 文档
Have you considered the csv module for working with csv files?
Python docs for csv module