从使用 Scipy.io.loadmat 加载的 .mat 文件访问数组内容 - python

发布于 2024-11-14 06:33:21 字数 7097 浏览 4 评论 0原文

更新:这是一个很长的问题,归结为,有人可以向我解释 numpy 数组类吗?我在下面回答了我自己的问题。

我正在开发一个项目,将数据从 matlab 导入到 mysql 数据库中,该数据库的内容将通过 django 网站提供。我想使用 Scipy.io.loadmat 将 matlab 中的信息转换为可以在 python 中使用的形式,以便我可以使用 django api 将数据输入数据库。

我的问题是我无法使用 scipy.io.loadmat 导入的数据。它以几个嵌套数组的形式加载,并且一些变量名称似乎丢失了。

以下是我为试验而创建的测试结构的 matlab 代码:

sensors.time = [0:1:10].';
sensors.sensor1 = {};
sensors.sensor1.source_type = 'flight';                          
sensors.sensor1.source_name = 'flight-2';                       
sensors.sensor1.channels = {};
sensors.sensor1.channels.channel1.name = '1';                    
sensors.sensor1.channels.channel1.local_ori = 'lateral';         
sensors.sensor1.channels.channel1.vehicle_ori = 'axial';         
sensors.sensor1.channels.channel1.signals = {};
sensors.sensor1.channels.channel1.signals.signal1.filtered = 'N';
sensors.sensor1.channels.channel1.signals.signal1.filtered_description = 'none'; 
sensors.sensor1.channels.channel1.signals.signal1.data = sin(sensors.time)+0.1*rand(11,1); 

>> sensors
      time: [11x1 double]
      sensor1: [1x1 struct]
>> sensors.sensor1
      source_type: 'flight'
      source_name: 'flight-2'
      channels: [1x1 struct]
>> sensors.sensor1.channels
      channel1: [1x1 struct]
>> sensors.sensor1.channels.channel1
      name: '1'
      local_ori: 'lateral'
      vehicle_ori: 'axial'
      signals: [1x1 struct]
>> sensors.sensor1.channels.channel1.signals
      signal1: [1x1 struct]
>> sensors.sensor1.channels.channel1.signals.signal1
      filtered: 'N'
      filtered_description: 'none'
      data: [11x1 double]

我可以轻松地将这个结构可视化为 Python 字典,因此这似乎不应该是一个如此复杂的练习。

这是我用来读取文件的Python代码(最终我想读取多个文件):

from scipy
import os, glob

path = 'C:\Users\c\Desktop\import'
for f in glob.glob( os.path.join(path, '*.mat')):
    matfile = scipy.io.loadmat(f, struct_as_record=True)

这是来自loadmat的结果字典:

>>> matfile
{'sensors': array([[ ([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]],[[(array([u'flight'], 
      dtype='<U6'), array([u'flight-2'], 
      dtype='<U8'), array([[ ([[(array([u'1'], 
      dtype='<U1'), array([u'lateral'], 
      dtype='<U7'), array([u'axial'], 
      dtype='<U5'), array([[ ([[(array([u'N'], 
      dtype='<U1'), array([u'none'], 
      dtype='<U4'), array([[ 0.06273465],[ 0.84363597],[ 1.00035443],[ 0.22117587],[-0.68221775],[-0.87761299],[-0.24108487],[ 0.71871452],[ 1.04690773],[ 0.46512366],[-0.51651414]]))]],)]],
      dtype=[('signal1', '|O4')]))]],)]], 
      dtype=[('channel1', '|O4')]))]])]], 
      dtype=[('time', '|O4'), ('sensor1', '|O4')]), '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Tue Jun 07 18:38:32 2011', '__globals__': []}

数据都在那里,但我不知道如何访问这些类对象。我希望能够循环内容,以便我可以处理多个传感器,然后每个传感器的多个通道等。

任何帮助我简化此数据结构的解释或建议的更改以使这更容易将不胜感激。


更新,根据尼克的建议,这里是 repr(matfile) 和 dir(matfile)

>>> repr(matfile)
"{'sensors': array([[ ([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]], [[(array([u'flight'], \n      dtype='<U6'), array([u'flight-2'], \n      dtype='<U8'), array([[ ([[(array([u'1'], \n      dtype='<U1'), array([u'lateral'], \n      dtype='<U7'), array([u'axial'], \n      dtype='<U5'), array([[ ([[(array([u'N'], \n      dtype='<U1'), array([u'none'], \n      dtype='<U4'), array([[ 0.0248629 ],\n       [ 0.88663486],\n       [ 0.93206871],\n       [ 0.22156497],\n       [-0.65819207],\n       [-0.95592508],\n       [-0.22584908],\n       [ 0.66569432],\n       [ 1.06956739],\n       [ 0.51103298],\n       [-0.53732649]]))]], [[(array([u'Y'], \n      dtype='<U1'), array([u'1. 5 Hz High Pass, 2. remove offset'], \n      dtype='<U35'), array([[ 0.        ],\n       [ 0.84147098],\n       [ 0.90929743],\n       [ 0.14112001],\n       [-0.7568025 ],\n       [-0.95892427],\n       [-0.2794155 ],\n       [ 0.6569866 ],\n       [ 0.98935825],\n       [ 0.41211849],\n       [-0.54402111]]))]])]], \n      dtype=[('signal1', '|O4'), ('signal2', '|O4')]))]],)]], \n      dtype=[('channel1', '|O4')]))]])]], \n      dtype=[('time', '|O4'), ('sensor1', '|O4')]), '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Wed Jun 08 10:58:19 2011', '__globals__': []}"

>>> dir(matfile)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues']

>>> dir(matfile['sensors'])
['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_wrap__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__delslice__', '__div__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__hex__', '__iadd__', '__iand__', '__idiv__', '__ifloordiv__', '__ilshift__', '__imod__', '__imul__', '__index__', '__init__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__long__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setslice__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__xor__', 'all', 'any', 'argmax', 'argmin', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']

显然我需要了解一些有关对象和类的知识。如何提取数组的位并将它们放入变量中。例如:

time = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
source_type = 'flight'
etc.   

UPDATE: This is a long question that boils down to, can someone explain the numpy array class to me? I answered my own question below.

I am working on a project to import data from matlab into a mysql database whose contents will be made available through a django website. I want to use Scipy.io.loadmat to get the information from matlab into a form I can use in python so that I can enter the data into the database with the django api.

My problem is that I cannot work with the data imported by scipy.io.loadmat. It is loaded in the form of several nested arrays and some of the variable names seem to be lost.

Here is the matlab code for a test structure that I have created for a trial:

sensors.time = [0:1:10].';
sensors.sensor1 = {};
sensors.sensor1.source_type = 'flight';                          
sensors.sensor1.source_name = 'flight-2';                       
sensors.sensor1.channels = {};
sensors.sensor1.channels.channel1.name = '1';                    
sensors.sensor1.channels.channel1.local_ori = 'lateral';         
sensors.sensor1.channels.channel1.vehicle_ori = 'axial';         
sensors.sensor1.channels.channel1.signals = {};
sensors.sensor1.channels.channel1.signals.signal1.filtered = 'N';
sensors.sensor1.channels.channel1.signals.signal1.filtered_description = 'none'; 
sensors.sensor1.channels.channel1.signals.signal1.data = sin(sensors.time)+0.1*rand(11,1); 

>> sensors
      time: [11x1 double]
      sensor1: [1x1 struct]
>> sensors.sensor1
      source_type: 'flight'
      source_name: 'flight-2'
      channels: [1x1 struct]
>> sensors.sensor1.channels
      channel1: [1x1 struct]
>> sensors.sensor1.channels.channel1
      name: '1'
      local_ori: 'lateral'
      vehicle_ori: 'axial'
      signals: [1x1 struct]
>> sensors.sensor1.channels.channel1.signals
      signal1: [1x1 struct]
>> sensors.sensor1.channels.channel1.signals.signal1
      filtered: 'N'
      filtered_description: 'none'
      data: [11x1 double]

I can easily visualize this structure as a python dictionary, so it does not seem like this should be such a complicated exercise.

Here is the python code I used to read the file in (eventually I want to read in multiple files):

from scipy
import os, glob

path = 'C:\Users\c\Desktop\import'
for f in glob.glob( os.path.join(path, '*.mat')):
    matfile = scipy.io.loadmat(f, struct_as_record=True)

This is the resulting dictionary from loadmat:

>>> matfile
{'sensors': array([[ ([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]],[[(array([u'flight'], 
      dtype='<U6'), array([u'flight-2'], 
      dtype='<U8'), array([[ ([[(array([u'1'], 
      dtype='<U1'), array([u'lateral'], 
      dtype='<U7'), array([u'axial'], 
      dtype='<U5'), array([[ ([[(array([u'N'], 
      dtype='<U1'), array([u'none'], 
      dtype='<U4'), array([[ 0.06273465],[ 0.84363597],[ 1.00035443],[ 0.22117587],[-0.68221775],[-0.87761299],[-0.24108487],[ 0.71871452],[ 1.04690773],[ 0.46512366],[-0.51651414]]))]],)]],
      dtype=[('signal1', '|O4')]))]],)]], 
      dtype=[('channel1', '|O4')]))]])]], 
      dtype=[('time', '|O4'), ('sensor1', '|O4')]), '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Tue Jun 07 18:38:32 2011', '__globals__': []}

The data is all there, but I don't know how to access these class objects. I would like to be able to loop over contents so that I can process, multiple sensors, then multiple channels for each sensor, etc.

Any explanations to help me simplify this data structure or suggested changes to make this easier would be greatly appreciated.


Update, based on Nick's suggestion here is the repr(matfile) and the dir(matfile)

>>> repr(matfile)
"{'sensors': array([[ ([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]], [[(array([u'flight'], \n      dtype='<U6'), array([u'flight-2'], \n      dtype='<U8'), array([[ ([[(array([u'1'], \n      dtype='<U1'), array([u'lateral'], \n      dtype='<U7'), array([u'axial'], \n      dtype='<U5'), array([[ ([[(array([u'N'], \n      dtype='<U1'), array([u'none'], \n      dtype='<U4'), array([[ 0.0248629 ],\n       [ 0.88663486],\n       [ 0.93206871],\n       [ 0.22156497],\n       [-0.65819207],\n       [-0.95592508],\n       [-0.22584908],\n       [ 0.66569432],\n       [ 1.06956739],\n       [ 0.51103298],\n       [-0.53732649]]))]], [[(array([u'Y'], \n      dtype='<U1'), array([u'1. 5 Hz High Pass, 2. remove offset'], \n      dtype='<U35'), array([[ 0.        ],\n       [ 0.84147098],\n       [ 0.90929743],\n       [ 0.14112001],\n       [-0.7568025 ],\n       [-0.95892427],\n       [-0.2794155 ],\n       [ 0.6569866 ],\n       [ 0.98935825],\n       [ 0.41211849],\n       [-0.54402111]]))]])]], \n      dtype=[('signal1', '|O4'), ('signal2', '|O4')]))]],)]], \n      dtype=[('channel1', '|O4')]))]])]], \n      dtype=[('time', '|O4'), ('sensor1', '|O4')]), '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Wed Jun 08 10:58:19 2011', '__globals__': []}"

>>> dir(matfile)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues']

>>> dir(matfile['sensors'])
['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_wrap__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__delslice__', '__div__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__hex__', '__iadd__', '__iand__', '__idiv__', '__ifloordiv__', '__ilshift__', '__imod__', '__imul__', '__index__', '__init__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__long__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setslice__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__xor__', 'all', 'any', 'argmax', 'argmin', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']

Obviously I need to learn a bit about objects and classes. How can I pull bits of the array and put them into variables. For example:

time = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
source_type = 'flight'
etc.   

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

哭了丶谁疼 2024-11-21 06:33:21

我在我们公司的一个相当复杂的 mat 文件中遇到了类似的问题。我仍然对 scipy IO 模块一头雾水,但这是我们发现的。

当您访问 matfile['sensors'] 时,它会返回一个 scipy.io.matlab.mio5_params.mat_struct 对象,我们可以使用它来访问下面的内容。当您打印它时,它看起来像一个平面数组,但您仍然可以访问字典来获取各个组件。因此,您可以运行类似的代码来开始访问组件:

from scipy.io import loadmat
matfile = loadmat('myfile.mat', squeeze_me=True, struct_as_record=False)
matfile['sensors'].sensor1.channels.channel1.name

在您的情况下,您希望能够迭代结构中的元素,如果您访问 mat_struct 对象的 _fieldnames 属性,则可以执行此操作。从那里,您可以循环遍历字段名称并使用 getattr 访问它们:

for field in matfile['sensors']._fieldnames:
    # getattr will return the value for the given key
    print getattr(matfile['sensors'], field)

这至少允许我们访问深层嵌套的元素,而无需更改 mat 文件。

I've run into a similar issue with a fairly complex mat file at our company. I'm still getting my head wrapped around the scipy IO module, but here is what we found.

When you access matfile['sensors'] it returns a scipy.io.matlab.mio5_params.mat_struct object, which we can use to access the contents below. When you print it, it looks like a flat array, but you can still access the dict to get at the individual components. So you could run something like this to start accessing the components:

from scipy.io import loadmat
matfile = loadmat('myfile.mat', squeeze_me=True, struct_as_record=False)
matfile['sensors'].sensor1.channels.channel1.name

In your case you want to be able to iterate over the elements in the structure, which you can do if you access the _fieldnames property of the mat_struct object. From there you can just loop over the field names and access them with getattr:

for field in matfile['sensors']._fieldnames:
    # getattr will return the value for the given key
    print getattr(matfile['sensors'], field)

This is at least allowing us to access the deeply nested elements without having to alter our mat files.

宁愿没拥抱 2024-11-21 06:33:21

我采取的解决方案是简化 MATLAB 结构。我消除了嵌套结构。每个数据集都驻留在一个文件中,我使用 python 循环指定目录中特定类型的所有文件。 (http://bogdan.org.ua/2007/08/12/python-iterate-and-read-all-files-in-a-directory-folder.html,如果您想查看相关示例。 )

导入平面 matlab 结构会生成一个字典,其中 matlab 变量名称是键。字符串以形状 (1,) 的数组形式出现 --> [ string ],数字以 (N, M) 形状的数组形式出现 --> [[数字]]。

我还需要更多地了解 numpy 数组。

The solution I resorted to was to simplify the MATLAB structure. I eliminated nested structures. Each data set resides in a single file and I used python to loop over all the files of a particular type in the specified directory. (http://bogdan.org.ua/2007/08/12/python-iterate-and-read-all-files-in-a-directory-folder.html, if you would like to see an example of that.)

Importing the flat matlab structure results in a dictionary where the matlab variable names are the keys. Strings come in as arrays of shape (1,) --> [ string ], and numbers come in as arrays of shape (N, M) --> [[ numbers ]].

I still have to learn a bit more about the numpy arrays.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文