使用 numpy 优化文件读取

发布于 2025-01-11 22:04:45 字数 1736 浏览 0 评论 0原文

我有一个由 FPGA 制作的 .dat 文件。该文件包含 3 列：第一列是输入通道（可以是 1 或 2），第二列是事件发生的时间戳，第三列是同一事件发生的本地时间。第三列是必要的，因为有时 FPGA 必须以不连续计数的方式重置时钟计数器。下图展示了我所说的一个例子。

.dat文件中的一些行示例如下：

1   80.80051152 2022-02-24T18:28:49.602000
2   80.91821978 2022-02-24T18:28:49.716000
1   80.94284154 2022-02-24T18:28:49.732000
2   0.01856876  2022-02-24T18:29:15.068000
2   0.04225772  2022-02-24T18:29:15.100000
2   0.11766780  2022-02-24T18:29:15.178000

时间列由 FPGA 给出（以数十纳秒为单位），日期列由 FPGA 给出蟒蛇脚本监听来自 FPGA 的数据，当它必须写入时间戳时，它还会将本地时间保存为日期。

我有兴趣获得两个数组（每个通道一个），其中每个事件的发生时间相对于采集的开始时间。下面是前面给出的数据应如何查看末尾的示例：

8.091821978000000115e+01
1.062702197800000050e+02
1.062939087400000062e+02
1.063693188200000179e+02

这些数据仅涉及第二个通道。可以通过观察前面数据中的第三列来进行双重检查。

我试图实现这个功能（对我来说太混乱），我每次都会检查两个连续事件之间的时间差相对于本地时间的差是否大于 1 秒，如果是这种情况，我会评估时间间隔通过当地时间栏。因此，我将时间戳校正为正确的时间量：

ch, time, date = np.genfromtxt("events220302_1d.dat", unpack=True, 
                               dtype=(int, float, 'datetime64[ms]'))

mask1 = ch==1
mask2 = ch==2

time1 = time[mask1]
time2 = time[mask2]
date1 = date[mask1]
date2 = date[mask2]

corr1 = np.zeros(len(time1))

for idx, val in enumerate(time1):
    if idx < len(time1) - 1:
        if check_dif(time1[idx], time1[idx+1], date1[idx], date1[idx+1]) == 0:
            corr1[idx+1] = val + (date1[idx+1]-date1[idx])/np.timedelta64(1,'s') - time1[idx+1]
time1 = time1 + corr1.cumsum()

其中 check_dif 是一个函数，如果连续事件之间的时间差与我所说的两个相同事件之间的日期差不一致，则返回 0前。

有没有更优雅甚至更快的方法来通过一些奇特的 NumPy 编码来获得我想要的东西？

原文

I have a .dat file made by an FPGA. The file contains 3 columns: the first is the input channel (it can be 1 or 2), the second column is the timestamp at which an event occurred, the third is the local time at which the same event occurred. The third column is necessary because sometimes the FPGA has to reset the clock counter in such a way that it doesn't count in a continuous way. An example of what I am saying is represented in the next figure.

An example of some lines from the .datfile is the following:

1   80.80051152 2022-02-24T18:28:49.602000
2   80.91821978 2022-02-24T18:28:49.716000
1   80.94284154 2022-02-24T18:28:49.732000
2   0.01856876  2022-02-24T18:29:15.068000
2   0.04225772  2022-02-24T18:29:15.100000
2   0.11766780  2022-02-24T18:29:15.178000

The time column is given by the FPGA (in tens of nanosecond), the date column is written by the python script that listen the data from the FPGA, when it has to write a timestamp it saves also the local time as a date.

I am interested in getting two arrays (one for each channel) where I have for each event the time at which that event occurs relatively to the starting time of the acquisition. An example of how the data given before should look at the end is the following:

8.091821978000000115e+01
1.062702197800000050e+02
1.062939087400000062e+02
1.063693188200000179e+02

These data refere to the second channel only. Double check can be made by observing third column in the previous data.

I tried to achieve this whit a function (too messy to me) where I check every time if the difference between two consecutive events in time is greater than 1 second respect to the difference in local time, if that's the case I evaluate the time interval through the local time column. So I correct the timestamp by the right amount of time:

ch, time, date = np.genfromtxt("events220302_1d.dat", unpack=True, 
                               dtype=(int, float, 'datetime64[ms]'))

mask1 = ch==1
mask2 = ch==2

time1 = time[mask1]
time2 = time[mask2]
date1 = date[mask1]
date2 = date[mask2]

corr1 = np.zeros(len(time1))

for idx, val in enumerate(time1):
    if idx < len(time1) - 1:
        if check_dif(time1[idx], time1[idx+1], date1[idx], date1[idx+1]) == 0:
            corr1[idx+1] = val + (date1[idx+1]-date1[idx])/np.timedelta64(1,'s') - time1[idx+1]
time1 = time1 + corr1.cumsum()

Where check_dif is a function that returns 0 if the difference in time between consecutive events is inconsistent with the difference in date between the two same events as I said before.

Is there any more elegant or even faster way to get what I want with maybe some fancy NumPy coding?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

平安喜乐 2025-01-18 22:04:45

优化代码的一个简单的初始方法是减少代码中的 if 部分，从而去掉两个 if 语句。为此，您可以在“连续事件之间的时间差异与差异不一致”时返回 1，而不是在 check_dif 中返回 0正如我之前所说，在两个相同事件之间的日期”，否则 0。

你的 for 循环将是这样的：

for idx in range(len(time1) - 1):
    is_dif = check_dif(time1[idx], time1[idx+1], date1[idx], date1[idx+1])
    # Correction value: if is_dif == 0, no correction; otherwise a correction takes place
    correction = is_dif * (date1[idx+1]-date1[idx])/np.timedelta64(1,'s') - time1[idx+1]
    corr1[idx+1] = time1[idx] + correction

一种更 numpy 的方式可以通过矢量化。我不知道您是否对速度或文件有多大有一些基准，但我认为在您的情况下，之前的更改应该足够好

A simple initial way to optimize your code is to make the code if-less, thus getting rid of both the if statements. To do so, instead of returning 0 in check_dif, you can return 1 when "the difference in time between consecutive events is inconsistent with the difference in date between the two same events as I said before", otherwise 0.

Your for loop will be something like that:

for idx in range(len(time1) - 1):
    is_dif = check_dif(time1[idx], time1[idx+1], date1[idx], date1[idx+1])
    # Correction value: if is_dif == 0, no correction; otherwise a correction takes place
    correction = is_dif * (date1[idx+1]-date1[idx])/np.timedelta64(1,'s') - time1[idx+1]
    corr1[idx+1] = time1[idx] + correction

A more numpy way to do things could be through vectorization. I don't know if you have some benchmark on the speed or how big the file is, but I think in your case the previous change should be good enough

回复收藏 0 原文

~没有更多了~