如何在不指定期间的情况下分解数据中存在多个周期性？

发布于 2025-01-21 13:33:15 字数 1189 浏览 3 评论 0原文

我试图将信号中存在的周期性分解为其各个组件，以计算其时间周期。

说以下是我的示例信号：

t_week = np.linspace(1,480, 480)
t_weekend=np.linspace(1,192,192)
T=96 #Time Period
x_weekday = 10*np.sin(2*np.pi*t_week/T)+10
x_weekend = 2*np.sin(2*np.pi*t_weekend/T)+10
x_daily_weekly_sinu = np.concatenate((x_weekday, x_weekend)) 

#Creating the Signal
x_daily_weekly_long_sinu = np.concatenate((x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu))

#Visualization
plt.plot(x_daily_weekly_long_sinu)
plt.show()

周期

如下所示：

我尝试使用STATSMODEL中的STL分解方法：

sm.tsa.seasonal_decompose()

但是，只有在您事先知道该周期的情况下，这是合适的。并且仅适用于一次分解一个周期。虽然，我需要分解任何具有多个周期性的信号，并且其周期未知。

谁能帮助如何实现这一目标？

原文

I am trying to decompose the periodicities present in a signal into its individual components, to calculate their time-periods.

Say the following is my sample signal:

You can reproduce the signal using the following code:

t_week = np.linspace(1,480, 480)
t_weekend=np.linspace(1,192,192)
T=96 #Time Period
x_weekday = 10*np.sin(2*np.pi*t_week/T)+10
x_weekend = 2*np.sin(2*np.pi*t_weekend/T)+10
x_daily_weekly_sinu = np.concatenate((x_weekday, x_weekend)) 

#Creating the Signal
x_daily_weekly_long_sinu = np.concatenate((x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu))

#Visualization
plt.plot(x_daily_weekly_long_sinu)
plt.show()

My objective is to split this signal into 3 separate isolated component signals consisting of:

Days as period
Weekdays as period
Weekends as period

Periods as shown below:

I tried using the STL decomposition method from statsmodel:

sm.tsa.seasonal_decompose()

But this is suitable only if you know the period beforehand. And is only applicable for decomposing a single period at a time.
While, I need to decompose any signal having multiple periodicities and whose periods are not known beforehand.

Can anyone please help how to achieve this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

极度宠爱 2025-01-28 13:33:15

您是否尝试过更多的算法方法？我们可以首先尝试确定信号的变化，即振幅或频率。在发生重大变化的地方确定所有阈值点，并在该窗口上进行FFT。

这是我的方法：

我发现与Daubechies小波的DWT确实擅长于此。当任何一个更改时，都会有不同的峰值，这使识别窗户的确非常好。
做了高斯混合物，基本上是在2个特定的窗口尺寸上键。在您的示例中，这是固定的，但是使用真实数据，可能不是。
通过施加FFT的峰对向后循环，发现了明显的频率。
现在，您可以使用窗口的宽度，您可以用它从高斯混合物与另一个epsilon识别，以找出窗户之间的周期，而FFT之间的频率在该窗口中具有突出的频率。虽然，如果我是您，我将使用混合模型来识别窗户中的关键频率或振幅。如果我们可以假设您在现实世界中的频率/振幅是正态分布的。

请注意，您有多种方法可以解决这个问题。我想说的是，从小波变换开始，就一个很好的开端。

这是代码，尝试添加一些高斯噪声或其他可变性以对其进行测试。您会看到噪音越多，您的dwt的最小epsilon就越高，因此您需要调整其中的一些。

import pywt
from sklearn.mixture import GaussianMixture
data = x_daily_weekly_long_sinu
times = np.linspace(0, len(data), len(data))
SAMPLING_RATE = len(data) / len(times)  # needed for frequency calc (number of discrete times / time interval) in this case it's 1
cA, cD = pywt.dwt(data, 'db4', mode='periodization')  # Daubechies wavelet good for changes in freq

# find peaks, with db4 good indicator of changes in frequencies, greater than some arbitrary value (you'll have to find by possibly plotting plt.plot(cD))
EPSILON = 0.02
peaks = (np.where(np.abs(cD) > EPSILON)[0] * 2)  # since cD (detailed coef) is len(x) / 2 only true for periodization mode
peaks = [0] + peaks.tolist() + [len(data) -1 ]  # always add start and end as beginning of windows

# iterate through peak pairs
if len(peaks) < 2:
    print('No peaks found...')
    exit(0)

# iterate through the "paired" windows
MIN_WINDOW_WIDTH = 10   # min width for the start of a new window
peak_starts = []
for i in range(len(peaks) - 1):
    s_ind, e_ind = peaks[i], peaks[i + 1]
    if len(peak_starts) > 0 and (s_ind - peak_starts[-1]) < MIN_WINDOW_WIDTH:
        continue  # not wide enough
    peak_starts.append(s_ind)

# calculate the sequential differences between windows
# essentially giving us how wide they are
seq_dist = np.array([t - s for s, t in zip(peak_starts, peak_starts[1:])])

# since peak windows might not be exact in the real world let's make a gaussian mixture
# you're assuming how many different windows there are here)
# with this assumption we're going to assume 2 different kinds of windows
WINDOW_NUM = 2
gmm = GaussianMixture(WINDOW_NUM).fit(seq_dist.reshape(-1, 1))
window_widths = [float(m) for m in gmm.means_]

# for example we would assume this prints (using your example of 2 different window types)
weekday_width, weekend_width = list(sorted(window_widths))
print('Weekday Width, Weekend Width', weekday_width, weekend_width)  # prints 191.9 and 479.59

# now we can process peak pairs with their respective windows
# we specify a padding which essentially will remove edge data which might overlap with another window (we really only care about the frequency)
freq_data = {}
PADDING = 3  # add padding to remove edge elements
WIDTH_EPSILON = 5  # make sure the window found is within the width found in gaussian mixture (to remove other small/large windows with noise)
T2_data = []
T3_data = []
for s, t in zip(peak_starts, peak_starts[1:]):
    width = t - s
    passed = False
    for testw in window_widths:
        if abs(testw - width) < WIDTH_EPSILON:
            passed = True
            break
    
    # weird window ignore it
    if not passed:
        continue

    # for your example let's populate T2 data
    if (width - weekday_width) < WIDTH_EPSILON:
        T2_data.append(s)  # append start
    elif (width - weekend_width) < WIDTH_EPSILON:
        T3_data.append(s)

    # append main frequency in window
    window = data[s + PADDING: t - PADDING]

    # get domininant frequency
    fft = np.real(np.fft.fft(window))
    fftfreq = np.fft.fftfreq(len(window))
    freq = SAMPLING_RATE * fftfreq[np.argmax(np.abs(fft[1:])) + 1]  # ignore constant (shifting) freq 0
    freq_data[int(testw)] = np.abs(freq)

print('T2 = ', np.mean([t - s for s, t in zip(T2_data, T2_data[1:])]))
print('T3 = ', np.mean([t - s for s, t in zip(T3_data, T3_data[1:])]))
print('Frequency data', freq_data)

# convert to periods
period_data = {}
for w in freq_data.keys():
    period_data[w] = 1.0 / freq_data[w]

print('Period data', period_data)

用您的示例打印以下示例（注意结果并不精确）。

Weekday Width, Weekend Width 191.99999999999997 479.5999999999999
T2 =  672.0
T3 =  671.5555555555555
Frequency data {479: 0.010548523206751054, 191: 0.010752688172043012}
Period data {479: 94.8, 191: 92.99999999999999}

请注意，这就是DWT COEFS的外观。

Have you tried more of an algorithmic approach? We could first try to identify the changes in the signal, either amplitude or frequency. Identify all threshold points where there is a major change, with some epsilon, and then do FFT on that window.

Here was my approach:

I found that the DWT with Daubechies wavelet was really good at this. There are distinct peaks when transformed for when either one changes, which makes identifying the windows really nice.
Did a Gaussian mixture, to essentially key in on 2 specific window sizes. In your example, this is fixed but with real data, it might not be.
Looped back through pairs of peaks applied FFT and found prominent frequency.
Now you have the width of windows which you can use to identify from Gaussian mixture with another epsilon to figure out the period between windows, and FFT to have the prominent frequency within that window. Although, if I were you I would use the mixture model to identify the key frequencies or amplitudes in the windows. If we can assume your frequencies/amplitudes in the real world are normally distributed.

Note there are many ways you could mess with this. I would say starting with a wavelet transform, personally, is a good start.

Here is the code, try adding some Gaussian noise or other variability to test it out. You see the more noise the higher your min epsilon for DWT will need to be so you do need to tune some of it.

import pywt
from sklearn.mixture import GaussianMixture
data = x_daily_weekly_long_sinu
times = np.linspace(0, len(data), len(data))
SAMPLING_RATE = len(data) / len(times)  # needed for frequency calc (number of discrete times / time interval) in this case it's 1
cA, cD = pywt.dwt(data, 'db4', mode='periodization')  # Daubechies wavelet good for changes in freq

# find peaks, with db4 good indicator of changes in frequencies, greater than some arbitrary value (you'll have to find by possibly plotting plt.plot(cD))
EPSILON = 0.02
peaks = (np.where(np.abs(cD) > EPSILON)[0] * 2)  # since cD (detailed coef) is len(x) / 2 only true for periodization mode
peaks = [0] + peaks.tolist() + [len(data) -1 ]  # always add start and end as beginning of windows

# iterate through peak pairs
if len(peaks) < 2:
    print('No peaks found...')
    exit(0)

# iterate through the "paired" windows
MIN_WINDOW_WIDTH = 10   # min width for the start of a new window
peak_starts = []
for i in range(len(peaks) - 1):
    s_ind, e_ind = peaks[i], peaks[i + 1]
    if len(peak_starts) > 0 and (s_ind - peak_starts[-1]) < MIN_WINDOW_WIDTH:
        continue  # not wide enough
    peak_starts.append(s_ind)

# calculate the sequential differences between windows
# essentially giving us how wide they are
seq_dist = np.array([t - s for s, t in zip(peak_starts, peak_starts[1:])])

# since peak windows might not be exact in the real world let's make a gaussian mixture
# you're assuming how many different windows there are here)
# with this assumption we're going to assume 2 different kinds of windows
WINDOW_NUM = 2
gmm = GaussianMixture(WINDOW_NUM).fit(seq_dist.reshape(-1, 1))
window_widths = [float(m) for m in gmm.means_]

# for example we would assume this prints (using your example of 2 different window types)
weekday_width, weekend_width = list(sorted(window_widths))
print('Weekday Width, Weekend Width', weekday_width, weekend_width)  # prints 191.9 and 479.59

# now we can process peak pairs with their respective windows
# we specify a padding which essentially will remove edge data which might overlap with another window (we really only care about the frequency)
freq_data = {}
PADDING = 3  # add padding to remove edge elements
WIDTH_EPSILON = 5  # make sure the window found is within the width found in gaussian mixture (to remove other small/large windows with noise)
T2_data = []
T3_data = []
for s, t in zip(peak_starts, peak_starts[1:]):
    width = t - s
    passed = False
    for testw in window_widths:
        if abs(testw - width) < WIDTH_EPSILON:
            passed = True
            break
    
    # weird window ignore it
    if not passed:
        continue

    # for your example let's populate T2 data
    if (width - weekday_width) < WIDTH_EPSILON:
        T2_data.append(s)  # append start
    elif (width - weekend_width) < WIDTH_EPSILON:
        T3_data.append(s)

    # append main frequency in window
    window = data[s + PADDING: t - PADDING]

    # get domininant frequency
    fft = np.real(np.fft.fft(window))
    fftfreq = np.fft.fftfreq(len(window))
    freq = SAMPLING_RATE * fftfreq[np.argmax(np.abs(fft[1:])) + 1]  # ignore constant (shifting) freq 0
    freq_data[int(testw)] = np.abs(freq)

print('T2 = ', np.mean([t - s for s, t in zip(T2_data, T2_data[1:])]))
print('T3 = ', np.mean([t - s for s, t in zip(T3_data, T3_data[1:])]))
print('Frequency data', freq_data)

# convert to periods
period_data = {}
for w in freq_data.keys():
    period_data[w] = 1.0 / freq_data[w]

print('Period data', period_data)

With your example that printed the following (note results weren't exact).

Weekday Width, Weekend Width 191.99999999999997 479.5999999999999
T2 =  672.0
T3 =  671.5555555555555
Frequency data {479: 0.010548523206751054, 191: 0.010752688172043012}
Period data {479: 94.8, 191: 92.99999999999999}

Note this is what the DWT coefs look like.