在 Excel 中查找缺失数据并进行线性插值

发布于 2024-11-10 02:57:23 字数 1892 浏览 4 评论 0原文

我希望有人能帮我在 Excel 中线性插值缺失数据。

我有在不同时间间隔收集的数据,为了操作数据,我需要将其排列在行中。我正在尝试创建一个函数,该函数可以查找列中丢失的数据,并根据上面最接近的值和已经过去的时间量的线性插值来填充它。简单的线性插值适合填充缺失的数据。我有下面显示的示例数据(不幸的是,不允许我发布图像,因此下面用逗号分隔数据)。对于开头或结尾丢失的数据,可以保留空白或复制最接近的值。

感谢您的帮助

示例数据(逗号分隔):

Date(days),Time(hours),Data1,Data2,Data3,Data4,Data5,Data6
40684,0.385670139,,,2.049620821,,,
40684,0.385675069,,,,,,0.133139679
40684,0.385680012,,,8.550069731,,,
40684,0.385684954,,,,,,
40684,0.385689884,,,6.308237045,,,
40684,0.385694826,,,,0.012712923,,0.008131037
40684,0.385699769,,,7.00871559,,,
40684,0.385704699,6.153512677,,,0.00487698,,
40684,0.385709641,,12.45628511,5.053182136,,,
40684,0.385714572,,,,,,
40684,0.385719514,,3.908056327,5.79615269,,0.132418943,
40684,0.385724456,,,,,,
40684,0.385729387,,,1.100001911,,0.011925063,
40684,0.385734329,,,,,,
40684,0.385739271,,,4.576754454,,0.114426916,
40684,0.385744201,,,,,,
40684,0.385749144,,,6.64728305,,0.078670107,
40684,0.385754074,,,,,,
40684,0.385759016,,,3.666219416,,,
40684,0.385763958,,,,0.206991693,,
40684,0.385768889,,,7.619782896,,,
40684,0.385773831,,,,,,
40684,0.385778773,,,2.622500183,,,
40684,0.385783704,,,,0.136512025,,
40684,0.385788646,,,4.506286862,,,
40684,0.385793576,,,,,,0.028082778
40684,0.385798519,,,0.28245389,,,
40684,0.385803461,,,,,,0.161361483
40684,0.385808391,,,8.368688368,,,
40684,0.385813333,,,,,,0.001826172
40684,0.385818264,,,2.513474192,,,
40684,0.385823206,,,,,,0.068932257
40684,0.385828148,,,3.908795401,,,
40684,0.385833079,,,,,,
40684,0.385838021,,,6.855867864,,,
40684,0.385842963,,,,,,
40684,0.385847894,,,11.49078845,,,
40684,0.385852836,,,,,,
40684,0.385857766,,,8.035507345,,0.022647571,
40684,0.385862708,9.617499888,,,0.113758152,,
40684,0.385867639,,23.14163433,10.70361681,,0.095718193,
40684,0.385872581,,,,0.073685334,,
40684,0.385877523,,13.93635422,11.24775181,,,
40684,0.385882454,,,,,,

I was hopping someone could give me a hand with linearly interpolating missing data in Excel.

I have data I gathered at different time intervals and in order to manipulate the data, I need it to line up in the rows. I am trying to make a function which finds missing data in a Column and fills it in based on a linear interpolation of the closest value above and the amount of time which has passed. A simple linear interpolation would be suitable to fill in the missing data. I have example data shown below (unfortunately wont let me post an image so comma delineated data is below). For missing data at the beginning or end, it would be OK to leave blank or copy the closest value.

Thank you for the help

Example Data (comma delineated):

Date(days),Time(hours),Data1,Data2,Data3,Data4,Data5,Data6
40684,0.385670139,,,2.049620821,,,
40684,0.385675069,,,,,,0.133139679
40684,0.385680012,,,8.550069731,,,
40684,0.385684954,,,,,,
40684,0.385689884,,,6.308237045,,,
40684,0.385694826,,,,0.012712923,,0.008131037
40684,0.385699769,,,7.00871559,,,
40684,0.385704699,6.153512677,,,0.00487698,,
40684,0.385709641,,12.45628511,5.053182136,,,
40684,0.385714572,,,,,,
40684,0.385719514,,3.908056327,5.79615269,,0.132418943,
40684,0.385724456,,,,,,
40684,0.385729387,,,1.100001911,,0.011925063,
40684,0.385734329,,,,,,
40684,0.385739271,,,4.576754454,,0.114426916,
40684,0.385744201,,,,,,
40684,0.385749144,,,6.64728305,,0.078670107,
40684,0.385754074,,,,,,
40684,0.385759016,,,3.666219416,,,
40684,0.385763958,,,,0.206991693,,
40684,0.385768889,,,7.619782896,,,
40684,0.385773831,,,,,,
40684,0.385778773,,,2.622500183,,,
40684,0.385783704,,,,0.136512025,,
40684,0.385788646,,,4.506286862,,,
40684,0.385793576,,,,,,0.028082778
40684,0.385798519,,,0.28245389,,,
40684,0.385803461,,,,,,0.161361483
40684,0.385808391,,,8.368688368,,,
40684,0.385813333,,,,,,0.001826172
40684,0.385818264,,,2.513474192,,,
40684,0.385823206,,,,,,0.068932257
40684,0.385828148,,,3.908795401,,,
40684,0.385833079,,,,,,
40684,0.385838021,,,6.855867864,,,
40684,0.385842963,,,,,,
40684,0.385847894,,,11.49078845,,,
40684,0.385852836,,,,,,
40684,0.385857766,,,8.035507345,,0.022647571,
40684,0.385862708,9.617499888,,,0.113758152,,
40684,0.385867639,,23.14163433,10.70361681,,0.095718193,
40684,0.385872581,,,,0.073685334,,
40684,0.385877523,,13.93635422,11.24775181,,,
40684,0.385882454,,,,,,

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不羁少年 2024-11-17 02:57:23

您需要的是使用线性插值或三次样条对数据进行插值。您可以对齐 x 轴。

根据数据量,您将需要 VBA 代码来为您进行数学计算。不幸的是,从您提供的数据来看,数据中似乎存在很多噪声(或变化)(参见 Data6),并且在短时间内发生很大变化,大多数方案都将不稳定。您可能需要首先平滑数据(如果有很多数据),然后对其进行插值。

对于线性插值,请激活工作表并运行以下命令

Option Explicit

Public Sub LinearFit()
Dim rx As Range, ry As Range
Dim x() As Variant, y() As Variant
Dim i As Integer, j As Integer, N As Integer, M As Integer
M = 6 'Columns
N = 44 'Rows
Set rx = Range("A2").Resize(N, 2) ' get x values (2 columns)
x = rx.Value    'Get Day/Time array from worksheet
Dim i_low, i_high As Integer
Dim x_low As Double, x_high As Double
Dim y_low As Double, y_high As Double
Dim x_temp As Double
For j = 1 To M 'go through all columns of data
    Set ry = Range("C2").Offset(0, j - 1).Resize(N, 1)
    y = ry.Value    'Get value array from worksheet
    i_low = 1: i_high = 0 'initialize search values
    Do  'loop starting empty values
        i_high = i_high + 1
    Loop While IsEmpty(y(i_high, 1)) And i_high < N
    'get x for first non-empty cell
    x_high = CDbl(x(i_high, 1)) + 24# * CDbl(x(i_high, 2))
    If IsEmpty(y(i_high, 1)) Then
        Exit For 'if column is empty leave it alone
    Else
        y_high = CDbl(y(i_high, 1)) ' get first non-zero value
    End If
    'Fill empty first cells with constant value
    For i = 1 To i_high - 1
        y(i, 1) = y_high 'fill empty cells with initial value
    Next i
    Do 'loop through sections of empty cells
        i_low = i_high
        x_low = x_high
        y_low = y_high
        'Find next empty block if it exists
        Do
            i_high = i_high + 1
        Loop While IsEmpty(y(i_high, 1)) And i_high < N
        'get x for next block end
        x_high = CDbl(x(i_high, 1)) + 24# * CDbl(x(i_high, 2))
        If IsEmpty(y(i_high, 1)) Then
            y_high = y_low 'if cells empty to end use last value
        Else
            y_high = CDbl(y(i_high, 1)) 'find next value
        End If
        For i = i_low To i_high
            ' x value of interest
            x_temp = CDbl(x(i, 1)) + 24# * CDbl(x(i, 2))
            ' Linear Interpolation (see wikipedia for details)
            y(i, 1) = y_low + (y_high - y_low) * (x_temp - x_low) / (x_high - x_low)
        Next i
    Loop While i_high < N
    ry.Value = y 'put array back into worksheet
Next j

End Sub

,您将看到像这样填充的值:

Date(days)  Time(hours) Data1   Data2   Data3   Data4   Data5   Data6
40684   0.385670139 6.153512677 12.45628511 2.049620821 0.012712923 0.132418943 0.133139679
40684   0.385675069 6.153512677 12.45628511 5.295565628 0.012712923 0.132418943 0.133139679
40684   0.385680012 6.153512677 12.45628511 8.550069731 0.012712923 0.132418943 0.101863791
40684   0.385684954 6.153512677 12.45628511 7.427790828 0.012712923 0.132418943 0.070594229
40684   0.385689884 6.153512677 12.45628511 6.308237045 0.012712923 0.132418943 0.039400597
40684   0.385694826 6.153512677 12.45628511 6.658440881 0.012712923 0.132418943 0.008131037
40684   0.385699769 6.153512677 12.45628511 7.00871559  0.008789793 0.132418943 0.009129735
40684   0.385704699 6.153512677 12.45628511 6.032137384 0.00487698  0.132418943 0.010125807
40684   0.385709641 6.261854761 12.45628511 5.053182136 0.021732662 0.132418943 0.011124303
40684   0.385714572 6.369955702 8.186932606 5.424253533 0.038550828 0.132418943 0.012120577

What you need is to interpolate the data with either a linear interpolation, or a cubic spline. The you can align the x-axis.

Depending on the amount of data you will need VBA code to do the math for you. Unfortunately from the data you supplied it seems that there is a lot of noise (or variation) in the data (look at Data6) with big changes over short time that most schemes are going to be unstable. What you might need is to smooth the data first (if you have lots of it) and then interpolate it.

For linear interpolation have the sheet active and run the following

Option Explicit

Public Sub LinearFit()
Dim rx As Range, ry As Range
Dim x() As Variant, y() As Variant
Dim i As Integer, j As Integer, N As Integer, M As Integer
M = 6 'Columns
N = 44 'Rows
Set rx = Range("A2").Resize(N, 2) ' get x values (2 columns)
x = rx.Value    'Get Day/Time array from worksheet
Dim i_low, i_high As Integer
Dim x_low As Double, x_high As Double
Dim y_low As Double, y_high As Double
Dim x_temp As Double
For j = 1 To M 'go through all columns of data
    Set ry = Range("C2").Offset(0, j - 1).Resize(N, 1)
    y = ry.Value    'Get value array from worksheet
    i_low = 1: i_high = 0 'initialize search values
    Do  'loop starting empty values
        i_high = i_high + 1
    Loop While IsEmpty(y(i_high, 1)) And i_high < N
    'get x for first non-empty cell
    x_high = CDbl(x(i_high, 1)) + 24# * CDbl(x(i_high, 2))
    If IsEmpty(y(i_high, 1)) Then
        Exit For 'if column is empty leave it alone
    Else
        y_high = CDbl(y(i_high, 1)) ' get first non-zero value
    End If
    'Fill empty first cells with constant value
    For i = 1 To i_high - 1
        y(i, 1) = y_high 'fill empty cells with initial value
    Next i
    Do 'loop through sections of empty cells
        i_low = i_high
        x_low = x_high
        y_low = y_high
        'Find next empty block if it exists
        Do
            i_high = i_high + 1
        Loop While IsEmpty(y(i_high, 1)) And i_high < N
        'get x for next block end
        x_high = CDbl(x(i_high, 1)) + 24# * CDbl(x(i_high, 2))
        If IsEmpty(y(i_high, 1)) Then
            y_high = y_low 'if cells empty to end use last value
        Else
            y_high = CDbl(y(i_high, 1)) 'find next value
        End If
        For i = i_low To i_high
            ' x value of interest
            x_temp = CDbl(x(i, 1)) + 24# * CDbl(x(i, 2))
            ' Linear Interpolation (see wikipedia for details)
            y(i, 1) = y_low + (y_high - y_low) * (x_temp - x_low) / (x_high - x_low)
        Next i
    Loop While i_high < N
    ry.Value = y 'put array back into worksheet
Next j

End Sub

and you will see values filled in like this:

Date(days)  Time(hours) Data1   Data2   Data3   Data4   Data5   Data6
40684   0.385670139 6.153512677 12.45628511 2.049620821 0.012712923 0.132418943 0.133139679
40684   0.385675069 6.153512677 12.45628511 5.295565628 0.012712923 0.132418943 0.133139679
40684   0.385680012 6.153512677 12.45628511 8.550069731 0.012712923 0.132418943 0.101863791
40684   0.385684954 6.153512677 12.45628511 7.427790828 0.012712923 0.132418943 0.070594229
40684   0.385689884 6.153512677 12.45628511 6.308237045 0.012712923 0.132418943 0.039400597
40684   0.385694826 6.153512677 12.45628511 6.658440881 0.012712923 0.132418943 0.008131037
40684   0.385699769 6.153512677 12.45628511 7.00871559  0.008789793 0.132418943 0.009129735
40684   0.385704699 6.153512677 12.45628511 6.032137384 0.00487698  0.132418943 0.010125807
40684   0.385709641 6.261854761 12.45628511 5.053182136 0.021732662 0.132418943 0.011124303
40684   0.385714572 6.369955702 8.186932606 5.424253533 0.038550828 0.132418943 0.012120577
忘羡 2024-11-17 02:57:23

我相信我已经想出了一种在细胞中做到这一点的方法。此公式将从第二行开始计算 A 列中的时间、B 列中的间隙数据、C 列中的内插数据。请注意,但在最后几行上会失败,因为可能没有更多值可供查找。

{=IF(ISBLANK(B2), C1 + (偏移(B2, IF(ISBLANK(B2)), MATCH(1,
IF(ISBLANK(B2:B$999), 0, 1), 0), B2), 0) - C1) / (OFFSET($A2,
IF(ISBLANK(B2), MATCH(1, IF(ISBLANK(B2:B$999), 0, 1), 0), B2), 0) -
$A1) * ($A2 - $A1), B2)}

I believe I have come up with a way to do it in the cell. This formula will work from the second row down for time in column A, gappy data in column B, interpolated data in column C. Note that but fails on the last couple of rows as there may be no more values to find.

{=IF(ISBLANK(B2), C1 + (OFFSET(B2, IF(ISBLANK(B2), MATCH(1,
IF(ISBLANK(B2:B$999), 0, 1), 0), B2), 0) - C1) / (OFFSET($A2,
IF(ISBLANK(B2), MATCH(1, IF(ISBLANK(B2:B$999), 0, 1), 0), B2), 0) -
$A1) * ($A2 - $A1), B2)}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文