如何使Matlab fillmissing函数仅在已知值之间估算一定数量的缺失值?

发布于 2025-01-15 02:15:04 字数 803 浏览 3 评论 0原文

我们仅出于示例目的考虑此代码:

A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
dates = datetime({'2010','2011','2012','2013','2014'},'InputFormat','yyyy')';
TT = array2timetable(A,'RowTimes',dates);

生成的时间表是: 开始时间表具有不需要的缺失值

我想使用 matlab 函数 fillmissing 根据以下规则插补缺失数据:

  • 时间序列开头的缺失数据不应 估算
  • 时间序列末尾的 缺失数据不应 仅当满足以下情况时,才应对已知值内的估算
  • 缺失数据进行估算 已知值之间的缺失值数量非常少 大于 2

由此产生的时间表应该是: 最终具有所需缺失数据插补的表

请注意,此处仅插补了 A2 列中的第 4 行。我可以用 fillmissing 来做到这一点吗?不然我怎么能这么做呢?

Let's consider this code only for exemplification purpose:

A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
dates = datetime({'2010','2011','2012','2013','2014'},'InputFormat','yyyy')';
TT = array2timetable(A,'RowTimes',dates);

The resulting timetable is:
starting timetable with undesired missing values

I would like to use the matlab function fillmissing to impute missing data according to the following rules:

  • missing data at the beginning of the time series should not be
    imputed
  • missing data at the end of the time series should not be
    imputed
  • missing data within known values should be imputed only if
    the number of missing values between known values is strictly minor
    than 2

The resulting timetable should be:
the final table with desired missing data imputation

Notice that only the 4th row in the column A2 has been imputed here. Can I do that with fillmissing? Otherwise how can I do that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我爱人 2025-01-22 02:15:04

您可以使用 find 查找第一个和最后一个非 NaN 值。根据这些指标,如果缺失值少于 2 个,您可以有条件地填充缺失数据。对于某些向量v

idxNaN = isnan( v ); % Get indicies of values which are NaN
idxDataStart = find( ~idxNaN, 1, 'first' ); % First NaN index
idxDataEnd =   find( ~idxNaN, 1, 'last' );  % Last NaN index
idxData = idxDataStart:idxDataEnd;          % Indices of valid data
numValsMissing = nnz( idxNaN(idxData) );    % Number of NaNs in valid data
if numValsMissing < 2 % Check for max number of NaNs
    v(idxData) = fillmissing(v(idxData));   % Fill missing on this data
end

对于数组A,您可以循环遍历列并应用上面的内容,其中每一列都是一个向量v

A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];

for ii = 1:size(A,2)
    v = A(:,ii);
    idxNaN = isnan( v );
    idxDataStart = find( ~idxNaN, 1, 'first' );
    idxDataEnd =   find( ~idxNaN, 1, 'last' );
    idxData = idxDataStart:idxDataEnd;
    numValsMissing = nnz( idxNaN(idxData) );
    if numValsMissing < 2
        v(idxData) = fillmissing(v(idxData),'linear');
    end
    A(:,ii) = v;
end

You can find the first and last non-NaN values using find. Based on these indicies, you can conditionally fill missing data if there are fewer than 2 missing values. For some vector v:

idxNaN = isnan( v ); % Get indicies of values which are NaN
idxDataStart = find( ~idxNaN, 1, 'first' ); % First NaN index
idxDataEnd =   find( ~idxNaN, 1, 'last' );  % Last NaN index
idxData = idxDataStart:idxDataEnd;          % Indices of valid data
numValsMissing = nnz( idxNaN(idxData) );    % Number of NaNs in valid data
if numValsMissing < 2 % Check for max number of NaNs
    v(idxData) = fillmissing(v(idxData));   % Fill missing on this data
end

For your array A you can loop over the columns and apply the above, where each column is a vector v.

A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];

for ii = 1:size(A,2)
    v = A(:,ii);
    idxNaN = isnan( v );
    idxDataStart = find( ~idxNaN, 1, 'first' );
    idxDataEnd =   find( ~idxNaN, 1, 'last' );
    idxData = idxDataStart:idxDataEnd;
    numValsMissing = nnz( idxNaN(idxData) );
    if numValsMissing < 2
        v(idxData) = fillmissing(v(idxData),'linear');
    end
    A(:,ii) = v;
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文