替换R中的缺失值
我需要帮助以更换以下虚拟文件中的丢失值。更换缺失值时,需要遵循以下规则。
如果该列的两侧的值相同,该列的两个列的值都缺少值,则应用两侧的值替换缺失的值。
如果两个相邻单元格值缺少值的列的两侧的值相同,则应用两侧的值替换缺失的值。
如果该列两侧的值相同,其中3、4或更多相邻单元格缺少值,则应将缺失的值替换为其两侧的值
- 如果缺少2007年Colum的值,则应将缺失值替换为IT
的两侧的值,则应将其替换 为如果它们是相同的,则由2008年和2009年的价值替换
如果缺少2017列中的值,则应将其替换为2016年和2015年的值。如果它们相同。
如果包含缺失值的列两侧的值不同,则应将缺失的值替换为2007年至2017列之间最常见的值。
如果2007年和2008年缺失,则将两个丢失的值替换为2009,如果2009 == 2010 == 2011
如果2007、2008和2009丢失,请替换所有三个失误值,如果2010 == 2011 == 2012
- 2008年和 2009 == 2013
替换所有四件失误值,如果2017年和2016年丢失,则将两者都替换为2015,如果2015 == 2014 == 2014 == 2013 < /p>
如果2017、2016和2015丢失,则替换为2014,如果2014 == 2013 == 2012
替换所有三件失误值== 2011
替换所有四件失误值,以创建每个情况
在2007年和2017年创建独特价值的新变量
为每种情况下的虚拟数据
dput(gb)
structure(list(ID = 1:20, X2007 = c("a1", "v1", "", "e1", "d1",
"g1", "t1", "b2w", "p1", "q1", "sd1", "fr4", "fr6", "gt7", "",
"ju8", "ki9", "lo9", "", "i88"), X2008 = c("a1", "v1", "c1",
"e1", "d1", "", "t1", "b2w", "", "", "", "", "", "", "", "",
"", "", "", ""), X2009 = c("a1", "", "c1", "", "", "d1", "t1",
"", "p1", "", "sd1", "", "fr6", "", "hj7", "ju8", "ki9", "lo9",
"k99", "i88"), X2010 = c("a1", "", "", "e1", "", "d1", "", "",
"p1", "", "sd1", "", "fr6", "gt7", "hj7", "", "ki9", "", "k99",
""), X2011 = c("", "v1", "", "", "", "d1", "", "b2w", "p1", "q1",
"sd1", "", "fr6", "gt7", "hj7", "", "ki9", "", "k99", ""), X2012 = c("a1",
"v1", "c1", "e1", "", "", "", "b2w", "p1", "q1", "sd1", "", "fr6",
"gt7", "hj7", "ju8", "ki9", "lo9", "k99", ""), X2013 = c("b1 ",
"", "c1", "e1", "d1", "", "t1", "", "p1", "q1", "sd1", "fr4",
"fr6", "gt7", "hj7", "ju8", "ki9", "lo9", "k99", ""), X2014 = c("",
"v1", "", "", "d1", "g1", "t1", "", "", "q1", "", "fr4", "",
"gt7", "", "ju8", "", "lo9", "", "i88"), X2015 = c("b3", "b6",
"", "", "d1", "g1", "t1", "", "", "q1", "", "fr4", "", "", "",
"ju8", "", "lo9", "", "i88"), X2016 = c("b4", "b6", "", "", "d1",
"g1", "t1", "b2w", "", "", "", "fr4", "", "", "", "", "", "lo9",
"", "i88"), X2017 = c("b5", "b6", "c1", "e1", "d1", "g1", "",
"", "", "", "", "fr4", "", "", "", "", "", "lo9", "", "i88")), class = "data.frame", row.names = c(NA,
-20L))
I need help in replacing missing values in the following dummy file. The following rule need to be followed when replacing a missing value.
If the value is the same on both sides of the column where the cell has a missing value, the missing value should be replaced with the value on either side.
If the value is the same on both sides of a column where two adjacent cells have a missing value, the missing value should be replaced with the value on either side.
If the value same on both sides of the column where 3, 4 or more adjacent cells have missing value, the missing value should be replaced with the value on either side of it
If the value in 2007 Colum missing, then it should be replaced with the value of 2008 and 2009 if they are the same
If a value in the 2017 Column is missing, it should be replaced with the values from 2016 and 2015, if they are the same.
If the value is not the same on both sides of the column containing the missing value, the missing value should be replaced with the most frequently occurring value between 2007 and 2017 columns.
If 2007 and 2008 missing, replace both missing value with 2009 if 2009==2010==2011
If 2007, 2008 and 2009 missing, replace all three-missing value with 2010 if 2010==2011==2012
If 2007, 2008, 2009 and 2010 missing, replace all four-missing value with 2011 if 2011==2012==2013
If 2017 and 2016 missing, replace both missing value with 2015 if 2015==2014==2013
If 2017, 2016 and 2015 missing, replace all three-missing value with 2014 if 2014==2013==2012
If 2017, 2016, 2015 and 2014 missing, replace all four-missing value with 2013 if 2013==2012==2011
create new variable of count of unique value during 2007 and 2017 for every case
dummy data is below
dput(gb)
structure(list(ID = 1:20, X2007 = c("a1", "v1", "", "e1", "d1",
"g1", "t1", "b2w", "p1", "q1", "sd1", "fr4", "fr6", "gt7", "",
"ju8", "ki9", "lo9", "", "i88"), X2008 = c("a1", "v1", "c1",
"e1", "d1", "", "t1", "b2w", "", "", "", "", "", "", "", "",
"", "", "", ""), X2009 = c("a1", "", "c1", "", "", "d1", "t1",
"", "p1", "", "sd1", "", "fr6", "", "hj7", "ju8", "ki9", "lo9",
"k99", "i88"), X2010 = c("a1", "", "", "e1", "", "d1", "", "",
"p1", "", "sd1", "", "fr6", "gt7", "hj7", "", "ki9", "", "k99",
""), X2011 = c("", "v1", "", "", "", "d1", "", "b2w", "p1", "q1",
"sd1", "", "fr6", "gt7", "hj7", "", "ki9", "", "k99", ""), X2012 = c("a1",
"v1", "c1", "e1", "", "", "", "b2w", "p1", "q1", "sd1", "", "fr6",
"gt7", "hj7", "ju8", "ki9", "lo9", "k99", ""), X2013 = c("b1 ",
"", "c1", "e1", "d1", "", "t1", "", "p1", "q1", "sd1", "fr4",
"fr6", "gt7", "hj7", "ju8", "ki9", "lo9", "k99", ""), X2014 = c("",
"v1", "", "", "d1", "g1", "t1", "", "", "q1", "", "fr4", "",
"gt7", "", "ju8", "", "lo9", "", "i88"), X2015 = c("b3", "b6",
"", "", "d1", "g1", "t1", "", "", "q1", "", "fr4", "", "", "",
"ju8", "", "lo9", "", "i88"), X2016 = c("b4", "b6", "", "", "d1",
"g1", "t1", "b2w", "", "", "", "fr4", "", "", "", "", "", "lo9",
"", "i88"), X2017 = c("b5", "b6", "c1", "e1", "d1", "g1", "",
"", "", "", "", "fr4", "", "", "", "", "", "lo9", "", "i88")), class = "data.frame", row.names = c(NA,
-20L))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一种可能的方法:
Zoo :: Locf
,获取任何连续丢失的序列的“相邻”值case_when()
输出:
Here is a possible approach:
zoo::locf
, get the "adjacent" values for any sequence of consecutive missingcase_when()
Output: