r 中多个个体的聚类分析

发布于 2024-11-07 08:01:18 字数 8183 浏览 1 评论 0原文

抱歉,我不知道如何使用 HTML 或其他任何东西来让它看起来“漂亮”。特别是为了让我的示例数据对大家有用。我只是边走边学。

我正在尝试对变量 PersVel、TurnVel 和 Velocity(可能还有其他变量,但这些暂时可以)运行聚类分析。我已经按年份分隔了数据,但每年的人数有所不同(ID 是这些人的名称)。我想对每个个体的这些变量运行 k 均值和/或分层聚类分析。下面的数据只有20个数据点。一旦确定了感兴趣的变量的聚类,我想将其链接回日历日期或日期/时间变量。最终我想知道集群何时发生。

我已经编写了将 ID 转换为级别的代码,并被告知我需要标准化 k 均值聚类的变量(因此我假设您会对层次结构执行相同的操作,但这没什么大不了的)。如何让它在个体中循环?

IDNames = levels(Data$ID)
for (i in 1:(length(IDNames)){

现在怎么办?我该如何编写下一部分来进行此测试?

structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("c_002", 
"c_102", "c_104", "c_401", "c_402", "c_406", "c_409", "c_411", 
"c_412", "c_413", "c_414", "c_415", "c_417", "c_418", "c_420", 
"c_421", "c_423", "c_425", "c_426", "c_602", "c_604", "c_9809", 
"c_9814", "c_9815", "c_9816", "c_9819", "c_9908", "c_9911"), class = "factor"), 
    x = c(229539.8109, 231122.438, 231290.6472, 231355.2828, 
    230910.8116, 230928.7384, 231164.6592, 231113.9708, 231186.0565, 
    231270.4396, 231334.5768, 231153.0715, 231215.2728, 231200.7462, 
    231325.1136, 231777.6369, 231522.6185, 231674.6925, 231684.3388, 
    231924.464, 232065.5961), y = c(2229114.92, 2229455.232, 
    2230388.77, 2232003.32, 2232559.623, 2232521.689, 2232434.829, 
    2232996.109, 2233038.608, 2233160.861, 2233371.836, 2233471.823, 
    2233307.792, 2233285.778, 2233204.662, 2231630.353, 2231054.838, 
    2231056.299, 2230981.267, 2230840.082, 2230998.991), DateTime = structure(c(1148853637, 
    1148871660, 1148889637, 1148907637, 1148925637, 1148943666, 
    1148961637, 1148979636, 1148997636, 1149015637, 1149033637, 
    1149051690, 1149069666, 1149087665, 1149105637, 1149123683, 
    1149141654, 1149159637, 1149177636, 1149195696, 1149213696
    ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), RunClock_days = c(1179.58332175926, 
    1179.79192129630, 1179.99998842593, 1180.20832175926, 1180.41665509259, 
    1180.62532407407, 1180.83332175926, 1181.04164351852, 1181.24997685185, 
    1181.45832175926, 1181.66665509259, 1181.87560185185, 1182.08365740741, 
    1182.29197916667, 1182.49998842593, 1182.70885416667, 1182.91685185185, 
    1183.12498842593, 1183.33331018519, 1183.54233796296, 1183.75067129630
    ), CalDay = c("148", "149", "149", "149", "149", "149", "150", 
    "150", "150", "150", "151", "151", "151", "151", "151", "152", 
    "152", "152", "152", "152", "153"), DistX = c(1582.62709999998, 
    168.209200000012, 64.6355999999796, -444.4712, 17.9268000000156, 
    235.920799999993, -50.6883999999845, 72.085699999996, 84.3831000000064, 
    64.1371999999974, -181.505300000019, 62.2013000000152, -14.5266000000120, 
    124.367400000017, 452.523300000001, -255.018400000001, 152.073999999993, 
    9.64629999999306, 240.125200000009, 141.132099999988, -3159.38569999998
    ), DistY = c(340.311999999918, 933.538000000175, 1614.54999999981, 
    556.303000000305, -37.9340000003576, -86.8599999998696, 561.280000000261, 
    42.4989999998361, 122.253000000026, 210.975000000093, 99.9869999997318, 
    -164.030999999959, -22.0139999999665, -81.1159999999218, 
    -1574.30899999989, -575.51500000013, 1.46100000012666, -75.032000000123, 
    -141.185000000056, 158.908999999985, -5943.84400000004), 
    Dist = c(1618.80227174238, 948.571311188026, 1615.84326693116, 
    712.058758417295, 41.9566265835052, 251.402632191101, 563.564151002218, 
    83.6810202224823, 148.547310896621, 220.508573640299, 207.223488285096, 
    175.428534402698, 26.3749559916007, 148.482509538166, 1638.05560483262, 
    629.48542442515, 152.081017872048, 75.649534880978, 278.555768025113, 
    212.533150194039, 6731.34455348268), LnDist = c(7.38944181635036, 
    6.8549569696676, 7.3876122460922, 6.56816043387389, 3.73663638428818, 
    5.527055766233, 6.33428117083723, 4.42701219219356, 5.00090349939957, 
    5.39593657685343, 5.33379786440982, 5.16723174859221, 3.27241492322993, 
    5.00046717041827, 7.4012652106211, 6.44490269900689, 5.02441339116178, 
    4.32611129191379, 5.62961828357648, 5.35909797711072, 8.81453018774869
    ), TimeDif = c(5.00638888888889, 4.99361111111111, 5, 5, 
    5.00805555555556, 4.99194444444444, 4.99972222222222, 5, 
    5.00027777777778, 5, 5.01472222222222, 4.99333333333333, 
    4.99972222222222, 4.99222222222222, 5.01277777777778, 4.99194444444444, 
    4.99527777777778, 4.99972222222222, 5.01666666666667, 5, 
    4.98361111111111), Velocity = c(323.347288368894, 189.956985051838, 
    323.168653386232, 142.411751683459, 8.3778277053979, 50.3616646757533, 
    112.719092372242, 16.7362040444965, 29.7078117453384, 44.1017147280597, 
    41.3230243076688, 35.1325502809141, 5.27528426966845, 29.7427684363120, 
    326.776026676129, 126.100246393108, 30.4449571450467, 15.130747573283, 
    55.5260667159693, 42.5066300388078, 1350.69619266137), LnVelocity = c(5.77872694180175, 
    5.24679765206538, 5.7781743336581, 4.95872252143979, 2.12558865719019, 
    3.91923026414518, 4.72489881550196, 2.81757427975946, 3.39141003295307, 
    3.78649866441933, 3.72141983391736, 3.55912805917125, 1.66303256789466, 
    3.39258602467242, 5.78927500251085, 4.83707719691907, 3.41592036944078, 
    2.71672893657851, 4.0168525810497, 3.74966006467662, 7.2083754367735
    ), Heading = c(1.35899167682096, 0.178271769107279, 0.040011832151945, 
    5.60907076311214, 2.70012174242416, 1.92356952639201, 6.193121040462, 
    1.03808707214764, 0.604141059039809, 0.295125938335282, 5.21590486031959, 
    2.77914091577713, 3.72488212039469, 2.14873677066758, 2.86169595063768, 
    3.55870493136089, 1.56118945741765, 3.01373153808326, 2.10231890072709, 
    0.726219128764754, 3.63015207232184), Angle = c(0.609592148368293, 
    -1.18071990771368, -0.138259936955334, 5.5690589309602, -2.90894902068798, 
    -0.776552216032153, 4.26955151407000, -5.15503396831437, 
    -0.433946013107828, -0.309015120704527, 4.92077892198431, 
    -2.43676394454246, 0.945741204617556, -1.57614534972711, 
    0.712959179970102, 0.697008980723212, -1.99751547394325, 
    1.45254208066561, -0.911412637356172, -1.37609977196233, 
    2.90393294355708), CosAngle = c(0.81988159459602, 0.380259094713527, 
    0.990457310809811, 0.755665715954353, -0.973060304449898, 
    0.713334063328187, -0.428504949324728, 0.428331029577178, 
    0.907313699540722, 0.952633553896418, 0.206884943442359, 
    -0.761722542920473, 0.585141974104434, -0.00534899742449928, 
    0.756429664977827, 0.766765630720815, -0.413886381311673, 
    0.117978826229562, 0.612629855005907, 0.193468831871728, 
    -0.971891607429047), SinAngle = c(0.572533117682013, -0.924880003507292, 
    -0.137819865996876, -0.654957499179294, -0.230550740410589, 
    -0.700824167745161, -0.90353943378483, 0.903621894987807, 
    -0.420454338336196, -0.304120555028232, -0.978365279523375, 
    -0.647903362861135, 0.810930866437557, -0.999985694010946, 
    0.654075043050515, 0.641927151275992, -0.910328546934967, 
    0.993016110927459, -0.7903699518298, -0.981106421900391, 
    0.235428765041538), PersVel = c(265.106490396188, 72.2328711703229, 
    320.084755370955, 107.615678296195, -8.15213157764327, 35.9246908991267, 
    -48.300688964897, 7.1686355095929, 26.9543045799223, 42.0127732343175, 
    8.54911154675927, -26.7612555392593, 3.08679025151586, -0.159093991763311, 
    247.183080381409, 96.6893349596614, -12.6007531419523, 1.78510783867173, 
    34.0169262012526, 8.22370806041186, -1312.73029383395), TurnVel = c(185.127031103868, 
    -175.687417000979, -44.5390605040812, -93.273644736341, -1.93151438051183, 
    -35.2946717326457, -101.846144898756, 15.1232004135905, -12.4907783308025, 
    -13.4122379607943, -40.4290122275236, -22.7624974728921, 
    4.27789084350665, -29.7423429365923, 213.736043716065, 80.9471719423284, 
    -27.7149135993477, 15.0250761106466, -43.886134675599, -41.7035277044184, 
    317.992736584573)), .Names = c("ID", "x", "y", "DateTime", 
"RunClock_days", "CalDay", "DistX", "DistY", "Dist", "LnDist", 
"TimeDif", "Velocity", "LnVelocity", "Heading", "Angle", "CosAngle", 
"SinAngle", "PersVel", "TurnVel"), row.names = 150:170, class = "data.frame")

I apologize, I don't know how to use HTML or anything else really to get this to look "pretty". Particularly to make my example data useful for you all. I am just learning this as I go.

I am trying to run a cluster analysis on variables PersVel, TurnVel, and Velocity (and possibly others but these will do for the moment). I have the data already separated by year, but I have a varying number of individuals per year (ID is the name for those). I want to run the k-means and or hierarchical cluster analysis on these variables PER individual. The data below is only 20 data points. Once the clusters by the variables of interest have been identified, I want to link that back to either Calender Date or the Date/Time variable. Ultimately I want to know WHEN the clusters were occurring.

I have already code to turn the ID into levels and was told I needed to standardize the variables for k-means clustering (so I would assume you would do the same for hierarchical but thats not too big of a deal). Just how to get it to loop through the individuals?

IDNames = levels(Data$ID)
for (i in 1:(length(IDNames)){

now what??? how do I write the next part to do this test?

structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("c_002", 
"c_102", "c_104", "c_401", "c_402", "c_406", "c_409", "c_411", 
"c_412", "c_413", "c_414", "c_415", "c_417", "c_418", "c_420", 
"c_421", "c_423", "c_425", "c_426", "c_602", "c_604", "c_9809", 
"c_9814", "c_9815", "c_9816", "c_9819", "c_9908", "c_9911"), class = "factor"), 
    x = c(229539.8109, 231122.438, 231290.6472, 231355.2828, 
    230910.8116, 230928.7384, 231164.6592, 231113.9708, 231186.0565, 
    231270.4396, 231334.5768, 231153.0715, 231215.2728, 231200.7462, 
    231325.1136, 231777.6369, 231522.6185, 231674.6925, 231684.3388, 
    231924.464, 232065.5961), y = c(2229114.92, 2229455.232, 
    2230388.77, 2232003.32, 2232559.623, 2232521.689, 2232434.829, 
    2232996.109, 2233038.608, 2233160.861, 2233371.836, 2233471.823, 
    2233307.792, 2233285.778, 2233204.662, 2231630.353, 2231054.838, 
    2231056.299, 2230981.267, 2230840.082, 2230998.991), DateTime = structure(c(1148853637, 
    1148871660, 1148889637, 1148907637, 1148925637, 1148943666, 
    1148961637, 1148979636, 1148997636, 1149015637, 1149033637, 
    1149051690, 1149069666, 1149087665, 1149105637, 1149123683, 
    1149141654, 1149159637, 1149177636, 1149195696, 1149213696
    ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), RunClock_days = c(1179.58332175926, 
    1179.79192129630, 1179.99998842593, 1180.20832175926, 1180.41665509259, 
    1180.62532407407, 1180.83332175926, 1181.04164351852, 1181.24997685185, 
    1181.45832175926, 1181.66665509259, 1181.87560185185, 1182.08365740741, 
    1182.29197916667, 1182.49998842593, 1182.70885416667, 1182.91685185185, 
    1183.12498842593, 1183.33331018519, 1183.54233796296, 1183.75067129630
    ), CalDay = c("148", "149", "149", "149", "149", "149", "150", 
    "150", "150", "150", "151", "151", "151", "151", "151", "152", 
    "152", "152", "152", "152", "153"), DistX = c(1582.62709999998, 
    168.209200000012, 64.6355999999796, -444.4712, 17.9268000000156, 
    235.920799999993, -50.6883999999845, 72.085699999996, 84.3831000000064, 
    64.1371999999974, -181.505300000019, 62.2013000000152, -14.5266000000120, 
    124.367400000017, 452.523300000001, -255.018400000001, 152.073999999993, 
    9.64629999999306, 240.125200000009, 141.132099999988, -3159.38569999998
    ), DistY = c(340.311999999918, 933.538000000175, 1614.54999999981, 
    556.303000000305, -37.9340000003576, -86.8599999998696, 561.280000000261, 
    42.4989999998361, 122.253000000026, 210.975000000093, 99.9869999997318, 
    -164.030999999959, -22.0139999999665, -81.1159999999218, 
    -1574.30899999989, -575.51500000013, 1.46100000012666, -75.032000000123, 
    -141.185000000056, 158.908999999985, -5943.84400000004), 
    Dist = c(1618.80227174238, 948.571311188026, 1615.84326693116, 
    712.058758417295, 41.9566265835052, 251.402632191101, 563.564151002218, 
    83.6810202224823, 148.547310896621, 220.508573640299, 207.223488285096, 
    175.428534402698, 26.3749559916007, 148.482509538166, 1638.05560483262, 
    629.48542442515, 152.081017872048, 75.649534880978, 278.555768025113, 
    212.533150194039, 6731.34455348268), LnDist = c(7.38944181635036, 
    6.8549569696676, 7.3876122460922, 6.56816043387389, 3.73663638428818, 
    5.527055766233, 6.33428117083723, 4.42701219219356, 5.00090349939957, 
    5.39593657685343, 5.33379786440982, 5.16723174859221, 3.27241492322993, 
    5.00046717041827, 7.4012652106211, 6.44490269900689, 5.02441339116178, 
    4.32611129191379, 5.62961828357648, 5.35909797711072, 8.81453018774869
    ), TimeDif = c(5.00638888888889, 4.99361111111111, 5, 5, 
    5.00805555555556, 4.99194444444444, 4.99972222222222, 5, 
    5.00027777777778, 5, 5.01472222222222, 4.99333333333333, 
    4.99972222222222, 4.99222222222222, 5.01277777777778, 4.99194444444444, 
    4.99527777777778, 4.99972222222222, 5.01666666666667, 5, 
    4.98361111111111), Velocity = c(323.347288368894, 189.956985051838, 
    323.168653386232, 142.411751683459, 8.3778277053979, 50.3616646757533, 
    112.719092372242, 16.7362040444965, 29.7078117453384, 44.1017147280597, 
    41.3230243076688, 35.1325502809141, 5.27528426966845, 29.7427684363120, 
    326.776026676129, 126.100246393108, 30.4449571450467, 15.130747573283, 
    55.5260667159693, 42.5066300388078, 1350.69619266137), LnVelocity = c(5.77872694180175, 
    5.24679765206538, 5.7781743336581, 4.95872252143979, 2.12558865719019, 
    3.91923026414518, 4.72489881550196, 2.81757427975946, 3.39141003295307, 
    3.78649866441933, 3.72141983391736, 3.55912805917125, 1.66303256789466, 
    3.39258602467242, 5.78927500251085, 4.83707719691907, 3.41592036944078, 
    2.71672893657851, 4.0168525810497, 3.74966006467662, 7.2083754367735
    ), Heading = c(1.35899167682096, 0.178271769107279, 0.040011832151945, 
    5.60907076311214, 2.70012174242416, 1.92356952639201, 6.193121040462, 
    1.03808707214764, 0.604141059039809, 0.295125938335282, 5.21590486031959, 
    2.77914091577713, 3.72488212039469, 2.14873677066758, 2.86169595063768, 
    3.55870493136089, 1.56118945741765, 3.01373153808326, 2.10231890072709, 
    0.726219128764754, 3.63015207232184), Angle = c(0.609592148368293, 
    -1.18071990771368, -0.138259936955334, 5.5690589309602, -2.90894902068798, 
    -0.776552216032153, 4.26955151407000, -5.15503396831437, 
    -0.433946013107828, -0.309015120704527, 4.92077892198431, 
    -2.43676394454246, 0.945741204617556, -1.57614534972711, 
    0.712959179970102, 0.697008980723212, -1.99751547394325, 
    1.45254208066561, -0.911412637356172, -1.37609977196233, 
    2.90393294355708), CosAngle = c(0.81988159459602, 0.380259094713527, 
    0.990457310809811, 0.755665715954353, -0.973060304449898, 
    0.713334063328187, -0.428504949324728, 0.428331029577178, 
    0.907313699540722, 0.952633553896418, 0.206884943442359, 
    -0.761722542920473, 0.585141974104434, -0.00534899742449928, 
    0.756429664977827, 0.766765630720815, -0.413886381311673, 
    0.117978826229562, 0.612629855005907, 0.193468831871728, 
    -0.971891607429047), SinAngle = c(0.572533117682013, -0.924880003507292, 
    -0.137819865996876, -0.654957499179294, -0.230550740410589, 
    -0.700824167745161, -0.90353943378483, 0.903621894987807, 
    -0.420454338336196, -0.304120555028232, -0.978365279523375, 
    -0.647903362861135, 0.810930866437557, -0.999985694010946, 
    0.654075043050515, 0.641927151275992, -0.910328546934967, 
    0.993016110927459, -0.7903699518298, -0.981106421900391, 
    0.235428765041538), PersVel = c(265.106490396188, 72.2328711703229, 
    320.084755370955, 107.615678296195, -8.15213157764327, 35.9246908991267, 
    -48.300688964897, 7.1686355095929, 26.9543045799223, 42.0127732343175, 
    8.54911154675927, -26.7612555392593, 3.08679025151586, -0.159093991763311, 
    247.183080381409, 96.6893349596614, -12.6007531419523, 1.78510783867173, 
    34.0169262012526, 8.22370806041186, -1312.73029383395), TurnVel = c(185.127031103868, 
    -175.687417000979, -44.5390605040812, -93.273644736341, -1.93151438051183, 
    -35.2946717326457, -101.846144898756, 15.1232004135905, -12.4907783308025, 
    -13.4122379607943, -40.4290122275236, -22.7624974728921, 
    4.27789084350665, -29.7423429365923, 213.736043716065, 80.9471719423284, 
    -27.7149135993477, 15.0250761106466, -43.886134675599, -41.7035277044184, 
    317.992736584573)), .Names = c("ID", "x", "y", "DateTime", 
"RunClock_days", "CalDay", "DistX", "DistY", "Dist", "LnDist", 
"TimeDif", "Velocity", "LnVelocity", "Heading", "Angle", "CosAngle", 
"SinAngle", "PersVel", "TurnVel"), row.names = 150:170, class = "data.frame")

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

不打扰别人 2024-11-14 08:01:18

为了保持简单(并假设您的原始 data.frame 称为 orgData):

results<-list()
IDNames = levels(Data$ID)
for (i in 1:(length(IDNames)){
   dataForCurrentIndividual<-orgData[orgData$ID==IDNames[i],]
   #now do whatever analysis you're interested in on data.frame dataFor...

   #after your analysis, I assume the result is in a variable resCurIndv
   results[[i]]<-resCurIndv #keep your results in the i'th spot in the results list
}

一旦您完成了此操作,这可能是使您的代码更加R'ish 的一个好步骤。

首先,将上面的内容变成一个函数。也就是说:将你在我的注释(以 # 开头)所在的位置编写的所有代码,并将其变成这样的函数:

analysisPerIndividual<-function(dataForCurrentIndividual){
   #now do whatever analysis you're interested in on data.frame dataFor...

   #after your analysis, I assume the result is in a variable resCurIndv
   return(resCurIndv)
}

现在,你可以像这样正确使用它(注意你必须为此安装 plyr 包) ):

require(plyr)
dlply(orgData, "ID", analysisPerIndividual)

要标准化变量,请参阅 ?scale,有关 k-means 聚类,请参阅 ?kmeans
祝你好运!

To keep it simple (and assuming your original data.frame was called orgData):

results<-list()
IDNames = levels(Data$ID)
for (i in 1:(length(IDNames)){
   dataForCurrentIndividual<-orgData[orgData$ID==IDNames[i],]
   #now do whatever analysis you're interested in on data.frame dataFor...

   #after your analysis, I assume the result is in a variable resCurIndv
   results[[i]]<-resCurIndv #keep your results in the i'th spot in the results list
}

Once you've done that, it's probably a good step to make your code more R'ish.

First, turn the above into a function. That is: take all the code your have written where my comment (starting with the #) is, and turn it into a function like this:

analysisPerIndividual<-function(dataForCurrentIndividual){
   #now do whatever analysis you're interested in on data.frame dataFor...

   #after your analysis, I assume the result is in a variable resCurIndv
   return(resCurIndv)
}

Now, you can properly use this like so (note you have to install the plyr package for this):

require(plyr)
dlply(orgData, "ID", analysisPerIndividual)

For standardizing your variables, see ?scale, for k-means clustering see ?kmeans.
Good luck!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文