从列表列中随机选择值,以便选择列表中的所有元素
比如说,我有一个带有列表列“event_ids”的 pandas 数据框,
code canceled event_ids
xxx [1.0] [107385, 128281, 133015]
xxS [0.0] [108664, 110515, 113556]
ssD [1.0] [134798, 133499, 125396, 114298, 133915]
cvS [0.0] [107611]
eeS [5.0] [113472, 115236, 108586, 128043, 114106, 10796...
544W [44.0] [107650, 128014, 127763, 118036, 116247, 12802.
如何充分随机地选择 k 行,以便在样本中表示“event_ids”中的所有元素?我的意思是样本中的事件词汇应该与总体中的事件词汇相同。我所说的“足够”随机是指是否可以进行某种重要性采样,以便最初样本是随机的,并根据某种条件添加或拒绝。
Say, I had a pandas dataframe with a list column 'event_ids'
code canceled event_ids
xxx [1.0] [107385, 128281, 133015]
xxS [0.0] [108664, 110515, 113556]
ssD [1.0] [134798, 133499, 125396, 114298, 133915]
cvS [0.0] [107611]
eeS [5.0] [113472, 115236, 108586, 128043, 114106, 10796...
544W [44.0] [107650, 128014, 127763, 118036, 116247, 12802.
How to select k rows sufficiently randomly so that all elements across 'event_ids' are represented in the sample? By that I mean the event vocabulary in samples should be same as that of the population. By 'sufficiently' random I mean if some sort of importance sampling is possible so that initially the samples are random and added or rejected according to some condition.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
目前尚不清楚您是否要选择 events_ids 列表中的每个元素,或者每个列表是否应被视为唯一元素。
在后一种情况下,这可能有效(不确定性能!)
给定此数据集:
“x”列中有 99 个唯一值。您希望进行采样,以便 df['x'] 中的每个唯一值都位于获得的样本中。
您可以更改首选大小以获得样本中的更多值。
如果您想要 events_ids 中每个列表中的每个唯一元素,那么您可以使用explode,然后使用相同的代码。
It is not clear if you want to select each element within the list in events_ids, or if each list should be considered as a unique element.
In the latter case, this could work (not sure about the performance!)
Given this dataset:
There are 99 unique values in column 'x'. You want to sample so that every unique value in df['x'] is in the obtained sample.
You can change the preferred size to obtain more values in your sample.
If you want each unique element in each list in events_ids, then you can use explode and then use the same code.