Pandas Dataframe - 添加其他内容?
我想为我的贝叶斯网络生成测试数据。 这是我当前的代码:
data = np.random.randint(2, size=(5, 6))
columns = ['p_1', 'p_2', 'OP1', 'OP2', 'OP3', 'OP4']
df = pd.DataFrame(data=data, columns=columns)
df.loc[(df['p_1'] == 1) & (df['p_2'] == 1), 'OP1'] = 1
df.loc[(df['p_1'] == 1) & (df['p_2'] == 0), 'OP2'] = 1
df.loc[(df['p_1'] == 0) & (df['p_2'] == 1), 'OP3'] = 1
df.loc[(df['p_1'] == 0) & (df['p_2'] == 0), 'OP4'] = 1
print(df)
因此,例如,每次P_1都有1和P_2具有1,OP1也应为1,所有其他值都应在列中输出0。 当P_1为1,P_2为0时,OP2应为1和其他所有0,依此类推。
但是我当前的输出如下:
P_1 | P_1 P_2 | OP1 | OP2 OP3 | OP3 | OP4 | |
---|---|---|---|---|---|---|
0 | 0 | 0 | 0 0 0 | 0 | 1 | |
1 | 1 | 0 1 1 | 1 | 1 1 | 1 1 | |
1 0 | 1 0 | 1 | 1 | 0 | 1 0 | |
1 | 1 1 1 | 1 | 1 | 1 | 1 1 | |
1 | 0 | 0 | 1 | 1 | 0 |
是否有任何方法可以修复它?我做错了什么?
我并不真正了解其他人问题的解决方案,所以我认为ID在这里问。
我希望有人可以帮助我。
I want to generate Test Data for my Bayesian Network.
This is my current Code:
data = np.random.randint(2, size=(5, 6))
columns = ['p_1', 'p_2', 'OP1', 'OP2', 'OP3', 'OP4']
df = pd.DataFrame(data=data, columns=columns)
df.loc[(df['p_1'] == 1) & (df['p_2'] == 1), 'OP1'] = 1
df.loc[(df['p_1'] == 1) & (df['p_2'] == 0), 'OP2'] = 1
df.loc[(df['p_1'] == 0) & (df['p_2'] == 1), 'OP3'] = 1
df.loc[(df['p_1'] == 0) & (df['p_2'] == 0), 'OP4'] = 1
print(df)
So every time, for example, p_1 has a 1 and p_2 has a 1, the OP1 should be 1 as well, all the other values should output 0 in the column.
When p_1 is 1 and p_2 is 0, then OP2 should be 1 an d all others 0, and so on.
But my current Output is the following:
p_1 | p_2 | OP1 | OP2 | OP3 | OP4 | |
---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 1 | |
1 | 0 | 1 | 1 | 1 | 1 | |
0 | 0 | 1 | 1 | 0 | 1 | |
0 | 1 | 1 | 1 | 1 | 1 | |
1 | 0 | 0 | 1 | 1 | 0 |
Is there any way to fix it? What did I do wrong?
I didn't really understand the solutions to other peoples questions, so I thought Id ask here.
I hope that someone can help me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
问题是,当您实例化 df 时,“OP”列已经有了一些值:
用代码修复它的一种方法是在之前将所有“OP”列强制为 0:
但是您正在生成随机数太多。我会这样做:
The problem is that when you instantiate
df
, the "OP" columns already have some values:One way of fixing it with your code is forcing all "OP" columns to 0 before:
But then you are generating too many random numbers. I'd do this instead:
您可以定义用于测试的元组,并通过将掩码值转换为 inetegers 来创建新列,以将
True/False
映射到1/0
:在您的解决方案中设置
0< /code> 首先,因为原始
DataFrame
中已经设置了1
值:You can defined tuples for test and create new columns by casting values of mask to inetegers for
True/False
to1/0
mapping:In your solution set
0
first, because already are set1
values in originalDataFrame
: