将两个数据范围之间的行交换与模式

发布于 2025-02-07 04:26:24 字数 641 浏览 10 评论 0原文

我有2个看起来像这样的数据范围:

在每个数据框中,值列中的模式为1-2。 (只是为了证明模式,这些值对我的问题并不重要)

df1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [20, 1000, 10001, 21, 1000, 1002, 22, 1003, 1007,23]}
df2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [1000, 21, 22, 1000, 22, 23, 1000, 20, 21, 1000]}

我需要在两个数据范围之间交换行,以便结果是:

df_expected1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [20, 21, 22, 21, 22, 23, 22, 20, 21,23]}

df_expected2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [1000, 1000, 10001, 1000, 1000, 1002, 1000, 1003, 1007, 1000]}

I have 2 dataframes looking like this :

In each dataframe there is pattern of 1-2 in the values column. (the values are not significant to my problem, just to demonstrate the pattern)

df1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [20, 1000, 10001, 21, 1000, 1002, 22, 1003, 1007,23]}
df2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [1000, 21, 22, 1000, 22, 23, 1000, 20, 21, 1000]}

I need to swap rows between the two dataframes so that the outcome would be :

df_expected1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [20, 21, 22, 21, 22, 23, 22, 20, 21,23]}

df_expected2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [1000, 1000, 10001, 1000, 1000, 1002, 1000, 1003, 1007, 1000]}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

知道掉期需要的索引来自3N-2,您可以制作掩码,然后使用numpy.Where

m = df1["idx"].add(2).mod(3).eq(0)
s1 = np.where(m, df1["values"], df2["values"])
s2 = np.where(~m, df1["values"], df2["values"])

df1["values"] = s1
df2["values"] = s2

输出:输出:

   idx  values
0    1      20
1    2      21
2    3      22
3    4      21
4    5      22
5    6      23
6    7      22
7    8      20
8    9      21
9   10      23

   idx  values
0    1    1000
1    2    1000
2    3   10001
3    4    1000
4    5    1000
5    6    1002
6    7    1000
7    8    1003
8    9    1007
9   10    1000

Knowing that swap-needed indices are from 3n-2, you can make mask then use numpy.where:

m = df1["idx"].add(2).mod(3).eq(0)
s1 = np.where(m, df1["values"], df2["values"])
s2 = np.where(~m, df1["values"], df2["values"])

df1["values"] = s1
df2["values"] = s2

Output:

   idx  values
0    1      20
1    2      21
2    3      22
3    4      21
4    5      22
5    6      23
6    7      22
7    8      20
8    9      21
9   10      23

   idx  values
0    1    1000
1    2    1000
2    3   10001
3    4    1000
4    5    1000
5    6    1002
6    7    1000
7    8    1003
8    9    1007
9   10    1000
亣腦蒛氧 2025-02-14 04:26:24

知道两个dataframes之间的索引相同:

df1[df1['idx']%3 == 1], df2[df1['idx']%3 == 1] = df2[df1['idx']%3 == 1], df1[df1['idx']%3 == 1]

输出:输出:

   idx  values
 0    1    1000
 1    2    1000
 2    3   10001
 3    4    1000
 4    5    1000
 5    6    1002
 6    7    1000
 7    8    1003
 8    9    1007
 9   10    1000
    idx  values
 0    1      20
 1    2      21
 2    3      22
 3    4      21
 4    5      22
 5    6      23
 6    7      22
 7    8      20
 8    9      21
 9   10      23

This should do it, knowing that the indices are the same between two dataframes :

df1[df1['idx']%3 == 1], df2[df1['idx']%3 == 1] = df2[df1['idx']%3 == 1], df1[df1['idx']%3 == 1]

Output :

   idx  values
 0    1    1000
 1    2    1000
 2    3   10001
 3    4    1000
 4    5    1000
 5    6    1002
 6    7    1000
 7    8    1003
 8    9    1007
 9   10    1000
    idx  values
 0    1      20
 1    2      21
 2    3      22
 3    4      21
 4    5      22
 5    6      23
 6    7      22
 7    8      20
 8    9      21
 9   10      23
温柔少女心 2025-02-14 04:26:24

在下面的代码段中,我假设df1df2中的索引相等,并且来自df1的值总是应该大于DF2

import pandas as pd
import numpy as np

from pprint import pprint

df1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [20, 1000, 10001, 21, 1000, 1002, 22, 1003, 1007,23]}
df2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [1000, 21, 22, 1000, 22, 23, 1000, 20, 21, 1000]}

a = pd.DataFrame(df1).set_index('idx')
b = pd.DataFrame(df2).set_index('idx')

col_name = 'values'
a_series = a[col_name]
b_series = b[col_name]
for i in a_series.index:
    if a_series.loc[i] > b_series.loc[i]:
        a_series.loc[i], b_series.loc[i] = b_series.loc[i], a_series.loc[i]

df_expected1 = {'idx': a_series.index.tolist(), 'values': a_series.values.tolist()}
df_expected2 = {'idx': b_series.index.tolist(), 'values': b_series.values.tolist()}

pprint(df_expected1)
pprint(df_expected2)

输出:

{'idx': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 'values': [1000, 1000, 10001, 1000, 1000, 1002, 1000, 1003, 1007, 1000]}
{'idx': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 'values': [20, 21, 22, 21, 22, 23, 22, 20, 21, 23]}

In the code snippet below, I assumed that indexes in both df1 and df2 were equal and that values from df1 are always supposed to be greater than df2.

import pandas as pd
import numpy as np

from pprint import pprint

df1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [20, 1000, 10001, 21, 1000, 1002, 22, 1003, 1007,23]}
df2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
       'values': [1000, 21, 22, 1000, 22, 23, 1000, 20, 21, 1000]}

a = pd.DataFrame(df1).set_index('idx')
b = pd.DataFrame(df2).set_index('idx')

col_name = 'values'
a_series = a[col_name]
b_series = b[col_name]
for i in a_series.index:
    if a_series.loc[i] > b_series.loc[i]:
        a_series.loc[i], b_series.loc[i] = b_series.loc[i], a_series.loc[i]

df_expected1 = {'idx': a_series.index.tolist(), 'values': a_series.values.tolist()}
df_expected2 = {'idx': b_series.index.tolist(), 'values': b_series.values.tolist()}

pprint(df_expected1)
pprint(df_expected2)

Output:

{'idx': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 'values': [1000, 1000, 10001, 1000, 1000, 1002, 1000, 1003, 1007, 1000]}
{'idx': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 'values': [20, 21, 22, 21, 22, 23, 22, 20, 21, 23]}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文