从python中的熊猫中的数据框中获取特定的列和行数据

发布于 2025-02-05 13:05:43 字数 3156 浏览 1 评论 0原文

我正在使用Python中的Panadas使用数据框架。我已经对表进行了排序并创建了一些额外的列: https://i.sstatic.net/y6lkn.png

{'Part Number': ['K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2'],
 'Date': [Timestamp('2021-05-17 00:00:00'),
  Timestamp('2021-05-23 00:00:00'),
  Timestamp('2021-07-08 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-12-18 00:00:00'),
  Timestamp('2021-12-20 00:00:00'),
  Timestamp('2022-02-10 00:00:00'),
  Timestamp('2022-03-31 00:00:00'),
  Timestamp('2021-10-04 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-17 00:00:00'),
  Timestamp('2021-11-24 00:00:00'),
  Timestamp('2021-11-27 00:00:00'),
  Timestamp('2021-12-22 00:00:00'),
  Timestamp('2021-12-24 00:00:00'),
  Timestamp('2022-03-21 00:00:00')],
 'Code': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52'],
 'Fix': ['na',
  'na',
  'na',
  'Custom Status',
  'na',
  'na',
  'Remade',
  'na',
  'na',
  'na',
  'Testing Procedure Fixed',
  'na',
  'na',
  'na',
  'na',
  'na',
  'na',
  'Light Repair',
  'na',
  'na'],
 'Fixed': ['No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No'],
 'Combined': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52']}

我按日期对数据框进行了排序,现在我想创建一个循环,该循环逐行顺序排列。在循环中,如果“固定”列中的行是“否”,我想将“代码”列中该行中的值附加到列表(我称为list_test)。然后,当“固定”列中的行变为“是”时,我想创建一个新列表变量,该变量是List_test的副本。

然后,我想将list_test清除为一个空列表,以便它可以在列下重复该过程(每次都有“修复”时清除自己)。

在上面的示例表中,我希望输出符合以下路线:

  • fixed_before_3 = [“ sf22”,“ kfs3”,“ 3ffs”]
  • fixe_before_6 = [“ la52”,“ k2ka”,“ k2ka”]
  • fixed_before_10 = [qet6 “,“ Qet6”,“ p0sf”]
  • fixed_before_17 = [“ dp2l”,“ sr2f”,“ jko2”,“ dp2l”,“ dp2l”,“ a2bf”,“ kll2”]

这是我试图解决这个问题的一种方式:

list_test = []
var_test = {}

for index in df.index:
    var_test[index] = "Fixed_Before_" + str(index)
    
    if df['Fixed'][index] == 'No':
        list_test.append(df['Code'])
    
    if df['Fixed'][index] == 'Yes':
            var_test[index] = list_test
            list_test = []
list_test

尽管当我运行代码时,输​​出()是一个很大的列,看起来它在我的列中包含了几次,而不是上面包含的输出。我认为我的问题可能存在:

  • 的方式
  • 在整个数据框中迭代有关数据框的条件语句
    • 也许list_test.append(df ['code'])给了我整列,而不是我条件语句中该行中列的值?
  • 使用字典在我的循环中创建新变量。

I'm working with a dataframe using panadas in python. I have sorted the table and created some extra columns: https://i.sstatic.net/Y6lkN.png

{'Part Number': ['K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2'],
 'Date': [Timestamp('2021-05-17 00:00:00'),
  Timestamp('2021-05-23 00:00:00'),
  Timestamp('2021-07-08 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-12-18 00:00:00'),
  Timestamp('2021-12-20 00:00:00'),
  Timestamp('2022-02-10 00:00:00'),
  Timestamp('2022-03-31 00:00:00'),
  Timestamp('2021-10-04 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-17 00:00:00'),
  Timestamp('2021-11-24 00:00:00'),
  Timestamp('2021-11-27 00:00:00'),
  Timestamp('2021-12-22 00:00:00'),
  Timestamp('2021-12-24 00:00:00'),
  Timestamp('2022-03-21 00:00:00')],
 'Code': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52'],
 'Fix': ['na',
  'na',
  'na',
  'Custom Status',
  'na',
  'na',
  'Remade',
  'na',
  'na',
  'na',
  'Testing Procedure Fixed',
  'na',
  'na',
  'na',
  'na',
  'na',
  'na',
  'Light Repair',
  'na',
  'na'],
 'Fixed': ['No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No'],
 'Combined': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52']}

I sorted the dataframe by date and now I would like to create a loop that goes down the table in order row by row. In the loop, if the row in the "Fixed" column is "No", I want to append the value in that row in the "Code" column to a list (which I called list_test). Then, when the row in the "Fixed" column becomes a "Yes", I want to create a new list variable that is a copy of the list_test.

Then, I want to clear the list_test to be an empty list so that it can repeat the process down the column (clearing itself every time there is a "Fix").

In my example table above, I would want the output to be something along the lines of:

  • Fixed_Before_3 = ["SF22", "KFS3", "3FFS"]
  • Fixed_Before_6 = ["LA52", "K2KA"]
  • Fixed_Before_10 = ["QET6", "QET6", "P0SF"]
  • Fixed_Before_17 = ["DP2L", "SR2F", "JKO2", "DP2L", "A2BF", "KLL2"]

This is one way I tried to approach the problem:

list_test = []
var_test = {}

for index in df.index:
    var_test[index] = "Fixed_Before_" + str(index)
    
    if df['Fixed'][index] == 'No':
        list_test.append(df['Code'])
    
    if df['Fixed'][index] == 'Yes':
            var_test[index] = list_test
            list_test = []
list_test

Although, when I run the code, the output (https://i.sstatic.net/RM0Pj.png) is a very large column and it looks like it includes everything in my column more than a few times rather than the output I included above. I think my problems might be with:

  • The way I iterate throughout the dataframe
  • The conditional statements about dataframes
    • Maybe list_test.append(df['Code']) gives me the whole column instead of the value of the column in the row in my conditional statement?
  • Using a dictionary to create new variables in my loop.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

风月客 2025-02-12 13:05:48

确切的预期输出尚不清楚,但这是一个建议:

mask = df['Fixed'].eq('Yes')
out = (df
 .assign(index=pd.Series(df.index.where(mask), index=df.index).bfill())
 .loc[~mask]
 .groupby(['Part Number', mask.shift(fill_value=True).cumsum()])
 .agg(fix_before=('index', 'first'),
      list=('Code', list))
 .reset_index(drop=True)
 )

输出:

   fix_before                                  list
0         3.0                    [SF22, KFS3, 3FFS]
1         6.0                          [LA52, K2KA]
2        10.0                    [QET6, QET6, P0SF]
3        17.0  [DP2L, SR2F, JKO2, DP2L, A2BF, KLL2]
4         NaN                          [A3SA, LA52]

The exact expected output is unclear, but here is a suggestion:

mask = df['Fixed'].eq('Yes')
out = (df
 .assign(index=pd.Series(df.index.where(mask), index=df.index).bfill())
 .loc[~mask]
 .groupby(['Part Number', mask.shift(fill_value=True).cumsum()])
 .agg(fix_before=('index', 'first'),
      list=('Code', list))
 .reset_index(drop=True)
 )

Output:

   fix_before                                  list
0         3.0                    [SF22, KFS3, 3FFS]
1         6.0                          [LA52, K2KA]
2        10.0                    [QET6, QET6, P0SF]
3        17.0  [DP2L, SR2F, JKO2, DP2L, A2BF, KLL2]
4         NaN                          [A3SA, LA52]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文