从python中的熊猫中的数据框中获取特定的列和行数据

发布于 2025-02-05 13:05:43 字数 3156 浏览 1 评论 0原文

我正在使用Python中的Panadas使用数据框架。我已经对表进行了排序并创建了一些额外的列： https://i.sstatic.net/y6lkn.png

{'Part Number': ['K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2'],
 'Date': [Timestamp('2021-05-17 00:00:00'),
  Timestamp('2021-05-23 00:00:00'),
  Timestamp('2021-07-08 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-12-18 00:00:00'),
  Timestamp('2021-12-20 00:00:00'),
  Timestamp('2022-02-10 00:00:00'),
  Timestamp('2022-03-31 00:00:00'),
  Timestamp('2021-10-04 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-17 00:00:00'),
  Timestamp('2021-11-24 00:00:00'),
  Timestamp('2021-11-27 00:00:00'),
  Timestamp('2021-12-22 00:00:00'),
  Timestamp('2021-12-24 00:00:00'),
  Timestamp('2022-03-21 00:00:00')],
 'Code': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52'],
 'Fix': ['na',
  'na',
  'na',
  'Custom Status',
  'na',
  'na',
  'Remade',
  'na',
  'na',
  'na',
  'Testing Procedure Fixed',
  'na',
  'na',
  'na',
  'na',
  'na',
  'na',
  'Light Repair',
  'na',
  'na'],
 'Fixed': ['No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No'],
 'Combined': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52']}

我按日期对数据框进行了排序，现在我想创建一个循环，该循环逐行顺序排列。在循环中，如果“固定”列中的行是“否”，我想将“代码”列中该行中的值附加到列表（我称为list_test）。然后，当“固定”列中的行变为“是”时，我想创建一个新列表变量，该变量是List_test的副本。

然后，我想将list_test清除为一个空列表，以便它可以在列下重复该过程（每次都有“修复”时清除自己）。

在上面的示例表中，我希望输出符合以下路线：

fixed_before_3 = [“ sf22”，“ kfs3”，“ 3ffs”]
fixe_before_6 = [“ la52”，“ k2ka”，“ k2ka”]
fixed_before_10 = [qet6 “，“ Qet6”，“ p0sf”]
fixed_before_17 = [“ dp2l”，“ sr2f”，“ jko2”，“ dp2l”，“ dp2l”，“ a2bf”，“ kll2”]

这是我试图解决这个问题的一种方式：

list_test = []
var_test = {}

for index in df.index:
    var_test[index] = "Fixed_Before_" + str(index)
    
    if df['Fixed'][index] == 'No':
        list_test.append(df['Code'])
    
    if df['Fixed'][index] == 'Yes':
            var_test[index] = list_test
            list_test = []
list_test

尽管当我运行代码时，输出（）是一个很大的列，看起来它在我的列中包含了几次，而不是上面包含的输出。我认为我的问题可能存在：

的方式
在整个数据框中迭代有关数据框的条件语句
- 也许list_test.append（df ['code']）给了我整列，而不是我条件语句中该行中列的值？
使用字典在我的循环中创建新变量。

原文

I'm working with a dataframe using panadas in python. I have sorted the table and created some extra columns: https://i.sstatic.net/Y6lkN.png

{'Part Number': ['K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2'],
 'Date': [Timestamp('2021-05-17 00:00:00'),
  Timestamp('2021-05-23 00:00:00'),
  Timestamp('2021-07-08 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-12-18 00:00:00'),
  Timestamp('2021-12-20 00:00:00'),
  Timestamp('2022-02-10 00:00:00'),
  Timestamp('2022-03-31 00:00:00'),
  Timestamp('2021-10-04 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-17 00:00:00'),
  Timestamp('2021-11-24 00:00:00'),
  Timestamp('2021-11-27 00:00:00'),
  Timestamp('2021-12-22 00:00:00'),
  Timestamp('2021-12-24 00:00:00'),
  Timestamp('2022-03-21 00:00:00')],
 'Code': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52'],
 'Fix': ['na',
  'na',
  'na',
  'Custom Status',
  'na',
  'na',
  'Remade',
  'na',
  'na',
  'na',
  'Testing Procedure Fixed',
  'na',
  'na',
  'na',
  'na',
  'na',
  'na',
  'Light Repair',
  'na',
  'na'],
 'Fixed': ['No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No'],
 'Combined': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52']}

I sorted the dataframe by date and now I would like to create a loop that goes down the table in order row by row. In the loop, if the row in the "Fixed" column is "No", I want to append the value in that row in the "Code" column to a list (which I called list_test). Then, when the row in the "Fixed" column becomes a "Yes", I want to create a new list variable that is a copy of the list_test.

Then, I want to clear the list_test to be an empty list so that it can repeat the process down the column (clearing itself every time there is a "Fix").

In my example table above, I would want the output to be something along the lines of:

Fixed_Before_3 = ["SF22", "KFS3", "3FFS"]
Fixed_Before_6 = ["LA52", "K2KA"]
Fixed_Before_10 = ["QET6", "QET6", "P0SF"]
Fixed_Before_17 = ["DP2L", "SR2F", "JKO2", "DP2L", "A2BF", "KLL2"]

This is one way I tried to approach the problem:

list_test = []
var_test = {}

for index in df.index:
    var_test[index] = "Fixed_Before_" + str(index)
    
    if df['Fixed'][index] == 'No':
        list_test.append(df['Code'])
    
    if df['Fixed'][index] == 'Yes':
            var_test[index] = list_test
            list_test = []
list_test

Although, when I run the code, the output (https://i.sstatic.net/RM0Pj.png) is a very large column and it looks like it includes everything in my column more than a few times rather than the output I included above. I think my problems might be with:

The way I iterate throughout the dataframe
The conditional statements about dataframes
- Maybe list_test.append(df['Code']) gives me the whole column instead of the value of the column in the row in my conditional statement?
Using a dictionary to create new variables in my loop.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风月客 2025-02-12 13:05:48

确切的预期输出尚不清楚，但这是一个建议：

mask = df['Fixed'].eq('Yes')
out = (df
 .assign(index=pd.Series(df.index.where(mask), index=df.index).bfill())
 .loc[~mask]
 .groupby(['Part Number', mask.shift(fill_value=True).cumsum()])
 .agg(fix_before=('index', 'first'),
      list=('Code', list))
 .reset_index(drop=True)
 )

输出：

   fix_before                                  list
0         3.0                    [SF22, KFS3, 3FFS]
1         6.0                          [LA52, K2KA]
2        10.0                    [QET6, QET6, P0SF]
3        17.0  [DP2L, SR2F, JKO2, DP2L, A2BF, KLL2]
4         NaN                          [A3SA, LA52]

The exact expected output is unclear, but here is a suggestion:

mask = df['Fixed'].eq('Yes')
out = (df
 .assign(index=pd.Series(df.index.where(mask), index=df.index).bfill())
 .loc[~mask]
 .groupby(['Part Number', mask.shift(fill_value=True).cumsum()])
 .agg(fix_before=('index', 'first'),
      list=('Code', list))
 .reset_index(drop=True)
 )

Output:

   fix_before                                  list
0         3.0                    [SF22, KFS3, 3FFS]
1         6.0                          [LA52, K2KA]
2        10.0                    [QET6, QET6, P0SF]
3        17.0  [DP2L, SR2F, JKO2, DP2L, A2BF, KLL2]
4         NaN                          [A3SA, LA52]

回复收藏 0 原文

~没有更多了~