如何将数字从列隔离并创建3列?
我正在尝试访问列,过滤其数字,然后在3列中拆分。但是我只是遇到错误。我正在尝试这个:
dsc = df["Descricao"].str.findall("\d+")
dsc
The Output:
0 []
1 [475, 2000, 3]
2 [65, 2000, 2]
3 [51, 2000, 3]
4 [320, 2000, 3]
...
2344 NaN
2345 [480, 2000, 1]
2346 [32, 2000, 6]
2347 [250, 2000, 1]
2348 NaN
Name: Descricao, Length: 2349, dtype: object
然后,我正在尝试拆分,每次得到这种错误时:
df[['Larg','comp', 'qtd']] = dsc.str.split(',',expand=True)
df.head(5)
The Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15388/2481153233.py in <module>
----> 1 df[['Larg','comp', 'qtd']] = dsc.str.split(',',expand=True)
2 df.head(5)
~\anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
3598 self._setitem_frame(key, value)
3599 elif isinstance(key, (Series, np.ndarray, list, Index)):
-> 3600 self._setitem_array(key, value)
3601 elif isinstance(value, DataFrame):
3602 self._set_item_frame_value(key, value)
~\anaconda3\lib\site-packages\pandas\core\frame.py in _setitem_array(self, key, value)
3637 else:
3638 if isinstance(value, DataFrame):
-> 3639 check_key_length(self.columns, key, value)
3640 for k1, k2 in zip(key, value.columns):
3641 self[k1] = value[k2]
~\anaconda3\lib\site-packages\pandas\core\indexers.py in check_key_length(columns, key, value)
426 if columns.is_unique:
427 if len(value.columns) != len(key):
--> 428 raise ValueError("Columns must be same length as key")
429 else:
430 # Missing keys in columns are represented as -1
ValueError: Columns must be same length as key
我认为与str.findall生成列表有关。 有人知道如何解决这个问题吗? 有关信息,我的所有列都是对象。
I am trying to access a column, filter its numbers and then split in 3 columns. But i have been only getting errors. I am trying this:
dsc = df["Descricao"].str.findall("\d+")
dsc
The Output:
0 []
1 [475, 2000, 3]
2 [65, 2000, 2]
3 [51, 2000, 3]
4 [320, 2000, 3]
...
2344 NaN
2345 [480, 2000, 1]
2346 [32, 2000, 6]
2347 [250, 2000, 1]
2348 NaN
Name: Descricao, Length: 2349, dtype: object
Then, I am trying to split and everytime i get this kind of error:
df[['Larg','comp', 'qtd']] = dsc.str.split(',',expand=True)
df.head(5)
The Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15388/2481153233.py in <module>
----> 1 df[['Larg','comp', 'qtd']] = dsc.str.split(',',expand=True)
2 df.head(5)
~\anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
3598 self._setitem_frame(key, value)
3599 elif isinstance(key, (Series, np.ndarray, list, Index)):
-> 3600 self._setitem_array(key, value)
3601 elif isinstance(value, DataFrame):
3602 self._set_item_frame_value(key, value)
~\anaconda3\lib\site-packages\pandas\core\frame.py in _setitem_array(self, key, value)
3637 else:
3638 if isinstance(value, DataFrame):
-> 3639 check_key_length(self.columns, key, value)
3640 for k1, k2 in zip(key, value.columns):
3641 self[k1] = value[k2]
~\anaconda3\lib\site-packages\pandas\core\indexers.py in check_key_length(columns, key, value)
426 if columns.is_unique:
427 if len(value.columns) != len(key):
--> 428 raise ValueError("Columns must be same length as key")
429 else:
430 # Missing keys in columns are represented as -1
ValueError: Columns must be same length as key
I think there is something to do with str.findall generating a list of lists.
Does anybody know how can I solve this?
For information, all my columns are objects.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以尝试以下操作:
请注意,如果任何时候都有三个以上的列,这可能不起作用(考虑到您的尝试,情况并非如此)。
该方法来自在这里。
You could try this:
Note that this may not work if there are more than three columns at any point (I assume this will not be the case, given your attempt).
This method came from here.
一般情况下,某些输入可能没有解析为 3 个数值的字符串。
这里有一种方法可以完成问题所要求的操作,同时用 NaN 填充任何异常行的新列。如果非标准行的所需行为不同,则可以根据需要调整逻辑。
示例输出:
In the general case, some of the inputs may not have strings that parse to 3 numerical values.
Here is a way to do what the question asks while filling the new columns for any unusual rows with NaNs. If the desired behavior for non-standard rows is different, the logic can be adjusted as needed.
Sample Output:
谢谢大家!遵循@ConstantStranger解决方案,该解决方案的零件并开发了新版本。但这很容易开始。最后,我的解决方案是:
输出:
我想,这不是大数据的理想解决方案,但目前正在起作用。我尝试使用to_frame()尝试了.findall表达式,但是由于某种原因,每个长度都归功于零。
因此,现在我将寻找一种优化的方法。
Thank You all! Following the @constantstranger solution, a part from it solution and developed a new version. But it was an easy start. At the end, my solution was:
The Output:
I guess, it's not the ideal solution for big data but it's working for now. I tried the .findall expression with to_frame(), but for some reason every length went to zero.
So, now i'll be looking for a way to optimize.