我正在尝试从大数据中删除重复项(4919214, 2),但出现此错误
这是数据框的形状,
df.head()
当我尝试删除重复项时出现此错误,并且我还尝试从“坐标”列替换[此数据框,请帮助我,这
df.drop_duplicates(subset='coordinates')
是我不断收到的错误
TypeError Traceback (most recent call last)
<ipython-input-25-f4aacff1447d> in <module>
----> 1 df.drop_duplicates(subset='coordinates')
~\anaconda3\lib\site-packages\pandas\core\frame.py in drop_duplicates(self, subset, keep, inplace, ignore_index)
5269 inplace = validate_bool_kwarg(inplace, "inplace")
5270 ignore_index = validate_bool_kwarg(ignore_index, "ignore_index")
-> 5271 duplicated = self.duplicated(subset, keep=keep)
5272
5273 result = self[-duplicated]
~\anaconda3\lib\site-packages\pandas\core\frame.py in duplicated(self, subset, keep)
5406
5407 vals = (col.values for name, col in self.items() if name in subset)
-> 5408 labels, shape = map(list, zip(*map(f, vals)))
5409
5410 ids = get_group_index(labels, shape, sort=False, xnull=False)
~\anaconda3\lib\site-packages\pandas\core\frame.py in f(vals)
5380
5381 def f(vals):
-> 5382 labels, shape = algorithms.factorize(
5383 vals, size_hint=min(len(self), SIZE_HINT_LIMIT)
5384 )
~\anaconda3\lib\site-packages\pandas\core\algorithms.py in factorize(values, sort, na_sentinel, size_hint)
720 na_value = None
721
--> 722 codes, uniques = factorize_array(
723 values, na_sentinel=na_sentinel, size_hint=size_hint, na_value=na_value
724 )
~\anaconda3\lib\site-packages\pandas\core\algorithms.py in factorize_array(values, na_sentinel, size_hint, na_value, mask)
526
527 table = hash_klass(size_hint or len(values))
--> 528 uniques, codes = table.factorize(
529 values, na_sentinel=na_sentinel, na_value=na_value, mask=mask
530 )
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.factorize()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()
TypeError: unhashable type: 'list'
here is shape of dataframe
df.head()
when i try to remove duplicates getting this error and i also try to replace [ this from column 'coordinates' please help me out in this
df.drop_duplicates(subset='coordinates')
this is the error i'm getting continuously
TypeError Traceback (most recent call last)
<ipython-input-25-f4aacff1447d> in <module>
----> 1 df.drop_duplicates(subset='coordinates')
~\anaconda3\lib\site-packages\pandas\core\frame.py in drop_duplicates(self, subset, keep, inplace, ignore_index)
5269 inplace = validate_bool_kwarg(inplace, "inplace")
5270 ignore_index = validate_bool_kwarg(ignore_index, "ignore_index")
-> 5271 duplicated = self.duplicated(subset, keep=keep)
5272
5273 result = self[-duplicated]
~\anaconda3\lib\site-packages\pandas\core\frame.py in duplicated(self, subset, keep)
5406
5407 vals = (col.values for name, col in self.items() if name in subset)
-> 5408 labels, shape = map(list, zip(*map(f, vals)))
5409
5410 ids = get_group_index(labels, shape, sort=False, xnull=False)
~\anaconda3\lib\site-packages\pandas\core\frame.py in f(vals)
5380
5381 def f(vals):
-> 5382 labels, shape = algorithms.factorize(
5383 vals, size_hint=min(len(self), SIZE_HINT_LIMIT)
5384 )
~\anaconda3\lib\site-packages\pandas\core\algorithms.py in factorize(values, sort, na_sentinel, size_hint)
720 na_value = None
721
--> 722 codes, uniques = factorize_array(
723 values, na_sentinel=na_sentinel, size_hint=size_hint, na_value=na_value
724 )
~\anaconda3\lib\site-packages\pandas\core\algorithms.py in factorize_array(values, na_sentinel, size_hint, na_value, mask)
526
527 table = hash_klass(size_hint or len(values))
--> 528 uniques, codes = table.factorize(
529 values, na_sentinel=na_sentinel, na_value=na_value, mask=mask
530 )
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.factorize()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()
TypeError: unhashable type: 'list'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的坐标列是一个列表,您不能在其上使用 drop_duplicates 。
一种方法是将其视为字符串,但是请注意,列表中的不同排序会将它们视为唯一值。
请注意,eval 不是一个安全函数,可能会让您容易受到代码注入 。
Your coordinates column is a list, which you cannot use drop_duplicates on.
One approach is to treat it as a string, however, note that different ordering in the list will treat them as unique values.
Note that eval is not a secure function and may leave you vulnerable to code injection.