尝试使用 pandas python 进行 vlookup 时出现错误
这就是我正在尝试做的事情。 我有一个名为 newdf 的大型数据框。它有几行,但相关的行是年份和产品名称。我需要计算每年(从 2018 年到 2021 年)相同产品名称出现的次数,并创建一个如下所示的新数据框。
产品名称 | 2018 | 2019 | 20120 | 20121 |
---|---|---|---|---|
abc | 0 | 5 | 10 | 8 |
xyz | 2 | 0 | 0 | 5 |
这是我到目前为止所做的
df_target = pd.DataFrame({'Product Name': newdf['Product Name']}) #copied only the product name column to new dataframe df_target
df_target.drop_duplicates(subset= 'Product Name', keep='first') # deleted duplicates from this dataframe.
df_target["2018"]=""
df_target["2019"]="" #adding empty columns to the dataframe where results can later be added
df_target["2020"]=""
df_target["2021"]=""
df_target.set_index("Product Name",inplace = True) #Setting Product Name as index
df_2018 = newdf.query('YEAR == "2018"')
df_2019 = newdf.query('YEAR == "2019"')
df_2020 = newdf.query('YEAR == "2020"') #creating new dataframes for each year by filtering the original one
df_2021 = newdf.query('YEAR == "2021"')
counts_2018 = pd.DataFrame(df_2018.Product Name.value_counts().reset_index())
counts_2019 = pd.DataFrame(df_2019.Product Name.value_counts().reset_index())
counts_2020 = pd.DataFrame(df_2020.Product Name.value_counts().reset_index())
counts_2021 = pd.DataFrame(df_2021.Product Name.value_counts().reset_index()) # Counting the number of times a product number appears in each year
counts_2018.columns = ['Product Name', ' 2018']
counts_2019.columns = ['Product Name', ' 2019']
counts_2020.columns = ['Product Name', ' 2020']
counts_2021.columns = ['Product Name', ' 2021'] # Labelling the columns in the count dataframes.
df_target["2018"] = df_target.index.map(counts_2018["2018"]) # This last line of code is where I get the error. When I try to map data from the count data frame to the target one that I created earlier. The error is below
KeyError Traceback (最近一次调用)C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method,lerance)2392 try:->第2393章 2394、第2394章
pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5239)()
pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5085)()
pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20405)()
pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20359)()
密钥错误:“2018”
在处理上述异常的过程中,又发生了一个异常:
KeyError Traceback(最近一次调用最后一次)
在<模块>()中 ----> 1 df_target["2018"] = df_target.index.map(counts_2018["2018"]) C:\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key) 第2060章 第2061章: ->第2062章 2063 第2064章
_getitem_column(self, key) 中的 C:\Anaconda3\lib\site-packages\pandas\core\frame.py 第2067章 第2068章 ->第2069章 2070 攀上漂亮女局长之后2071可能降低维度
_get_item_cache(self, item) 中的 C:\Anaconda3\lib\site-packages\pandas\core\generic.py 第1532章 第1533章 ->第1534章 第1535章 第1536章
get(self, item, fastpath) 中的 C:\Anaconda3\lib\site-packages\pandas\core\internals.py 3588 第3589章 ->第3590章 第3591章 第3592章
get_loc 中的 C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py(self、key、method、tolerance) 第2393章 第2394章 ->第2395章 2396 第2397章
pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5239)()
pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5085)()
pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20405)()
pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20359)()
密钥错误:“2018”
错误很大,我找不到解决方法。有人可以请建议吗?
So here is what I am trying to do.
I have a large data frame named newdf. It has several rows, but the relevant ones for this are year, and product name. I need to count the number of times the same product names appear in each year (from 2018 to 2021), and create a new dataframe that would look like below.
Product Name | 2018 | 2019 | 20120 | 20121 |
---|---|---|---|---|
abc | 0 | 5 | 10 | 8 |
xyz | 2 | 0 | 0 | 5 |
Here is what I have done so far
df_target = pd.DataFrame({'Product Name': newdf['Product Name']}) #copied only the product name column to new dataframe df_target
df_target.drop_duplicates(subset= 'Product Name', keep='first') # deleted duplicates from this dataframe.
df_target["2018"]=""
df_target["2019"]="" #adding empty columns to the dataframe where results can later be added
df_target["2020"]=""
df_target["2021"]=""
df_target.set_index("Product Name",inplace = True) #Setting Product Name as index
df_2018 = newdf.query('YEAR == "2018"')
df_2019 = newdf.query('YEAR == "2019"')
df_2020 = newdf.query('YEAR == "2020"') #creating new dataframes for each year by filtering the original one
df_2021 = newdf.query('YEAR == "2021"')
counts_2018 = pd.DataFrame(df_2018.Product Name.value_counts().reset_index())
counts_2019 = pd.DataFrame(df_2019.Product Name.value_counts().reset_index())
counts_2020 = pd.DataFrame(df_2020.Product Name.value_counts().reset_index())
counts_2021 = pd.DataFrame(df_2021.Product Name.value_counts().reset_index()) # Counting the number of times a product number appears in each year
counts_2018.columns = ['Product Name', ' 2018']
counts_2019.columns = ['Product Name', ' 2019']
counts_2020.columns = ['Product Name', ' 2020']
counts_2021.columns = ['Product Name', ' 2021'] # Labelling the columns in the count dataframes.
df_target["2018"] = df_target.index.map(counts_2018["2018"]) # This last line of code is where I get the error. When I try to map data from the count data frame to the target one that I created earlier. The error is below
KeyError Traceback (most recent call last)C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)2392 try:-> 2393 return self._engine.get_loc(key)2394 except KeyError:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5239)()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20405)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20359)()
KeyError: '2018'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-18-cf92c30b79a3> in <module>()
----> 1 df_target["2018"] = df_target.index.map(counts_2018["2018"])C:\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
2060 return self._getitem_multilevel(key)
2061 else:
-> 2062 return self._getitem_column(key)
2063
2064 def _getitem_column(self, key):C:\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2067 # get column
2068 if self.columns.is_unique:
-> 2069 return self._get_item_cache(key)
2070
2071 # duplicate columns & possible reduce dimensionalityC:\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1532 res = cache.get(item)
1533 if res is None:
-> 1534 values = self._data.get(item)
1535 res = self._box_item_values(item, values)
1536 cache[item] = resC:\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3588
3589 if not isnull(item):
-> 3590 loc = self.items.get_loc(item)
3591 else:
3592 indexer = np.arange(len(self.items))[isnull(self.items)]C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2393 return self._engine.get_loc(key)
2394 except KeyError:
-> 2395 return self._engine.get_loc(self._maybe_cast_indexer(key))
2396
2397 indexer = self.get_indexer([key], method=method, tolerance=tolerance)pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5239)()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20405)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20359)()
KeyError: '2018'
The error is big, and I cant find a way to resolve it. Can anyone please advice?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论