尝试使用 pandas python 进行 vlookup 时出现错误

发布于 2025-01-16 04:35:47 字数 4192 浏览 1 评论 0原文

这就是我正在尝试做的事情。我有一个名为 newdf 的大型数据框。它有几行，但相关的行是年份和产品名称。我需要计算每年（从 2018 年到 2021 年）相同产品名称出现的次数，并创建一个如下所示的新数据框。

产品名称	2018	2019	20120	20121
abc	0	5	10	8
xyz	2	0	0	5

这是我到目前为止所做的

df_target = pd.DataFrame({'Product Name': newdf['Product Name']}) #copied only the product name column to new dataframe df_target

df_target.drop_duplicates(subset= 'Product Name', keep='first') # deleted duplicates from this dataframe.

    df_target["2018"]=""
    df_target["2019"]=""   #adding empty columns to the dataframe where results can later be added
    df_target["2020"]=""
    df_target["2021"]=""


    df_target.set_index("Product Name",inplace = True) #Setting Product Name as index

    df_2018 = newdf.query('YEAR == "2018"')
    df_2019 = newdf.query('YEAR == "2019"')
    df_2020 = newdf.query('YEAR == "2020"') #creating new dataframes for each year by filtering the original one
    df_2021 = newdf.query('YEAR == "2021"')
  

    counts_2018 = pd.DataFrame(df_2018.Product Name.value_counts().reset_index())
    counts_2019 = pd.DataFrame(df_2019.Product Name.value_counts().reset_index())
    counts_2020 = pd.DataFrame(df_2020.Product Name.value_counts().reset_index())
    counts_2021 = pd.DataFrame(df_2021.Product Name.value_counts().reset_index())  # Counting the number of times a product number appears in each year


    counts_2018.columns = ['Product Name', ' 2018']
    counts_2019.columns = ['Product Name', ' 2019']
    counts_2020.columns = ['Product Name', ' 2020']
    counts_2021.columns = ['Product Name', ' 2021'] # Labelling the columns in the count dataframes.


    df_target["2018"] = df_target.index.map(counts_2018["2018"])  # This last line of code is where I get the error. When I try to map data from the count data frame to the target one that I created earlier. The error is below

KeyError Traceback (最近一次调用)C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method,lerance)2392 try:->第2393章 2394、第2394章
pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5239)()
pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5085)()
pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20405)()
pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20359)()
密钥错误：“2018”
在处理上述异常的过程中，又发生了一个异常：
KeyError Traceback（最近一次调用最后一次）在<模块>()中 ----> 1 df_target["2018"] = df_target.index.map(counts_2018["2018"])
C:\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key) 第2060章第2061章： ->第2062章 2063 第2064章
_getitem_column(self, key) 中的 C:\Anaconda3\lib\site-packages\pandas\core\frame.py 第2067章第2068章 ->第2069章 2070 攀上漂亮女局长之后2071可能降低维度
_get_item_cache(self, item) 中的 C:\Anaconda3\lib\site-packages\pandas\core\generic.py 第1532章第1533章 ->第1534章第1535章第1536章
get(self, item, fastpath) 中的 C:\Anaconda3\lib\site-packages\pandas\core\internals.py 3588 第3589章 ->第3590章第3591章第3592章
get_loc 中的 C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py(self、key、method、tolerance) 第2393章第2394章 ->第2395章 2396 第2397章
pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5239)()
pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5085)()
pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20405)()
pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20359)()
密钥错误：“2018”

错误很大，我找不到解决方法。有人可以请建议吗？

原文

So here is what I am trying to do.
I have a large data frame named newdf. It has several rows, but the relevant ones for this are year, and product name. I need to count the number of times the same product names appear in each year (from 2018 to 2021), and create a new dataframe that would look like below.

Product Name	2018	2019	20120	20121
abc	0	5	10	8
xyz	2	0	0	5

Here is what I have done so far

df_target = pd.DataFrame({'Product Name': newdf['Product Name']}) #copied only the product name column to new dataframe df_target

df_target.drop_duplicates(subset= 'Product Name', keep='first') # deleted duplicates from this dataframe.

    df_target["2018"]=""
    df_target["2019"]=""   #adding empty columns to the dataframe where results can later be added
    df_target["2020"]=""
    df_target["2021"]=""


    df_target.set_index("Product Name",inplace = True) #Setting Product Name as index

    df_2018 = newdf.query('YEAR == "2018"')
    df_2019 = newdf.query('YEAR == "2019"')
    df_2020 = newdf.query('YEAR == "2020"') #creating new dataframes for each year by filtering the original one
    df_2021 = newdf.query('YEAR == "2021"')
  

    counts_2018 = pd.DataFrame(df_2018.Product Name.value_counts().reset_index())
    counts_2019 = pd.DataFrame(df_2019.Product Name.value_counts().reset_index())
    counts_2020 = pd.DataFrame(df_2020.Product Name.value_counts().reset_index())
    counts_2021 = pd.DataFrame(df_2021.Product Name.value_counts().reset_index())  # Counting the number of times a product number appears in each year


    counts_2018.columns = ['Product Name', ' 2018']
    counts_2019.columns = ['Product Name', ' 2019']
    counts_2020.columns = ['Product Name', ' 2020']
    counts_2021.columns = ['Product Name', ' 2021'] # Labelling the columns in the count dataframes.


    df_target["2018"] = df_target.index.map(counts_2018["2018"])  # This last line of code is where I get the error. When I try to map data from the count data frame to the target one that I created earlier. The error is below

KeyError Traceback (most recent call last)C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)2392 try:-> 2393 return self._engine.get_loc(key)2394 except KeyError:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5239)()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20405)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20359)()
KeyError: '2018'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-18-cf92c30b79a3> in <module>()
----> 1 df_target["2018"] = df_target.index.map(counts_2018["2018"])
C:\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
2060 return self._getitem_multilevel(key)
2061 else:
-> 2062 return self._getitem_column(key)
2063
2064 def _getitem_column(self, key):
C:\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2067 # get column
2068 if self.columns.is_unique:
-> 2069 return self._get_item_cache(key)
2070
2071 # duplicate columns & possible reduce dimensionality
C:\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1532 res = cache.get(item)
1533 if res is None:
-> 1534 values = self._data.get(item)
1535 res = self._box_item_values(item, values)
1536 cache[item] = res
C:\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3588
3589 if not isnull(item):
-> 3590 loc = self.items.get_loc(item)
3591 else:
3592 indexer = np.arange(len(self.items))[isnull(self.items)]
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2393 return self._engine.get_loc(key)
2394 except KeyError:
-> 2395 return self._engine.get_loc(self._maybe_cast_indexer(key))
2396
2397 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5239)()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20405)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20359)()
KeyError: '2018'