尝试使用 pandas python 进行 vlookup 时出现错误

发布于 2025-01-16 04:35:47 字数 4192 浏览 1 评论 0原文

这就是我正在尝试做的事情。 我有一个名为 newdf 的大型数据框。它有几行,但相关的行是年份和产品名称。我需要计算每年(从 2018 年到 2021 年)相同产品名称出现的次数,并创建一个如下所示的新数据框。

产品名称201820192012020121
abc05108
xyz2005

这是我到目前为止所做的

df_target = pd.DataFrame({'Product Name': newdf['Product Name']}) #copied only the product name column to new dataframe df_target

df_target.drop_duplicates(subset= 'Product Name', keep='first') # deleted duplicates from this dataframe.

    df_target["2018"]=""
    df_target["2019"]=""   #adding empty columns to the dataframe where results can later be added
    df_target["2020"]=""
    df_target["2021"]=""


    df_target.set_index("Product Name",inplace = True) #Setting Product Name as index

    df_2018 = newdf.query('YEAR == "2018"')
    df_2019 = newdf.query('YEAR == "2019"')
    df_2020 = newdf.query('YEAR == "2020"') #creating new dataframes for each year by filtering the original one
    df_2021 = newdf.query('YEAR == "2021"')
  

    counts_2018 = pd.DataFrame(df_2018.Product Name.value_counts().reset_index())
    counts_2019 = pd.DataFrame(df_2019.Product Name.value_counts().reset_index())
    counts_2020 = pd.DataFrame(df_2020.Product Name.value_counts().reset_index())
    counts_2021 = pd.DataFrame(df_2021.Product Name.value_counts().reset_index())  # Counting the number of times a product number appears in each year


    counts_2018.columns = ['Product Name', ' 2018']
    counts_2019.columns = ['Product Name', ' 2019']
    counts_2020.columns = ['Product Name', ' 2020']
    counts_2021.columns = ['Product Name', ' 2021'] # Labelling the columns in the count dataframes.


    df_target["2018"] = df_target.index.map(counts_2018["2018"])  # This last line of code is where I get the error. When I try to map data from the count data frame to the target one that I created earlier. The error is below

KeyError Traceback (最近一次调用)C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method,lerance)2392 try:->第2393章 2394、第2394章

pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5239)()

pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5085)()

pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20405)()

pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20359)()

密钥错误:“2018”

在处理上述异常的过程中,又发生了一个异常:

KeyError Traceback(最近一次调用最后一次)在<模块>()中 ----> 1 df_target["2018"] = df_target.index.map(counts_2018["2018"])

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key) 第2060章 第2061章: ->第2062章 2063 第2064章

_getitem_column(self, key) 中的 C:\Anaconda3\lib\site-packages\pandas\core\frame.py 第2067章 第2068章 ->第2069章 2070 攀上漂亮女局长之后2071可能降低维度

_get_item_cache(self, item) 中的 C:\Anaconda3\lib\site-packages\pandas\core\generic.py 第1532章 第1533章 ->第1534章 第1535章 第1536章

get(self, item, fastpath) 中的 C:\Anaconda3\lib\site-packages\pandas\core\internals.py 3588 第3589章 ->第3590章 第3591章 第3592章

get_loc 中的 C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py(self、key、method、tolerance) 第2393章 第2394章 ->第2395章 2396 第2397章

pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5239)()

pandas._libs.index.IndexEngine.get_loc 中的 pandas_libs\index.pyx (pandas_libs\index.c:5085)()

pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20405)()

pandas._libs.hashtable.PyObjectHashTable.get_item 中的 pandas_libs\hashtable_class_helper.pxi (pandas_libs\hashtable.c:20359)()

密钥错误:“2018”


错误很大,我找不到解决方法。有人可以请建议吗?

So here is what I am trying to do.
I have a large data frame named newdf. It has several rows, but the relevant ones for this are year, and product name. I need to count the number of times the same product names appear in each year (from 2018 to 2021), and create a new dataframe that would look like below.

Product Name201820192012020121
abc05108
xyz2005

Here is what I have done so far

df_target = pd.DataFrame({'Product Name': newdf['Product Name']}) #copied only the product name column to new dataframe df_target

df_target.drop_duplicates(subset= 'Product Name', keep='first') # deleted duplicates from this dataframe.

    df_target["2018"]=""
    df_target["2019"]=""   #adding empty columns to the dataframe where results can later be added
    df_target["2020"]=""
    df_target["2021"]=""


    df_target.set_index("Product Name",inplace = True) #Setting Product Name as index

    df_2018 = newdf.query('YEAR == "2018"')
    df_2019 = newdf.query('YEAR == "2019"')
    df_2020 = newdf.query('YEAR == "2020"') #creating new dataframes for each year by filtering the original one
    df_2021 = newdf.query('YEAR == "2021"')
  

    counts_2018 = pd.DataFrame(df_2018.Product Name.value_counts().reset_index())
    counts_2019 = pd.DataFrame(df_2019.Product Name.value_counts().reset_index())
    counts_2020 = pd.DataFrame(df_2020.Product Name.value_counts().reset_index())
    counts_2021 = pd.DataFrame(df_2021.Product Name.value_counts().reset_index())  # Counting the number of times a product number appears in each year


    counts_2018.columns = ['Product Name', ' 2018']
    counts_2019.columns = ['Product Name', ' 2019']
    counts_2020.columns = ['Product Name', ' 2020']
    counts_2021.columns = ['Product Name', ' 2021'] # Labelling the columns in the count dataframes.


    df_target["2018"] = df_target.index.map(counts_2018["2018"])  # This last line of code is where I get the error. When I try to map data from the count data frame to the target one that I created earlier. The error is below

KeyError Traceback (most recent call last)C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)2392 try:-> 2393 return self._engine.get_loc(key)2394 except KeyError:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5239)()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20405)()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20359)()

KeyError: '2018'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
<ipython-input-18-cf92c30b79a3> in <module>()
----> 1 df_target["2018"] = df_target.index.map(counts_2018["2018"])

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
2060 return self._getitem_multilevel(key)
2061 else:
-> 2062 return self._getitem_column(key)
2063
2064 def _getitem_column(self, key):

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2067 # get column
2068 if self.columns.is_unique:
-> 2069 return self._get_item_cache(key)
2070
2071 # duplicate columns & possible reduce dimensionality

C:\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1532 res = cache.get(item)
1533 if res is None:
-> 1534 values = self._data.get(item)
1535 res = self._box_item_values(item, values)
1536 cache[item] = res

C:\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3588
3589 if not isnull(item):
-> 3590 loc = self.items.get_loc(item)
3591 else:
3592 indexer = np.arange(len(self.items))[isnull(self.items)]

C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2393 return self._engine.get_loc(key)
2394 except KeyError:
-> 2395 return self._engine.get_loc(self._maybe_cast_indexer(key))
2396
2397 indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5239)()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20405)()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20359)()

KeyError: '2018'


The error is big, and I cant find a way to resolve it. Can anyone please advice?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文