我的代码因 ValueError 崩溃:索引包含重复条目,使用 yaho 数据阅读器时无法重塑

发布于 2025-01-13 07:30:11 字数 3390 浏览 0 评论 0原文

代码工作得很好,但现在在这些行之后给了我这个错误:

end = dt.datetime.now()
start = dt.date(end.year - 3, end.month, end.day)
prices = reader.get_data_yahoo(tickers,start,end)['Adj Close']

我尝试升级软件包和所有内容,但它没有帮助。即使对于我之前成功下载并通过它分析的数据,代码现在也不起作用。

ValueError                                Traceback (most recent call last)
Input In [6], in <cell line: 3>()
      1 end = dt.datetime.now()
      2 start = dt.date(end.year - 3, end.month, end.day)
----> 3 prices = reader.get_data_yahoo(tickers,start,end)['Adj Close']

File C:\Python310\lib\site-packages\pandas_datareader\data.py:80, in get_data_yahoo(*args, **kwargs)
     79 def get_data_yahoo(*args, **kwargs):
---> 80     return YahooDailyReader(*args, **kwargs).read()

File C:\Python310\lib\site-packages\pandas_datareader\base.py:256, in _DailyBaseReader.read(self)
    254 # Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT'])
    255 elif isinstance(self.symbols, DataFrame):
--> 256     df = self._dl_mult_symbols(self.symbols.index)
    257 else:
    258     df = self._dl_mult_symbols(self.symbols)

File C:\Python310\lib\site-packages\pandas_datareader\base.py:285, in _DailyBaseReader._dl_mult_symbols(self, symbols)
    283         stocks[sym] = df_na
    284 if PANDAS_0230:
--> 285     result = concat(stocks, sort=True).unstack(level=0)
    286 else:
    287     result = concat(stocks).unstack(level=0)

File C:\Python310\lib\site-packages\pandas\core\frame.py:8413, in DataFrame.unstack(self, level, fill_value)
   8351 """
   8352 Pivot a level of the (necessarily hierarchical) index labels.
   8353 
   (...)
   8409 dtype: float64
   8410 """
   8411 from pandas.core.reshape.reshape import unstack
-> 8413 result = unstack(self, level, fill_value)
   8415 return result.__finalize__(self, method="unstack")

File C:\Python310\lib\site-packages\pandas\core\reshape\reshape.py:478, in unstack(obj, level, fill_value)
    476 if isinstance(obj, DataFrame):
    477     if isinstance(obj.index, MultiIndex):
--> 478         return _unstack_frame(obj, level, fill_value=fill_value)
    479     else:
    480         return obj.T.stack(dropna=False)

File C:\Python310\lib\site-packages\pandas\core\reshape\reshape.py:501, in _unstack_frame(obj, level, fill_value)
    499 def _unstack_frame(obj, level, fill_value=None):
    500     if not obj._can_fast_transpose:
--> 501         unstacker = _Unstacker(obj.index, level=level)
    502         mgr = obj._mgr.unstack(unstacker, fill_value=fill_value)
    503         return obj._constructor(mgr)

File C:\Python310\lib\site-packages\pandas\core\reshape\reshape.py:140, in _Unstacker.__init__(self, index, level, constructor)
    133 if num_cells > np.iinfo(np.int32).max:
    134     warnings.warn(
    135         f"The following operation may generate {num_cells} cells "
    136         f"in the resulting pandas object.",
    137         PerformanceWarning,
    138     )
--> 140 self._make_selectors()

File C:\Python310\lib\site-packages\pandas\core\reshape\reshape.py:192, in _Unstacker._make_selectors(self)
    189 mask.put(selector, True)
    191 if mask.sum() < len(self.index):
--> 192     raise ValueError("Index contains duplicate entries, cannot reshape")
    194 self.group_index = comp_index
    195 self.mask = mask

ValueError: Index contains duplicate entries, cannot reshape

The code worked just fine but now it gives me this error after these lines:

end = dt.datetime.now()
start = dt.date(end.year - 3, end.month, end.day)
prices = reader.get_data_yahoo(tickers,start,end)['Adj Close']

I tried upgrading packages and everything but it didn't help.The code doesn't work now even for the data I previously successfully downloaded and analysied via it.

ValueError                                Traceback (most recent call last)
Input In [6], in <cell line: 3>()
      1 end = dt.datetime.now()
      2 start = dt.date(end.year - 3, end.month, end.day)
----> 3 prices = reader.get_data_yahoo(tickers,start,end)['Adj Close']

File C:\Python310\lib\site-packages\pandas_datareader\data.py:80, in get_data_yahoo(*args, **kwargs)
     79 def get_data_yahoo(*args, **kwargs):
---> 80     return YahooDailyReader(*args, **kwargs).read()

File C:\Python310\lib\site-packages\pandas_datareader\base.py:256, in _DailyBaseReader.read(self)
    254 # Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT'])
    255 elif isinstance(self.symbols, DataFrame):
--> 256     df = self._dl_mult_symbols(self.symbols.index)
    257 else:
    258     df = self._dl_mult_symbols(self.symbols)

File C:\Python310\lib\site-packages\pandas_datareader\base.py:285, in _DailyBaseReader._dl_mult_symbols(self, symbols)
    283         stocks[sym] = df_na
    284 if PANDAS_0230:
--> 285     result = concat(stocks, sort=True).unstack(level=0)
    286 else:
    287     result = concat(stocks).unstack(level=0)

File C:\Python310\lib\site-packages\pandas\core\frame.py:8413, in DataFrame.unstack(self, level, fill_value)
   8351 """
   8352 Pivot a level of the (necessarily hierarchical) index labels.
   8353 
   (...)
   8409 dtype: float64
   8410 """
   8411 from pandas.core.reshape.reshape import unstack
-> 8413 result = unstack(self, level, fill_value)
   8415 return result.__finalize__(self, method="unstack")

File C:\Python310\lib\site-packages\pandas\core\reshape\reshape.py:478, in unstack(obj, level, fill_value)
    476 if isinstance(obj, DataFrame):
    477     if isinstance(obj.index, MultiIndex):
--> 478         return _unstack_frame(obj, level, fill_value=fill_value)
    479     else:
    480         return obj.T.stack(dropna=False)

File C:\Python310\lib\site-packages\pandas\core\reshape\reshape.py:501, in _unstack_frame(obj, level, fill_value)
    499 def _unstack_frame(obj, level, fill_value=None):
    500     if not obj._can_fast_transpose:
--> 501         unstacker = _Unstacker(obj.index, level=level)
    502         mgr = obj._mgr.unstack(unstacker, fill_value=fill_value)
    503         return obj._constructor(mgr)

File C:\Python310\lib\site-packages\pandas\core\reshape\reshape.py:140, in _Unstacker.__init__(self, index, level, constructor)
    133 if num_cells > np.iinfo(np.int32).max:
    134     warnings.warn(
    135         f"The following operation may generate {num_cells} cells "
    136         f"in the resulting pandas object.",
    137         PerformanceWarning,
    138     )
--> 140 self._make_selectors()

File C:\Python310\lib\site-packages\pandas\core\reshape\reshape.py:192, in _Unstacker._make_selectors(self)
    189 mask.put(selector, True)
    191 if mask.sum() < len(self.index):
--> 192     raise ValueError("Index contains duplicate entries, cannot reshape")
    194 self.group_index = comp_index
    195 self.mask = mask

ValueError: Index contains duplicate entries, cannot reshape

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

陈年往事 2025-01-20 07:30:11

我知道这可能会令人沮丧,但目前您必须单独阅读每个股票行情。自 Pandas 最新版本以来,API 可能已损坏:

tickers = ['AAPL', 'MSFT']
end = dt.datetime.now()
start = dt.date(end.year - 3, end.month, end.day)

data = {}
for ticker in tickers:
    data[ticker] = reader.get_data_yahoo(ticker, start, end)['Adj Close']
prices = pd.concat(data, axis=1)

输出:

>>> prices
                  AAPL        MSFT
Date                              
2019-03-11   43.548748  109.345795
2019-03-12   44.038033  110.111404
2019-03-13   44.232773  110.964211
2019-03-14   44.724491  111.051437
2019-03-15   45.306278  112.330688
...                ...         ...
2022-03-07  159.300003  278.910004
2022-03-08  157.440002  275.850006
2022-03-09  162.949997  288.500000
2022-03-10  158.520004  285.589996
2022-03-10  158.520004  285.589996

[759 rows x 2 columns]

I know it can be frustrating but for the moment you have to read each ticker individually. The API is probably broken since the lastest versions of Pandas:

tickers = ['AAPL', 'MSFT']
end = dt.datetime.now()
start = dt.date(end.year - 3, end.month, end.day)

data = {}
for ticker in tickers:
    data[ticker] = reader.get_data_yahoo(ticker, start, end)['Adj Close']
prices = pd.concat(data, axis=1)

Output:

>>> prices
                  AAPL        MSFT
Date                              
2019-03-11   43.548748  109.345795
2019-03-12   44.038033  110.111404
2019-03-13   44.232773  110.964211
2019-03-14   44.724491  111.051437
2019-03-15   45.306278  112.330688
...                ...         ...
2022-03-07  159.300003  278.910004
2022-03-08  157.440002  275.850006
2022-03-09  162.949997  288.500000
2022-03-10  158.520004  285.589996
2022-03-10  158.520004  285.589996

[759 rows x 2 columns]
安静被遗忘 2025-01-20 07:30:11

当在周六或周日进行查询时,会出现该错误,因为雅虎财经会重复周五的数据两次。

你可以通过雅虎财经本身的历史数据来查看。

对于单个股票可以通过以下方式解决:

data = data[~data.index.duplicated(keep='last')]

但是,当下载股票列表的信息时,提出了解决方案通过迭代所述列表,然后连接该系列来构建 df。

然后你可以使用上面的代码来删除重复的索引。

The error occurs when the query is made on a Saturday or Sunday, since Yahoo Finance repeats the data for Friday twice.

You can check it by looking at the historical data in finance yahoo itself.

For a single stock can be solved with:

data = data[~data.index.duplicated(keep='last')]

But, when downloading info for a list of stocks, , the solution is proposed by iterating over said list and then concatenating the series to construct the df.

Then you can use the code above to remove the duplicate indexes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文