当列具有相同名称时，尝试使用pd.read_csv读取和过滤列的问题

发布于 2025-02-13 11:18:38 字数 1235 浏览 0 评论 0原文

我已经使用

historingdata = pdr.get_data_yahoo（tickers，strie ='1mo'，interval ='5m'，prepost = true，true，group_by ='ticker'）> 历史data.to_csv（'./ store/store/store/historingdata.csv'）

，它创建以下CSV

我现在想做的就是拉动“ m6a = f”＆amp; “卷”列并在总卷上进行总和，但是我遇到了一个问题，试图使用pandas从CSV文件中获取该列。

将列重新加载

如果我尝试使用voldata = pd.read_csv（'./ store/store/historingdata.csv'，usecols = ['m6a = f']）

voldata.to_csv（'folumetest.csv'）

它仅带有它找到的第一个“ m6a = f”列（在上一个图像中的第二行中具有“打开”的一个“ m6a = f”列），没有数量）过滤尝试帮助我到达M6A = F音量列。

因此，我使用

voldata = pd.read_csv（'./ store/store/historingdata.csv'）

（'olumetest.csv'）

voldata.to_csv 创建以下CSV，其中它已将所有列更改为唯一数据，这使我认为在使用pd.read_csv加载时可以做到这一点。

（顺便说一句，我需要从CSV中执行此操作，以免反复撤离每次我想从事历史数据时）

从CSV中获得“ M6A = F”“卷”列的正确方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

命比纸薄 2025-02-20 11:18:38

由于您正在阅读由带有多指数列的数据框架生产的CSV文件，因此在将其读回数据框架中时必须考虑到这一点。

尝试以下内容：

from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override()

tickers = "MYM=F M6A=F"
hist_data = pdr.get_data_yahoo(
    tickers, period="1mo", interval="5m", prepost=True, group_by="ticker"
)

# Writing to csv-file
hist_data.to_csv("hist_data.csv")

# Reading back from csv-file
hist_data = pd.read_csv("hist_data.csv", index_col=0, header=[0, 1])

# Selecting the M6A=F/Volume-column:
volume = hist_data[[("M6A=F", "Volume")]]
print(volume)

第一个更改是使用index_col = 0设置索引列（显然是第一个）。第二，header = [0，1]是确保使用前2行用于构建多索引列。参见在这里

标题： int，int列表，none，默认'ceble'
...标题可以是指定列上多指数的行位置的整数列表...

结果：（

                           M6A=F
                          Volume
Datetime                        
2022-06-06 09:40:00-04:00    0.0
2022-06-06 09:45:00-04:00   67.0
2022-06-06 09:50:00-04:00   36.0
2022-06-06 09:55:00-04:00   18.0
2022-06-06 10:00:00-04:00   61.0
...                          ...
2022-07-06 09:20:00-04:00   47.0
2022-07-06 09:25:00-04:00   12.0
2022-07-06 09:30:00-04:00    7.0
2022-07-06 09:31:10-04:00    0.0
2022-07-06 09:31:20-04:00    NaN

[6034 rows x 1 columns]

我在此处使用了双括号hist_data [[[（“ m6a = f”，“ volume”）]获取显示显示的数据框列标签。如果您不需要，请使用单个括号hist_data [（“ m6a = f”，“卷”）]等）

Since you're reading a csv-file that was produced from a dataframe with multi-index columns, you have to take that into account when reading it back into a dataframe.

Try something like the following:

from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override()

tickers = "MYM=F M6A=F"
hist_data = pdr.get_data_yahoo(
    tickers, period="1mo", interval="5m", prepost=True, group_by="ticker"
)

# Writing to csv-file
hist_data.to_csv("hist_data.csv")

# Reading back from csv-file
hist_data = pd.read_csv("hist_data.csv", index_col=0, header=[0, 1])

# Selecting the M6A=F/Volume-column:
volume = hist_data[[("M6A=F", "Volume")]]
print(volume)

The first change is to set an index column by using index_col=0 (obviously the first here). And the second, header=[0, 1], is to make sure that the first 2 rows are used to build the multi-index columns. See here:

header : int, list of int, None, default ‘infer’
... The header can be a list of integers that specify row locations for a multi-index on the columns ...

Result:

                           M6A=F
                          Volume
Datetime                        
2022-06-06 09:40:00-04:00    0.0
2022-06-06 09:45:00-04:00   67.0
2022-06-06 09:50:00-04:00   36.0
2022-06-06 09:55:00-04:00   18.0
2022-06-06 10:00:00-04:00   61.0
...                          ...
2022-07-06 09:20:00-04:00   47.0
2022-07-06 09:25:00-04:00   12.0
2022-07-06 09:30:00-04:00    7.0
2022-07-06 09:31:10-04:00    0.0
2022-07-06 09:31:20-04:00    NaN

[6034 rows x 1 columns]

(I've used double brackets here hist_data[[("M6A=F", "Volume")]] to get a dataframe that shows the column label. If you don't need that, use single brackets hist_data[("M6A=F", "Volume")] etc.)

回复收藏 0 原文

~没有更多了~