当列具有相同名称时,尝试使用pd.read_csv读取和过滤列的问题

发布于 2025-02-13 11:18:38 字数 1235 浏览 0 评论 0原文

我已经使用

historingdata = pdr.get_data_yahoo(tickers,strie ='1mo',interval ='5m',prepost = true,true,group_by ='ticker')> 历史data.to_csv('./ store/store/store/historingdata.csv')

,它创建以下CSV

“

我现在想做的就是拉动“ m6a = f”& “卷”列并在总卷上进行总和,但是我遇到了一个问题,试图使用pandas从CSV文件中获取该列。

将列重新加载

如果我尝试使用voldata = pd.read_csv('./ store/store/historingdata.csv',usecols = ['m6a = f'])

voldata.to_csv('folumetest.csv')

它仅带有它找到的第一个“ m6a = f”列(在上一个图像中的第二行中具有“打开”的一个“ m6a = f”列),没有数量)过滤尝试帮助我到达M6A = F音量列。

因此,我使用

voldata = pd.read_csv('./ store/store/historingdata.csv')

('olumetest.csv')

voldata.to_csv 创建以下CSV,其中它已将所有列更改为唯一数据,这使我认为在使用pd.read_csv加载时可以做到这一点。

(顺便说一句,我需要从CSV中执行此操作,以免反复撤离每次我想从事历史数据时)

从CSV中获得“ M6A = F”“卷”列的正确方法是什么?

I have imported data from Yfinance using

historicaldata = pdr.get_data_yahoo(tickers, period='1mo', interval='5m', prepost=True, group_by='ticker')
historicaldata.to_csv('./store/historicaldata.csv')

which creates the following csv

historicaldata.csv

What I now want to do is pull in the "M6A=F" & "Volume" column and do a sum on the total volume but I am running into a problem trying to get at that column from the csv file using pandas.

If I try to load the columns back in using

voldata=pd.read_csv('./store/historicaldata.csv', usecols=['M6A=F'])

voldata.to_csv('volumetest.csv')

it only brings in the first "M6A=F" column it finds (the one with "Open" in the second row in the previous image) and no amount of filtering attempts helped me get to the M6A=F volume column.

So I did a test output using

voldata=pd.read_csv('./store/historicaldata.csv')

voldata.to_csv('volumetest.csv')

And discovered it creates the following csv where it has changed all the columns to unique data which makes me think it does that when loading it in using pd.read_csv.

enter image description here

(incidentally, I need to do this from the csv to avoid repeatedly pulling large amounts off yfinance every time I want to work on historical data)

What is the correct way for me to get at the "M6A=F" "Volume" column from a csv?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

命比纸薄 2025-02-20 11:18:38

由于您正在阅读由带有多指数列的数据框架生产的CSV文件,因此在将其读回数据框架中时必须考虑到这一点。

尝试以下内容:

from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override()

tickers = "MYM=F M6A=F"
hist_data = pdr.get_data_yahoo(
    tickers, period="1mo", interval="5m", prepost=True, group_by="ticker"
)

# Writing to csv-file
hist_data.to_csv("hist_data.csv")

# Reading back from csv-file
hist_data = pd.read_csv("hist_data.csv", index_col=0, header=[0, 1])

# Selecting the M6A=F/Volume-column:
volume = hist_data[[("M6A=F", "Volume")]]
print(volume)

第一个更改是使用index_col = 0设置索引列(显然是第一个)。第二,header = [0,1]是确保使用前2行用于构建多索引列。参见在这里

标题 int,int列表,none,默认'ceble'

...标题可以是指定列上多指数的行位置的整数列表...

结果:(

                           M6A=F
                          Volume
Datetime                        
2022-06-06 09:40:00-04:00    0.0
2022-06-06 09:45:00-04:00   67.0
2022-06-06 09:50:00-04:00   36.0
2022-06-06 09:55:00-04:00   18.0
2022-06-06 10:00:00-04:00   61.0
...                          ...
2022-07-06 09:20:00-04:00   47.0
2022-07-06 09:25:00-04:00   12.0
2022-07-06 09:30:00-04:00    7.0
2022-07-06 09:31:10-04:00    0.0
2022-07-06 09:31:20-04:00    NaN

[6034 rows x 1 columns]

我在此处使用了双括号hist_data [[[(“ m6a = f”,“ volume”)]获取显示显示的数据框列标签。如果您不需要,请使用单个括号hist_data [(“ m6a = f”,“卷”)]等)

Since you're reading a csv-file that was produced from a dataframe with multi-index columns, you have to take that into account when reading it back into a dataframe.

Try something like the following:

from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override()

tickers = "MYM=F M6A=F"
hist_data = pdr.get_data_yahoo(
    tickers, period="1mo", interval="5m", prepost=True, group_by="ticker"
)

# Writing to csv-file
hist_data.to_csv("hist_data.csv")

# Reading back from csv-file
hist_data = pd.read_csv("hist_data.csv", index_col=0, header=[0, 1])

# Selecting the M6A=F/Volume-column:
volume = hist_data[[("M6A=F", "Volume")]]
print(volume)

The first change is to set an index column by using index_col=0 (obviously the first here). And the second, header=[0, 1], is to make sure that the first 2 rows are used to build the multi-index columns. See here:

header : int, list of int, None, default ‘infer’

... The header can be a list of integers that specify row locations for a multi-index on the columns ...

Result:

                           M6A=F
                          Volume
Datetime                        
2022-06-06 09:40:00-04:00    0.0
2022-06-06 09:45:00-04:00   67.0
2022-06-06 09:50:00-04:00   36.0
2022-06-06 09:55:00-04:00   18.0
2022-06-06 10:00:00-04:00   61.0
...                          ...
2022-07-06 09:20:00-04:00   47.0
2022-07-06 09:25:00-04:00   12.0
2022-07-06 09:30:00-04:00    7.0
2022-07-06 09:31:10-04:00    0.0
2022-07-06 09:31:20-04:00    NaN

[6034 rows x 1 columns]

(I've used double brackets here hist_data[[("M6A=F", "Volume")]] to get a dataframe that shows the column label. If you don't need that, use single brackets hist_data[("M6A=F", "Volume")] etc.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文