当列具有相同名称时,尝试使用pd.read_csv读取和过滤列的问题
我已经使用
historingdata = pdr.get_data_yahoo(tickers,strie ='1mo',interval ='5m',prepost = true,true,group_by ='ticker')
> 历史data.to_csv('./ store/store/store/historingdata.csv')
,它创建以下CSV
我现在想做的就是拉动“ m6a = f”& “卷”列并在总卷上进行总和,但是我遇到了一个问题,试图使用pandas从CSV文件中获取该列。
将列重新加载
如果我尝试使用voldata = pd.read_csv('./ store/store/historingdata.csv',usecols = ['m6a = f'])
voldata.to_csv('folumetest.csv')
它仅带有它找到的第一个“ m6a = f”列(在上一个图像中的第二行中具有“打开”的一个“ m6a = f”列),没有数量)过滤尝试帮助我到达M6A = F音量列。
因此,我使用
voldata = pd.read_csv('./ store/store/historingdata.csv')
('olumetest.csv')
voldata.to_csv 创建以下CSV,其中它已将所有列更改为唯一数据,这使我认为在使用pd.read_csv加载时可以做到这一点。
(顺便说一句,我需要从CSV中执行此操作,以免反复撤离每次我想从事历史数据时)
从CSV中获得“ M6A = F”“卷”列的正确方法是什么?
I have imported data from Yfinance using
historicaldata = pdr.get_data_yahoo(tickers, period='1mo', interval='5m', prepost=True, group_by='ticker')
historicaldata.to_csv('./store/historicaldata.csv')
which creates the following csv
What I now want to do is pull in the "M6A=F" & "Volume" column and do a sum on the total volume but I am running into a problem trying to get at that column from the csv file using pandas.
If I try to load the columns back in using
voldata=pd.read_csv('./store/historicaldata.csv', usecols=['M6A=F'])
voldata.to_csv('volumetest.csv')
it only brings in the first "M6A=F" column it finds (the one with "Open" in the second row in the previous image) and no amount of filtering attempts helped me get to the M6A=F volume column.
So I did a test output using
voldata=pd.read_csv('./store/historicaldata.csv')
voldata.to_csv('volumetest.csv')
And discovered it creates the following csv where it has changed all the columns to unique data which makes me think it does that when loading it in using pd.read_csv.
(incidentally, I need to do this from the csv to avoid repeatedly pulling large amounts off yfinance every time I want to work on historical data)
What is the correct way for me to get at the "M6A=F" "Volume" column from a csv?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于您正在阅读由带有多指数列的数据框架生产的CSV文件,因此在将其读回数据框架中时必须考虑到这一点。
尝试以下内容:
第一个更改是使用
index_col = 0
设置索引列(显然是第一个)。第二,header = [0,1]
是确保使用前2行用于构建多索引列。参见在这里结果:(
我在此处使用了双括号
hist_data [[[(“ m6a = f”,“ volume”)]
获取显示显示的数据框列标签。如果您不需要,请使用单个括号hist_data [(“ m6a = f”,“卷”)]
等)Since you're reading a csv-file that was produced from a dataframe with multi-index columns, you have to take that into account when reading it back into a dataframe.
Try something like the following:
The first change is to set an index column by using
index_col=0
(obviously the first here). And the second,header=[0, 1]
, is to make sure that the first 2 rows are used to build the multi-index columns. See here:Result:
(I've used double brackets here
hist_data[[("M6A=F", "Volume")]]
to get a dataframe that shows the column label. If you don't need that, use single bracketshist_data[("M6A=F", "Volume")]
etc.)