Python 3 Urlopen 在下载的文本文件中添加 \xef\xbb\xbf 前缀

发布于 2024-09-28 16:05:07 字数 799 浏览 5 评论 0原文

我正在编写一个脚本来下载和处理历史股票价格。当我使用 urllib.request.urlopen 时，我在每个文件 (b'\xef\xbb\xbf) 中都得到了一个奇怪的文本前缀，当我使用 urllib.request.urlretrieve 时，该前缀不存在，当我将 url 输入到浏览器（火狐）。所以我有一个答案，但我不知道为什么它首先引起了问题。我怀疑这可能是因为我强迫它成为一个字符串，但我不知道为什么会这样，也不知道我将如何解决这个问题（除了使用 urlretrieve 代替）。代码如下。相关行是第11行。后面的注释代码是我使用orlopen时的。

    #download a bunch of historical stock quotes from google finance

import urllib.request
symbolarray = []
symbolfile = open("symbols.txt")
for line in symbolfile:
    symbolarray.append(line.strip())
symbolfile.close()

for symbol in symbolarray:
    page = urllib.request.urlretrieve("http://www.google.com/finance/historical?q=NYSE:"+symbol+"&output=csv",symbol+".csv")
    #datafile = open(symbol+".csv","w")
    #datafile.write(str(page.read()))
    #datafile.close()

原文

I am working on a script to download and process historical stock prices.
When I used urllib.request.urlopen I got a strange prefix of text in every file (b'\xef\xbb\xbf) that was not present when I used urllib.request.urlretrieve, nor present when I typed the url into a browser (Firefox).
So I have an answer but I don't know why it was causing a problem in the first place. I suspect it may be because I forced it to be a string, but I don't know why that is or how I would work around that (other than to use urlretrieve instead).
The code is below. The relevant line is line 11. The commented code after is when I was using orlopen.

    #download a bunch of historical stock quotes from google finance

import urllib.request
symbolarray = []
symbolfile = open("symbols.txt")
for line in symbolfile:
    symbolarray.append(line.strip())
symbolfile.close()

for symbol in symbolarray:
    page = urllib.request.urlretrieve("http://www.google.com/finance/historical?q=NYSE:"+symbol+"&output=csv",symbol+".csv")
    #datafile = open(symbol+".csv","w")
    #datafile.write(str(page.read()))
    #datafile.close()

分享到QQ

分享到微博