调用连续几个月的行数据并将其写入列中?
我有气候数据表,基本上,我需要将部分行转置为列,反之亦然。不幸的是,格式有点尴尬。我收到的数据包括年、月、该月的天数、行中气候数据的类型,然后是连续的 93 列,每列代表一个每日值,后面有一个相关的标志(所以每月的每一天有 3 个术语、一个值和 2 个标志)。尽管月份的长度各不相同,但较短的月份已在最后几列中填充了空值。出于处理和建模的目的,我想要的是一个电子表格/.csv 文件,其列如下:
年、月、日(即数字 1 到 31),然后是代表气候数据类型的五列(降水量)。 、雪、雪水、tmax、tmin)。
如果我也可以获得具有适当标志值的列,那就太好了,但这不是优先事项。因此,我编写了下面的代码,将行解压到列表中(可能效率很低,但我是新手),根据行中的位置表示年、月、气候变量类型、变量值、flag1 和 flag2 (对应于一天,1 到 31):
import matplotlib.mlab as mlab
from matplotlib.pyplot import figure, show
import numpy as np
import scipy
import csv
durham='C:\\Users\\LocalUser\\Desktop\\Drought Data\\My_Met_Data\\USHCN\\Durham.csv'
txt='met'
station='Durham'
output=station+"_"+txt+"_"+"new"+".csv"
infile=open(durham,'r')
outfile=open(output,'w')
writer=csv.writer(outfile)
yr=[]; mon=[]; var=[]; unit=[]; flag1= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
flag2=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
value=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
valu=[]; flg1=[]; flg2=[]; prcp=[]; snow=[]; snwd=[]; tmax=[]; tmin=[]; row=[]
for line in infile:
stationid, variable, units, year, month, days, flag1[0], value[0], flag2[0], flag1[1], value[1], flag2[1], flag1[2], value[2], flag2[2],\
flag1[3], value[3], flag2[3], flag1[4], value[4], flag2[4], flag1[5], value[5], flag2[5], flag1[6], value[6], flag2[6],\
flag1[7], value[7], flag2[7], flag1[8], value[8], flag2[8] ,flag1[9], value[9], flag2[9], flag1[10], value[10], flag2[10],\
flag1[11], value[11], flag2[11], flag1[12], value[12], flag2[12], flag1[13], value[13], flag2[13], flag1[14], value[14], flag2[14],\
flag1[15], value[15], flag2[15], flag1[16], value[16], flag2[16], flag1[17], value[17], flag2[17], flag1[18], value[18], flag2[18],\
flag1[19], value[19], flag2[19], flag1[20], value[20], flag2[20], flag1[21], value[21], flag2[21], flag1[22], value[22], flag2[22],\
flag1[23], value[23], flag2[23], flag1[24], value[24], flag2[24], flag1[25], value[25], flag2[25], flag1[26], value[26], flag2[26],\
flag1[27], value[27], flag2[27], flag1[28], value[28], flag2[28], flag1[29], value[29], flag2[29], flag1[30], value[30], flag2[30]=line.split(',')
yr=[int(year)]
mon=[int(month)]
var=variable
unit=units
for yr in range(1926, 2003):
for mon in range(1,13):
if var=='PRCP':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
prcp.append(valu[j])
elif var=='SNOW':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
snow.append(valu[j])
elif var=='SNWD':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
snwd.append(valu[j])
elif var=='TMAX':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
tmax.append(valu[j])
elif var=='TMIN':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
tmin.append(valu[j])
row=[yr, mon, j+1, prcp[j], snow[j], snwd[j], tmax[j], tmin[j]]
writer.writerow(row)
infile.close()
outfile.close()
现在,撇开运行此程序时出现内存错误不谈,如果我去掉一些气候变量,那么我会成功地在我想要的格式。问题是,每个月、每个年 (1926-2002) 都会报告相同的气候数据值,即 1926 年 1 月的数据。代码从适当的变量调用数据适当的一天,但月复一月重复相同的数据。我不确定我在哪里出了问题,但任何建议/帮助将不胜感激。
I have spread sheets of climate data for which, essentially, I need to transpose parts of rows into columns and vice versa. Unfortunately, the format is somewhat awkward. The data came to me with columns for year, month, number of days in the month, the type of climate data in the row, and then a successive 93 columns, each representing a daily value, succeeded and preceded by an associated flag (so 3 terms, a value and 2 flags, for each day of the month). Although months vary in length, the shorter months have been filled out with null values in the last few columns. What I want, for processing and modeling purposes, is a spreadsheet/.csv file with columns as follows:
year, month, day of month (i.e. a number 1 to 31), and then five columns representing the type of climate data (precip, snow, snow water, tmax, tmin).
If I could get columns with the appropriate flag values as well, that would be great, but it's not a priority. So, I've written the code below to unpack rows into lists (probably very inefficiently, but I'm new at this) representing year, month, type of climate variable, variable value, flag1 and flag2 based on the location in the row (corresponding to a day, 1 to 31):
import matplotlib.mlab as mlab
from matplotlib.pyplot import figure, show
import numpy as np
import scipy
import csv
durham='C:\\Users\\LocalUser\\Desktop\\Drought Data\\My_Met_Data\\USHCN\\Durham.csv'
txt='met'
station='Durham'
output=station+"_"+txt+"_"+"new"+".csv"
infile=open(durham,'r')
outfile=open(output,'w')
writer=csv.writer(outfile)
yr=[]; mon=[]; var=[]; unit=[]; flag1= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
flag2=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
value=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
valu=[]; flg1=[]; flg2=[]; prcp=[]; snow=[]; snwd=[]; tmax=[]; tmin=[]; row=[]
for line in infile:
stationid, variable, units, year, month, days, flag1[0], value[0], flag2[0], flag1[1], value[1], flag2[1], flag1[2], value[2], flag2[2],\
flag1[3], value[3], flag2[3], flag1[4], value[4], flag2[4], flag1[5], value[5], flag2[5], flag1[6], value[6], flag2[6],\
flag1[7], value[7], flag2[7], flag1[8], value[8], flag2[8] ,flag1[9], value[9], flag2[9], flag1[10], value[10], flag2[10],\
flag1[11], value[11], flag2[11], flag1[12], value[12], flag2[12], flag1[13], value[13], flag2[13], flag1[14], value[14], flag2[14],\
flag1[15], value[15], flag2[15], flag1[16], value[16], flag2[16], flag1[17], value[17], flag2[17], flag1[18], value[18], flag2[18],\
flag1[19], value[19], flag2[19], flag1[20], value[20], flag2[20], flag1[21], value[21], flag2[21], flag1[22], value[22], flag2[22],\
flag1[23], value[23], flag2[23], flag1[24], value[24], flag2[24], flag1[25], value[25], flag2[25], flag1[26], value[26], flag2[26],\
flag1[27], value[27], flag2[27], flag1[28], value[28], flag2[28], flag1[29], value[29], flag2[29], flag1[30], value[30], flag2[30]=line.split(',')
yr=[int(year)]
mon=[int(month)]
var=variable
unit=units
for yr in range(1926, 2003):
for mon in range(1,13):
if var=='PRCP':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
prcp.append(valu[j])
elif var=='SNOW':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
snow.append(valu[j])
elif var=='SNWD':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
snwd.append(valu[j])
elif var=='TMAX':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
tmax.append(valu[j])
elif var=='TMIN':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
tmin.append(valu[j])
row=[yr, mon, j+1, prcp[j], snow[j], snwd[j], tmax[j], tmin[j]]
writer.writerow(row)
infile.close()
outfile.close()
Now, leaving aside that I get a memory error when I run this, if I take away a few of the climate variables, then I get a do successfully get a .csv file in the format that I want. The problem is, that every single month, in every single year (1926-2002), reports the same climate data values-- that is, the data for the month of january 1926. The code is calling the data from the appropriate variable for the appropriate day, but repeating the same data month after month. I'm not sure where I've gone wrong with this, but any suggestions/help would be much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
每个
for
都会形成一个循环;您的代码有两个彼此内部的年份循环 - 它处理文件第一行的所有年份,然后是第二行的所有年份,依此类推。这是您遇到的错误,但如果你以某种方式去修复它,很快就会出现另一个。现在,请在图书馆借一本好的 Python 书籍,花一些时间阅读和做练习。或者参加课程。找一位知识渊博的朋友来帮你检查你的代码。 StackOverflow 可能会帮助您解决特定问题,但遗憾的是它无法教您概念。你走错了路;如果你继续这样下去,只会有麻烦。你应该回去更好地学习基础知识,从长远来看,这会让事情变得更容易。
计算机可以为您完成繁琐且重复的任务。您不应该永远键入大量数字或编号变量。
熟悉列表(以及列表列表)和 range 函数。
对变量使用描述性名称,而不是缩写。这是Python,我们喜欢事情清晰。并将每个语句放在自己的行上;所有这些分号看起来都很丑。如果您想共享代码、获得帮助或只是整理自己的想法,这些事情就很重要。
研究 csv 模块 的文档并使用其阅读器,而不仅仅是作者。
熟悉列表切片,特别是 line[1::3] 类型。
了解文件的 with 语句。
如果您在每个 if/elif 中执行相同的操作,请将其从那里移至公共位置。
祝你有一天成为一名优秀的程序员:)
Each
for
makes a loop; your code has two loops for years inside each other – it processes all the years with the first line of the file, then all the years for the second, etc. This is the error you're running into, but if you just go and fix it somehow, another one will pop up soon enough.Now, please do borrow a good Python book at the library and spend some time reading and doing the exercises. Or join a course. Get a knowledgeable friend to review your code for you. StackOverflow might help you to get through a specific problem, but sadly it can't teach you the concepts. You're on the wrong path; there's nothing but trouble ahead if you just continue like this. You should go back and learn the basics a bit better, it'll make it much easier in the long run.
Computers are there to do tedious and repetitive tasks for you. You should never type out a giant list of numbers or numbered variables.
Get familiar with lists (and lists of lists), and the range function.
Use descriptive names for your variables, not abbreviations. This is Python, we like things clear. And put each statement on its own line; all those semicolons look ugly. These things are important if you want to share code, get help, or just organize your own thoughts on it.
Study the documentation for the csv module and use its reader also, not just the writer.
Get familiar with list slicing, particularly the line[1::3] variety.
Learn the with statement for files.
If you do the same thing in each if/elif, move it out of there to a common place.
Here's to you becoming a good programmer one day :)