调用连续几个月的行数据并将其写入列中？

发布于 2024-12-10 12:44:00 字数 3597 浏览 0 评论 0原文

我有气候数据表，基本上，我需要将部分行转置为列，反之亦然。不幸的是，格式有点尴尬。我收到的数据包括年、月、该月的天数、行中气候数据的类型，然后是连续的 93 列，每列代表一个每日值，后面有一个相关的标志（所以每月的每一天有 3 个术语、一个值和 2 个标志）。尽管月份的长度各不相同，但较短的月份已在最后几列中填充了空值。出于处理和建模的目的，我想要的是一个电子表格/.csv 文件，其列如下：

年、月、日（即数字 1 到 31），然后是代表气候数据类型的五列（降水量）。、雪、雪水、tmax、tmin）。

如果我也可以获得具有适当标志值的列，那就太好了，但这不是优先事项。因此，我编写了下面的代码，将行解压到列表中（可能效率很低，但我是新手），根据行中的位置表示年、月、气候变量类型、变量值、flag1 和 flag2 （对应于一天，1 到 31）：

import matplotlib.mlab as mlab
from matplotlib.pyplot import figure, show
import numpy as np

import scipy
import csv

durham='C:\\Users\\LocalUser\\Desktop\\Drought Data\\My_Met_Data\\USHCN\\Durham.csv'

txt='met'
station='Durham'

output=station+"_"+txt+"_"+"new"+".csv"

infile=open(durham,'r')
outfile=open(output,'w')
writer=csv.writer(outfile)

yr=[]; mon=[]; var=[]; unit=[]; flag1=   [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
flag2=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
value=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
valu=[]; flg1=[]; flg2=[]; prcp=[]; snow=[]; snwd=[]; tmax=[]; tmin=[]; row=[]
for line in infile:
  stationid, variable, units, year, month, days, flag1[0], value[0], flag2[0], flag1[1], value[1], flag2[1], flag1[2], value[2], flag2[2],\
  flag1[3], value[3], flag2[3], flag1[4], value[4], flag2[4], flag1[5], value[5], flag2[5], flag1[6], value[6], flag2[6],\
  flag1[7], value[7], flag2[7], flag1[8], value[8], flag2[8] ,flag1[9], value[9], flag2[9], flag1[10], value[10], flag2[10],\
  flag1[11], value[11], flag2[11], flag1[12], value[12], flag2[12], flag1[13], value[13], flag2[13], flag1[14], value[14], flag2[14],\
  flag1[15], value[15], flag2[15], flag1[16], value[16], flag2[16], flag1[17], value[17], flag2[17], flag1[18], value[18], flag2[18],\
  flag1[19], value[19], flag2[19], flag1[20], value[20], flag2[20], flag1[21], value[21], flag2[21], flag1[22], value[22], flag2[22],\
  flag1[23], value[23], flag2[23], flag1[24], value[24], flag2[24], flag1[25], value[25], flag2[25], flag1[26], value[26], flag2[26],\
  flag1[27], value[27], flag2[27], flag1[28], value[28], flag2[28], flag1[29], value[29], flag2[29], flag1[30], value[30], flag2[30]=line.split(',')
  yr=[int(year)]
  mon=[int(month)]
  var=variable
  unit=units

  for yr in range(1926, 2003):
     for mon in range(1,13):
        if var=='PRCP':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            prcp.append(valu[j])

        elif var=='SNOW':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            snow.append(valu[j])

        elif var=='SNWD':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            snwd.append(valu[j])

        elif var=='TMAX':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            tmax.append(valu[j])

        elif var=='TMIN':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            tmin.append(valu[j])

            row=[yr, mon, j+1, prcp[j], snow[j], snwd[j], tmax[j], tmin[j]]
            writer.writerow(row)


infile.close()
outfile.close()

现在，撇开运行此程序时出现内存错误不谈，如果我去掉一些气候变量，那么我会成功地在我想要的格式。问题是，每个月、每个年 (1926-2002) 都会报告相同的气候数据值，即 1926 年 1 月的数据。代码从适当的变量调用数据适当的一天，但月复一月重复相同的数据。我不确定我在哪里出了问题，但任何建议/帮助将不胜感激。

原文

I have spread sheets of climate data for which, essentially, I need to transpose parts of rows into columns and vice versa. Unfortunately, the format is somewhat awkward. The data came to me with columns for year, month, number of days in the month, the type of climate data in the row, and then a successive 93 columns, each representing a daily value, succeeded and preceded by an associated flag (so 3 terms, a value and 2 flags, for each day of the month). Although months vary in length, the shorter months have been filled out with null values in the last few columns. What I want, for processing and modeling purposes, is a spreadsheet/.csv file with columns as follows:

year, month, day of month (i.e. a number 1 to 31), and then five columns representing the type of climate data (precip, snow, snow water, tmax, tmin).

If I could get columns with the appropriate flag values as well, that would be great, but it's not a priority. So, I've written the code below to unpack rows into lists (probably very inefficiently, but I'm new at this) representing year, month, type of climate variable, variable value, flag1 and flag2 based on the location in the row (corresponding to a day, 1 to 31):

import matplotlib.mlab as mlab
from matplotlib.pyplot import figure, show
import numpy as np

import scipy
import csv

durham='C:\\Users\\LocalUser\\Desktop\\Drought Data\\My_Met_Data\\USHCN\\Durham.csv'

txt='met'
station='Durham'

output=station+"_"+txt+"_"+"new"+".csv"

infile=open(durham,'r')
outfile=open(output,'w')
writer=csv.writer(outfile)

yr=[]; mon=[]; var=[]; unit=[]; flag1=   [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
flag2=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
value=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
valu=[]; flg1=[]; flg2=[]; prcp=[]; snow=[]; snwd=[]; tmax=[]; tmin=[]; row=[]
for line in infile:
  stationid, variable, units, year, month, days, flag1[0], value[0], flag2[0], flag1[1], value[1], flag2[1], flag1[2], value[2], flag2[2],\
  flag1[3], value[3], flag2[3], flag1[4], value[4], flag2[4], flag1[5], value[5], flag2[5], flag1[6], value[6], flag2[6],\
  flag1[7], value[7], flag2[7], flag1[8], value[8], flag2[8] ,flag1[9], value[9], flag2[9], flag1[10], value[10], flag2[10],\
  flag1[11], value[11], flag2[11], flag1[12], value[12], flag2[12], flag1[13], value[13], flag2[13], flag1[14], value[14], flag2[14],\
  flag1[15], value[15], flag2[15], flag1[16], value[16], flag2[16], flag1[17], value[17], flag2[17], flag1[18], value[18], flag2[18],\
  flag1[19], value[19], flag2[19], flag1[20], value[20], flag2[20], flag1[21], value[21], flag2[21], flag1[22], value[22], flag2[22],\
  flag1[23], value[23], flag2[23], flag1[24], value[24], flag2[24], flag1[25], value[25], flag2[25], flag1[26], value[26], flag2[26],\
  flag1[27], value[27], flag2[27], flag1[28], value[28], flag2[28], flag1[29], value[29], flag2[29], flag1[30], value[30], flag2[30]=line.split(',')
  yr=[int(year)]
  mon=[int(month)]
  var=variable
  unit=units

  for yr in range(1926, 2003):
     for mon in range(1,13):
        if var=='PRCP':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            prcp.append(valu[j])

        elif var=='SNOW':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            snow.append(valu[j])

        elif var=='SNWD':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            snwd.append(valu[j])

        elif var=='TMAX':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            tmax.append(valu[j])

        elif var=='TMIN':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            tmin.append(valu[j])

            row=[yr, mon, j+1, prcp[j], snow[j], snwd[j], tmax[j], tmin[j]]
            writer.writerow(row)


infile.close()
outfile.close()

Now, leaving aside that I get a memory error when I run this, if I take away a few of the climate variables, then I get a do successfully get a .csv file in the format that I want. The problem is, that every single month, in every single year (1926-2002), reports the same climate data values-- that is, the data for the month of january 1926. The code is calling the data from the appropriate variable for the appropriate day, but repeating the same data month after month. I'm not sure where I've gone wrong with this, but any suggestions/help would be much appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北风几吹夏 2024-12-17 12:44:00

每个 for 都会形成一个循环；您的代码有两个彼此内部的年份循环 - 它处理文件第一行的所有年份，然后是第二行的所有年份，依此类推。这是您遇到的错误，但如果你以某种方式去修复它，很快就会出现另一个。

现在，请在图书馆借一本好的 Python 书籍，花一些时间阅读和做练习。或者参加课程。找一位知识渊博的朋友来帮你检查你的代码。 StackOverflow 可能会帮助您解决特定问题，但遗憾的是它无法教您概念。你走错了路；如果你继续这样下去，只会有麻烦。你应该回去更好地学习基础知识，从长远来看，这会让事情变得更容易。

计算机可以为您完成繁琐且重复的任务。您不应该永远键入大量数字或编号变量。
熟悉列表（以及列表列表）和 range 函数。

对变量使用描述性名称，而不是缩写。这是Python，我们喜欢事情清晰。并将每个语句放在自己的行上；所有这些分号看起来都很丑。如果您想共享代码、获得帮助或只是整理自己的想法，这些事情就很重要。

研究 csv 模块的文档并使用其阅读器，而不仅仅是作者。

熟悉列表切片，特别是 line[1::3] 类型。

了解文件的 with 语句。

如果您在每个 if/elif 中执行相同的操作，请将其从那里移至公共位置。

祝你有一天成为一名优秀的程序员:)