选择从 .txt 到 .csv 的数据列

发布于 2024-09-13 15:49:07 字数 569 浏览 3 评论 0原文

我对 python 很陌生(更像是我过去一周才使用它)。我的任务看似很简单,但我却很挣扎。我有几个大型文本文件,每个文件中都有来自不同区域的许多数据列。我想从一个文本文件中获取数据并仅提取我需要的数据列并将其写入新的 .csv 文件中。目前它们是制表符分隔的,但我希望输出以逗号分隔。

我有:

#YY  MM DD hh mm WVHT  SwH  SwP  WWH  WWP SwD WWD   MWD
#yr  mo dy hr mn    m    m  sec    m  sec  -  degT  degT
2010 07 16 17 00  0.5  0.5  5.0  0.3  4.0 SSE SSE   163
2010 07 16 16 00  0.6  0.5  5.9  0.3  3.8 SSE SSE   165
2010 07 16 15 00  0.5  0.5  6.7  0.3  3.6 SSE  SW   151
2010 07 16 14 00  0.6  0.5  5.6  0.3  3.8 SSE SSE   153

我只想保留:DD、WVHT 和 MWD

提前致谢, 哈珀

I am quite new to python (well more like I've only been using it for the past week). My task seems fairly simple, yet I am struggling. I have several large text files each with many columns of data in them from different regions. I would like to take the data from one text file and extract only the columns of data that I need and write it into a new .csv file. Currently they are tab delimitated but I would like the output to be comma delimitated.

I have:

#YY  MM DD hh mm WVHT  SwH  SwP  WWH  WWP SwD WWD   MWD
#yr  mo dy hr mn    m    m  sec    m  sec  -  degT  degT
2010 07 16 17 00  0.5  0.5  5.0  0.3  4.0 SSE SSE   163
2010 07 16 16 00  0.6  0.5  5.9  0.3  3.8 SSE SSE   165
2010 07 16 15 00  0.5  0.5  6.7  0.3  3.6 SSE  SW   151
2010 07 16 14 00  0.6  0.5  5.6  0.3  3.8 SSE SSE   153

I only want to keep: DD, WVHT, and MWD

Thanks in advance,
Harper

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

诗酒趁年少 2024-09-20 15:49:07

您需要将这个问题的格式设置得更清晰一些。 :)

看一下 python csv 模块,用于从现在存储的数据写入 csv 文件: http://docs.python.org/library/csv.html

编辑:这里有一些更好、更简洁的代码,基于注释+ csv 模块:

import csv

csv_out = csv.writer(open('out.csv', 'w'), delimiter=',')

f = open('myfile.txt')
for line in f:
  vals = line.split('\t')
  # DD, WVHT, MWD
  csv_out.writerow(vals[2], vals[5], vals[12])
f.close()

You need to format this question a little more legibly. :)

Take a look at the python csv module for writing your csv files from your now-stored data: http://docs.python.org/library/csv.html

EDIT: Here's some better, more concise code, based on comments + csv module:

import csv

csv_out = csv.writer(open('out.csv', 'w'), delimiter=',')

f = open('myfile.txt')
for line in f:
  vals = line.split('\t')
  # DD, WVHT, MWD
  csv_out.writerow(vals[2], vals[5], vals[12])
f.close()
恬淡成诗 2024-09-20 15:49:07

实现此目的的一种简单方法是使用标准库中的 csv 模块。

首先,创建一个 CSVReader 和一个 CSVWriter 对象:

>>> import csv
>>> csv_in = csv.reader(open('eggs.txt', 'rb'), delimiter='\t')
>>> csv_out = csv.writer(open('spam.csv', 'w'), delimiter=',')

然后将所需的信息放入新的 csv 文件中。

>>> for line in csv_in:
...    csv_out.writerow(line[2], line[5], line[-1])

One easy way to achieve this is by using the csv module in the standard library.

First, create a CSVReader and a CSVWriter object:

>>> import csv
>>> csv_in = csv.reader(open('eggs.txt', 'rb'), delimiter='\t')
>>> csv_out = csv.writer(open('spam.csv', 'w'), delimiter=',')

Then just put the information you want into the new csv file.

>>> for line in csv_in:
...    csv_out.writerow(line[2], line[5], line[-1])
小瓶盖 2024-09-20 15:49:07

问题之一似乎是所有数据都在一行上:

2010 07 16 17 00 0.5 0.5 5.0 0.3 4.0 上证 上证 163 2010 07 16 16 00 0.6 0.5 5.9 0.3 3.8 上证 上证 165 2010 07 16 15 00 0.5 0.5 6.7 0 .3 3.6 上证SW 151 2010 07 16 14 00 0.6 0.5 5.6 0.3 3.8 上证上证153

如果是这种情况,您将需要拆分输入行。如果你知道你的数据是有规律的,那么你可能会在 2010 年偷偷摸摸地分裂:

f = open('data.txt')
for line in f:
    for portion in line.split(' 2010') #space is significant
    # write to csv

如果你的数据跨越多年,那么 Python itertools 模块会非常方便。我经常发现自己使用石斑鱼食谱。

import csv
from itertools import izip_longest

csv_writer = csv.writer(open('eggs.csv', 'wb'), delimiter=',')

def grouper(n, iterable, fillvalue=None):
  """
  >>> grouper(3, 'ABCDEFG', 'x')
  ['ABC', 'DEF', 'Gxx']
  """
  args = [iter(iterable)] * n
  return izip_longest(fillvalue=fillvalue, *args)

f = open('spam.txt')
for line in grouper(22, f.split('\t')): 
    csv_writer.writerow(line[2], line[12])

One of the problems appears to be that all of your data is on a single line:

2010 07 16 17 00 0.5 0.5 5.0 0.3 4.0 SSE SSE 163 2010 07 16 16 00 0.6 0.5 5.9 0.3 3.8 SSE SSE 165 2010 07 16 15 00 0.5 0.5 6.7 0.3 3.6 SSE SW 151 2010 07 16 14 00 0.6 0.5 5.6 0.3 3.8 SSE SSE 153

If this is the case, you will need to split the input line up. If you know your data are regular, then you could be sneaky and split on the 2010:

f = open('data.txt')
for line in f:
    for portion in line.split(' 2010') #space is significant
    # write to csv

If your data span multiple years, then Python itertools module can be very handy. I often find myself using the grouper recipe.

import csv
from itertools import izip_longest

csv_writer = csv.writer(open('eggs.csv', 'wb'), delimiter=',')

def grouper(n, iterable, fillvalue=None):
  """
  >>> grouper(3, 'ABCDEFG', 'x')
  ['ABC', 'DEF', 'Gxx']
  """
  args = [iter(iterable)] * n
  return izip_longest(fillvalue=fillvalue, *args)

f = open('spam.txt')
for line in grouper(22, f.split('\t')): 
    csv_writer.writerow(line[2], line[12])
夜巴黎 2024-09-20 15:49:07

这是一个基本的事情,因为它是基本需求,并且由于 csv 没有广泛使用,所以这里有一个没有 csv 模块的片段。

DD = 2
WVHT = 5
MWD = 12
INPUT = "input.txt"
OUTPUT = "output.csv"

from os import linesep

def main():
    t = []
    fi = open(INPUT)
    fo = open(OUTPUT, "w")
    try:
        for line in fi.xreadlines():
            line = line.split()
            t.append("%s,%s,%s" %(line[DD], line[WVHT], line[MWD]))
        fo.writelines(linesep.join(t))
    finally:
        fi.close()
        fo.close()

if __name__ == "__main__":
    main()

Here is a basic thing since it is a basic need and since there is no extensive use of csv, here's a snippet without the csv module.

DD = 2
WVHT = 5
MWD = 12
INPUT = "input.txt"
OUTPUT = "output.csv"

from os import linesep

def main():
    t = []
    fi = open(INPUT)
    fo = open(OUTPUT, "w")
    try:
        for line in fi.xreadlines():
            line = line.split()
            t.append("%s,%s,%s" %(line[DD], line[WVHT], line[MWD]))
        fo.writelines(linesep.join(t))
    finally:
        fi.close()
        fo.close()

if __name__ == "__main__":
    main()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文