通过从TXT文件过滤数据和不必要的信息来创建整洁的CSV文件
我有一个分配的分配来导出一个仅存在标题和数据的整洁CSV文件,所有其他数据必须被过滤掉。大约有500多个文本文件。
每个文件必须是一个单独的CSV文件,该格式必须为“年度月(Original_file_name)”。
一个例子是: 原始文件:PM990902.b17
CSV文件:1999-09-02(PM990902.B17).CSV
我已经有用于过滤数据的代码:
*
import pandas as pd
import numpy as np
import glob
pred = lambda x: x in np.arange(0, 192, 1)
inval = [99999.9, 999.0, 999.9900, 999.9]
files = glob.glob('C:\\Users\Lenovo\Desktop\Python\Files\*')
for file in files:
df = pd.read_csv(file, header = 0, delim_whitespace=True, skiprows=pred,
engine='python', na_values=inval)
df = df[1:]
df.to_csv('Name of the new file.csv', index=False)
我仍然不知道如何做文件的新名称(日期),这实际上是我的问题。
这就是文件中的文件的样子:
*AAAAAAAAAAAAAAAAAAAAAAAAAA zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 05-JAN-2000 12:21:0005-JAN-2000 14:00:300102
160 2160
1.00 1.0 1.00 1.00 1.0000 1.0 1.0 1.0 1.0000 1.0000 1.00 1.000 1.0 1.0 1.0000 1.0000
9999.90 99999.0 999.90 999.00 99.9900 999.0 999.9 99999.9 999.9900 999.9900 999.90 99.990 999.9 999.9 99.9900 99.9900
Pressure [hPa]
Geopotential height [gpm]
Temperature [K]
Relative humidity [%]
Ozone partial pressure [mPa]
Horizontal wind direction [decimal degrees]
Horizontal wind speed [m/s]
GPS geometric height [m]
GPS longitude [decimal degrees E]
GPS latitude [decimal degrees N]
Internal temperature [K]
Ozone raw current [microA]
Battery voltage [V]
Pump current [mA]
Ozone mixing ratio per volume [ppm]
Ozone partial pressure uncertainty estimate [mPa]*
我无法连接整个文本文件,但这是每个文本文件的开始的一个示例。
那么,如何从此行中获取文件名的所需日期呢?
I have an assignment to export neat CSV files where only the headers and data are present, all other data must be filtered out. There are about 500+ text files.
Each file must be a separate CSV file, the format must be "YEAR-MONTH-DAY (ORIGINAL_FILE_NAME)".
An example of this is:
Original file: pm990902.b17
CSV file: 1999-09-02 (pm990902.b17).csv
I already have code for filtering the data:
*
import pandas as pd
import numpy as np
import glob
pred = lambda x: x in np.arange(0, 192, 1)
inval = [99999.9, 999.0, 999.9900, 999.9]
files = glob.glob('C:\\Users\Lenovo\Desktop\Python\Files\*')
for file in files:
df = pd.read_csv(file, header = 0, delim_whitespace=True, skiprows=pred,
engine='python', na_values=inval)
df = df[1:]
df.to_csv('Name of the new file.csv', index=False)
I still can't figure out how to do the new name of the file (the date) which is actually the problem for me.
This is what the file looks like with the date in the first line:
*AAAAAAAAAAAAAAAAAAAAAAAAAA zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 05-JAN-2000 12:21:0005-JAN-2000 14:00:300102
160 2160
1.00 1.0 1.00 1.00 1.0000 1.0 1.0 1.0 1.0000 1.0000 1.00 1.000 1.0 1.0 1.0000 1.0000
9999.90 99999.0 999.90 999.00 99.9900 999.0 999.9 99999.9 999.9900 999.9900 999.90 99.990 999.9 999.9 99.9900 99.9900
Pressure [hPa]
Geopotential height [gpm]
Temperature [K]
Relative humidity [%]
Ozone partial pressure [mPa]
Horizontal wind direction [decimal degrees]
Horizontal wind speed [m/s]
GPS geometric height [m]
GPS longitude [decimal degrees E]
GPS latitude [decimal degrees N]
Internal temperature [K]
Ozone raw current [microA]
Battery voltage [V]
Pump current [mA]
Ozone mixing ratio per volume [ppm]
Ozone partial pressure uncertainty estimate [mPa]*
I can't attach the whole text file, but this is an example of the beginning of every text file.
So how can I get the desired date for the file name out of this line?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果输入文件始终具有相同的格式,则始终在生产线末尾的日期/时间元素,您可以将行分开,并且只需从末尾拿出第三个元素即可。
您可以根据 w3schools
>
将其应用于您的逻辑,您需要将下面的行更改为我的代码示例:
您需要
import os
在我使用os.path
时和os.sep
获取结果文件名。请注意,当我使用f-strings时,这需要python 3.6+。
另请注意,您需要打开原始文件并实际读取文件的第一行。 这将起作用。
If the input files always have the same format, with the date/time elements always at the end of the line, you can split the line, and just take the third element from the end.
You can do this with negative indexing, as per w3schools
output
As for applying this to your logic, you'll need to change the line below to my code example further down:
You need to
import os
as I useos.path
andos.sep
to get the resulting filename.Note that this requires Python 3.6+ as I'm using f-strings.
Also note that you need to open the original files and actually read the first line of the file. This will work.