使用 type() 信息来转换存储为字符串的值

发布于 2024-12-04 14:24:11 字数 436 浏览 0 评论 0 原文

在我的应用程序中,我生成了许多值(三列,类型为 int、str 和 datetime,请参见下面的示例),这些值作为逗号分隔的字符串存储在平面文件中。此外,我存储一个包含值类型的文件(见下文)。现在,我如何使用此信息将平面文件中的值转换为 Python 中的正确数据类型?可以吗还是我需要做一些其他的事情?

数据文件:

#id,value,date
1,a,2011-09-13 15:00:00
2,b,2011-09-13 15:10:00
3,c,2011-09-13 15:20:00
4,d,2011-09-13 15:30:00

类型文件:

id,<type 'int'>
value,<type 'str'>
date,<type 'datetime.datetime'>

In my application I have generated a number of values (three columns, of type int, str and datetime, see example below) and these values are stored in a flat file as comma-separated strings. Furthermore, I store a file containing the type of the values (see below). Now, how can I use this information to cast my values from the flat file to the correct data type in Python? Is is possible or do I need to do some other stuff?

Data file:

#id,value,date
1,a,2011-09-13 15:00:00
2,b,2011-09-13 15:10:00
3,c,2011-09-13 15:20:00
4,d,2011-09-13 15:30:00

Type file:

id,<type 'int'>
value,<type 'str'>
date,<type 'datetime.datetime'>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

滥情哥ㄟ 2024-12-11 14:24:11

据我了解,您已经解析了文件,现在只需要获得正确的类型。假设 id_type_value 是包含文件中的值的三个字符串。 (请注意,type_ 应包含 'int' — 例如 —,而不是 ''

def convert(value, type_):
    import importlib
    try:
        # Check if it's a builtin type
        module = importlib.import_module('__builtin__')
        cls = getattr(module, type_)
    except AttributeError:
        # if not, separate module and class
        module, type_ = type_.rsplit(".", 1)
        module = importlib.import_module(module)
        cls = getattr(module, type_)
    return cls(value)

然后您可以像这样使用它......:

value = convert("5", "int")

不幸的是,对于日期时间,这不起作用,因为它不能简单地通过其字符串表示形式进行初始化。

As I understand, you already parsed the file, you now just need to get the right type. So let's say id_, type_ and value are three strings that contain the values in the file. (Note, type_ should contain 'int' — for example —, not '<type 'int'>'.

def convert(value, type_):
    import importlib
    try:
        # Check if it's a builtin type
        module = importlib.import_module('__builtin__')
        cls = getattr(module, type_)
    except AttributeError:
        # if not, separate module and class
        module, type_ = type_.rsplit(".", 1)
        module = importlib.import_module(module)
        cls = getattr(module, type_)
    return cls(value)

Then you can use it like..:

value = convert("5", "int")

Unfortunately for datetime this doesnt work though, as it can not be simply initialized by its string representation.

梦途 2024-12-11 14:24:11

你的类型文件可以更简单:

id=int
value=str
date=datetime.datetime

然后在你的主程序中你可以

import datetime

def convert_datetime(text):
    return datetime.datetime.strptime(text, "%Y-%m-%d %H:%M:%S")

data_types = {'int':int, 'str':str, 'datetime.datetime':convert_datetime}
fields = {}

for line in open('example_types.txt').readlines():
    key, val = line.strip().split('=')
    fields[key] = val

data_file = open('actual_data.txt')
field_info = data_file.readline().strip('#\n ').split(',')
values = [] #store it all here for now

for line in data_file.readlines():
    row = []
    for i, element in enumerate(line.strip().split(',')):
        element_type = fields[field_info[i]] # will get 'int', 'str', or 'datetime'
        convert = data_types[element_type]
        row.append(convert(element))
    values.append(row)

# to show it working...
for row in values:
    print row

Your types file can be simpler:

id=int
value=str
date=datetime.datetime

Then in your main program you can

import datetime

def convert_datetime(text):
    return datetime.datetime.strptime(text, "%Y-%m-%d %H:%M:%S")

data_types = {'int':int, 'str':str, 'datetime.datetime':convert_datetime}
fields = {}

for line in open('example_types.txt').readlines():
    key, val = line.strip().split('=')
    fields[key] = val

data_file = open('actual_data.txt')
field_info = data_file.readline().strip('#\n ').split(',')
values = [] #store it all here for now

for line in data_file.readlines():
    row = []
    for i, element in enumerate(line.strip().split(',')):
        element_type = fields[field_info[i]] # will get 'int', 'str', or 'datetime'
        convert = data_types[element_type]
        row.append(convert(element))
    values.append(row)

# to show it working...
for row in values:
    print row
彩虹直至黑白 2024-12-11 14:24:11

请按照下列步骤操作:

  1. 逐行读取文件,对于每一行执行以下步骤
  2. 使用 split(), 作为分隔符分割行。
  3. 将列表的第一个元素(来自步骤 2)转换为 int。将第二个元素保留为字符串。解析第三个值(例如使用切片)并创建一个相同的datetime对象。

Follow these steps:

  1. Read the file line by line, for each line do the following steps
  2. Split the line using split() with , as the separator.
  3. Cast the first element of list (from step 2) as an int. Keep the second element as string. Parse the third value (e.g. using slices) and make a datetime object of the same.
青瓷清茶倾城歌 2024-12-11 14:24:11

我必须在最近的一个程序中处理类似的情况,该程序必须转换许多字段。我使用了一个元组列表,其中元组的一个元素是要使用的转换函数。有时是 intfloat;有时它是一个简单的lambda;有时它是其他地方定义的函数的名称。

I had to deal with a similar situation in a recent program, that had to convert many fields. I used a list of tuples, where one element of the tuples was the conversion function to use. Sometimes it was int or float; sometimes it was a simple lambda; and sometimes it was the name of a function defined elsewhere.

ゝ偶尔ゞ 2024-12-11 14:24:11

不要使用单独的“类型”文件,而是使用 (id, value, date) 的元组列表和 pickle 它。

或者你必须解决将字符串到类型转换器存储为文本的问题(在你的“type”文件中),这可能是一个有趣的问题,但如果你只是想完成一些事情,使用 picklecPickle

Instead of having a separate "type" file, take your list of tuples of (id, value, date) and just pickle it.

Or you'll have to solve the problem of storing your string-to-type converters as text (in your "type" file), which might be a fun problem to solve, but if you're just trying to get something done, go with pickle or cPickle

意中人 2024-12-11 14:24:11

首先,你不能编写一个“通用”或“智能”转换来神奇地处理任何事情。

其次,试图用代码以外的任何东西来总结字符串到数据的转换似乎从来都没有很好的效果。因此,不必编写命名转换的字符串,只需编写转换即可。

最后,尝试用特定于域的语言编写配置文件是愚蠢的。只需编写Python代码即可。这并不比尝试解析某些配置文件复杂多少。

有可能吗?还是我需要做一些其他的事情?

不要浪费时间尝试创建一个不仅仅是 Python 的“类型文件”。这没有帮助。将转换编写为 Python 函数会更简单。您可以导入该函数,就像它是您的“类型文件”一样。

import datetime

def convert( row ):
   return dict(
       id= int(row['id']),
       value= str(row['value']),
       date= datetime.datetime.strptime(row['date],"%Y-%m-%d %H:%M:%S"),
   )

这就是“类型文件”中的所有内容

现在您可以像这样读取(并处理)您的输入。

 from type_file import convert
 import csv

 with open( "date", "rb" ) as source:
     rdr= csv.DictReader( source )
     for row in rdr:
         useful_row= convert( row )

在很多情况下,我在运行之前不知道列数或数据类型

这意味着你注定要失败。

您必须对文件内容有实际的定义,否则无法进行任何处理。

"id","value","other value"
1,23507,3

您不知道“23507”是否应该是整数、字符串、邮政编码或浮点数(省略了句点)、持续时间(以天或秒为单位)或其他更复杂的东西。你无法希望,也无法猜测。

得到定义后,需要根据实际的定义编写显式转换函数。

编写转换后,您需要 (a) 使用简单的单元测试来测试转换,以及 (b) 测试数据以确保它确实可以转换。

然后就可以处理该文件了。

First, you cannot write a "universal" or "smart" conversion that magically handles anything.

Second, trying to summarize a string-to-data conversion in anything other than code never seems to work out well. So rather than write a string that names the conversion, just write the conversion.

Finally, trying to write a configuration file in a domain-specific language is silly. Just write Python code. It's not much more complicated than trying to parse some configuration file.

Is is possible or do i need to do some other stuff?

Don't waste time trying to create a "type file" that's not simply Python. It doesn't help. It is simpler to write the conversion as a Python function. You can import that function as if it was your "type file".

import datetime

def convert( row ):
   return dict(
       id= int(row['id']),
       value= str(row['value']),
       date= datetime.datetime.strptime(row['date],"%Y-%m-%d %H:%M:%S"),
   )

That's all you have in your "type file"

Now you can read (and process) your input like this.

 from type_file import convert
 import csv

 with open( "date", "rb" ) as source:
     rdr= csv.DictReader( source )
     for row in rdr:
         useful_row= convert( row )

in many cases i do not know the number of columns or the data type before runtime

This means you are doomed.

You must have an actual definition the file content or you cannot do any processing.

"id","value","other value"
1,23507,3

You don't know if "23507" should be an integer, a string, a postal code, or a floating-point (which omitted the period), a duration (in days or seconds) or some other more complex thing. You can't hope and you can't guess.

After getting a definition, you need to write an explicit conversion function based on the actual definition.

After writing the conversion, you need to (a) test the conversion with a simple unit test, and (b) test the data to be sure it really converts.

Then you can process the file.

巴黎盛开的樱花 2024-12-11 14:24:11

您可能想查看 xlrd 模块。如果您可以将数据加载到 Excel 中,并且它知道与每列关联的类型,则 xlrd 会在您读取 Excel 文件时为您提供类型。当然,如果数据以 csv 形式提供给您,则必须有人进入 excel 文件并手动更改列类型。

不确定这是否能让您一路到达您想去的地方,但这可能会有所帮助

You might want to look at the xlrd module. If you can load your data into excel, and it knows what type is associated with each column, xlrd will give you the type when you read the excel file. Of course, if the data is given to you as a csv then someone would have to go into the excel file and change the column types by hand.

Not sure this gets you all the way to where you want to go, but it might help

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文