读取带有不正确标记的字典的文件
我有一个包含字典列表的文件,其中大多数都错误地用引号标记。示例如下:
{game:Available,player:Available,location:"Chelsea, London, England",time:Available}
{"game":"Available","player":"Available","location":"Chelsea, London, England","time":"Available","date":"Available"}
正如您所看到的,字典之间的键也可能不同。
我尝试使用 json 模块或 csv 模块的 DictReader 来读取该内容,但每次我都会遇到困难,因为“”始终出现在位置值中,但并不总是出现在其他键或值中。到目前为止,我看到两种可能性:
- 用“;”替换“,”在位置值中,并删除所有引号。
- 为每个值和键(位置除外)添加引号。
PS:我的最后一点是能够格式化所有这些字典来创建一个 SQL 表,其中的列是所有字典的并集,每一行都是我的字典之一,当缺少值时为空白。
I have a file with a list of dictionaries, most of them being unproperly marked with quotations marks. An example is the following:
{game:Available,player:Available,location:"Chelsea, London, England",time:Available}
{"game":"Available","player":"Available","location":"Chelsea, London, England","time":"Available","date":"Available"}
As you can see, the keys can also differ from a dictionary to another.
I tried to read that with the json module, or the DictReader of the csv module, but each time I have difficulties due to the "" always present in the location value, but not always for the other keys or values. Up until this point I see two possibilities:
- Replacing the "," by ";" in the location value, and getting rid of all the quotes.
- Adding quotes for every value and key, except the location one.
PS: My final point being to be able to format all these dictionaries to create a SQL table with the columns being the union of all the dictionaries, and each row being one of my dictionary, with blank when there are missing values.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我认为这是一个非常完整的代码。
首先,我创建了以下文件:
.
。
那么下面的代码分两个阶段处理文件的内容:
首先,遍历内容,收集所有字典中的所有介入键
扣除字典posis,为每个键给出其对应值必须在行中占据的位置
其次,由于对文件的另一次运行,行被一个接一个地构建并收集在一个列表行中
顺便说一下,请注意与键 关联的值的条件位置或“位置”是受到尊重。
。
结果
。
。
我把上面的代码写成一个数GB的巨大文件,无法完全读取:对这样一个非常大的文件的处理必须逐块完成。这就是为什么有说明:
但是,显然,如果文件不太大,因此可以一次性读取,则可以简化代码:
Here's a very complete code, I think.
First I created the following file:
.
.
Then the following code treats the content of the file in two phases:
first, running through the content, all the intervening keys in all the dictionaries are collected
a dictionary posis is deducted, that gives for each key the place that its corresponding value must occupy in a row
secondly, thanks to another run through the file, the rows are build one after the other and collected in a list rows
By the way, note that the condition on the value associated with key location or "location" is respected.
.
result
.
.
I wrote the above code thinking to an enormous file of several GB that couldn't be read entirely: the treatment of such a very big file must be done chunk after chunk. That's why there are instructions:
But, evidently, if the file isn't too big, hence readable in one shot, the code can be simplified:
如果它比您作为示例给出的更复杂,或者如果它必须更快,您可能应该研究 pyparsing。
否则你可以写一些更古怪的东西,如下所示:
If it's more complicated then what you have given as examples, or if it has to be faster, you should probably look into pyparsing.
Otherwise you could write something more hacky like this:
希望这个 pyparsing 解决方案随着时间的推移更容易遵循和维护:
打印:
Hopefully this pyparsing solution is easier to follow and maintain over time:
Prints: