读取带有不正确标记的字典的文件

发布于 2024-12-04 01:01:59 字数 553 浏览 3 评论 0原文

我有一个包含字典列表的文件，其中大多数都错误地用引号标记。示例如下：

{game:Available,player:Available,location:"Chelsea, London, England",time:Available}
{"game":"Available","player":"Available","location":"Chelsea, London, England","time":"Available","date":"Available"}

正如您所看到的，字典之间的键也可能不同。

我尝试使用 json 模块或 csv 模块的 DictReader 来读取该内容，但每次我都会遇到困难，因为“”始终出现在位置值中，但并不总是出现在其他键或值中。到目前为止，我看到两种可能性：

用“;”替换“，”在位置值中，并删除所有引号。
为每个值和键（位置除外）添加引号。

PS：我的最后一点是能够格式化所有这些字典来创建一个 SQL 表，其中的列是所有字典的并集，每一行都是我的字典之一，当缺少值时为空白。

原文

I have a file with a list of dictionaries, most of them being unproperly marked with quotations marks. An example is the following:

{game:Available,player:Available,location:"Chelsea, London, England",time:Available}
{"game":"Available","player":"Available","location":"Chelsea, London, England","time":"Available","date":"Available"}

As you can see, the keys can also differ from a dictionary to another.

I tried to read that with the json module, or the DictReader of the csv module, but each time I have difficulties due to the "" always present in the location value, but not always for the other keys or values. Up until this point I see two possibilities:

Replacing the "," by ";" in the location value, and getting rid of all the quotes.
Adding quotes for every value and key, except the location one.

PS: My final point being to be able to format all these dictionaries to create a SQL table with the columns being the union of all the dictionaries, and each row being one of my dictionary, with blank when there are missing values.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

分開簡單 2024-12-11 01:01:59

我认为这是一个非常完整的代码。

首先，我创建了以下文件：

{surprise : "perturbating at start  ", game:Available Universal Dices Game,
    player:FTROE875574,location
:"Lakeview School, Kingsmere Boulevard, Saskatoon, Saskatchewan , Canada",time:15h18}

{"game":"Available","   player":"LOI4531",
"location":  "Perth, Australia","time":"08h13","date":"Available"}

{"game":Available,player:PLLI874,location:"Chelsea, London, England",time:20h35}

{special:"midnight happening",game:"Available","player":YTR44,
"location":"Paris, France","time":"02h24"
,
"date":"Available"}

{game:Available,surprise:"  hretyuuhuhu  ",player:FT875,location
:,"time":11h22}

{"game":"Available","player":"LOI4531","location":
"Damas,Syria","time":"unavailable","date":"Available"}

{"surprise   " : GARAMANANATALA Tower ,  game:Available Dices,player  :
  PuLuLu874,location:"  Westminster, London, England  ",time:20h01}

{"game":"Available",special:"overnight",   "player":YTR44,"location":
"Madrid, Spain"    ,     "time":
"12h33",
date:"Available"
}

。

那么下面的代码分两个阶段处理文件的内容：

首先，遍历内容，收集所有字典中的所有介入键
扣除字典posis，为每个键给出其对应值必须在行中占据的位置
其次，由于对文件的另一次运行，行被一个接一个地构建并收集在一个列表行中

顺便说一下，请注意与键 关联的值的条件位置或“位置”是受到尊重。

。

import re

dicreg = re.compile('(?<=\{)[^}]*}')

kvregx = re.compile('[ \r\n]*'
                    '(" *)?((location)|[^:]+?)(?(1) *")'
                    '[ \r\n]*'
                    ':'
                    '[ \r\n]*'
                    '(?(3)|(" *)?)([^:]*?)(?(4) *")'
                    '[ \r\n]*(?:,(?=[^,]+?:)|\})')


checking_dict = {}
checking_list = []

filename = 'zzz.txt'

with open(filename) as f:

    ######## First part: to gather all the keys in all the dictionaries

    prec,chunk = '','go'
    ecr = []
    while chunk:
        chunk = f.read(120)
        ss = ''.join((prec,chunk))
        ecr.append('\n\n------------------------------------------------------------\nss   == %r' %ss)
        mat_dic = None
        for mat_dic in dicreg.finditer(ss):
            ecr.append('\nmmmmmmm dictionary found in ss mmmmmmmmmmmmmm')
            for mat_kv in kvregx.finditer(mat_dic.group()):
                k,v = mat_kv.group(2,5)
                ecr.append('%s  :  %s' % (k,v))
                if k in checking_list:
                    checking_dict[k] += 1
                else:
                    checking_list.append(k)
                    checking_dict[k] = 1
        if mat_dic:
            prec = ss[mat_dic.end():]
        else:
            prec += chunk

    print '\n'.join(ecr)
    print '\n\n\nchecking_dict == %s\n\nchecking_list        == %s' %(checking_dict,checking_list)

    ######## The keys are sorted in order that the less frequent ones are at the end
    checking_list.sort(key=lambda k: checking_dict[k], reverse=True)
    posis = dict((k,i) for i,k in enumerate(checking_list))
    print '\nchecking_list sorted == %s\n\nposis == %s' % (checking_list,posis)



    ######## Now, the file is read again to build a list of rows 

    f.seek(0,0)  # the file's pointer is move backed to the beginning of the file

    prec,chunk = '','go'
    base = [ '' for i in xrange(len(checking_list))]
    rows = []
    while chunk:
        chunk = f.read(110)
        ss = ''.join((prec,chunk))
        mat_dic = None
        for mat_dic in dicreg.finditer(ss):
            li = base[:]
            for mat_kv in kvregx.finditer(mat_dic.group()):
                k,v = mat_kv.group(2,5)
                li[posis[k]] = v
            rows.append(li)
        if mat_dic:
            prec = ss[mat_dic.end():]
        else:
            prec += chunk


    print '\n\n%s\n%s' % (checking_list,30*'___')
    print '\n'.join(str(li) for li in rows)

结果

------------------------------------------------------------
ss   == '{surprise : "perturbating at start  ", game:Available Universal Dices Game,\n    player:FTROE875574,location\n:"Lakeview S'


------------------------------------------------------------
ss   == '{surprise : "perturbating at start  ", game:Available Universal Dices Game,\n    player:FTROE875574,location\n:"Lakeview School, Kingsmere Boulevard, Saskatoon, Saskatchewan , Canada",time:15h18}\n\n{"game":"Available","   player":"LOI4531",\n"l'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
surprise  :  perturbating at start
game  :  Available Universal Dices Game
player  :  FTROE875574
location  :  "Lakeview School, Kingsmere Boulevard, Saskatoon, Saskatchewan , Canada"
time  :  15h18


------------------------------------------------------------
ss   == '\n\n{"game":"Available","   player":"LOI4531",\n"location":  "Perth, Australia","time":"08h13","date":"Available"}\n\n{"game":Available,player:PLLI874,location:"Chelsea, Lo'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
game  :  Available
player  :  LOI4531
location  :  "Perth, Australia"
time  :  08h13
date  :  Available


------------------------------------------------------------
ss   == '\n\n{"game":Available,player:PLLI874,location:"Chelsea, London, England",time:20h35}\n\n{special:"midnight happening",game:"Available","player":YTR44,\n"location":"Paris, France","t'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
game  :  Available
player  :  PLLI874
location  :  "Chelsea, London, England"
time  :  20h35


------------------------------------------------------------
ss   == '\n\n{special:"midnight happening",game:"Available","player":YTR44,\n"location":"Paris, France","time":"02h24"\n,\n"date":"Available"}\n\n{game:Available,surprise:"  hretyuuhuhu  ",player:FT875,location\n:,"time":11h22}\n\n{"'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
special  :  midnight happening
game  :  Available
player  :  YTR44
location  :  "Paris, France"
time  :  02h24
date  :  Available

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
game  :  Available
surprise  :  hretyuuhuhu
player  :  FT875
location  :  
time  :  11h22


------------------------------------------------------------
ss   == '\n\n{"game":"Available","player":"LOI4531","location":\n"Damas,Syria","time":"unavailable","date":"Available"}\n\n{"surprise   " '

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
game  :  Available
player  :  LOI4531
location  :  "Damas,Syria"
time  :  unavailable
date  :  Available


------------------------------------------------------------
ss   == '\n\n{"surprise   " : GARAMANANATALA Tower ,  game:Available Dices,player  :\n  PuLuLu874,location:"  Westminster, London, England  ",time:20'


------------------------------------------------------------
ss   == '\n\n{"surprise   " : GARAMANANATALA Tower ,  game:Available Dices,player  :\n  PuLuLu874,location:"  Westminster, London, England  ",time:20h01}\n\n{"game":"Available",special:"overnight",   "player":YTR44,"location":\n"Madrid, Spain"    ,     "time":\n"12h33",\nda'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
surprise  :  GARAMANANATALA Tower
game  :  Available Dices
player  :  PuLuLu874
location  :  "  Westminster, London, England  "
time  :  20h01


------------------------------------------------------------
ss   == '\n\n{"game":"Available",special:"overnight",   "player":YTR44,"location":\n"Madrid, Spain"    ,     "time":\n"12h33",\ndate:"Available"\n}'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
game  :  Available
special  :  overnight
player  :  YTR44
location  :  "Madrid, Spain"
time  :  12h33
date  :  Available


------------------------------------------------------------
ss   == ''



checking_dict == {'player': 8, 'game': 8, 'location': 8, 'time': 8, 'date': 4, 'surprise': 3, 'special': 2}

checking_list        == ['surprise', 'game', 'player', 'location', 'time', 'date', 'special']

checking_list sorted == ['game', 'player', 'location', 'time', 'date', 'surprise', 'special']

posis == {'player': 1, 'game': 0, 'location': 2, 'time': 3, 'date': 4, 'surprise': 5, 'special': 6}


['game', 'player', 'location', 'time', 'date', 'surprise', 'special']
__________________________________________________________________________________________
['Available Universal Dices Game', 'FTROE875574', '"Lakeview School, Kingsmere Boulevard, Saskatoon, Saskatchewan , Canada"', '15h18', '', 'perturbating at start', '']
['Available', 'LOI4531', '"Perth, Australia"', '08h13', 'Available', '', '']
['Available', 'PLLI874', '"Chelsea, London, England"', '20h35', '', '', '']
['Available', 'YTR44', '"Paris, France"', '02h24', 'Available', '', 'midnight happening']
['Available', 'FT875', '', '11h22', '', 'hretyuuhuhu', '']
['Available', 'LOI4531', '"Damas,Syria"', 'unavailable', 'Available', '', '']
['Available Dices', 'PuLuLu874', '"  Westminster, London, England  "', '20h01', '', 'GARAMANANATALA Tower', '']
['Available', 'YTR44', '"Madrid, Spain"', '12h33', 'Available', '', 'overnight']

。

我把上面的代码写成一个数GB的巨大文件，无法完全读取：对这样一个非常大的文件的处理必须逐块完成。这就是为什么有说明：

while chunk:
    chunk = f.read(120)
    ss = ''.join((prec,chunk))
    ecr.append('\n\n------------------------------------------------------------\nss   == %r' %ss)
    mat_dic = None
    for mat_dic in dicreg.finditer(ss):
        ............
        ...............
    if mat_dic:
        prec = ss[mat_dic.end():]
    else:
        prec += chunk

但是，显然，如果文件不太大，因此可以一次性读取，则可以简化代码：

import re

dicreg = re.compile('(?<=\{)[^}]*}')

kvregx = re.compile('[ \r\n]*'
                    '(" *)?((location)|[^:]+?)(?(1) *")'
                    '[ \r\n]*'
                    ':'
                    '[ \r\n]*'
                    '(?(3)|(" *)?)([^:]*?)(?(4) *")'
                    '[ \r\n]*(?:,(?=[^,]+?:)|\})')


checking_dict = {}
checking_list = []

filename = 'zzz.txt'

with open(filename) as f:
    content = f.read()




######## First part: to gather all the keys in all the dictionaries

ecr = []

for mat_dic in dicreg.finditer(content):
    ecr.append('\nmmmmmmm dictionary found in ss mmmmmmmmmmmmmm')
    for mat_kv in kvregx.finditer(mat_dic.group()):
        k,v = mat_kv.group(2,5)
        ecr.append('%s  :  %s' % (k,v))
        if k in checking_list:
            checking_dict[k] += 1
        else:
            checking_list.append(k)
            checking_dict[k] = 1


print '\n'.join(ecr)
print '\n\n\nchecking_dict == %s\n\nchecking_list        == %s' %(checking_dict,checking_list)

######## The keys are sorted in order that the less frequent ones are at the end
checking_list.sort(key=lambda k: checking_dict[k], reverse=True)
posis = dict((k,i) for i,k in enumerate(checking_list))
print '\nchecking_list sorted == %s\n\nposis == %s' % (checking_list,posis)



######## Now, the file is read again to build a list of rows 


base = [ '' for i in xrange(len(checking_list))]
rows = []

for mat_dic in dicreg.finditer(content):
    li = base[:]
    for mat_kv in kvregx.finditer(mat_dic.group()):
        k,v = mat_kv.group(2,5)
        li[posis[k]] = v
    rows.append(li)


print '\n\n%s\n%s' % (checking_list,30*'___')
print '\n'.join(str(li) for li in rows)

Here's a very complete code, I think.

First I created the following file:

{surprise : "perturbating at start  ", game:Available Universal Dices Game,
    player:FTROE875574,location
:"Lakeview School, Kingsmere Boulevard, Saskatoon, Saskatchewan , Canada",time:15h18}

{"game":"Available","   player":"LOI4531",
"location":  "Perth, Australia","time":"08h13","date":"Available"}

{"game":Available,player:PLLI874,location:"Chelsea, London, England",time:20h35}

{special:"midnight happening",game:"Available","player":YTR44,
"location":"Paris, France","time":"02h24"
,
"date":"Available"}

{game:Available,surprise:"  hretyuuhuhu  ",player:FT875,location
:,"time":11h22}

{"game":"Available","player":"LOI4531","location":
"Damas,Syria","time":"unavailable","date":"Available"}

{"surprise   " : GARAMANANATALA Tower ,  game:Available Dices,player  :
  PuLuLu874,location:"  Westminster, London, England  ",time:20h01}

{"game":"Available",special:"overnight",   "player":YTR44,"location":
"Madrid, Spain"    ,     "time":
"12h33",
date:"Available"
}

Then the following code treats the content of the file in two phases:

first, running through the content, all the intervening keys in all the dictionaries are collected
a dictionary posis is deducted, that gives for each key the place that its corresponding value must occupy in a row
secondly, thanks to another run through the file, the rows are build one after the other and collected in a list rows

By the way, note that the condition on the value associated with key location or "location" is respected.

import re

dicreg = re.compile('(?<=\{)[^}]*}')

kvregx = re.compile('[ \r\n]*'
                    '(" *)?((location)|[^:]+?)(?(1) *")'
                    '[ \r\n]*'
                    ':'
                    '[ \r\n]*'
                    '(?(3)|(" *)?)([^:]*?)(?(4) *")'
                    '[ \r\n]*(?:,(?=[^,]+?:)|\})')


checking_dict = {}
checking_list = []

filename = 'zzz.txt'

with open(filename) as f:

    ######## First part: to gather all the keys in all the dictionaries

    prec,chunk = '','go'
    ecr = []
    while chunk:
        chunk = f.read(120)
        ss = ''.join((prec,chunk))
        ecr.append('\n\n------------------------------------------------------------\nss   == %r' %ss)
        mat_dic = None
        for mat_dic in dicreg.finditer(ss):
            ecr.append('\nmmmmmmm dictionary found in ss mmmmmmmmmmmmmm')
            for mat_kv in kvregx.finditer(mat_dic.group()):
                k,v = mat_kv.group(2,5)
                ecr.append('%s  :  %s' % (k,v))
                if k in checking_list:
                    checking_dict[k] += 1
                else:
                    checking_list.append(k)
                    checking_dict[k] = 1
        if mat_dic:
            prec = ss[mat_dic.end():]
        else:
            prec += chunk

    print '\n'.join(ecr)
    print '\n\n\nchecking_dict == %s\n\nchecking_list        == %s' %(checking_dict,checking_list)

    ######## The keys are sorted in order that the less frequent ones are at the end
    checking_list.sort(key=lambda k: checking_dict[k], reverse=True)
    posis = dict((k,i) for i,k in enumerate(checking_list))
    print '\nchecking_list sorted == %s\n\nposis == %s' % (checking_list,posis)



    ######## Now, the file is read again to build a list of rows 

    f.seek(0,0)  # the file's pointer is move backed to the beginning of the file

    prec,chunk = '','go'
    base = [ '' for i in xrange(len(checking_list))]
    rows = []
    while chunk:
        chunk = f.read(110)
        ss = ''.join((prec,chunk))
        mat_dic = None
        for mat_dic in dicreg.finditer(ss):
            li = base[:]
            for mat_kv in kvregx.finditer(mat_dic.group()):
                k,v = mat_kv.group(2,5)
                li[posis[k]] = v
            rows.append(li)
        if mat_dic:
            prec = ss[mat_dic.end():]
        else:
            prec += chunk


    print '\n\n%s\n%s' % (checking_list,30*'___')
    print '\n'.join(str(li) for li in rows)

result

------------------------------------------------------------
ss   == '{surprise : "perturbating at start  ", game:Available Universal Dices Game,\n    player:FTROE875574,location\n:"Lakeview S'


------------------------------------------------------------
ss   == '{surprise : "perturbating at start  ", game:Available Universal Dices Game,\n    player:FTROE875574,location\n:"Lakeview School, Kingsmere Boulevard, Saskatoon, Saskatchewan , Canada",time:15h18}\n\n{"game":"Available","   player":"LOI4531",\n"l'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
surprise  :  perturbating at start
game  :  Available Universal Dices Game
player  :  FTROE875574
location  :  "Lakeview School, Kingsmere Boulevard, Saskatoon, Saskatchewan , Canada"
time  :  15h18


------------------------------------------------------------
ss   == '\n\n{"game":"Available","   player":"LOI4531",\n"location":  "Perth, Australia","time":"08h13","date":"Available"}\n\n{"game":Available,player:PLLI874,location:"Chelsea, Lo'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
game  :  Available
player  :  LOI4531
location  :  "Perth, Australia"
time  :  08h13
date  :  Available


------------------------------------------------------------
ss   == '\n\n{"game":Available,player:PLLI874,location:"Chelsea, London, England",time:20h35}\n\n{special:"midnight happening",game:"Available","player":YTR44,\n"location":"Paris, France","t'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
game  :  Available
player  :  PLLI874
location  :  "Chelsea, London, England"
time  :  20h35


------------------------------------------------------------
ss   == '\n\n{special:"midnight happening",game:"Available","player":YTR44,\n"location":"Paris, France","time":"02h24"\n,\n"date":"Available"}\n\n{game:Available,surprise:"  hretyuuhuhu  ",player:FT875,location\n:,"time":11h22}\n\n{"'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
special  :  midnight happening
game  :  Available
player  :  YTR44
location  :  "Paris, France"
time  :  02h24
date  :  Available

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
game  :  Available
surprise  :  hretyuuhuhu
player  :  FT875
location  :  
time  :  11h22


------------------------------------------------------------
ss   == '\n\n{"game":"Available","player":"LOI4531","location":\n"Damas,Syria","time":"unavailable","date":"Available"}\n\n{"surprise   " '

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
game  :  Available
player  :  LOI4531
location  :  "Damas,Syria"
time  :  unavailable
date  :  Available


------------------------------------------------------------
ss   == '\n\n{"surprise   " : GARAMANANATALA Tower ,  game:Available Dices,player  :\n  PuLuLu874,location:"  Westminster, London, England  ",time:20'


------------------------------------------------------------
ss   == '\n\n{"surprise   " : GARAMANANATALA Tower ,  game:Available Dices,player  :\n  PuLuLu874,location:"  Westminster, London, England  ",time:20h01}\n\n{"game":"Available",special:"overnight",   "player":YTR44,"location":\n"Madrid, Spain"    ,     "time":\n"12h33",\nda'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
surprise  :  GARAMANANATALA Tower
game  :  Available Dices
player  :  PuLuLu874
location  :  "  Westminster, London, England  "
time  :  20h01


------------------------------------------------------------
ss   == '\n\n{"game":"Available",special:"overnight",   "player":YTR44,"location":\n"Madrid, Spain"    ,     "time":\n"12h33",\ndate:"Available"\n}'

mmmmmmm dictionary found in ss mmmmmmmmmmmmmm
game  :  Available
special  :  overnight
player  :  YTR44
location  :  "Madrid, Spain"
time  :  12h33
date  :  Available


------------------------------------------------------------
ss   == ''



checking_dict == {'player': 8, 'game': 8, 'location': 8, 'time': 8, 'date': 4, 'surprise': 3, 'special': 2}

checking_list        == ['surprise', 'game', 'player', 'location', 'time', 'date', 'special']

checking_list sorted == ['game', 'player', 'location', 'time', 'date', 'surprise', 'special']

posis == {'player': 1, 'game': 0, 'location': 2, 'time': 3, 'date': 4, 'surprise': 5, 'special': 6}


['game', 'player', 'location', 'time', 'date', 'surprise', 'special']
__________________________________________________________________________________________
['Available Universal Dices Game', 'FTROE875574', '"Lakeview School, Kingsmere Boulevard, Saskatoon, Saskatchewan , Canada"', '15h18', '', 'perturbating at start', '']
['Available', 'LOI4531', '"Perth, Australia"', '08h13', 'Available', '', '']
['Available', 'PLLI874', '"Chelsea, London, England"', '20h35', '', '', '']
['Available', 'YTR44', '"Paris, France"', '02h24', 'Available', '', 'midnight happening']
['Available', 'FT875', '', '11h22', '', 'hretyuuhuhu', '']
['Available', 'LOI4531', '"Damas,Syria"', 'unavailable', 'Available', '', '']
['Available Dices', 'PuLuLu874', '"  Westminster, London, England  "', '20h01', '', 'GARAMANANATALA Tower', '']
['Available', 'YTR44', '"Madrid, Spain"', '12h33', 'Available', '', 'overnight']

I wrote the above code thinking to an enormous file of several GB that couldn't be read entirely: the treatment of such a very big file must be done chunk after chunk. That's why there are instructions:

while chunk:
    chunk = f.read(120)
    ss = ''.join((prec,chunk))
    ecr.append('\n\n------------------------------------------------------------\nss   == %r' %ss)
    mat_dic = None
    for mat_dic in dicreg.finditer(ss):
        ............
        ...............
    if mat_dic:
        prec = ss[mat_dic.end():]
    else:
        prec += chunk

But, evidently, if the file isn't too big, hence readable in one shot, the code can be simplified:

import re

dicreg = re.compile('(?<=\{)[^}]*}')

kvregx = re.compile('[ \r\n]*'
                    '(" *)?((location)|[^:]+?)(?(1) *")'
                    '[ \r\n]*'
                    ':'
                    '[ \r\n]*'
                    '(?(3)|(" *)?)([^:]*?)(?(4) *")'
                    '[ \r\n]*(?:,(?=[^,]+?:)|\})')


checking_dict = {}
checking_list = []

filename = 'zzz.txt'

with open(filename) as f:
    content = f.read()




######## First part: to gather all the keys in all the dictionaries

ecr = []

for mat_dic in dicreg.finditer(content):
    ecr.append('\nmmmmmmm dictionary found in ss mmmmmmmmmmmmmm')
    for mat_kv in kvregx.finditer(mat_dic.group()):
        k,v = mat_kv.group(2,5)
        ecr.append('%s  :  %s' % (k,v))
        if k in checking_list:
            checking_dict[k] += 1
        else:
            checking_list.append(k)
            checking_dict[k] = 1


print '\n'.join(ecr)
print '\n\n\nchecking_dict == %s\n\nchecking_list        == %s' %(checking_dict,checking_list)

######## The keys are sorted in order that the less frequent ones are at the end
checking_list.sort(key=lambda k: checking_dict[k], reverse=True)
posis = dict((k,i) for i,k in enumerate(checking_list))
print '\nchecking_list sorted == %s\n\nposis == %s' % (checking_list,posis)



######## Now, the file is read again to build a list of rows 


base = [ '' for i in xrange(len(checking_list))]
rows = []

for mat_dic in dicreg.finditer(content):
    li = base[:]
    for mat_kv in kvregx.finditer(mat_dic.group()):
        k,v = mat_kv.group(2,5)
        li[posis[k]] = v
    rows.append(li)


print '\n\n%s\n%s' % (checking_list,30*'___')
print '\n'.join(str(li) for li in rows)

回复收藏 0 原文

眼角的笑意。 2024-12-11 01:01:59

如果它比您作为示例给出的更复杂，或者如果它必须更快，您可能应该研究 pyparsing。

否则你可以写一些更古怪的东西，如下所示：

contentlines = ["""{"game":"Available","player":"Available","location":"Chelsea, London, England","time":"Available","date":"Available"}""", """{game:Available,player:Available,location:"Chelsea, London, England",time:Available}"""]
def get_dict(line):
    keys = []
    values = []
    line = line.replace("{", "").replace("}", "")
    contlist = line.split(":")
    keys.append(contlist[0].strip('"').strip("'"))
    for entry in contlist[1:-1]:
        entry = entry.strip()
        if entry[0] == "'" or entry[0] == '"':
            endpos = entry[1:].find(entry[0]) + 2
        else:
            endpos = entry.find(",")
        values.append(entry[0:endpos].strip('"').strip("'"))
        keys.append(entry[endpos + 1:].strip('"').strip("'"))
    values.append(contlist[-1].strip('"').strip("'"))
    return dict(zip(keys, values))


for line in contentlines:
    print get_dict(line)

If it's more complicated then what you have given as examples, or if it has to be faster, you should probably look into pyparsing.

Otherwise you could write something more hacky like this:

contentlines = ["""{"game":"Available","player":"Available","location":"Chelsea, London, England","time":"Available","date":"Available"}""", """{game:Available,player:Available,location:"Chelsea, London, England",time:Available}"""]
def get_dict(line):
    keys = []
    values = []
    line = line.replace("{", "").replace("}", "")
    contlist = line.split(":")
    keys.append(contlist[0].strip('"').strip("'"))
    for entry in contlist[1:-1]:
        entry = entry.strip()
        if entry[0] == "'" or entry[0] == '"':
            endpos = entry[1:].find(entry[0]) + 2
        else:
            endpos = entry.find(",")
        values.append(entry[0:endpos].strip('"').strip("'"))
        keys.append(entry[endpos + 1:].strip('"').strip("'"))
    values.append(contlist[-1].strip('"').strip("'"))
    return dict(zip(keys, values))


for line in contentlines:
    print get_dict(line)

回复收藏 0 原文

小兔几 2024-12-11 01:01:59

import re

text = """
{game:Available,player:Available,location:"Chelsea, London, England",time:Available}
{"game":"Available","player":"Available","location":"Chelsea, London, England","time":"Available","date":"Available"}
"""

dicts = re.findall(r"{.+?}", text)                         # Split the dicts
for dict_ in dicts:
    dict_ = dict(re.findall(r'(\w+|".*?"):(\w+|".*?")', dict_))    # Get the elements
    print dict_

>>>{'player': 'Available', 'game': 'Available', 'location': '"Chelsea, London, England"', 'time': 'Available'}
>>>{'"game"': '"Available"', '"time"': '"Available"', '"player"': '"Available"', '"date"': '"Available"', '"location"': '"Chelsea, London, England"'}

import re

text = """
{game:Available,player:Available,location:"Chelsea, London, England",time:Available}
{"game":"Available","player":"Available","location":"Chelsea, London, England","time":"Available","date":"Available"}
"""

dicts = re.findall(r"{.+?}", text)                         # Split the dicts
for dict_ in dicts:
    dict_ = dict(re.findall(r'(\w+|".*?"):(\w+|".*?")', dict_))    # Get the elements
    print dict_

>>>{'player': 'Available', 'game': 'Available', 'location': '"Chelsea, London, England"', 'time': 'Available'}
>>>{'"game"': '"Available"', '"time"': '"Available"', '"player"': '"Available"', '"date"': '"Available"', '"location"': '"Chelsea, London, England"'}

回复收藏 0 原文

策马西风 2024-12-11 01:01:59

希望这个 pyparsing 解决方案随着时间的推移更容易遵循和维护：

data = """\
{game:Available,player:Available,location:"Chelsea, London, England",time:Available} 
{"game":"Available","player":"Available","location":"Chelsea, London, England","time":"Available","date":"Available"}"""

from pyparsing import Suppress, Word, alphas, alphanums, QuotedString, Group, Dict, delimitedList

LBRACE,RBRACE,COLON = map(Suppress, "{}:")
key = QuotedString('"') | Word(alphas) 
value =  QuotedString('"') | Word(alphanums+"_")
keyvalue = Group(key + COLON + value)

dictExpr = LBRACE + Dict(delimitedList(keyvalue)) + RBRACE

for d in dictExpr.searchString(data):
    print d.asDict()

打印：

{'player': 'Available', 'game': 'Available', 'location': 'Chelsea, London, England', 'time': 'Available'}
{'date': 'Available', 'player': 'Available', 'game': 'Available', 'location': 'Chelsea, London, England', 'time': 'Available'}

Hopefully this pyparsing solution is easier to follow and maintain over time:

data = """\
{game:Available,player:Available,location:"Chelsea, London, England",time:Available} 
{"game":"Available","player":"Available","location":"Chelsea, London, England","time":"Available","date":"Available"}"""

from pyparsing import Suppress, Word, alphas, alphanums, QuotedString, Group, Dict, delimitedList

LBRACE,RBRACE,COLON = map(Suppress, "{}:")
key = QuotedString('"') | Word(alphas) 
value =  QuotedString('"') | Word(alphanums+"_")
keyvalue = Group(key + COLON + value)

dictExpr = LBRACE + Dict(delimitedList(keyvalue)) + RBRACE

for d in dictExpr.searchString(data):
    print d.asDict()

Prints:

{'player': 'Available', 'game': 'Available', 'location': 'Chelsea, London, England', 'time': 'Available'}
{'date': 'Available', 'player': 'Available', 'game': 'Available', 'location': 'Chelsea, London, England', 'time': 'Available'}

回复收藏 0 原文

~没有更多了~