使用 Python 导入 CSV(查找表)并将 GPS 坐标添加到另一个输出 CSV

发布于 2024-09-13 04:38:15 字数 755 浏览 5 评论 0原文

因此,我已经导入了一个包含 3000 个元素的 XML-ish 文件,并将它们解析为 CSV 以便输出。但我还需要导入第二个 CSV 文件,其中包含“关键字”、“纬度”、“经度”作为列,并使用它将 GPS 坐标添加到第一个文件的其他列中。

阅读 python 教程,似乎 {dictionary} 是我所需要的,尽管我在这里读到元组可能会更好。我不知道。

但无论哪种方式 - 我从以下开始:

    floc = open('c:\python\kenya_location_lookup.csv','r')
    l = csv.DictReader(floc)
    for row in l: print row.keys()

输出看起来像:

{'LATITUDE': '-1.311467078', 'LONGITUDE': '36.77352011', 'KEYWORD': 'Kianda'} {'纬度': '-1.315288401', '经度': '36.77614331', '关键字': '索韦托'} {'纬度':'-1.315446430425027','经度':'36.78170621395111','关键字':'Gatwekera'} {'LATITUDE': '-1.3136151425171327', 'LONGITUDE': '36.785863637924194', 'KEYWORD': 'Kisumu Ndogo'}

我是新手(而不是程序员)。问题是如何使用键提取相应的行数据并将其与另一组元素主体中的单词进行匹配?

So I have already imported one XML-ish file with 3000 elements and parsed them into a CSV for output. But I also need to import a second CSV file with 'keyword','latitude','longitude' as columns and use it to add the GPS coordinates to additional columns on the first file.

Reading the python tutorial, it seems like {dictionary} is what I need, although I've read on here that tuples might be better. I don't know.

But either way - I start with:

    floc = open('c:\python\kenya_location_lookup.csv','r')
    l = csv.DictReader(floc)
    for row in l: print row.keys()

The output look like:

{'LATITUDE': '-1.311467078', 'LONGITUDE': '36.77352011', 'KEYWORD': 'Kianda'}
{'LATITUDE': '-1.315288401', 'LONGITUDE': '36.77614331', 'KEYWORD': 'Soweto'}
{'LATITUDE': '-1.315446430425027', 'LONGITUDE': '36.78170621395111', 'KEYWORD': 'Gatwekera'}
{'LATITUDE': '-1.3136151425171327', 'LONGITUDE': '36.785863637924194', 'KEYWORD': 'Kisumu Ndogo'}

I'm a newbie (and not a programmer). Question is how do I use the keys to pluck out the corresponding row data and match it against words in the body of the element in the other set?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

谜兔 2024-09-20 04:38:15

看了python教程,看来
就像{dictionary}是我需要的,
虽然我在这里读过元组
可能会更好。我不知道。

他们都是完成这项任务的不错选择。

print row.keys() 输出外观
像:

{'纬度': '-1.311467078',

不,不是!这是 print row 的输出,绝对不是 print row.keys()。请不要在您的问题中提供虚假信息,这会使问题很难有效回答(作为新手没有什么区别:您当然可以检查您提供的输出实际上是否来自您也提供的代码!)。

我是新手(而不是程序员)。
问题是我如何使用这些键
取出对应的行数据
并将其与正文中的单词相匹配
另一个集合中的元素?

由于您向我们提供的有关“另一组”结构的信息绝对为零,因此您当然无法回答这个问题。疯狂猜测,例如,如果“另一组”中的条目也是 dict,每个都有一个 KEYWORD 键,那么您想先构建一个辅助字典,然后合并“其他集合”中的(某些)条目:

l = csv.DictReader(floc)
dloc = dict((d['KEYWORD'], d) for d in l)
for d in otherset:
  d.update(dloc.get(d['KEYWORD'], ()))

当 CSV 中的相应关键字条目中不存在该位置时,这将使其他集合中的位置丢失 - 如果这是一个问题,您可能需要使用“假位置”字典作为缺失条目的默认值,而不是我显示的最后一条语句中的 () 。但是,由于您的问题中缺乏信息,无论如何这都是疯狂的猜测。

Reading the python tutorial, it seems
like {dictionary} is what I need,
although I've read on here that tuples
might be better. I don't know.

They're both fine choices for this task.

print row.keys() The output look
like:

{'LATITUDE': '-1.311467078',

No it doesn't! This is the output from print row, most definitely NOT print row.keys(). Please don't supply disinformation in your questions, it makes them really hard to answer effectively (being a newbie makes no difference: surely you can check that the output you provide actually comes from the code you also provide!).

I'm a newbie (and not a programmer).
Question is how do I use the keys to
pluck out the corresponding row data
and match it against words in the body
of the element in the other set?

Since you give us absolutely zero information on the structure of "the other set", you make it of course impossible to answer this question. Guessing wildly, if for example the entries in "the other set" are also dicts each with a key of KEYWORD, you want to build an auxiliary dict first, then merge (some of) its entries in the "other set":

l = csv.DictReader(floc)
dloc = dict((d['KEYWORD'], d) for d in l)
for d in otherset:
  d.update(dloc.get(d['KEYWORD'], ()))

This will leave the location missing from the other set when not present in a corresponding keyword entry in the CSV -- if that's a problem you may want to use a "fake location" dictionary as the default for missing entries instead of that () in the last statement I've shown. But, this is all wild speculation anyway, due to the dearth of info in your Q.

中性美 2024-09-20 04:38:15

如果将 DictReader 转储到列表中 (data = [row for row in csv.DictReader(file)]),并且每行都有唯一的关键字,则将该词典列表转换为包含字典,使用该关键字作为键。

>>> data = [row for row in csv.DictReader(open('C:\\my.csv'),
...                                       ('num','time','time2'))]
>>> len(data)  # lots of old data :P
1410
>>> data[1].keys()
['time2', 'num', 'time']
>>> keyeddata = {}
>>> for row in data[2:]:  # I have some junk rows
...     keyeddata[row['num']] = row
...
>>> keyeddata['32']
{'num': '32', 'time2': '8', 'time': '13269'}

提取关键字后,您可以迭代其他列表,从中获取关键字,并将其用作纬度/经度列表的索引。从该索引中取出纬度/经度并将其添加到另一个列表中。

If you dump the DictReader into a list (data = [row for row in csv.DictReader(file)]), and you have unique keywords for each row, convert that list of dictionaries into a dictionary of dictionaries, using that keyword as the key.

>>> data = [row for row in csv.DictReader(open('C:\\my.csv'),
...                                       ('num','time','time2'))]
>>> len(data)  # lots of old data :P
1410
>>> data[1].keys()
['time2', 'num', 'time']
>>> keyeddata = {}
>>> for row in data[2:]:  # I have some junk rows
...     keyeddata[row['num']] = row
...
>>> keyeddata['32']
{'num': '32', 'time2': '8', 'time': '13269'}

Once you have the keyword pulled out, you can iterate through your other list, grab the keyword from it, and use it as the index for the lat/long list. Pull out the lat/long from that index and add it to the other list.

当梦初醒 2024-09-20 04:38:15

谢谢 -

亚历克斯:我的另一组代码正在工作,唯一相关的部分是我有一个字符串,可能包含也可能不包含该字典中的“关键字”。

从结构上讲,这就是我组织它的方式:(

def main():
    f = open('c:\python\ggce.sms', 'r')
    sensetree = etree.parse(f)
    senses = sensetree.getiterator('SenseMakingItem')
    bodies = sensetree.getiterator('Body')       
    stories = []
    for body in bodies:
            fix_body(body)
            storybyte = unicode(body.text)
            storybit = storybyte.encode('ascii','ignore')
            stories.append(storybit)
    rows = [ids,titles,locations,stories]
    out = map(None, *rows)
    print out[120:121]
    write_data(out,'c:\python\output_test.csv')

我省略了获取其标题、位置的代码,因为它们可以工作,并且不会用于从故事中的数据获取真实位置)

希望这会有所帮助。

Thanks -

Alex: My code for the other set is working, and the only relevant part is that I have a string that may or may not contain the 'keyword' that is in this dictionary.

Structurally, this is how I organized it:

def main():
    f = open('c:\python\ggce.sms', 'r')
    sensetree = etree.parse(f)
    senses = sensetree.getiterator('SenseMakingItem')
    bodies = sensetree.getiterator('Body')       
    stories = []
    for body in bodies:
            fix_body(body)
            storybyte = unicode(body.text)
            storybit = storybyte.encode('ascii','ignore')
            stories.append(storybit)
    rows = [ids,titles,locations,stories]
    out = map(None, *rows)
    print out[120:121]
    write_data(out,'c:\python\output_test.csv')

(I omitted the code for getting its, titles, locations because they work and will not be used to get the real locations from the data within stories)

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文