使用 python 从 .txt 文件中提取数据
我有很多很多 .xml 文件,我需要从中提取一些坐标。 直接从 .xml 文件提取数据似乎非常非常复杂 - 所以我正在将 .xml 文件保存为 .txt 文件并以这种方式提取数据。但是,当我打开 .txt 文件时,我的数据全部集中在大约 6 行上。到目前为止,我找到的所有脚本都通过读取每行上的第一个单词来选择数据。但显然这不会为我工作! 我需要提取这些注释之间的数字:
<gml:lowerCorner>137796 483752</gml:lowerCorner> <gml:upperCorner>138178 484222</gml:upperCorner>
在文本文件中,它们都分组在一起!有谁知道如何提取这些数据?谢谢你!
I many, many .xml files and i need to extract some co-ordinates from them.
Extracting data straight from .xml files seems to be very, very complicated - so i am working saving the .xml files as .txt files and extracting the data that way. However, when i open the .txt file, my data is all bunched together on about 6 lines.. And all the scripts i have found so far select the data by reading the first word on each line.. but obviously that won't work for me!
I need to extract the numbers inbetween these comments:
<gml:lowerCorner>137796 483752</gml:lowerCorner> <gml:upperCorner>138178 484222</gml:upperCorner>
In the text file they are all grouped together! Does anyone know how to extract this data? Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这绝对是错误的做法。别管它,改进你的方法:-)
说真的,如果文件是 XML,那么只需使用 XML 解析器来读取它。学习如何在 Python 中做到这一点并不难,并且会让您现在的生活更轻松,将来也会更轻松,当您可能发现自己面临更复杂的解析需求时,您不必重新学习它。
查看 xml.etree.ElementTree.ElementTree。下面是一些示例代码:
现在只需阅读模块的文档,看看可以使用
tree
做什么。您会惊讶地发现通过这种方式获取信息是多么简单。如果您对提取数据有具体问题,我建议您提出另一个问题,在其中指定必须解析的 XML 文件的格式,以及必须从中提取哪些数据。我确信几分钟内就会向您建议工作代码。This is absolutely the wrong approach. Leave it alone and improve your ways :-)
Seriously, if the file is XML, then just use an XML parser to read it. Learning how to do it in Python isn't hard and will make your life easier now and much easier in the future, when you may find yourself facing more complex parsing needs, and you won't have to re-learn it.
Look at
xml.etree.ElementTree.ElementTree
. Here's some sample code:Now just read the documentation of the module and see what you can do with
tree
. You'll be surprised to find out how simple it is to get to information this way. If you have specific questions about extracting data, I suggest you open another question in which you specify the format of the XML file you have to parse, and what data you have to take out of there. I'm sure you will have working code suggested to you in matters of minutes.您还可以像打开 .txt 文件一样通过 python 脚本打开 .xml 文件。
然后你可以使用正则表达式来找到你非常想要的那些数字。
You can also open through the python script .xml file as you open a .txt file.
Then you can use regular expressions to find those numbers you want so badly.
最上面的答案仍然是最上面的答案。但是,我一直在使用 HTML 来完成此操作,并且此链接 lxml 和 xpath 是理想的选择。
打开浏览器访问您感兴趣的网站(或数据)。在 Chrome 中,右键单击并“检查元素”。在开发人员窗口中突出显示的文本上再次右键单击并“复制 XPath”。对于 google.com,单击主搜索框,我得到以下 XPath。
您可以使用 lxml 从该项目中获取各种数据。看看在末尾附加“text()”或“@value”或“@href”时会得到什么。
The top answer is still the top answer. However, I've been doing just this with HTML and this link lxml and xpath ideal.
Open your browser to the site (or data) which is of interest. In Chrome, right click and 'Inspect Element'. In the Developer window on the highlighted text right click again and 'Copy XPath'. For google.com and clicking on the main search box I get the following XPath.
You can use lxml to grab various data from this item. See what you get when you append 'text()' or '@value' or '@href' on the end.
对于非常简单的 xml,我只使用正则表达式,不会为简单的 xml 数据包启动缓慢的 xml 解析器。
For really simple xml i just use a regex, can't be botherd to start an slow xml parser for a simple xml packet.