如何将XML数据转换为data.frame?
我正在尝试学习 R 的 XML
包。我正在尝试从 books.xml 示例 xml 数据文件创建 data.frame。这就是我得到的:
library(XML)
books <- "http://www.w3schools.com/XQuery/books.xml"
doc <- xmlTreeParse(books, useInternalNodes = TRUE)
doc
xpathApply(doc, "//book", function(x) do.call(paste, as.list(xmlValue(x))))
xpathSApply(doc, "//book", function(x) strsplit(xmlValue(x), " "))
xpathSApply(doc, "//book/child::*", xmlValue)
这些 xpathSApply 中的每一个都没有让我接近我的意图。应该如何构建一个格式良好的 data.frame?
I'm trying to learn R's XML
package. I'm trying to create a data.frame from books.xml sample xml data file. Here's what I get:
library(XML)
books <- "http://www.w3schools.com/XQuery/books.xml"
doc <- xmlTreeParse(books, useInternalNodes = TRUE)
doc
xpathApply(doc, "//book", function(x) do.call(paste, as.list(xmlValue(x))))
xpathSApply(doc, "//book", function(x) strsplit(xmlValue(x), " "))
xpathSApply(doc, "//book/child::*", xmlValue)
Each of these xpathSApply's don't get me even close to my intention. How should one proceed toward a well formed data.frame?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常,我建议尝试
xmlToDataFrame()
函数,但我相信这实际上相当棘手,因为它一开始的结构就不好。我建议使用此函数:
一个问题是每本书有多个作者,因此您需要在构建数据框架时决定如何处理该问题。
一旦你决定如何处理多作者问题,那么使用 plyr 中的 ldply() 函数(或者只使用 lapply 并将使用 do.call("rbind"...) 将值返回到 data.frame 中。
这是一个完整的示例(不包括作者):
这是包含作者的情况。您需要使用 ldply 。 > 在这种情况下,因为列表是“锯齿状的”...lapply 无法正确处理该问题[否则您可以将
lapply
与rbind.fill
一起使用(也由您提供)。 Hadley),但是当plyr
自动为你做这件事时为什么还要麻烦呢?]:Ordinarily, I would suggest trying the
xmlToDataFrame()
function, but I believe that this will actually be fairly tricky because it isn't well structured to begin with.I would recommend working with this function:
One problem is that there are multiple authors per book, so you will need to decide how to handle that when you're structuring your data frame.
Once you have decided what to do with the multiple authors issue, then it's fairly straight forward to turn your book list into a data frame with the
ldply()
function in plyr (or just use lapply and convert the return value into a data.frame by using do.call("rbind"...).Here's a complete example (excluding author):
Here's what it looks like with author included. You need to use
ldply
in this instance since the list is "jagged"...lapply can't handle that properly. [Otherwise you can uselapply
withrbind.fill
(also courtesy of Hadley), but why bother whenplyr
automatically does it for you?]: