XML数据扁平化,如何找回原始数据

发布于 2025-01-10 02:49:57 字数 1437 浏览 1 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浪推晚风 2025-01-17 02:49:58

您提到您的现有字段位于另一个文件中。如果您可以使用它们来制作命名空白字符向量,如下所示:

existing_list <- c(id = "", v1 = "", v2 = "", v3 = "", v4 = "")

那么您可以执行以下操作:

df <- read.csv(text = "my_textfile.csv", header = FALSE)

id_list <- split(df, cumsum(df$V1 == "id"))

do.call(rbind, lapply(id_list, function(x) {
  vec <- setNames(x$V2, x$V1)
  existing_list[match(names(vec), names(existing_list))] <- vec
  as.data.frame(as.list(existing_list))
  }))

#>    id         v1         v2 v3         v4
#> 1 001 some_value                         
#> 2 002 some_value some_value              
#> 3 003            some_value              
#> 4 004                          some_value

再次强调,此处不使用外部包。

You mention you have your existing fields in another file. Provided you can use these to make a named blank character vector like this:

existing_list <- c(id = "", v1 = "", v2 = "", v3 = "", v4 = "")

Then you can do:

df <- read.csv(text = "my_textfile.csv", header = FALSE)

id_list <- split(df, cumsum(df$V1 == "id"))

do.call(rbind, lapply(id_list, function(x) {
  vec <- setNames(x$V2, x$V1)
  existing_list[match(names(vec), names(existing_list))] <- vec
  as.data.frame(as.list(existing_list))
  }))

#>    id         v1         v2 v3         v4
#> 1 001 some_value                         
#> 2 002 some_value some_value              
#> 3 003            some_value              
#> 4 004                          some_value

Again, no external packages are used here.

如若梦似彩虹 2025-01-17 02:49:58

这是一种用 Python 实现的方法。

假设输入文件 'flattened_xml.txt' 包含:

id,001
v1,some_value1
id,002
v2,some_value2
v1,some_value3
id,003
v2,some_value4
id,004
v4,some_value5

代码:

# Get lines of text file.
with open('flattened_xml.txt') as file:
    data = file.read().splitlines()

# Determine what fiels are present.
fields = []
for line in data:
    id, _ = line.split(',')
    if id not in fields:
        fields.append(id)

table = [{field:field for field in fields}]  # Header row.
row = {}
for line in data:
    field, value = line.split(',')

    if field == fields[0]:
        row = {field: value}
        row.update({key: None for key in fields[1:]})
        continue

    row.update({key: value if key == field else None for key in fields[1:]})
    table.append(row)
    row = {key: None for key in fields}

for row in table:
    s = [f'{value if value is not None else "":11}' for value in row.values()]
    print('|' + '|'.join(s) + '|')

结果:

|id         |v1         |v2         |v4         |
|001        |some_value1|           |           |
|002        |           |some_value2|           |
|           |some_value3|           |           |
|003        |           |some_value4|           |
|004        |           |           |some_value5|

Here's a way to do it Python.

Assuming the input file 'flattened_xml.txt'contains:

id,001
v1,some_value1
id,002
v2,some_value2
v1,some_value3
id,003
v2,some_value4
id,004
v4,some_value5

Code:

# Get lines of text file.
with open('flattened_xml.txt') as file:
    data = file.read().splitlines()

# Determine what fiels are present.
fields = []
for line in data:
    id, _ = line.split(',')
    if id not in fields:
        fields.append(id)

table = [{field:field for field in fields}]  # Header row.
row = {}
for line in data:
    field, value = line.split(',')

    if field == fields[0]:
        row = {field: value}
        row.update({key: None for key in fields[1:]})
        continue

    row.update({key: value if key == field else None for key in fields[1:]})
    table.append(row)
    row = {key: None for key in fields}

for row in table:
    s = [f'{value if value is not None else "":11}' for value in row.values()]
    print('|' + '|'.join(s) + '|')

Result:

|id         |v1         |v2         |v4         |
|001        |some_value1|           |           |
|002        |           |some_value2|           |
|           |some_value3|           |           |
|003        |           |some_value4|           |
|004        |           |           |some_value5|
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文