将XML字段值打印到CSV

发布于 2025-01-24 09:26:23 字数 2556 浏览 1 评论 0原文

我有一个目录，其中包含XML文件的负载，这些文件的格式不是完全相同的，但是我在包含3个相同必需字段后的XML文件。我需要知道每个库的snpid，因为它们必须是唯一的。为此，我需要列出每个及其ID值。

首先，我只需要从“供电”字段中包含值=“ kontakt”的文件中获取结果。然后，对于每个，我都需要名称，snpid和regkey才能打印成3列CSV。例如。 Mystica | 547 |最好的服务 - Mystica

XML文件是这样的：

    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<ProductHints spec="1.0.16">

  <Product version="2">
    <UPID>70026fd5-8f6f-429e-b891-12c2f94bc566</UPID>
    <Name>Mystica</Name>
    <Type>Content</Type>
    <NKSEnabled minVersion="1.1.0.0">true</NKSEnabled>
    <Relevance maxVersion="1.0.9.0" minVersion="1.0.0.0">
      <Application minVersion="5.0.0.0" nativeContent="true">kontakt</Application>
    </Relevance>
    <Relevance minVersion="1.1.0.0">
      <Application minVersion="5.8.0.0" nativeContent="true">kontakt</Application>
      <Application minVersion="2.6.5.0">maschine</Application>
      <Application minVersion="1.8.2.0">kkontrol</Application>
    </Relevance>
    <PoweredBy>Kontakt</PoweredBy>
    <Visibility maxVersion="1.0.9.0" minVersion="1.0.0.0" target="Standalone">1</Visibility>
    <Company>Best Service</Company>
    <AuthSystem>RAS2</AuthSystem>
    <SNPID>547</SNPID>
    <RegKey>Best Service - Mystica</RegKey>
    <Icon>bestservice</Icon>
    <ProductSpecific>
      <HU>496B8CF4F8B1402C4A6650214DF2514C</HU>
      <JDX>C9A2B6D9549FD159D8A3CFF054AAE934C2AC849EC74827847288DF07577A8F22</JDX>
      <Visibility type="Number">3</Visibility>
    </ProductSpecific>
  </Product>

</ProductHints>

我已经尝试过

Computer:~ user$ cd /Library/Application\ Support/Native\ Instruments/Service\ Center 
Computer:Service Center

        while read -r Name SNPID RegKey
do
    echo "+++++++++++++++++++"
    echo "Name:  ${Name}"
    echo "SNPID: ${SNPID}"
    echo "RegKey: ${RegKey}"
awk -F '[<>]' '{if (FNR%2==1) {printf "%s: ",$3} else {print $3}}'
done

，但是结果不一致（我知道bash对于解析XML并不是很好，但要求非常基本）。

find -name "*.xml" | xargs cat | tr -d "\n" | sed 's/<\/Name>/\n/g' | sed 's/<\/SNPID>/\n/g' | sed 's/<\/RegKey>/: /g' | sed 's/<[^>]*>//g' | egrep "Name:|SNPID:|RegKey:" | sed 's/Name: /---\nName: /g'

还会导致错误：查找：非法选项 - N

任何狂欢向导可以帮助我吗？（如果可能的话，我希望在不安装其他内容的情况下使用终端）谢谢

原文

I have a directory with loads of XML files that are not formatted in the exact same way but the ones I am after contain the 3 same required fields. I need to know the SNPID of each library as they must be unique. For this I need to list each and their ID value.

Firstly I need to only get results from files containing the value="Kontakt" in the "PoweredBy" field.
Then for each I need the Name, SNPID and RegKey to print into a 3 column CSV. eg. Mystica | 547 | Best Service - Mystica

The XML files are like this:

    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<ProductHints spec="1.0.16">

  <Product version="2">
    <UPID>70026fd5-8f6f-429e-b891-12c2f94bc566</UPID>
    <Name>Mystica</Name>
    <Type>Content</Type>
    <NKSEnabled minVersion="1.1.0.0">true</NKSEnabled>
    <Relevance maxVersion="1.0.9.0" minVersion="1.0.0.0">
      <Application minVersion="5.0.0.0" nativeContent="true">kontakt</Application>
    </Relevance>
    <Relevance minVersion="1.1.0.0">
      <Application minVersion="5.8.0.0" nativeContent="true">kontakt</Application>
      <Application minVersion="2.6.5.0">maschine</Application>
      <Application minVersion="1.8.2.0">kkontrol</Application>
    </Relevance>
    <PoweredBy>Kontakt</PoweredBy>
    <Visibility maxVersion="1.0.9.0" minVersion="1.0.0.0" target="Standalone">1</Visibility>
    <Company>Best Service</Company>
    <AuthSystem>RAS2</AuthSystem>
    <SNPID>547</SNPID>
    <RegKey>Best Service - Mystica</RegKey>
    <Icon>bestservice</Icon>
    <ProductSpecific>
      <HU>496B8CF4F8B1402C4A6650214DF2514C</HU>
      <JDX>C9A2B6D9549FD159D8A3CFF054AAE934C2AC849EC74827847288DF07577A8F22</JDX>
      <Visibility type="Number">3</Visibility>
    </ProductSpecific>
  </Product>

</ProductHints>

I've tried

Computer:~ user$ cd /Library/Application\ Support/Native\ Instruments/Service\ Center 
Computer:Service Center

        while read -r Name SNPID RegKey
do
    echo "+++++++++++++++++++"
    echo "Name:  ${Name}"
    echo "SNPID: ${SNPID}"
    echo "RegKey: ${RegKey}"
awk -F '[<>]' '{if (FNR%2==1) {printf "%s: ",$3} else {print $3}}'
done

but the results are inconsistent (I know bash isn't great for parsing XML but the requirements are pretty basic).

find -name "*.xml" | xargs cat | tr -d "\n" | sed 's/<\/Name>/\n/g' | sed 's/<\/SNPID>/\n/g' | sed 's/<\/RegKey>/: /g' | sed 's/<[^>]*>//g' | egrep "Name:|SNPID:|RegKey:" | sed 's/Name: /---\nName: /g'

also results in error: find: illegal option -- n

Any BASH wizards out there that can help me please? (If possible I would prefer to use terminal without installing other things) Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

兮颜 2025-01-31 09:26:23

这可能对您有用（gnu sed）：

sed -sE '/^\s*<(Name|SNPID|PoweredBy|RegKey)>(.*)<\/\1>\s*$/{s//\1:\2/;H}
         $!d;g;/PoweredBy:Kontakt/!d
         s/Name:([^\n]*)(.*)/\2\n\1/
         s/SNPID:([^\n]*)(.*)/\2|\1/
         s/RegKey:([^\n]*)(.*)/\2|\1/
         s/.*\n//' file ...

使用选项-s分别处理每个输入文件，然后使用-e来简化Regexps。

使用交替，将所需字段提取到保持缓冲区中。

在文件的末尾，如果驱动字段不包含kontakt无需进一步处理。

否则，将保留缓冲区中的字段格式化为所需的格式并打印结果。

This might work for you (GNU sed):

sed -sE '/^\s*<(Name|SNPID|PoweredBy|RegKey)>(.*)<\/\1>\s*$/{s//\1:\2/;H}
         $!d;g;/PoweredBy:Kontakt/!d
         s/Name:([^\n]*)(.*)/\2\n\1/
         s/SNPID:([^\n]*)(.*)/\2|\1/
         s/RegKey:([^\n]*)(.*)/\2|\1/
         s/.*\n//' file ...

Use the options -s to treat each input file separately and -E to simplify regexps.

Using alternation, extract the required fields into the hold buffer.

At the end of the file if the PoweredBy field does not contain Kontakt no further processing is required.

Otherwise, format the fields in the hold buffer into the required format and print the result.

回复收藏 0 原文

初懵 2025-01-31 09:26:23

强烈推荐：不要尝试用适当的XML解析器来解析XML。为此，您可以使用XMLSTARLET [编辑以反映 @reino的评论：

xml sel -T -t -m "//ProductHints//Product[PoweredBy="Kontakt"]" -v //SNPID/. --nl -v //Name/. --nl -v //RegKey/. --nl your_file.xml

或者，更简单，Xidel：

xidel your_file.xml -e '//ProductHints//Product[PoweredBy="Kontakt"]//(Name,SNPID,RegKey)'

输出（基于您的问题中的示例XML）应该是：

547
Mystica
Best Service - Mystica

Highly recommeded: Don't try to parse xml with anything other than a proper xml parser. For that you can use either xmlstarlet [edited to reflect @Reino's comment below:

xml sel -T -t -m "//ProductHints//Product[PoweredBy="Kontakt"]" -v //SNPID/. --nl -v //Name/. --nl -v //RegKey/. --nl your_file.xml

or, even simpler, xidel:

xidel your_file.xml -e '//ProductHints//Product[PoweredBy="Kontakt"]//(Name,SNPID,RegKey)'

Output (based on the sample xml in your question) should be:

547
Mystica
Best Service - Mystica

回复收藏 0 原文

晨光如昨 2025-01-31 09:26:23

1。使用`查找所需的文件查找`和`grep`

 grep -l "PoweredBy>Kontakt" $(find . -name "*.xml")

验证您获取正确的文件列表。

2.提取数据，并使用`awk`从必需的文件中制定数据

awk 脚本，仅处理所需的文件。

 awk -F"[><]" '
   $2 == "Name" {ret = $3 " | "}
   $2 == "SNPID" {ret = ret $3 " | "}
   $2 == "RegKey" {print ret $3}
 ' $(grep -l "PoweredBy>Kontakt" $(find . -name "*.xml"))

使用awk的替代方案，没有grep命令。

 awk -F"[><]" '
   $2 == "PoweredBy" && $3 != "Kontakt" {nextfile} 
   $2 == "Name" {ret = $3 " | "}
   $2 == "SNPID" {ret = ret $3 " | "}
   $2 == "RegKey" {print ret $3}
 ' $(find . -name "*.xml")

1. Locating required files using `find` and `grep`

 grep -l "PoweredBy>Kontakt" $(find . -name "*.xml")

Validate you get the correct file list.

2. Extracting data, and formating data from required files using `awk`

This awk script, process only the required files one by one.

 awk -F"[><]" '
   $2 == "Name" {ret = $3 " | "}
   $2 == "SNPID" {ret = ret $3 " | "}
   $2 == "RegKey" {print ret $3}
 ' $(grep -l "PoweredBy>Kontakt" $(find . -name "*.xml"))

An alternative using awk without grep command.

 awk -F"[><]" '
   $2 == "PoweredBy" && $3 != "Kontakt" {nextfile} 
   $2 == "Name" {ret = $3 " | "}
   $2 == "SNPID" {ret = ret $3 " | "}
   $2 == "RegKey" {print ret $3}
 ' $(find . -name "*.xml")

回复收藏 0 原文

~没有更多了~