将XML字段值打印到CSV
我有一个目录,其中包含XML文件的负载,这些文件的格式不是完全相同的,但是我在包含3个相同必需字段后的XML文件。我需要知道每个库的snpid,因为它们必须是唯一的。为此,我需要列出每个及其ID值。
首先,我只需要从“供电”字段中包含值=“ kontakt”的文件中获取结果。 然后,对于每个,我都需要名称,snpid和regkey才能打印成3列CSV。例如。 Mystica | 547 |最好的服务 - Mystica
XML文件是这样的:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<ProductHints spec="1.0.16">
<Product version="2">
<UPID>70026fd5-8f6f-429e-b891-12c2f94bc566</UPID>
<Name>Mystica</Name>
<Type>Content</Type>
<NKSEnabled minVersion="1.1.0.0">true</NKSEnabled>
<Relevance maxVersion="1.0.9.0" minVersion="1.0.0.0">
<Application minVersion="5.0.0.0" nativeContent="true">kontakt</Application>
</Relevance>
<Relevance minVersion="1.1.0.0">
<Application minVersion="5.8.0.0" nativeContent="true">kontakt</Application>
<Application minVersion="2.6.5.0">maschine</Application>
<Application minVersion="1.8.2.0">kkontrol</Application>
</Relevance>
<PoweredBy>Kontakt</PoweredBy>
<Visibility maxVersion="1.0.9.0" minVersion="1.0.0.0" target="Standalone">1</Visibility>
<Company>Best Service</Company>
<AuthSystem>RAS2</AuthSystem>
<SNPID>547</SNPID>
<RegKey>Best Service - Mystica</RegKey>
<Icon>bestservice</Icon>
<ProductSpecific>
<HU>496B8CF4F8B1402C4A6650214DF2514C</HU>
<JDX>C9A2B6D9549FD159D8A3CFF054AAE934C2AC849EC74827847288DF07577A8F22</JDX>
<Visibility type="Number">3</Visibility>
</ProductSpecific>
</Product>
</ProductHints>
我已经尝试过
Computer:~ user$ cd /Library/Application\ Support/Native\ Instruments/Service\ Center
Computer:Service Center
while read -r Name SNPID RegKey
do
echo "+++++++++++++++++++"
echo "Name: ${Name}"
echo "SNPID: ${SNPID}"
echo "RegKey: ${RegKey}"
awk -F '[<>]' '{if (FNR%2==1) {printf "%s: ",$3} else {print $3}}'
done
,但是结果不一致(我知道bash对于解析XML并不是很好,但要求非常基本)。
find -name "*.xml" | xargs cat | tr -d "\n" | sed 's/<\/Name>/\n/g' | sed 's/<\/SNPID>/\n/g' | sed 's/<\/RegKey>/: /g' | sed 's/<[^>]*>//g' | egrep "Name:|SNPID:|RegKey:" | sed 's/Name: /---\nName: /g'
还会导致错误:查找:非法选项 - N
任何狂欢向导可以帮助我吗? (如果可能的话,我希望在不安装其他内容的情况下使用终端)谢谢
I have a directory with loads of XML files that are not formatted in the exact same way but the ones I am after contain the 3 same required fields. I need to know the SNPID of each library as they must be unique. For this I need to list each and their ID value.
Firstly I need to only get results from files containing the value="Kontakt" in the "PoweredBy" field.
Then for each I need the Name, SNPID and RegKey to print into a 3 column CSV. eg. Mystica | 547 | Best Service - Mystica
The XML files are like this:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<ProductHints spec="1.0.16">
<Product version="2">
<UPID>70026fd5-8f6f-429e-b891-12c2f94bc566</UPID>
<Name>Mystica</Name>
<Type>Content</Type>
<NKSEnabled minVersion="1.1.0.0">true</NKSEnabled>
<Relevance maxVersion="1.0.9.0" minVersion="1.0.0.0">
<Application minVersion="5.0.0.0" nativeContent="true">kontakt</Application>
</Relevance>
<Relevance minVersion="1.1.0.0">
<Application minVersion="5.8.0.0" nativeContent="true">kontakt</Application>
<Application minVersion="2.6.5.0">maschine</Application>
<Application minVersion="1.8.2.0">kkontrol</Application>
</Relevance>
<PoweredBy>Kontakt</PoweredBy>
<Visibility maxVersion="1.0.9.0" minVersion="1.0.0.0" target="Standalone">1</Visibility>
<Company>Best Service</Company>
<AuthSystem>RAS2</AuthSystem>
<SNPID>547</SNPID>
<RegKey>Best Service - Mystica</RegKey>
<Icon>bestservice</Icon>
<ProductSpecific>
<HU>496B8CF4F8B1402C4A6650214DF2514C</HU>
<JDX>C9A2B6D9549FD159D8A3CFF054AAE934C2AC849EC74827847288DF07577A8F22</JDX>
<Visibility type="Number">3</Visibility>
</ProductSpecific>
</Product>
</ProductHints>
I've tried
Computer:~ user$ cd /Library/Application\ Support/Native\ Instruments/Service\ Center
Computer:Service Center
while read -r Name SNPID RegKey
do
echo "+++++++++++++++++++"
echo "Name: ${Name}"
echo "SNPID: ${SNPID}"
echo "RegKey: ${RegKey}"
awk -F '[<>]' '{if (FNR%2==1) {printf "%s: ",$3} else {print $3}}'
done
but the results are inconsistent (I know bash isn't great for parsing XML but the requirements are pretty basic).
find -name "*.xml" | xargs cat | tr -d "\n" | sed 's/<\/Name>/\n/g' | sed 's/<\/SNPID>/\n/g' | sed 's/<\/RegKey>/: /g' | sed 's/<[^>]*>//g' | egrep "Name:|SNPID:|RegKey:" | sed 's/Name: /---\nName: /g'
also results in error: find: illegal option -- n
Any BASH wizards out there that can help me please? (If possible I would prefer to use terminal without installing other things) Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这可能对您有用(gnu sed):
使用选项
-s
分别处理每个输入文件,然后使用-e
来简化Regexps。使用交替,将所需字段提取到保持缓冲区中。
在文件的末尾,如果
驱动
字段不包含kontakt
无需进一步处理。否则,将保留缓冲区中的字段格式化为所需的格式并打印结果。
This might work for you (GNU sed):
Use the options
-s
to treat each input file separately and-E
to simplify regexps.Using alternation, extract the required fields into the hold buffer.
At the end of the file if the
PoweredBy
field does not containKontakt
no further processing is required.Otherwise, format the fields in the hold buffer into the required format and print the result.
强烈推荐:不要尝试用适当的XML解析器来解析XML。为此,您可以使用XMLSTARLET [编辑以反映 @reino的评论:
或者,更简单,Xidel:
输出(基于您的问题中的示例XML)应该是:
Highly recommeded: Don't try to parse xml with anything other than a proper xml parser. For that you can use either xmlstarlet [edited to reflect @Reino's comment below:
or, even simpler, xidel:
Output (based on the sample xml in your question) should be:
1。使用
查找所需的文件查找
和grep
验证您获取正确的文件列表。
2.提取数据,并使用
awk
从必需的文件中制定数据awk 脚本,仅处理所需的文件。
使用
awk
的替代方案,没有grep
命令。1. Locating required files using
find
andgrep
Validate you get the correct file list.
2. Extracting data, and formating data from required files using
awk
This
awk
script, process only the required files one by one.An alternative using
awk
withoutgrep
command.