仅在 XML 标记内替换;使用 bash 命令从 Referencer .reflib 导出为 bibtex 格式,文件名完整并删除 URL 编码
我在Referencer 中有很多参考文献。从 Referencer 导出时,我尝试在 bibtex 文件中包含文件名。由于该软件默认情况下不会执行此操作,因此我尝试在导出之前使用 sed 命令将文件名作为 bibtex 信息包含在 XML 文件中,从而包含文件名。
输入
<doc>
<filename>file:///home/dwickrama/Desktop/stevenJonesLab/papers/Transcription%20Factor%20Binding/A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</filename>
<relative_filename>A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</relative_filename>
<key>Sadowski93</key>
<notes></notes>
<bib_type>article</bib_type>
<bib_doi></bib_doi>
<bib_title>A common nuclear signal transduction pathway activated by growth factor and cytokine receptors.</bib_title>
<bib_authors>Sadowski, H B and Shuai, K and Darnell, J E and Gilman, M Z</bib_authors>
<bib_journal>Science</bib_journal>
<bib_volume>261</bib_volume>
<bib_number>5129</bib_number>
<bib_pages>1739-44</bib_pages>
<bib_year>1993</bib_year>
<bib_extra key="pmid">8397445</bib_extra>
输出
<doc>
<filename>file:///home/dwickrama/Desktop/stevenJonesLab/papers/Transcription%20Factor%20Binding/A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</filename>
<bib_extra key="File">article:../Transcription\ Factor\ Binding/A\ Common\ Nuclear\ Signal\ Transduction\ Pathway\ Activated\ by\ Growth\ Factor\ and\ Cytokine.pdf:pdf</bib_extra>
<relative_filename>A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</relative_filename>
<key>Sadowski93</key>
<notes></notes>
<bib_type>article</bib_type>
<bib_doi></bib_doi>
<bib_title>A common nuclear signal transduction pathway activated by growth factor and cytokine receptors.</bib_title>
<bib_authors>Sadowski, H B and Shuai, K and Darnell, J E and Gilman, M Z</bib_authors>
<bib_journal>Science</bib_journal>
<bib_volume>261</bib_volume>
<bib_number>5129</bib_number>
<bib_pages>1739-44</bib_pages>
<bib_year>1993</bib_year>
<bib_extra key="pmid">8397445</bib_extra>
我可以使用以下 sed 命令部分执行我想要的操作,但 URL 编码“%20”仍然保留。我如何仅在 bibtex 标签中摆脱它?
sed -e 's/\(\ \ \ \ <filename>file:\/\/\/home\/dwickrama\/Desktop\/stevenJonesLab\/papers\)\([^.]*\)\(\.\?\)\(.*\)\(<\/filename>\)/\1\2\3\4\5\n\ \ \ \ <bib_extra\ key=\"File\">article:\.\.\2\3\4:\4<\/bib_extra>/g' NewPapers.reflib > NewPapers.new.reflib
I have many references in Referencer. I'm trying to include filenames in my bibtex file when exporting from Referencer. Since the software doesn't do this by default I'm trying to use a sed command to include the filename as a bibtex information in the XML file before I export and thus include the filename.
Input
<doc>
<filename>file:///home/dwickrama/Desktop/stevenJonesLab/papers/Transcription%20Factor%20Binding/A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</filename>
<relative_filename>A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</relative_filename>
<key>Sadowski93</key>
<notes></notes>
<bib_type>article</bib_type>
<bib_doi></bib_doi>
<bib_title>A common nuclear signal transduction pathway activated by growth factor and cytokine receptors.</bib_title>
<bib_authors>Sadowski, H B and Shuai, K and Darnell, J E and Gilman, M Z</bib_authors>
<bib_journal>Science</bib_journal>
<bib_volume>261</bib_volume>
<bib_number>5129</bib_number>
<bib_pages>1739-44</bib_pages>
<bib_year>1993</bib_year>
<bib_extra key="pmid">8397445</bib_extra>
Ouput
<doc>
<filename>file:///home/dwickrama/Desktop/stevenJonesLab/papers/Transcription%20Factor%20Binding/A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</filename>
<bib_extra key="File">article:../Transcription\ Factor\ Binding/A\ Common\ Nuclear\ Signal\ Transduction\ Pathway\ Activated\ by\ Growth\ Factor\ and\ Cytokine.pdf:pdf</bib_extra>
<relative_filename>A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</relative_filename>
<key>Sadowski93</key>
<notes></notes>
<bib_type>article</bib_type>
<bib_doi></bib_doi>
<bib_title>A common nuclear signal transduction pathway activated by growth factor and cytokine receptors.</bib_title>
<bib_authors>Sadowski, H B and Shuai, K and Darnell, J E and Gilman, M Z</bib_authors>
<bib_journal>Science</bib_journal>
<bib_volume>261</bib_volume>
<bib_number>5129</bib_number>
<bib_pages>1739-44</bib_pages>
<bib_year>1993</bib_year>
<bib_extra key="pmid">8397445</bib_extra>
I can use the following sed command to partially do what I want, but the URL encoding "%20" remains. How do I get rid of that in only the bibtex tag ?
sed -e 's/\(\ \ \ \ <filename>file:\/\/\/home\/dwickrama\/Desktop\/stevenJonesLab\/papers\)\([^.]*\)\(\.\?\)\(.*\)\(<\/filename>\)/\1\2\3\4\5\n\ \ \ \ <bib_extra\ key=\"File\">article:\.\.\2\3\4:\4<\/bib_extra>/g' NewPapers.reflib > NewPapers.new.reflib
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Regex 和 sed 并不是处理 XML 或 URL 解码的好工具。
使用更完整的脚本语言的快速脚本将能够更清晰、更可靠地完成它。例如在Python中:(
我没有在给定的示例输出中包含一个函数来重现反斜杠转义,因为它不清楚到底是什么格式。最简单的方法是
filename= filename.replace(' ' , '\\ ')
,但我不确定这是否正确。)Regex and sed are not very good tools for processing XML, or URL-decoding.
A quick script in more complete scripting language would be able to do it more clearly and reliably. For example in Python:
(I haven't included a function to reproduce the backslash-escaping in the given example output as it's not clearly exactly what format that is. The simplest approach would be
filename= filename.replace(' ', '\\ ')
, but I'm not sure that would be correct.)您所需要的只是在 right 之后添加一行?所以搜索完后打印出来就可以了。
all you need is to add a line after right?? So just print it out after is searched.