After downloading the Metadata Extraction Tool, I discovered it captures the metadata a of particular object (file name, size, date, etc.); it does not look at the references inside that object and extract them.
The best solution I've found for scraping references from Word and Pdf files is cb2Bib.
I used it once to convert my old Word bibliography file to BibTeX. :)
The usual path of the word xml bibliography file is something like: C:\Documents and Settings\<username>\Application Data\Microsoft\Bibliography\Sources.xml
我在 Windows 中为 Excel 创建了一个 VBA 宏,通过交叉引用从纯文本引用中获取多个 DOI。 借助 DOI,您可以使用 R 获取所有 bibtex 格式的引文。 1. 下载带宏的excel文件 2. 将纯文本引文放入 A 列,稍微修改 F 和 H 列中的数字,以正确获得 K 列中的标题。 3. 按 Ctrl A,然后等待大约。每次引用 5 秒。 (为了再次使用,请用备份表恢复公式) 4. 找到已保存 DOI 的 .csv 文件,或从sheet2 再次手动保存 5. 使用类似的方法在 R 中查询您的 DOI
library(RefManageR)
setwd("/your/folder/") # set the folder where the .csv file is
list.files(getwd()) # be sure your .csv file is there
doi<-read.csv("dois.csv", header=FALSE) # pass the file info to the doi object
# get a new .bib file with formatted references
GetBibEntryWithDOI(unlist(doi), temp.file = ("mycitations.bib"), delete.file = FALSE)
# write a .csv
bib<-ReadBib("mycitations.bib")
dfbib<-as.data.frame(bib)
write.csv(dfbib,"table.csv")
I made a vba macro for excel in windows to get several DOIs from plain text citations via crossref. With the DOIs, you can get all bibtex formatted citations with R. 1. Download excel file with macro
2. Put your plain text citations in column A, modify slightly the numbers in columns F and H to get correctly the title in column K.
3. Press Ctrl A, and wait ca. 5 sec per citation.
(in order to use again, restore formulas with the backup sheet) 4. Locate your .csv file with DOIs saved, or save it again manually from sheet2
5. Use something like this to query your DOIs in R
library(RefManageR)
setwd("/your/folder/") # set the folder where the .csv file is
list.files(getwd()) # be sure your .csv file is there
doi<-read.csv("dois.csv", header=FALSE) # pass the file info to the doi object
# get a new .bib file with formatted references
GetBibEntryWithDOI(unlist(doi), temp.file = ("mycitations.bib"), delete.file = FALSE)
# write a .csv
bib<-ReadBib("mycitations.bib")
dfbib<-as.data.frame(bib)
write.csv(dfbib,"table.csv")
发布评论
评论(5)
你的选择太多了。谷歌搜索“参考元数据提取”并开始点击。
有免费软件可以从 PDF 中提取:请参阅元数据提取工具。
如果您有一个 Word 2007 文件,它(最终)具有引用列表条目的标准化表示形式,EndNote 可以从中可靠地提取内容。
如果您只想查看文章中的引用,RefRuns 是一个有用的工具,并且有一个简单的网络界面。
You are spoilt for choice. Google for "reference metadata extraction" and start clicking.
There's free software to extract from PDFs: see Metadata Extraction Tool.
If you have a Word 2007 file, that has (at last) a standardised representation of reflist entries, and EndNote can extract reliably from it.
If you just want to see the citations in an article, RefRuns is a useful tool, and has a simple web interface.
下载元数据提取工具后,我发现它捕获特定对象的元数据(文件名、大小、日期等);它不会查看该对象内部的引用并提取它们。
我发现从 Word 和 Pdf 文件中抓取引用的最佳解决方案是 cb2Bib。
After downloading the Metadata Extraction Tool, I discovered it captures the metadata a of particular object (file name, size, date, etc.); it does not look at the references inside that object and extract them.
The best solution I've found for scraping references from Word and Pdf files is cb2Bib.
您可以尝试WordToBibTeX。
我曾经用它来将旧的 Word 参考书目文件转换为 BibTeX。 :)
单词xml参考书目文件的通常路径是这样的:
C:\Documents and Settings\<用户名>\Application Data\Microsoft\Bibliography\Sources.xml
You may try WordToBibTeX.
I used it once to convert my old Word bibliography file to BibTeX. :)
The usual path of the word xml bibliography file is something like:
C:\Documents and Settings\<username>\Application Data\Microsoft\Bibliography\Sources.xml
这效果相当好: http://www.snowelm.com/~ t/doc/tips/makebib.en.html
This works fairly well: http://www.snowelm.com/~t/doc/tips/makebib.en.html
我在 Windows 中为 Excel 创建了一个 VBA 宏,通过交叉引用从纯文本引用中获取多个 DOI。
借助 DOI,您可以使用 R 获取所有 bibtex 格式的引文。
1. 下载带宏的excel文件
2. 将纯文本引文放入 A 列,稍微修改 F 和 H 列中的数字,以正确获得 K 列中的标题。
3. 按 Ctrl A,然后等待大约。每次引用 5 秒。
(为了再次使用,请用备份表恢复公式)
4. 找到已保存 DOI 的 .csv 文件,或从sheet2 再次手动保存
5. 使用类似的方法在 R 中查询您的 DOI
I made a vba macro for excel in windows to get several DOIs from plain text citations via crossref.
With the DOIs, you can get all bibtex formatted citations with R.
1. Download excel file with macro
2. Put your plain text citations in column A, modify slightly the numbers in columns F and H to get correctly the title in column K.
3. Press Ctrl A, and wait ca. 5 sec per citation.
(in order to use again, restore formulas with the backup sheet)
4. Locate your .csv file with DOIs saved, or save it again manually from sheet2
5. Use something like this to query your DOIs in R