当前位置：文江博客话题详情

使用 R 下载 gzip 数据文件、提取和导入数据

发布于 2024-11-29 08:26:11 字数 470 浏览 0 评论 0原文

此问题的后续：如何使用 R 下载并解压缩 gzip 压缩文件？例如（来自 UCI 机器学习存储库），我有一个保险数据文件。如何使用 R 下载它？

以下是数据网址：http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

勿忘初心 2024-12-06 08:26:11

我喜欢 Ramnath 的方法，但我会像这样使用临时文件：

tmpdir <- tempdir()

url <- 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
file <- basename(url)
download.file(url, file)

untar(file, compressed = 'gzip', exdir = tmpdir )
list.files(tmpdir)

list.files() 应该生成类似这样的内容：

[1] "TicDataDescr.txt" "dictionary.txt"   "ticdata2000.txt"  "ticeval2000.txt"  "tictgts2000.txt"

如果您需要对大量文件自动执行此过程，您可以对其进行解析。

I like Ramnath's approach, but I would use temp files like so:

tmpdir <- tempdir()

url <- 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
file <- basename(url)
download.file(url, file)

untar(file, compressed = 'gzip', exdir = tmpdir )
list.files(tmpdir)

The list.files() should produce something like this:

[1] "TicDataDescr.txt" "dictionary.txt"   "ticdata2000.txt"  "ticeval2000.txt"  "tictgts2000.txt"

which you could parse if you needed to automate this process for a lot of files.

回复收藏 0 原文

梅倚清风 2024-12-06 08:26:11

这是一种快速的方法。

# create download directory and set it
.exdir = '~/Desktop/tmp'
dir.create(.exdir)
.file = file.path(.exdir, 'tic.tar.gz')

# download file
url = 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
download.file(url, .file)

# untar it
untar(.file, compressed = 'gzip', exdir = path.expand(.exdir))

Here is a quick way to do it.

# create download directory and set it
.exdir = '~/Desktop/tmp'
dir.create(.exdir)
.file = file.path(.exdir, 'tic.tar.gz')

# download file
url = 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
download.file(url, .file)

# untar it
untar(.file, compressed = 'gzip', exdir = path.expand(.exdir))

回复收藏 0 原文

听风吹 2024-12-06 08:26:11

请参阅 help(download.file) 的内容。如果相关文件只是一个 gzip 压缩但可读的文件，您也可以将完整的 URL 提供给 read.table() 等。

回复收藏 0 原文

对你的占有欲 2024-12-06 08:26:11

使用library(archive)还可以读取存档中的特定csv文件，而无需先解压它：read_csv(archive_read("http://archive.ics.uci.edu) /ml/databases/tic/tic.tar.gz", file = 1), col_types = cols())

这要快一些。

要解压缩所有内容，可以执行 archive_extract("http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz", dir=XXX)。

这对我和我来说非常有效。比未构建的 untar() 更快。它也适用于所有平台。它支持“tar”、“ZIP”、“7-zip”、“RAR”、“CAB”、“gzip”、“bzip2”、“compress”、“lzma”和“xz”格式。