只读取选定的列
谁能告诉我如何仅读取下面每年数据的前 6 个月(7 列),例如使用 read.table()
?
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2009 -41 -27 -25 -31 -31 -39 -25 -15 -30 -27 -21 -25
2010 -41 -27 -25 -31 -31 -39 -25 -15 -30 -27 -21 -25
2011 -21 -27 -2 -6 -10 -32 -13 -12 -27 -30 -38 -29
Can anyone please tell me how to read only the first 6 months (7 columns) for each year of the data below, for example by using read.table()
?
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2009 -41 -27 -25 -31 -31 -39 -25 -15 -30 -27 -21 -25
2010 -41 -27 -25 -31 -31 -39 -25 -15 -30 -27 -21 -25
2011 -21 -27 -2 -6 -10 -32 -13 -12 -27 -30 -38 -29
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
假设数据位于文件
data.txt
中,您可以使用read.table()
的colClasses
参数来跳过列。这里前 7 列中的数据是"integer"
,我们将其余 6 列设置为"NULL"
表示应跳过它们。更改
"integer"< /code> 为可接受的类型之一,具体取决于数据的实际类型,如
?read.table
中详述。data.txt
看起来像这样:并且是通过使用
where
dat
is创建的如果事先不知道列数,则使用实用函数
count.fields 将读取文件并计算每行中的字段数。
Say the data are in file
data.txt
, you can use thecolClasses
argument ofread.table()
to skip columns. Here the data in the first 7 columns are"integer"
and we set the remaining 6 columns to"NULL"
indicating they should be skippedChange
"integer"
to one of the accepted types as detailed in?read.table
depending on the real type of data.data.txt
looks like this:and was created by using
where
dat
isIf the number of columns is not known beforehand, the utility function
count.fields
will read through the file and count the number of fields in each line.要从数据集中读取一组特定的列,还有其他几种选择:
1) 使用
data.table
包中的fread
:您可以使用
data.table
包中fread
的select
参数指定所需的列。您可以使用列名称或列号向量指定列。对于示例数据集:
或者,您可以使用
drop
参数来指示不应读取哪些列:所有结果为:
更新:当您不希望
fread
时返回一个data.table,使用data.table = FALSE
参数,例如:fread("data.txt", select = c(1:7 ), data.table = FALSE)
2) 使用
sqldf
-package 中的read.csv.sql
:另一种选择是
sqldf
包中的read.csv.sql
函数:3) 使用
readr 中的
-package:read_*
函数文档中对
col_types
使用的字符的解释:To read a specific set of columns from a dataset you, there are several other options:
1) With
fread
from thedata.table
-package:You can specify the desired columns with the
select
parameter fromfread
from thedata.table
package. You can specify the columns with a vector of column names or column numbers.For the example dataset:
Alternatively, you can use the
drop
parameter to indicate which columns should not be read:All result in:
UPDATE: When you don't want
fread
to return a data.table, use thedata.table = FALSE
-parameter, e.g.:fread("data.txt", select = c(1:7), data.table = FALSE)
2) With
read.csv.sql
from thesqldf
-package:Another alternative is the
read.csv.sql
function from thesqldf
package:3) With the
read_*
-functions from thereadr
-package:From the documentation an explanation for the used characters with
col_types
:您还可以使用 JDBC 来实现此目的。让我们创建一个示例 csv 文件。
从以下链接下载并保存 CSV JDBC 驱动程序: http://sourceforge.net/projects /csvjdbc/files/latest/download
You could also use JDBC to achieve this. Let's create a sample csv file.
Download and save the the CSV JDBC driver from this link: http://sourceforge.net/projects/csvjdbc/files/latest/download
vroom 包提供了一种“tidy”方法在导入过程中按名称选择/删除列。文档:https://www.tidyverse。 org/blog/2019/05/vroom-1-0-0/#column-selection
列选择 (col_select)
vroom 参数 'col_select' 使选择要保留的列(或省略)更直接。 col_select 的接口与 dplyr::select() 相同。
Select columns by name
Drop columns by name
Or rename columns by name
The vroom package provides a 'tidy' method of selecting / dropping columns by name during import. Docs: https://www.tidyverse.org/blog/2019/05/vroom-1-0-0/#column-selection
Column selection (col_select)
The vroom argument 'col_select' makes selecting columns to keep (or omit) more straightforward. The interface for col_select is the same as dplyr::select().
Select columns by name
Drop columns by name
Or rename columns by name
你这样做:
You do it like this: