散布图实施后麻烦
我和教授提供的例子一起遇到了麻烦。我们打算按照提供的示例来了解代码以及实施方式,然后根据示例中涵盖的主题进行不同的作业。
我在示例上实现散点图时遇到了问题。该代码使用UCI机器学习存储库中的成人数据集并具有以下代码。
#install.packages("ggplot2")
library(ggplot2)
#import data
adult = read.csv("adult.DATA", header = FALSE, stringsAsFactors = TRUE)
summary(adult)
colnames(adult)
#remove similar columns and rename
adult_trim = adult[,-c(3,4,11,12)]
names(adult_trim) <- c("Age", "WorkClass", "Education", "Marital.Status", "Occupation", "Relationship", "Race",
"Sex", "Hours.per.Week", "Native.Country", "Income")
#remove empty values & Race/NativeCountry
adult_trim <- adult_trim[rowSums(adult_trim == "?") ==0, -c(7,10), drop = FALSE]
问题是在以下散点图中。数据没有任何列名称的标头值,因此它以V1,V2,...等的方式导入,
adult$V4 = as.factor(as.character(adult$V4))
levels(adult$V4)
plot(
jitter(as.numeric(adult$V4),0.5) ~ jitter(as.numeric(adult$V4), 0.5),
data = adult_trim,
xlab = "Income",
ylab = "Education",
pch = 19,
cex = 1,
bty = "n",
xlim = c(1:2),
col = rgb(180,0,180,30, maxColorValue = 255)
)
试图在我的计算机上实现此图时,它只是给我一个错误。
Warning message:
In plot.formula(jitter(as.numeric(adult$V4), 0.5) ~ jitter(as.numeric(adult$V4), :
c("the formula 'jitter(as.numeric(adult$V4), 0.5) ~ jitter(as.numeric(adult$V4), '
is treated as 'jitter(as.numeric(adult$V4), 0.5) ~ 1'", "the formula ' 0.5)'
is treated as 'jitter(as.numeric(adult$V4), 0.5) ~ 1'")
它应该看起来像该图,但是有教育 https://i.sstatic.net/epfhx.png < /a>,但我只是遇到错误。还有什么理由决定使用原始的“成人”而不是“成人”?
任何帮助或解释都将不胜感激。
I'm having trouble following along with an example provided by my professor. We're meant to follow along provided examples to understand the code and how the implementation goes and then do a different assignment based on topics covered in examples.
I'm having problems implementing a Scatter plot on the example. The code uses the Adult dataset from the UCI machine learning repository and has the following code.
#install.packages("ggplot2")
library(ggplot2)
#import data
adult = read.csv("adult.DATA", header = FALSE, stringsAsFactors = TRUE)
summary(adult)
colnames(adult)
#remove similar columns and rename
adult_trim = adult[,-c(3,4,11,12)]
names(adult_trim) <- c("Age", "WorkClass", "Education", "Marital.Status", "Occupation", "Relationship", "Race",
"Sex", "Hours.per.Week", "Native.Country", "Income")
#remove empty values & Race/NativeCountry
adult_trim <- adult_trim[rowSums(adult_trim == "?") ==0, -c(7,10), drop = FALSE]
The problem is in the following scatterplot. The data doesnt have any header values for column names so it imports as v1,v2,... etc.
adult$V4 = as.factor(as.character(adult$V4))
levels(adult$V4)
plot(
jitter(as.numeric(adult$V4),0.5) ~ jitter(as.numeric(adult$V4), 0.5),
data = adult_trim,
xlab = "Income",
ylab = "Education",
pch = 19,
cex = 1,
bty = "n",
xlim = c(1:2),
col = rgb(180,0,180,30, maxColorValue = 255)
)
When trying to implement this plot on my machine it just gives me an error.
Warning message:
In plot.formula(jitter(as.numeric(adult$V4), 0.5) ~ jitter(as.numeric(adult$V4), :
c("the formula 'jitter(as.numeric(adult$V4), 0.5) ~ jitter(as.numeric(adult$V4), '
is treated as 'jitter(as.numeric(adult$V4), 0.5) ~ 1'", "the formula ' 0.5)'
is treated as 'jitter(as.numeric(adult$V4), 0.5) ~ 1'")
its supposed to look like this graph but with education https://i.sstatic.net/EPfhX.png but I'm just getting the error. Also is there any reason this decides to use the original "adult" instead of "adult_trim" ?
Any help or explanation would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它使用原始
成人
而不是exult_trim
,因为在jitter
函数中,您明确指定成人$ V4
。您对成人
的使用将覆盖data = exult_trim
稍后参数。使用数据
参数,您应该只使用列名并依靠数据参数来指向plot
到正确的数据框架以查找以查找列。但是您还显示代码以替换
exults_trim
中的默认列名。运行该行后,然后
exult_trim
具有这些列名,并且它不记得有关v1,v2,v3,v4,v4,
等的任何内容(当您使用公式(使用
〜
)在plot()
中,您应该使用yvalues〜xvalues
。您有jitter(as.numeric(成人$ v4),0.5)
对x和y值都使用了错误的数据框架(覆盖data = data =
参数)和一个旧的列名称。相反,我会尝试的是,人们仍在教初学者基本地块而不是GGPLOT,这也太糟糕了。我真正建议的是
,最后,警告(您的编码显示)和错误(您说的是,但不要)之间存在重要差异。 。 警告表示您执行的代码,但可能存在问题,因此警告您要仔细检查。 error 表示无法执行您的代码 - 什么都没有更改,您需要在运行之前对其进行修复。
It uses the original
adult
instead ofadult_trim
because in thejitter
function you explicitly specifyadult$V4
. Your use ofadult
there overrides thedata = adult_trim
argument later on. With thedata
argument provided, you should just use the column name and rely on the data argument to pointplot
to the correct data frame to look in to find the column.But you also show code to replace the default column names in
adult_trim
. After you run the linethen
adult_trim
has those column names, and it doesn't remember anything aboutV1, V2, V3, V4,
etc.When you use a formula (with
~
) insideplot()
, you should useyvalues ~ xvalues
. You havewhich uses
jitter(as.numeric(adult$V4),0.5)
for both x and y values, uses the wrong data frame (overriding thedata =
argument), and an old column name. I would instead tryIt's also too bad that people are still teaching beginners base plots instead of ggplot. What I'd really recommend is
And lastly, there are important differences between Warnings (which you code shows) and Errors (which you say you have, but don't). A warning means your code executed, but there may have been problems, so it warns you to check carefully. An error means that your code could not be executed - nothing was changed, you need to fix it before it will run.