需要帮助在散点图中绘制两个变量的计数，然后在 R 中拟合该线

发布于 2025-01-11 06:56:16 字数 682 浏览 0 评论 0原文

我需要帮助解决所有这些问题，但特别是绘制散点图并拟合线性回归模型。

过滤掉紧急访问次数较少的任何邮政编码超过 20
绘制流感样疾病和/或肺炎的计数就诊次数与所有急诊科就诊次数的比较
绘制图表最佳拟合线（线性回归）和 R
平方 some.zips 数据集，按邮政编码聚合 ED 访问的平均值。

这是我的代码，但它不起作用。我不断收到“abline(m) 警告：仅使用 135 个回归系数中的前两个”。有人可以帮忙吗？代码如下。另外，这是数据集：

fromJSON("https://data.cityofnewyork.us/resource/2nwg-uqyg.json")

library(jsonlite)
library(tidyverse)
library(ALSM)
data(package="ALSM")

filtered_data = filter(er, emergency.visits > 20)

plot(ili_pne_visits~total_ed_visits,data=filtered_data,xlab="Total ER Visits",ylab="Influenza Visits")

m <-lm(ili_pne_visits~total_ed_visits,data=filtered_data)

abline(m)

原文

I need help with all these questions, but specifically plotting the scatterplot and fitting the linear regression model.

Filter out any zip code where the number of emergency visits was less
than 20
Plot the Count of influenza-like illness and/or pneumonia
visits against Count of all emergency department visits
Plot the
line of best fit (linear regression) and the R-squared
From the
some.zips data set, aggregate the mean of ED visits by zip code.

Here is my code, but it is not working. I keep getting "Warning in abline(m) :
only using the first two of 135 regression coefficients". Can someone help? Code below.
Also, here is the dataset :

fromJSON("https://data.cityofnewyork.us/resource/2nwg-uqyg.json")

library(jsonlite)
library(tidyverse)
library(ALSM)
data(package="ALSM")

filtered_data = filter(er, emergency.visits > 20)

plot(ili_pne_visits~total_ed_visits,data=filtered_data,xlab="Total ER Visits",ylab="Influenza Visits")

m <-lm(ili_pne_visits~total_ed_visits,data=filtered_data)

abline(m)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无畏 2025-01-18 06:56:16

从代码角度来看，这可以完成这项工作：

df <- fromJSON("https://data.cityofnewyork.us/resource/2nwg-uqyg.json")
    
df %>%
    ## convert variables from character to numeric where appropriate:
    mutate(across(mod_zcta:ili_pne_admissions, ~ as.integer(.x))) %>%
    filter(total_ed_visits > 20) %>%
    ggplot(aes(x = total_ed_visits, y = ili_pne_admissions)) +
    geom_point() +
    ## add regression line and confidence band
    geom_smooth(method = 'lm')

但是，将数据不加区别地倒入一个散点图/线性模型中会隐藏有趣的模式 - 例如季节性。绘制 ili_pne 相对于时间的总访问量份额，瞧！

library(lubridate) ## for easy date-time-manipulation

df %>%
    ## convert variables from character to numeric where appropriate:
    mutate(
        across(mod_zcta:ili_pne_admissions, ~ as.integer(.x)),
        date = lubridate::as_datetime(date),
        ili_pne_share = ili_pne_visits / total_ed_visits
        ) %>% 
    filter(total_ed_visits > 20) %>%
    arrange(date) %>%
    ggplot(aes(x = date, y = ili_pne_share)) + 
    geom_line() +
    geom_smooth(span = .1)

code-wise, this will do the job:

df <- fromJSON("https://data.cityofnewyork.us/resource/2nwg-uqyg.json")
    
df %>%
    ## convert variables from character to numeric where appropriate:
    mutate(across(mod_zcta:ili_pne_admissions, ~ as.integer(.x))) %>%
    filter(total_ed_visits > 20) %>%
    ggplot(aes(x = total_ed_visits, y = ili_pne_admissions)) +
    geom_point() +
    ## add regression line and confidence band
    geom_smooth(method = 'lm')

However, pouring the data indiscriminately into one scatterplot/linear model hides interesting patterns - e.g. seasonality. Plotting the share of ili_pne to total visits against time, voila!

library(lubridate) ## for easy date-time-manipulation

df %>%
    ## convert variables from character to numeric where appropriate:
    mutate(
        across(mod_zcta:ili_pne_admissions, ~ as.integer(.x)),
        date = lubridate::as_datetime(date),
        ili_pne_share = ili_pne_visits / total_ed_visits
        ) %>% 
    filter(total_ed_visits > 20) %>%
    arrange(date) %>%
    ggplot(aes(x = date, y = ili_pne_share)) + 
    geom_line() +
    geom_smooth(span = .1)

回复收藏 0 原文

~没有更多了~