从 SAS 到 SQL Server 的多个缺失类型

发布于 2024-10-14 11:12:40 字数 332 浏览 5 评论 0原文

我有一个 SAS 数据集,我正在将其转换为 sql server 2005。在 sas 数据集中,有一些允许多个缺失值的数值变量。

例如,列 AGE 指定为

0-124 = 数字
。 = 失踪
.A = 无效
.B = 不可用

现在我想使用一个数字作为 sql server 数据类型并且需要维护数据的完整性。我的第一个想法是为 sas 的实际值创建一个参考表:

-1 = .A = Invalid
-2 = .B = 不可用

这看起来很混乱,因为每个年龄 1-124 都需要输入年龄。好奇是否有其他人遇到过这个问题并有更好的解决方案。

I have a SAS dataset that I am converting to sql server 2005. In the sas dataset there are numeric variables that allow for multiple missing values.

For example column AGE is specified as

0-124 = number
. = missing
.A = Invalid
.B = Unavailable

Now I want to use a number for the sql server datatype and need to maintain the integrity of the data. My first thought is to create a reference table for the actual value from sas:

-1 = .A = Invalid
-2 = .B = Unavailable

This seems messy since age would require entry for each age 1-124. Curious if anyone else has run across this and has a better solution.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清君侧 2024-10-21 11:12:40

我想大多数时候你不会关心为什么值会丢失,只是它们丢失了 - 但是通过将其编码为负整数,你必须确保每次执行任何操作时都将其过滤掉与该变量。这是倒退的:您应该做一些特殊的事情来合并缺失的信息,而不是运行基本的数学和统计。

与其尝试在 SQL Server 中重新创建它没有的数据类型,为什么不使用它所拥有的数据类型呢?我可能只是将缺失值作为 NULL 发送到 SQL Server,然后在第二个变量中对缺失类型进行编码。当您确实需要知道数据丢失的原因时,它仍然存在;但是如果您不这样做,SQL Server 将使用其默认的处理方法来处理变量中的缺失值。

I imagine that most of the time you're not going to care about why values are missing, just that they are missing - but by encoding it as negative integers, you'll have to be sure to filter those out every time you do anything with that variable. That's backwards: you should do something special to incorporate the kind-of-missingness information, not to run basic math and statistics.

Rather than trying to recreate in SQL Server a data type that it doesn't have, why not work with what it's got? I would probably just send the missing values as NULL to SQL Server and then encode the type of missingness in a second variable. When you actually need to know why data are missing, it's still there; but when you don't, SQL Server will handle the missing values in your variable using its default methods for dealing with them.

守不住的情 2024-10-21 11:12:40

您可以使用某种格式将缺失值重新编码为负值,同时保持非缺失值不变。

proc format;
  value chgmiss
    .=.
    .a=-1
    .b=-2
    .c=-3
;
run;

data a;
input original;
new=put(original, chgmiss3.);
put original=  new=;
datalines;
.
9
102
3
2
15
90
.a
.b
.
78
;
run;

You could use a format to recode the missings to negatives while leaving the non-missings unchanged.

proc format;
  value chgmiss
    .=.
    .a=-1
    .b=-2
    .c=-3
;
run;

data a;
input original;
new=put(original, chgmiss3.);
put original=  new=;
datalines;
.
9
102
3
2
15
90
.a
.b
.
78
;
run;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文