从 SAS 到 SQL Server 的多个缺失类型
我有一个 SAS 数据集,我正在将其转换为 sql server 2005。在 sas 数据集中,有一些允许多个缺失值的数值变量。
例如,列 AGE 指定为
0-124 = 数字
。 = 失踪
.A = 无效
.B = 不可用
现在我想使用一个数字作为 sql server 数据类型并且需要维护数据的完整性。我的第一个想法是为 sas 的实际值创建一个参考表:
-1 = .A = Invalid
-2 = .B = 不可用
这看起来很混乱,因为每个年龄 1-124 都需要输入年龄。好奇是否有其他人遇到过这个问题并有更好的解决方案。
I have a SAS dataset that I am converting to sql server 2005. In the sas dataset there are numeric variables that allow for multiple missing values.
For example column AGE is specified as
0-124 = number
. = missing
.A = Invalid
.B = Unavailable
Now I want to use a number for the sql server datatype and need to maintain the integrity of the data. My first thought is to create a reference table for the actual value from sas:
-1 = .A = Invalid
-2 = .B = Unavailable
This seems messy since age would require entry for each age 1-124. Curious if anyone else has run across this and has a better solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我想大多数时候你不会关心为什么值会丢失,只是它们丢失了 - 但是通过将其编码为负整数,你必须确保每次执行任何操作时都将其过滤掉与该变量。这是倒退的:您应该做一些特殊的事情来合并缺失的信息,而不是运行基本的数学和统计。
与其尝试在 SQL Server 中重新创建它没有的数据类型,为什么不使用它所拥有的数据类型呢?我可能只是将缺失值作为 NULL 发送到 SQL Server,然后在第二个变量中对缺失类型进行编码。当您确实需要知道数据丢失的原因时,它仍然存在;但是如果您不这样做,SQL Server 将使用其默认的处理方法来处理变量中的缺失值。
I imagine that most of the time you're not going to care about why values are missing, just that they are missing - but by encoding it as negative integers, you'll have to be sure to filter those out every time you do anything with that variable. That's backwards: you should do something special to incorporate the kind-of-missingness information, not to run basic math and statistics.
Rather than trying to recreate in SQL Server a data type that it doesn't have, why not work with what it's got? I would probably just send the missing values as NULL to SQL Server and then encode the type of missingness in a second variable. When you actually need to know why data are missing, it's still there; but when you don't, SQL Server will handle the missing values in your variable using its default methods for dealing with them.
您可以使用某种格式将缺失值重新编码为负值,同时保持非缺失值不变。
You could use a format to recode the missings to negatives while leaving the non-missings unchanged.