如何让Stata在表格中报告零
我正在尝试使用 Stata 中的 tabulate 命令来创建频率的时间序列。当我尝试在运行每个日期后合并制表的输出时,问题就出现了。当相关变量的值不存在观察值时,tabulate
将不包含 0 作为条目。例如,如果我想在三年内统计一个班级中 10、11 和 12 岁的学生,如果仅其中一组有代表,那么 Stata 可能会输出 (8),因此我们不知道哪一组是 8 名学生学生属于:可以是 (0,8,0) 或 (0,0,8)。
如果时间序列很短,这不是问题,因为“结果”窗口显示了哪些类别被代表或没有被代表。我的数据有更长的时间序列。有谁知道强制 Stata 在这些表格中包含零的解决方案/方法?我的代码的相关部分如下:
# delimit;
set more off;
clear;
matrix drop _all;
set mem 1200m;
cd ;
global InputFile "/Users/.../1973-2010.dta";
global OutputFile "/Users/.../results.txt";
use $InputFile;
compress;
log using "/Users/.../log.txt", append;
gen yr_mn = ym(year(datadate), month(datadate));
la var yr_mn "Year-Month Date"
xtset, clear;
xtset id datadate, monthly;
/*Converting the Ratings Scale to Numeric*/;
gen LT_num = .;
replace LT_num = 1 if splticrm=="AAA";
replace LT_num = 2 if (splticrm=="AA"||splticrm=="AA+"||splticrm=="AA-");
replace LT_num = 3 if (splticrm=="A"||splticrm=="A+"||splticrm=="A-");
replace LT_num = 4 if (splticrm=="BBB"||splticrm=="BBB+"||splticrm=="BBB-");
replace LT_num = 5 if (splticrm=="BB"||splticrm=="BB+"||splticrm=="BB-");
replace LT_num = 6 if (splticrm=="B"||splticrm=="B+"||splticrm=="B-");
replace LT_num = 7 if (splticrm=="CCC"||splticrm=="CCC+"||splticrm=="CCC-");
replace LT_num = 8 if (splticrm=="CC");
replace LT_num = 9 if (splticrm=="SD");
replace LT_num = 10 if (splticrm=="D");
summarize(yr_mn);
local start = r(min);
local finish = r(max);
forv x = `start'/`finish' {;
qui tab LT_num if yr_mn == `x', matcell(freq_`x');
};
log close;
I'm trying to use the tabulate
command in Stata to create a time series of frequencies. The problem arises when I try to combine the output of tabulate
after running through each date. tabulate
will not include 0 as an entry when no observation exists for a value of the variable in question. For instance, if I wanted to count the 10, 11 and 12 year olds in a class over a three-year period Stata might output (8) if only one of the groups were represented and thus we don't know which group the 8 students belonged to: it could be (0,8,0) or (0,0,8).
This is not a problem if the time series is short as the "Results" window shows which categories are or are not represented. I have a much longer time series to my data. Does anyone know of a solution/method that forces Stata to include zeroes in these tabulations? The relevant parts of my code follows:
# delimit;
set more off;
clear;
matrix drop _all;
set mem 1200m;
cd ;
global InputFile "/Users/.../1973-2010.dta";
global OutputFile "/Users/.../results.txt";
use $InputFile;
compress;
log using "/Users/.../log.txt", append;
gen yr_mn = ym(year(datadate), month(datadate));
la var yr_mn "Year-Month Date"
xtset, clear;
xtset id datadate, monthly;
/*Converting the Ratings Scale to Numeric*/;
gen LT_num = .;
replace LT_num = 1 if splticrm=="AAA";
replace LT_num = 2 if (splticrm=="AA"||splticrm=="AA+"||splticrm=="AA-");
replace LT_num = 3 if (splticrm=="A"||splticrm=="A+"||splticrm=="A-");
replace LT_num = 4 if (splticrm=="BBB"||splticrm=="BBB+"||splticrm=="BBB-");
replace LT_num = 5 if (splticrm=="BB"||splticrm=="BB+"||splticrm=="BB-");
replace LT_num = 6 if (splticrm=="B"||splticrm=="B+"||splticrm=="B-");
replace LT_num = 7 if (splticrm=="CCC"||splticrm=="CCC+"||splticrm=="CCC-");
replace LT_num = 8 if (splticrm=="CC");
replace LT_num = 9 if (splticrm=="SD");
replace LT_num = 10 if (splticrm=="D");
summarize(yr_mn);
local start = r(min);
local finish = r(max);
forv x = `start'/`finish' {;
qui tab LT_num if yr_mn == `x', matcell(freq_`x');
};
log close;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您想要的不是
tab
命令的选项。如果你想将结果显示到屏幕上,你也许可以成功使用table ...,missing
。除了循环之外,您可以尝试以下操作,我认为这将适合您的目的:
我认为这就是您想要的(但无法告诉您如何使用您尝试生成的矩阵)。
PS 我更喜欢使用
inlist
函数来执行以下操作:What you want is not an option with the
tab
command. If you want to display the results to the screen, you might be able to usetable ..., missing
successfully.Instead of the loop, you could try the following, which I think will work for your purposes:
I think that is what you're going after (but can't tell how you use the matrices you are trying to generate).
P.S. I prefer to use the
inlist
function for things like:此问题已通过
tabcount
解决。请参阅 2003 年论文http://www.stata-journal.com/article.html ?article=pr0011
,通过
search tabcount
获取链接后下载程序代码和帮助文件。This problem is addressed by
tabcount
. See the 2003 paperhttp://www.stata-journal.com/article.html?article=pr0011
and download the program code and help files after getting a link by
search tabcount
.这是我使用的解决方案。 Keith 的可能更好,我将来会探索他的解决方案。
我将行标签(使用 matrow)保存在向量中,并将其用作初始化为零的正确维度矩阵的索引。这样我就可以将每个频率放入矩阵的正确位置,并保留所有零。解法遵循上述“local finish=r(max)”之后的代码。 [请注意,我包含一个计数器来消除该变量为空的第一个观察值。]
This is the solution that I used. Keith's is probably better, and I will explore his solution in the future.
I saved the row labels (using matrow) in a vector and used it as an index for a matrix of the correct dimensions initialized to zero. That way I could place each frequency into the matrix at the correct place, and keep all of the zeros. The solution follows the above code after "local finish=r(max)". [note that I include a counter to eliminate the first observations which are empty for this variable.]