基于 SAS 中的 2 个变量合并数据集

发布于 2024-08-24 23:33:43 字数 767 浏览 8 评论 0原文

我正在使用不同的数据库。它们全部包含 1000 多家公司的信息。公司由其股票代码（通常在股票报价板上看到的名称的缩写（即福特为 F）定义）。

除了要合并的股票代码之外，我还必须按时间合并。我在整个时间序列中使用月份作为计数变量。的回归

最终目的是进行Y(jt) = c + X(jt) +X1(jt) 等与 j = company （股票代码）和t = 时间（月）。

想象一下我有两个数据库，其中一个是基本数据库，其中包含股票、月份、公司贝塔值（风险度量）等变量，第二个数据库包含额外变量（比如说市值）。

然后我想做的是根据股票代码和月份合并这两个数据库。

示例：基础数据库：

Ticker ____ Month ____ Betas

AA ____ 4 ____ 1.2

BB ____ 8 ____ 1.18

第二个数据库：

Ticker ____ Month ____ MCAP

AA ____ 4 ____ 8542

BB ____ 6 ____ 1245

然后合并后我希望得到这样的结果：

Ticker ____ Month ____ Betas ____ MCAP

AA ____ 4 ____ 1.2 ____ 8542

因此，所有与日期和股票代码都不匹配的观察结果都必须被删除。我确信这是可能的，只是找不到正确类型的代码。

PS：我猜下划线与字体布局有关，但粗体和斜体都应该是正常的:)

原文

I'm working with different databases. All of them contain information about 1000+ companies. A company is defined by its ticker code (the short version of the name (i.e. Ford as F) usually seen on stock quotation boards).

Aside from the ticker code to merge on I also have to merge on the time. I used month as a count variable throughout my time series. The final purpose is to have a regression in the kind of

Y(jt) = c + X(jt) +X1(jt) etc with j = company (ticker) and t = time (month).

So imagine I have 2 databases, one of which is the base database with variables such as Tickers, months, betas of a company (risk measure) etc. and a second database which has an extra variable (let's say market capitalisation).

What I want to do then is to merge these 2 databases based on the ticker and the month.

Example: Base database:

Ticker ____ Month ____ Betas

AA ____ 4 ____ 1.2

BB ____ 8 ____ 1.18

Second database:

Ticker ____ Month ____ MCAP

AA ____ 4 ____ 8542

BB ____ 6 ____ 1245

Then after merge I would like to have something like this:

Ticker ____ Month ____ Betas ____ MCAP

AA ____ 4 ____ 1.2 ____ 8542

So all observations that do not match BOTH the date and ticker have to be dropped. I'm sure this is possible, just can't find the right type of code.

PS: I'm guessing the underscores have something to do with font layout but both the bold as italic is supposed to be normal :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清旖 2024-08-31 23:33:43

同意乔纳森的观点...在按代码和时间对两个数据集进行独立排序后，合并的数据步骤就是我将使用的......稍加修改

data want; 
   merge base(in = b) mcap(in = m); 
   by ticker time;
   if m & b; 
run;

两个数据集中没有共同代码和时间的记录将被自动删除..

Agree with Jonathan... after sorting both datasets independently by ticker and time, the data step of merging is what I would use..... little modification

data want; 
   merge base(in = b) mcap(in = m); 
   by ticker time;
   if m & b; 
run;

Records that don't have common ticker and time in both datasets would be dropped automatically..

回复收藏 0 原文

被翻牌 2024-08-31 23:33:43

调用两个数据集 base 和 mcap，并假设它们都已按代码和月份排序，您可以这样做：

data want;
  merge base(in = b)
        mcap(in = m);
  if m & b;
run;

子集 if 不会接受任何与 Bath 数据集中不匹配的行。

Calling the two datasets base and mcap, and assuming that they have both been sorted by ticker and month, you can do it this way:

data want;
  merge base(in = b)
        mcap(in = m);
  if m & b;
run;

The subsetting if will not accept any row that does not match in bath datasets.

回复收藏 0 原文

心房的律动 2024-08-31 23:33:43

好吧，看来你可以很容易地做到这一点：

proc sort data=work;
by ticker month;
run;
proc sort data=wsize;
by ticker month;
run;
data test;
merge work(in=a) wsize(in=b);
by ticker month;
frommerg=a;
fromwtvol=b;
run;
data test;
set test;
if frommerg=0 then delete;
run;
data test;
set test;
if fromwtvol = 0 then delete;
run;
data test;
set test;
drop frommerg fromwtvol;
run;

这是我使用的代码，我在发布之前尝试过这个，因为我不想看起来像个水蛭，但碰巧我尝试的两个数据库没有任何共同点（ 70.000 个观察值的几率是多少 :D)，我重试了它并且它有效（现在！）

无论如何谢谢！

Ok so it appears you can just do it very easily by:

proc sort data=work;
by ticker month;
run;
proc sort data=wsize;
by ticker month;
run;
data test;
merge work(in=a) wsize(in=b);
by ticker month;
frommerg=a;
fromwtvol=b;
run;
data test;
set test;
if frommerg=0 then delete;
run;
data test;
set test;
if fromwtvol = 0 then delete;
run;
data test;
set test;
drop frommerg fromwtvol;
run;

That's the code I used, I tried this before posting because I didn't want to look like a leecher but it so happens that the 2 databases i tried had nothing in common (what are the odds with 70.000 observations :D), I retried it and it works (for now!)

Thanks anyway!

回复收藏 0 原文

爱冒险 2024-08-31 23:33:43

proc sort data=database1;
by ticker month;
run;

proc sort data=database2;
by ticker month;
run;
data gh;
merge database1(in=a) database2(in=b);
by ticker month;
if a and b;
run;

proc sort data=database1;
by ticker month;
run;

proc sort data=database2;
by ticker month;
run;
data gh;
merge database1(in=a) database2(in=b);
by ticker month;
if a and b;
run;

回复收藏 0 原文

~没有更多了~