|
SAS常见问题解答
Q:行X列表中的无序变量如何用循环语句输入? 如
血型 A B O AB
急性 58 49 59 18
慢性 43 27 33 8
A:其实和有序变量一样的录入,大致的结构如下:
for a=1 to 4
for b=1 to 2
input x@@
next
next
…
我没有按SAS的语法写,但大家应当可以明白我的意思。
返回
Q:如何读入其它格式的数据文件,如DBF文件?
A:SAS for Windows的版本提供了菜单导航方式:让PGM视窗变为当前窗口,然后在FILE菜单中选择IMPORT命令,SAS IMPROT WIZARD界面就会出现,按照提示进行即可。对于老版本的SAS,可以用PROC DBF来引入dBASE III 文件。
返回
Q:在运行一段时间后,所提交的SAS程序只在LOG视窗中依次显示,并没有运行,也不出现NOTE或ERROR,这是怎么回事?
A:具体原因我也不太清楚,估计是SAS自己的问题(如果你用的是D版,这种问题会经常遇到。SAS的加密做的非常好,我曾经见到过一个D版,打开一次只能运行3个PROC,然后就完蛋了。但问题是我用的是正版,却偶尔也有该问题出现),解决办法是首先保存所做的工作,然后关闭SAS,再重新打开,继续工作。
返回
Q:为什么用SAS做的表格,其中的横线全部变成了“傻”字?
A:该问题有两种答案。1、问题出在配置文件CONFIG.SAS文件上,他在SAS的根目录内,你可以再里面找到表格设置语句,默认设置为 -FORMCHAR 們剠唶垑妺?=|-/\*(这是英文的表格线,在中文环境里就是乱码),请用/* */这一对注释符将其注释,然后找到他下面的/* -FORMCHAR |----|+|---+=|-/\* */,将两侧的注释符删除,以后表格线就正常了。2、这是SAS公司藐视我们华人的体现,如此大的一个软件公司面对如此多的中文用户,难道连这种小问题都没有发现?难道不能做一个汉化的SAS版本?有没有搞错!
返回
Q:如何在SAS中产生哑变量?
A:在SAS中使用哑变量必须用数据步建立新变量,建立方法主要用判断语句。可以直接用IF语句建立,但这样较麻烦,更简单的用法如下:
例:设“treat”有A、B、C三种取值(字符型),欲建立亚变量,数据库写法如下:
data sample;
input treat $ @@;
treata=(treat='A');
treatb=(treat='B');
cards;
A B A C C B A B A
;
这样就会建立TREATA和TREATB两个亚变量。
返回
Q:在SAS中秩和检验是如何实现的?
A:SAS中实现秩和检验的方法和我们平时所学的不太一样(作为国际一流的统计软件,有点脾气也是难免的),具体做法是将数据先根据所需要进行的统计分析编秩(成组、配伍等),得到每个测量值的秩次后,直接用秩次进行参数的统计分析。所以,有时候SAS秩和检验得出得结果和我们手算不太一样。究竟那种对?两种都有理,但最好以你老板的意见为准。
返回
Q:如何在SAS中拟合(1:r)配对病例对照研究的条件Logistic回归模型?
A:SAS是无法直接拟合上述模型的,但它提供了几种变通的方法来实现:
1.采用数据变换的方法来实现,下面这个例子就来自SAS Sample Library。实际上,可以直接在SAS的HELP里找到。请注意,他所采用的数据变换方法较难,请务必吃透这个例子。
/****************************************************************/
/* S A S S A M P L E L I B R A R Y */
/* */
/* NAME: LOGIOFF */
/* TITLE: Special Case of the Conditional Logistic Regression */
/* PRODUCT: STAT */
/* SYSTEM: ALL */
/* KEYS: logistic regression analysis, */
/* PROCS: LOGISTIC */
/* DATA: */
/* REF: Breslow N. (1982) "Covariance adjustment of */
/* relative-risk estimates in matched studies," */
/* Biometrics, 38, 661-683 */
/* MISC: */
/* */
/****************************************************************/
/*---------------------------------------------------------------
Example: 1:M Matching - A Special Case
Consider the special case that (i) there is only one case in
each matched set and the number of controls are the same for all
sets, and (ii) there is only a single binary covariable with
relative risk exp(beta). The Samsu consumption data in Breslow
(1982) contains 80 matched sets. Each matched set has one case
and four controls. The distribution according to Samsu exposure
is given in the following table:
Case Total Number (Case + Controls)
0 1 2 3 4 5
Exposed . 5 19 10 6 0
Not Exposed 10 15 8 7 0 .
Total 10 20 27 17 6 .
Let n_0m be the number of sets in which exactly m controls are
exposed and the case is not; and let n_1m be the number of sets
in which the case and m controls are exposed. The likelihood of
the conditional logistic regression model is proportional to the
product of the likelihood of the binomial distribution
B(N_m, theta_m), where
N_m = n_0m + n_1m
and
theta_m = m * exp(beta) / [m * exp(beta) + 5 - m]
Since logit(theta_m) = log( m/(5-m) + beta ), beta can be
estimated as a parameter in a logistic regression model with
no intercept and an offset value of m/(5-m).
---------------------------------------------------------------*/
data samsu;
input m r n;
samsu = 1;
off= log(m/(5-m));
cards;
1 5 20
2 19 27
3 10 17
4 6 6
;
run;
proc logistic data=samsu;
model r/n = samsu / offset=off noint;
run;
2.采用CATMOD过程来实现。
3.采用PHREG过程来实现,这种方法也适用于m:n的情况,可以在SAS/STAT手册的831页找到。建议采用这种方法,因为不需要变换数据格式,并且比较容易理解。下面是网友罗俊提供的示例:
资料是一次流行病学调查资料,比如要研究病例(case)与吸烟(smoke),饮酒( drink)等因素的关系,调查设计时用序号(xh) 来作为每组病例和对照的标识变量(xh相同的是一组),所用部分程序如下:
proc phreg ; model case=smoke drink;
strata xh; run;
PS:SPSS也只能做两分类和多分类Logistic回归模型,在这方面我们强烈推荐找一份Stata来用,该软件在Logistic模型上的功能极为强大!如果无法找到,可去下载egret的一月试用版(22M,网址可在软件介绍网页上找到),同样能完全满足需要。
返回
Q:请问如何在SAS 6.12中做岭脊回归?
A:实际上在SAS中REG过程就可以进行岭脊回归分析,而RSREG(二次响应面回归)过程则是专门进行岭脊回归的。在REG过程中用PROC REG后加RIDGE= values这个选项来实现;RSREG过程则有一个专门的RIDGE= values语句来进行定义,具体用法请参见有关参考书。
返回
Q:请问如何在SAS 6.12中做偏最小二乘回归分析?
A:这个问题太难了,我也不会,不过我在SAS的SAMPLE LIBRARY中找到了下面的这个例子,希望能有所帮助。
/****************************************************************/
/* S A S S A M P L E L I B R A R Y */
/* */
/* NAME: PLSEG1 */
/* TITLE: Partial Least Squares Analysis */
/* PRODUCT: STAT */
/* SYSTEM: ALL */
/* KEYS: PLS, regression analysis, chemometrics */
/* PROCS: PLS */
/* DATA: */
/* */
/* REF: */
/* MISC: */
/* */
/****************************************************************/
/*
/ A certain chemical process involves five different reactions. For
/ 20 different runs of the process, reaction time, temperature, and
/ pressure as well as chemical yield are observed for each substi-
/ tuent reaction. The following data step reads in the data.
/---------------------------------------------------------------------*/
data process;
input time1-time5 temp1-temp5 pres1-pres5 yield1-yield5;
cards;
6.4 3.6 4.0 6.2 4.4
7 11 48 19 7
.19 .48 .17 .09 .15 37.9 99.6 88.9 54.3 73.4
7.9 7.1 7.7 9.9 5.5
22 33 60 54 34
.46 .74 .51 .26 .43 62.9 159.8 130.5 88.3 106.3
3.2 1.1 1.6 2.5 2.1
5 12 25 0 6
.04 .21 .05 .00 .03 17.4 44.6 36.6 21.5 34.2
4.3 2.6 2.2 4.3 3.0
0 0 31 20 0
.12 .32 .06 .11 .08 25.9 67.0 57.8 38.4 54.0
5.5 7.4 8.7 9.3 3.7
39 62 43 62 63
.55 .66 .73 .30 .58 62.6 146.5 117.1 83.3 84.4
2.1 2.8 2.2 3.3 1.6
5 9 15 32 12
.17 .23 .15 .18 .15 18.7 50.8 36.3 25.7 36.5
6.2 6.8 8.7 9.2 4.1
39 65 49 49 61
.51 .67 .71 .22 .54 59.2 150.7 116.2 84.4 88.2
4.2 2.5 2.9 4.2 2.8
8 14 32 15 11
.14 .33 .15 .07 .12 26.6 67.9 60.5 40.6 51.3
7.4 5.8 6.9 8.7 5.0
23 36 57 36 32
.37 .65 .45 .16 .36 55.1 143.3 114.1 80.5 97.9
5.8 4.4 5.5 6.8 3.9
20 34 45 26 29
.28 .51 .36 .11 .28 41.2 110.1 88.0 56.9 74.3
7.1 7.3 8.7 9.9 4.8
34 54 55 55 52
.52 .73 .66 .26 .52 60.7 161.2 127.1 90.4 100.2
7.7 5.7 8.0 9.0 5.0
33 57 61 28 47
.40 .68 .57 .10 .42 61.7 147.7 120.0 82.9 96.7
4.4 4.3 4.5 5.9 3.1
14 22 33 36 23
.28 .43 .30 .19 .26 38.4 96.5 73.6 50.2 63.7
1.1 2.8 1.6 2.8 1.0
3 3 7 40 10
.18 .18 .12 .23 .14 16.8 38.9 34.8 21.2 23.7
7.6 6.6 7.1 9.5 5.3
20 30 58 48 30
.42 .70 .46 .23 .39 59.3 152.1 123.6 82.9 103.0
0.0 0.0 0.0 0.0 0.0
3 9 0 4 8
.00 .00 .00 .03 .00 0.0 0.0 0.0 0.0 0.0
2.9 1.7 2.1 2.9 2.0
7 14 22 10 11
.10 .23 .11 .05 .08 21.9 48.0 37.5 24.9 37.7
4.1 2.5 3.6 4.2 2.6
15 29 32 10 22
.16 .33 .23 .04 .16 23.3 68.6 57.7 39.1 48.4
2.4 3.0 3.0 3.8 1.7
12 20 18 31 22
.21 .27 .23 .16 .20 24.8 58.2 47.9 30.5 36.3
3.8 1.9 3.0 3.6 2.4
13 26 30 4 19
.11 .29 .18 .01 .12 23.6 62.8 50.1 33.2 47.6
;
/*
/ You can use the method of partial least squares to model the
/ yields as a function of all the reaction variables. The following
/ statements print a table which summarizes how much variation each
/ PLS component accounts for.
/---------------------------------------------------------------------*/
proc pls data=process;
model yield1-yield5 = time1-time5 temp1-temp5 pres1-pres5;
run;
/*
/ Notice that the percentage of variation in Y accounted for by the
/ PLS analysis doesn't change very much after the first few compo-
/ nents. You can use the CV=ONE option to select number of compo-
/ nents by cross-validation.
/---------------------------------------------------------------------*/
proc pls data=process cv=one;
model yield1-yield5 = time1-time5 temp1-temp5 pres1-pres5;
run;
/*
/ While three PLS components give the absolute minimum predicted
/ residual sum of squares of 0.76 for the cross-validation, this
/ isn't very different from the PRESS of 0.82 for only two PLS
/ components. You can use the CVTEST option to test for the
/ significance of this difference.
/---------------------------------------------------------------------*/
proc pls data=process cv=one cvtest;
model yield1-yield5 = time1-time5 temp1-temp5 pres1-pres5;
run;
/*
/ The result is that three PLS components don't explain signifi-
/ cantly more variation than just two.
/---------------------------------------------------------------------*/
本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u2/69783/showart_698417.html
|
|