统计211

 找回密码
 立即注册

QQ登录

只需一步,快速开始

查看: 4523|回复: 1
打印 上一主题 下一主题

SAS编程高手必看的25个技巧

[复制链接]
跳转到指定楼层
1
发表于 2011-6-18 18:48:01 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
       S Users。文章和作者简介见这里。看着好玩,一一做些译注,对其中关于SAS软件质量管理和视图方面的建议,注释就详细些:
  1.After running a SAS program,immediately review the SAS log for notes,warnings,and error messages.Avoid turning off SAS System options that turn off SAS log notes,messages,and warnings.
  运行完SAS代码,立即查看日志文件。
  2.Turn on the SOURCE2 SAS System option to display included source code on the log.Best practice coding techniques should mandate inclusion and display of any and all information that is available during a SAS session.
  打开SOURCE2的SAS系统选项。这里有个讲究,一般提交SAS代码有两种方式,一是在SAS的编辑器里编写或者打开,然后执行,源代码在不在日志文件出现由系统选项SOURCE决定,SOURCE的默认值是1,二就是用如%include “test.sas”语句提交,这时源代码test.sas是否在日志里出现由系统选项SOURCE2决定,它的默认值是0。打开SOURCE2的SAS系统选项的方法是,打开Tools-Options-System,在Options-log and procedure output control-SAS log下,找到SOURCE2,把它的Value改成1。
  3.Considering procedures like PROC SQL and PROC REPORT for code simplification.Because multiple processes can be frequently accomplished in a single procedure step,I/O may be reduced.
  为了简化代码,考虑使用proc sql或者proc report。作者是一个SAS畅销书PROC SQL:Beyond the Basics Using SAS的作者。SAS是一个庞大的工具箱,有多种编程方式。选择应该是基于实用主义原则的,哪个好用用哪个。
  4.When a DATA step or PROC can do the same job,consider using procedures whenever possible.Procedures are tried-and-proven throughout the world's SAS installations,testing requirements is considerably less.
  如果data步和proc步能完成同一个任务,尽量用proc步。SAS内置的proc步是经过研发人员多次测试通过的,一般会比我们写的强健一些。
  5.Create user-defined format libraries to store formatted values in one place.User-defined format libraries have the added advantage of making programs easier to maintain since formatted data values are not hard coded.
  在同一个地方建库,存放所有自定义的格式化数据。
  6.Include RUN statements at the end of each DATA or PROC step (to separate step boundaries)to print benchmark statistics on the SAS log immediately following each step.
  在每一个data步和proc步之后加上run语句。
  7.Document programs and routines with comments.In addition to the value associated with explaining program logic,comments should provide important information about complex code and logic conditions in a program.This helps to document important program processes as well as minimizes the learning curve associated with program maintenance and enhancement for other users.
  养成代码加注释的习惯,尤其是在代码算法方面。
  8.Assign descriptive and meaningful variable names.Besides improving the readability of program code,it serves an important element in the form of documentation.
  养成良好的变量命名习惯。选用一目了然的名字,而不是如var1、var2。
  9.Construct program header information to serve as program documentation for all programs.The following example illustrates the type of information that can be added so others have a useful documented history.
  软件项目管理,好像只跟C++、Java有关,其实,在一个大的SAS开发或应用系统里,项目管理的实施同样重要。这里说的是代码归档的事(documentation)。前面第7条说的代码加注属于documentation的尝试之一,program header(文件注释头)是每份源代码的开头,写的对其他程序员或客户看的指导性的注释,我们常见的花盒子注释就是。一个叫A Programming Development Environment for SAS Programs的文档,可以参考这里。
  10.Simplify complex code and operations into smaller,more manageable parts.By splitting complex code into two or more programming statements,a program becomes easier to read as well as more maintainable.
  把复杂的代码分解成易于管理的小块(blocks)。
  11.Specify SAS data set names when invoking procedures to help improve documentation efforts as well as preventing an incorrect data set from being processed.
  在调用proc步时,指定要引用的数据,用proc print data=a;而不是proc print;。
  12.Utilize macros for redundant code and enable autocall processing by specifying the MAUTOSOURCE system option.
  用宏(macro)来管理你的代码。打开MAUTOSOURCE系统选项以便自动调用宏(这是默认的)。
  13.Create macro libraries to store common macro routines in one place.
  在同一个地方建库,存放所有类似的宏文件。
  14.Create permanent libraries containing information from daily,weekly,monthly,quarterly,and annual runs.The type of libraries consists of scripts,SAS programs,SAS logs,output lists,and documentation of instructions for others to follow.
  为每天、每周、每月、每季度和每年都要运行的例行代码,分别建立永久性的库,存放相关的信息。
  15.Create views based on user input to simplify and streamline redundant,complex and/or burdensome tasks.Consider creating views in a central view library to support maintenance and documentation requirements.
  视图是数据库里的概念。简单地说,比如,你用十行SQL代码完成了一次查询,这个查询结果是你老板(客户)感兴趣的。以后你老板想要看这个查询结果,你可以让他每次都写或运行这个十行代码,或者,你把这次查询的结果做成一个虚拟的表——说是虚拟的,因为数据库里并不真正存在这张表,存在的只是这十行SQL脚本;说是表,是因为运行这个脚本时表就会动态生成。这个虚拟的表,就叫做视图。现在你把中堆东西做成了一个视图,你老板想要这个结果,只需要写一句“视图,阿里巴巴”。视图有很多好处,这里是可以简化老板对数据的理解,同时简化他们的操作。
  16.Code for unknown data values.This will prevent unassigned or null data values from falling through logic conditions.
  对未知的数据值,用编码表示,如99999999之类。如果是空值,可能会引起一些预料不到的逻辑错误。
  17.Store informats,formats,and labels with the SAS data sets that use them.Informats,formats,and labels should be stored with important SAS data sets to minimize processing time.An important reason for using this technique is that many popular procedures use stored formats and labels as they produce output,eliminating the need to assign them in each individual step.This provides added incentives and value for programmers and users,especially since reporting requirements are usually time critical.
  把数据的描述性部分(descriptor portion)如informats,formats,and labels跟数据存储在一起。
  18.Construct conditions that would render data unusable and abort (or end)the program.This prevents unwanted or harmful data from being processed or written to a data set.
  写一段条件语句,使得数据不能够被随意改写。(这段我还要好好琢磨一下)
  19.Test program code using “complete”test data particularly if the data set is small or represents a random sample of a large data set.
  测试代码时,使用所有的测试数据,特别是这些测试数据本身很小,或者它们就是总体数据的一个很好的样本。
  20.Set OBS=0 to test syntax and compile time errors without the risk of executing any observations through a DATA or PROC step.
  测试代码的语法错误时,使用obs=0这个选项。
  21.Use the PROC SQL VALIDATE clause to test syntax and compile time errors in PROC SQL code.
  PROC SQL时,用VALIDATE这个选项来指示语法错误。
  22.Specify the NOREPLACE system option to prevent permanent SAS data sets from accidentally being overwritten while writing or testing a program.
  为了防止永久逻辑库里的数据文件被改写或者覆盖,使用NOREPLACE这个系统选项。具体做法是把系统选项REPLACE的值改为0。做练习时这个选项要慎用。
  23.Take advantage of procedures that summarize large amounts of data by saving and using the results in order to avoid reading a large data set again.
  为了避免重复读入大型的数据,利用一些proc步,它们可以通过保存一些处理过的结果来概况数据。
  24.Add options that are frequently used into the SAS configuration file.This eliminates the time and keystrokes necessary to enter them during a SAS session.
  在SAS的配置文件里添加一些你常用的选项。SAS的配置文件就是sas root下的那个SASV9.CFG,以SAS 9为例。这里你可以找到更多相关的信息。
  25.Add statements that are frequently used into the SAS autoexec file.This eliminates the time and keystrokes necessary to enter them during a SAS session.
  在SAS的自动运行文件AUTOEXEC.SAS里加入一些个性化的语句。不熟就不要尝试了。
分享到:  QQ好友和群QQ好友和群 QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
收藏收藏1 转播转播 分享分享 分享淘帖 支持支持 反对反对
2
发表于 2012-8-21 17:32:00 | 只看该作者
高手,一直期望着
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则


免责声明|关于我们|小黑屋|联系我们|赞助我们|统计211 ( 闽ICP备09019626号  

GMT+8, 2025-4-8 22:35 , Processed in 0.077571 second(s), 22 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表