åã®
èšäºã§ãå€éšçããŒã¿ãèªã¿åãæ¹æ³ãåŠã³ãŸããã 仿¥ã¯ãæšæºã®SASããŒã¿ã»ããïŒSASããŒã¿ã»ããïŒãèªã¿åããããŒã¿ã¹ã©ã€ã¹ã®äœææ¹æ³ãæ°žç¶çãªå±æ§ã®æ§ææ¹æ³ãããã³ããã€ãã®äŸ¿å©ãªSAS颿°ãåŠç¿ããSETæŒç®åã«ã€ããŠèª¬æããŸãã ç¹°ãè¿ããŸãããã§ããã ãå€ãã®äŸã䜿çšããŠãå¯èœãªéãåçŽãªèšèªã§è³æã衚瀺ããããšããŸãã
ããŒã¿ããã£ã¬ã¯ããª
CïŒ\ workshop \ habrahabrã«EXCEL圢åŒã§ä¿åãããŠãããšããŸãã ã¹ãã¬ããã·ãŒããã€ã³ããŒãããããããã¹ã©ã€ã¹ãäœæããSAS颿°ã䜿çšããŠæ°ããèšç®åãäœæãããã®ããŒã¿ã»ããã2ã€ã«åå²ããŸãã
ã¹ãã¬ããã·ãŒããã€ã³ããŒãããŠãã£ã«ã¿ãŒãèšå®ãã
Excelãã¡ã€ã«ã¯äžèšã®ãã£ã¬ã¯ããªã«ä¿åãããæ¬¡ã®ããã«ãªããŸãã
ãã¡ã€ã«ã¹ããããïŒ
PROC IMPORTããã·ãŒãžã£ãé©çšããŠãã¹ãã¬ããã·ãŒããSASããŒã¿ã»ããã«å€æããŸãã
options validvarname=v7; proc import datafile="C:\workshop\habrahabr\company.xlsx" dbms=xlsx out=company replace; getnames=yes; run;
validvarname = V7ãªãã·ã§ã³ã¯ãSASã®èгç¹ããæ£ãããã£ãŒã«ãåãèšå®ããŸãããã¹ãŠã®ç¡å¹ãªæåãã¢ã³ããŒã¹ã³ã¢ã«çœ®ãæããŸãã 倿°ã®åœåèŠåã¯
ã¬ãã¹ã³1ã«ãããŸããå€éšãã¡ã€ã«ã®èªã¿åãæã«ãã£ã«ã¿ãŒãããã«èšå®ããŸããããšãã°ãå®äºæ¥ãéããªã芳枬ã®ã¿ãéžæããŸãã whereãã©ã¡ãŒã¿ãŒã®æ§æã«æ³šæããŠãã ããã
options validvarname=v7; proc import datafile="C:\workshop\habrahabr\company.xlsx" dbms=xlsx out=company (where=(End_Date not is missing)) replace; getnames=yes; run;
PROC IMPORTã¹ãããæŒç®åãè©³çŽ°ã«æ€èšããŠãã ããã
ããŒã¿ãã¡ã€ã«-å€éšãã¡ã€ã«ã®ãã«ãã¹ãšååãå®çŸ©ããŸã
Dbms-ã€ã³ããŒãããããŒã¿åãå®çŸ©ããŸãã
Out -1ã¬ãã«ãŸãã¯2ã¬ãã«ã®SASåïŒã©ã€ãã©ãªåãšããŒã¿ã»ããåïŒã§SASåºåããŒã¿ã»ãããèå¥ããŸãã
眮æ -æ¢åã®SASããŒã¿ã»ãããäžæžãããŸãã
Getnames -PROC IMPORTãå
¥åå€éšãã¡ã€ã«ã®æåã®è¡ã®ããŒã¿å€ããSAS倿°åãçæãããã©ããã瀺ããŸãã
PROC IMPORTã¹ããããå®è¡ããŠããã°ã調ã¹ãŸãã
çµæã®SASããŒã¿ã»ãããå°å·ããŸãã
proc print data=work.company noobs; run;
PROC PRINTããã·ãŒãžã£ã®åºåã¯æ¬¡ã®ãšããã§ãã
ãã©ã°ã¡ã³ãïŒ
ãŸããSAS UEã§ã¯ã[çµæ]ã¿ãã䜿çšããŠãã€ã³ããŒããããSASããŒã¿ã»ããã«æ
£ããããšãã§ããŸãã
SASããŒã¿ã»ããã®èªã¿åã
SASããŒã¿ã»ããã®èªã¿åãã¯ã
SETã¹ããŒãã¡ã³ãã䜿çšããŠDATAã¹ãããã§å®è£
ãããŸãã
SETã¹ããŒãã¡ã³ãã®äžè¬çãªæ§æãæ€èšããŠãã ããã
SET<SAS-data-set(s) <(data-set-options(s) )> > <options>
SETã¹ããŒãã¡ã³ãã§ããŒã¿ã»ãããæå®ããªãå ŽåãæåŸã«äœæãããSASããŒã¿ã»ãããã芳枬å€ãèªã¿åããŸãã
SETã¹ããŒãã¡ã³ãã§ã¯ãè€æ°ã®ããŒã¿ã»ãããæå®ã§ããŸã;ãã®å ŽåãSASããŒã¿ã»ããã¯1ã€äžã«è¿œå ãããŸãïŒSQLã®UNIONãšåæ§ïŒã
ãŸããDATAã¹ãããã§ã¯ã2ã€ã®SETã¹ããŒãã¡ã³ãã䜿çšã§ããŸãããã®å ŽåãããŒãã«ã¯å
±éã®åã§çµåãããŸãã ããšãã°ã
ãã®èšäºã® 2ã€ã®SETã¹ããŒãã¡ã³ãã«ã€ããŠè©³ããèªãããšãã§ããŸãã
SASããŒã¿ã»ããã®ã³ããŒãäœæããæãç°¡åãªã³ãŒãã¯æ¬¡ã®ãšããã§ãã
data company1; set company; run;
SASããŒã¿ã»ããèšè¿°åã®æ§æ
PROC CONTENTSããã·ãŒãžã£ã䜿çšããŠSASããŒã¿ã»ããèšè¿°åã調ã¹ãããšãã§ããŸãïŒ
ã¬ãã¹ã³2ãåç
§ ïŒã ãã®ãã¥ãŒããªã¢ã«ã§ã¯ã
PROC DATASETSããã·ãŒãžã£ã䜿çšããŠèšè¿°åã³ã³ããŒãã³ãã
åºåããŸãã
proc datasets library=work nolist; contents data=company order=varnum; quit;
çµæã®æçïŒ
Travel_Expensesããã³Budget倿°ã®å®æ°åœ¢åŒãèšå®ããŸãã
data company; set company; format Travel_Expenses Budget dollar10.2; run;
SASããŒã¿ã»ããã®å±æ§ã確èªããŸãã
proc datasets library=work nolist; contents data=company order=varnum; quit;
èšç®åã®äœæ
ãã¹ãŠã®SAS颿°ã¯ã
SAS 9.4颿°ããã³CALLã«ãŒãã³ïŒãªãã¡ã¬ã³ã¹ã第5çã§èª¿ã¹ãããšãã§ããŸãã
ããã«ãç¹å®ã®ã¿ã¹ã¯ãå®è¡ããããã®é©åãªæ©èœããªãå Žåã¯ã
PROC FCMPããã·ãŒãžã£ã䜿çšããŠç¬èªã®æ©èœãäœæã§ããŸãã
ãã®ã¬ãã¹ã³ã§ã¯ãYRDIFãSUMãããã³CATSã®3ã€ã®æ©èœã«ã€ããŠèª¬æããŸãã
幎
åäœã®æ¥ä»ã®å·®ãèšç®ããã«ã¯
ãYRDIF颿°ã䜿çš
ããŸã ã
SAS圢åŒã®æ¥ä»ã¯ã1960幎1æ1æ¥ããå§ãŸãæ¥æ°ã§ããããšãæãåºãããŠãã ããïŒ
ã¬ãã¹ã³1ãåç
§ ïŒã æç€ºãããããŒã¿ã§ãå®è¡æéãèšç®ããå¿
èŠããããŸãã
data company1; set work.company; Lead_Time=yrdif(Start_Date, End_Date, 'actual'); format Travel_Expenses Budget dollar10.2 Lead_Time 3.1; run;
Lead_Time倿°ã«3.1圢åŒã䜿çšããŠãããããã¬ããŒãã®èšç®å€ïŒïŒïŒã¯å°æ°ç¹ä»¥äž1æ¡ã«äžžããããŠããããšã«æ³šæããŠãã ããã ãã©ãŒãããæŒç®å
㯠ãSASããŒã¿ã»ããã®å€ã
倿ŽããŸãã ïŒ
çµæã®æçïŒ
次ã«ãæ
è²»ãªãã®äœæ¥ã³ã¹ããèšç®ããŸãã
data company1; set work.company; Lead_Time=yrdif(Start_Date, End_Date, 'actual'); Cost=Budget-Travel_Expenses; format Cost Travel_Expenses Budget dollar10.2 Lead_Time 3.1; run;
çµæã®æçïŒ
ã¿ã¹ã¯ã®äžéšãšããŠã颿°ã䜿çšããã«æ
è²»ãèæ
®ããã«äœæ¥ã³ã¹ããèšç®ããŸããã ããŒãã«ã«æ¬ æå€ã¯ãããŸããã倿°ïŒBudgetãŸãã¯Travel_ExpensesïŒã®ããããã«æ¬ æå€ãããå Žåãçµæã¯ãããã·ã§ã³ãã§ããã
äŸïŒ
ãã¹ãããŒã¿ã»ãããäœæããŸãã
data test; input Budget Travel_Expenses; datalines; 12345 233 . 345 12543 . ;
倿°Budget Travel_Expensesã®å·®ãèšç®ããŸã
data test; set test; value=Budget-Travel_Expenses; run;
ãã®ã¹ãããã®çµæïŒ
æ£ããçµæãååŸããã«ã¯ã
SUM颿°ã䜿çšã§ããŸãã
ãã®é¢æ°ã¯ã
èšè¿°çµ±èšé¢æ°ã®ã«ããŽãªãŒã«å±ããŸãã èšè¿°çµ±èšé¢æ°ã¯æ¬ æå€ã
ç¡èŠããŸãã
SUMãä»ããã³ãŒãã®èšè¿°ïŒ
data test; set test; value=sum(Budget,-Travel_Expenses); run;
ãã®å Žåãã¹ãããã®çµæã¯æ¬¡ã®ãšããã§ãã
3çªç®ã®èšç®åã¯ããããŒãžã£ãŒã®ã¡ãŒã«ã¢ãã¬ã¹ã§ãã åManager_FirstNameãManager_LastNameããã³å€
habr .comãããã¢ã»ã³ãã«ãã§ããŸãã
CATS颿°ã䜿çš
ããŠãããã¹ãå€ã1è¡ã«çµåã§ããŸãã
data company1; set work.company; Lead_Time=yrdif(Start_Date, End_Date, 'actual'); Cost=Budget-Travel_Expenses; Email=cats(Manager_FirstName, '.',Manager_LastName, '@habr.com'); format Cost Travel_Expenses Budget dollar10.2 Lead_Time 3.1; run;
çµæã®æçïŒ
äœæãããããŒã¿ã»ããã®èšè¿°åã調ã¹ãŠã¿ãŸãããã
proc contents data=work.company1 varnum; run;
ãã³ãã«ãã©ã°ã¡ã³ãïŒ
Email倿°ã®é·ãã«æ³šæããŠãã ãã; 200ãã€ãã§ãããããã¯CATS颿°ã«ãã£ãŠè¿ãããããã©ã«ãã®é·ãã§ãã 倿°Manager_FirstNameããã³Manager_LastNameã®å±æ§ã調ã¹ããšãEmail倿°ã¯8 + 6 +æååã®é·ãã@ habr.comããã€ãŸããããã«9ãã€ããåèš23ã§ããããšãããããŸããããã«æ³šæããå¿
èŠãããã®ã¯ãªãã§ããã æ¬ èœããŠãããã¹ãŠã®æåã¯ã¹ããŒã¹ãå®çŸããŸããã¹ããŒã¹ã¯ããŒã¿ã»ããã®ãµã€ãºã«åœ±é¿ãã倧éã®ããŒã¿ã®ããã©ãŒãã³ã¹ã«åœ±é¿ããŸãã
Email倿°ã®é·ããæç€ºçã«èšå®ããã«ã¯ãLENGTHæŒç®åã䜿çšããå¿
èŠããããŸãã
data company1; set work.company; length Email $23; Lead_Time=yrdif(Start_Date, End_Date, 'actual'); Cost=Budget-Travel_Expenses; Email=cats(Manager_FirstName, '.',Manager_LastName, '@habr.com'); format Cost Travel_Expenses Budget dollar10.2 Lead_Time 3.1; run;
ãã©ã°ã¡ã³ããåŠçãã
æ¬¡ã®æ¡ä»¶ãèæ
®ããŠãLead_Time倿°ã«åºã¥ããŠè©³çްãªåãäœæããŸãã
- Lead_Time倿°ã®å€ã1æªæºã®å Žåã[詳现]åã®å€ã¯1å¹Žæªæºã§ãã
- Lead_Time倿°ã®å€ãå¢çãå«ããŠ1ãã2ã®ç¯å²ã«ããå Žåã[詳现]åã®å€ã¯1ã2幎ã§ãã
- Lead_Time倿°ã®å€ã2ãé€ã2ã3ã®ç¯å²ã«ããå Žåã[詳现]åã®å€ã¯2ã3幎ã§ãã
- Lead_Time倿°ã®å€ã3ãé€ã3ã4ã®ç¯å²ã«ããå Žåã[詳现]åã®å€ã¯3ã4幎ã§ãã
- Lead_Time倿°ã®å€ã4ãé€ã4ã5ã®ç¯å²ã«ããå Žåã[詳现]åã®å€ã¯4ã5幎ã§ãã
- ãã®ä»ã®å Žåã¯ãã¹ãŠã[詳现]åã®å€ã¯5幎ãè¶
ããŠããŸãã
詳现ãªåã¯ããŸããŸãªæ¹æ³ã§äœæã§ããŸããããšãã°ãæãåçŽã§æãæçœãªãªãã·ã§ã³ã¯æ¡ä»¶ä»ãåŠçã䜿çšããããšã§ãã æ¬¡ã®æŒç®åã䜿çšããŠå®è£
ã§ããŸãã
- IF-THEN-ELSE
- ãã®ä»
- éžææ
倧éã®ããŒã¿ã®å ŽåãæåŸã®2ã€ã®ãªãã·ã§ã³ã䜿çšããæ¹ãå¹ççã§ãã
data company1; set work.company; length Email $23; Lead_Time=yrdif(Start_Date, End_Date, 'actual'); Cost=Budget-Travel_Expenses; Email=cats(Manager_FirstName, '.',Manager_LastName, '@habr.com'); format Cost Travel_Expenses Budget dollar10.2 Lead_Time 3.1; if Lead_Time<1 then detail='less than a year'; else if Lead_Time=>1 and Lead_Time<=2 then detail='1-2 years'; else if Lead_Time>2 and Lead_Time<=3 then detail='2-3 years'; else if Lead_Time>3 and Lead_Time<=4 then detail='3-4 years'; else if Lead_Time>4 and Lead_Time<=5 then detail='4-5 years'; else detail='above 5 years'; run;
Detail倿°ã®å€ãã5幎以äžãã«çãããªã芳枬ã®ã¿ãéžæããæ¡ä»¶ã远å ããŸãã whereããã£ã«ã¿ãŒãšããŠäœ¿çšãããšãæ§æãšã©ãŒãçºçããŸãã
whereå¥ã¯èšç®åã«ã¯äœ¿çšãããŸããã å¿
èŠãªå€æ°ãéžæããã«ã¯ãéžæçãªIFã¹ããŒãã¡ã³ããå¿
èŠã§ãã äœæãããããŒã¿ã»ãããžã®èŠ³æž¬ã®åºåããã£ã³ã»ã«ããŸãã
data company1; set work.company; length Email $23; Lead_Time=yrdif(Start_Date, End_Date, 'actual'); Cost=Budget-Travel_Expenses; Email=cats(Manager_FirstName, '.',Manager_LastName, '@habr.com'); format Cost Travel_Expenses Budget dollar10.2 Lead_Time 3.1; if Lead_Time<1 then detail='less than a year'; else if Lead_Time=>1 and Lead_Time<=2 then detail='1-2 years'; else if Lead_Time>2 and Lead_Time<=3 then detail='2-3 years'; else if Lead_Time>3 and Lead_Time<=4 then detail='3-4 years'; else if Lead_Time>4 and Lead_Time<=5 then detail='2-3 years'; else detail='above 5 years'; if detail ne 'above 5 years'; run;
ãŸããéžæçIFã¹ããŒãã¡ã³ãã«ã¯ç®è¡æŒç®åãå¿
èŠã§ãã ããšãã°ã次ã®ããã«æžãããšã¯ã§ããŸããã
if detail contains 'above 5 years';
ãã°ã«ãšã©ãŒã衚瀺ãããŸãïŒ
SASããŒã¿ã»ãããæ§æããŸãã
Manager_FirstNameããã³Manager_LastName倿°ã¯ãæ°ããSASããŒã¿ã»ããã«ååšããŠã¯ãªããŸããã ãã®èŠä»¶ã¯DROPãã©ã¡ãŒã¿ãŒã䜿çšããŠå®è£
ãããDROPæŒç®åã䜿çšã§ããŸãã
data company1 (drop=Manager_FirstName Manager_LastName); set work.company; length Email $23; Lead_Time=yrdif(Start_Date, End_Date, 'actual'); Cost=Budget-Travel_Expenses; Email=cats(Manager_FirstName, '.',Manager_LastName, '@habr.com'); format Cost Travel_Expenses Budget dollar10.2 Lead_Time 3.1; if Lead_Time<1 then detail='less than a year'; else if Lead_Time=>1 and Lead_Time<=2 then detail='1-2 years'; else if Lead_Time>2 and Lead_Time<=3 then detail='2-3 years'; else if Lead_Time>3 and Lead_Time<=4 then detail='3-4 years'; else if Lead_Time>4 and Lead_Time<=5 then detail='2-3 years'; else detail='above 5 years'; if detail ne 'above 5 years'; run;
äœæãããSASããŒã¿ã»ããããæå®ãããæ¡ä»¶ã«åŸã£ãŠ2ã€ã«åå²ããŸã
1ã€ã®DATAã¹ãããã§ãè€æ°ã®SASããŒã¿ã»ãããäœæã§ããŸãã åœããšã«åå¥ã®ããŒã¿ã»ãããäœæããŸãã
ããšãã°ãCountryåã®å€ã確èªããã«ã¯ã
PROC FREQããã·ãŒãžã£ã䜿çšã§ããŸãã
proc freq data=company1; table Country /nocum nopercent; run;
ãã®æé ã§ã¯ãdata =ãââã©ã¡ãŒã¿ãŒã§æå®ãããSASããŒã¿ã»ããã§Country倿°ã®ç¹å®ã®å€ãäœåçºçããããèæ
®ããŸãã
ãã®æé ã®çµæã¯æ¬¡ã®ããã«ãªããŸãã
ãããã£ãŠãOUTPUTæŒç®åãšæ¡ä»¶ä»ãåŠçã䜿çšããŠã1ã€ã®DATAã¹ãããã§2ã€ã®ããŒã¿ã»ãããäœæããŸãã
data US AU; set work.company1; if Country='AU' then output AU; if Country='US' then output US; run;
ã³ãŒããå®è¡ããŠããã°ãåç
§ããŠãã ããã
SASããŒã¿ã»ããã®èªã¿åããšæ§æã«ã€ããŠç°¡åã«èª¬æããŸãã æ¬¡ã®èšäºã§ã¯ãMERGEããã³SETã¹ããŒãã¡ã³ãã䜿çšããŠããŒã¿ã»ãããçµåããæ¹æ³ã玹ä»ããŸãã
PSãšããŠãSAS BASEã¬ãã¹ã³ã®æ§é ãæãåºãããŸãã
ãã§ã«å
¬éãããŠããèšäºïŒ
- SAS BASEã§ã®ããã°ã©ãã³ã°ã®åºç€ã ã¬ãã¹ã³1ã
- SAS BASEã§ã®ããã°ã©ãã³ã°ã®åºç€ã ã¬ãã¹ã³2.ããŒã¿ã¢ã¯ã»ã¹
- SAS BASEã§ã®ããã°ã©ãã³ã°ã®åºç€ã ã¬ãã¹ã³3.ããã¹ããã¡ã€ã«ã®èªã¿åãã
- 4çªç®ã®ã¬ãã¹ã³ãåŠç¿ããŸããã
次ã®èšäºã§ã¯ãSAS Baseã§ããŒãã«ãçµåããïŒããŒãžãã»ããïŒãæ¡ä»¶ä»ãåŠçãã«ãŒããSAS颿°ãã«ã¹ã¿ã ãã©ãŒãããã®äœæãSASãã¯ããPROC SQLãªã©ã®åé¡ã匷調ããããšæããŸãã

ã³ã¡ã³ãã§ãã£ãŒãããã¯ããåŸ
ã¡ããŠãããŸãïŒ èšäºã§ä»ã«ã©ã®ãããªãããã¯ãèŠããã§ããïŒ