Regardless of the destination, the output from this example of the COMPARE procedure includes sections for the following:
Data set summary—data set names, number of variables, number of observations
Variables summary—number of variables in common, along with the number in each data set which are not found in the other set
Observation summary—location of first/last unequal record and number of matching/nonmatching observations
Values comparison summary—number of variables compared with matches/mismatches, listing of mismatched variables and their differences
A review of the output provided by PROC COMPARE shows, in this case, only two variables are compared, despite the Fish and Heart data sets containing 7 and 17 variables, respectively. This is because only two variables (Weight and Height) have names in common. As such, even if the results indicate the base and comparison data sets have no mismatches, it is important to confirm that all variables were compared before declaring the data sets are identical. Similarly, the number of records compared is the minimum of the number of records in the two data sets, so the number of records must be compared as well. Several options and statements exist to alter how comparisons are done and to direct some comparison information to data sets.
Since the Heart and Fish data sets are not expected to be similar, applying PROC COMPARE to them is a simplistic demonstration of the procedure. A more typical comparison is given in Program 2.10.2, which applies the COMPARE procedure to the data set read in by Program 2.8.8 (using fixed-position data) and the IPUMS2005Basic data set in the BookData library.
Program 2.10.2: Comparing IPUMS 2005 Basic Data Generated from Different Sources
data work.ipums2005basicSubset;
set work.ipums2005basicFPa;
where homeValue ne 9999999;
run;
proc compare base = BookData.ipums2005basic compare = work.ipums2005basicSubset
out = work.diff outbase outcompare outdif outnoequal
method = absolute criterion = 1E-9 ;
run;
proc print data = work.diff(obs=6);
var _type_ _obs_ serial countyfips metro citypop homevalue;
run;
proc print data = work.diff(obs=6);
var _type_ _obs_ city ownership;
run;
To create a data set which differs from the provided BookData.IPUMS2005Basic data set, a WHERE statement is used to remove any homes with a home value of $9,999,999.
OUT= produces a data set containing information about the differences for each pair of compared observations for all matching variables. SAS includes all compared variables and two automatic variables, _TYPE_ and _OBS_.
OUTBASE copies the record being compared in the BASE= data set into the OUT= data set.
Like OUTBASE, OUTCOMPARE copies the record being compared in the COMPARE= data set into the OUT= data set.
OUTDIF produces a record that contains the difference between the OUTBASE and OUTCOMPARE records.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.