SAS COMPARE Procedure PDF

Summary

This document describes the SAS COMPARE procedures and its functions in comparing two SAS datasets. It details the matching variables, matching observations and the data the COMPARE procedure generates such as whether matching variables have different values, etc. It also explains how values can be further processed, or compared for observations.

Full Transcript

What Does the COMPARE Procedure Do? The COMPARE procedure compares the contents of two SAS data sets, selected variables in different data sets, or variables within the same data set. PROC COMPARE compares two data sets: the base data set and the comparison data set. The procedure determines mat...

What Does the COMPARE Procedure Do? The COMPARE procedure compares the contents of two SAS data sets, selected variables in different data sets, or variables within the same data set. PROC COMPARE compares two data sets: the base data set and the comparison data set. The procedure determines matching variables and matching observations. Matching variables are variables with the same name or variables that you pair by using the VAR and WITH statements. Matching variables must be of the same type. Matching observations are observations that have the same values for all ID variables that you specify or, if you do not use the ID statement, that occur in the same position in the data sets. If you match observations by ID variables, then both data sets must be sorted by all ID variables. What Information Does PROC COMPARE Provide? PROC COMPARE generates the following information about the two data sets that are being compared: n whether matching variables have different values n whether one data set has more observations than the other n what variables the two data sets have in common n how many variables are in one data set but not in the other n whether matching variables have different formats, labels, or types n a comparison of the values of matching observations Note: You can create a view that has two columns with the same variable name. If duplicate variable names exist in the view, PROC COMPARE cannot determine which column in the base data set should be compared to the compare data set. PROC COMPARE issues an error if it finds duplicate variable names. Further, PROC COMPARE creates two types of output data sets that give detailed information about the differences between observations of variables that it is comparing. The following example compares the data sets Proclib.One and Proclib.Two, which contain similar data about students: data proclib.one(label=\'First Data Set\'); input student year \$ state \$ gr1 gr2; label year=\'Year of Birth\'; format gr1 4.1; datalines; 1000 1990 NC 85 87 1042 1991 MS 90 92 1095 1989 TN 78 92 1187 1990 MA 87 94 ; data proclib.two(label=\'Second Data Set\'); input student \$ year \$ state \$ gr1 gr2 major \$; label state=\'Home State\'; format gr1 5.2; datalines; 1000 1990 NC 85 87 Math 1042 1991 MS 90 92 History 1095 1989 TN 78 92 Physics 1187 1990 MA 87 94 Music 1204 1991 NC 82 96 English ; PROC COMPARE does not produce information about values that are the same in each comparison data set. It produces information about values that are different, not the same. PROC COMPARE does not produce a data set that contains observations that are in one of the comparison data sets but not in the other, or that are in both comparison data sets. The options for the COMPARE statement can produce much of this information, but they do not produce a data set. If you want to produce a data set that contains this information, use a DATA step that contains a MERGE statement. Here is an example of a DATA step that uses a MERGE statement to create a data set: data inone intwo inboth; merge a (in=ina) b(in=inb); by byvar; if ina and not inb then output inone; if inb and not ina then output intwo; if ina and inb then output inboth; run;

Use Quizgecko on...
Browser
Browser