Lora D. Delwiche

Exercises and Projects for The Little SAS Book, Sixth Edition


Скачать книгу

the same number of variables and observations as you stated in part a).

      c. Print the data set.

      48. Each year, Forbes magazine publishes a list of the world’s 100 biggest companies. Each company receives a score using four metrics: sales, profits, assets, and market value. The final overall ranking is based on a composite score of these metrics. The variables in the raw data file BigCompanies.dat are ranking, company name, country, sales (billions), profits (billions), assets (billions), and market value (billions).

      a. Open the raw data file BigCompanies.dat in a simple editor such as WordPad. In a comment in your program, state which variables must be read in as character and which variables should be read in as numeric.

      b. Read the raw data file into SAS.

      c. Print the data set.

      49. Crayola crayons were introduced in 1903, and since then numerous standard colors have been released. Each crayon has a unique name, which corresponds to a hexadecimal code and RGB triplet. The raw data file Crayons.dat contains information on these standard crayon colors with variables corresponding to crayon number, color name, hexadecimal code, RGB triplet, pack size, year issued, and year retired.

      a. Open the raw data file Crayons.dat in a simple editor such as WordPad. In a comment in your program, state which variables must be read in as character and which variables should be read in as numeric.

      b. Read the raw data file into a permanent SAS data set.

      c. Print the data set.

      50. The tallest mountains in the world are located in central and southern Asia. The raw data file Mountains.dat contains information on mountains over 7,200 meters (23,622 ft). Researchers measure the prominence of a mountain as the height above the highest saddle connecting it to a higher summit. The variables in this file are mountain name, height (m), height (ft), year of first ascent, and prominence (m).

      a. Open the raw data file Mountains.dat in a simple editor such as WordPad. In a comment in your program, state which variables must be read in as character and which variables should be read in as numeric.

      b. Read the raw data file into SAS.

      c. Print the data set.

      51. Information Technology Services (ITS) at Central State University has a computing service called ”the Grid,” which is offered to faculty, staff, and students. This supercomputer is a cluster of 10 computers that, if programmed correctly in a grid environment, can process much faster by distributing the work across 10 machines. University users that would like to use the Grid computing environment must register with ITS. The raw data file CompUsers.dat contains the variables user ID, classification group (faculty, staff, or student), first name, last name, email address, campus phone number, and department.

      a. Examine the raw data file CompUsers.dat and read it into SAS.

      b. Print the data set.

      c. Write another DATA step to read the raw data file and remove the student records. Do this as efficiently as possible by testing the classification group as it is being read in with the INPUT statement.

      d. Print the data set.

      52. The World Health Organization (WHO) collected data in countries across the world regarding the outbreak of swine flu cases and deaths in 2009. The data in the file SwineFlu2009.dat include counts per country by month during the epidemic. There are many variables in the raw data file with the following descriptions:

      By date, ID for sorting by first case date

      By continent, ID (X.YY) for sorting by first case date within a continent where X represents continent X, and YY represents the YYth country with the next first case

      Country

      Date of first case reported

      Number of cumulative cases reported on the first day of the month for April, May, June, July, and August (across the columns, respectively)

      Last reported cumulative number of cases reported to WHO as of August 9, 2009

      By date, ID for sorting by first death date

      By continent, ID (X.YY) for sorting by first death date within a continent where X represents continent X, and YY represents the YYth country with the next first death

      Date of first death

      Number of cumulative deaths reported on the first day of the month for May, June, July, August, September, October, November, and December (across the columns, respectively)

      a. Examine the raw data file SwineFlu2009.dat and read it into SAS.

      b. Print a report that describes the contents of the data set including attributes of the variables.

      53. The data in the file BenAndJerrys.dat represent various ice cream flavors and their nutritional information. The variables in the raw data file are flavor name, portion size (g), calories, calories from fat, fat (g), saturated fat (g), trans fat (g), cholesterol (mg), sodium (mg), total carbohydrate (g), dietary fiber (g), sugars (g), protein (g), year introduced, year retired, content description, and notes.

      a. Examine the raw data file BenAndJerrys.dat and read it into SAS using a DATA step.

      b. Read the raw data file using PROC IMPORT.

      c. Create reports that describe the contents for each data set.

      d. Note any differences between the two data sets as a comment in your program.

      54. Data on previous winners of the Oscars are stored in a Microsoft Excel file named Oscars.xlsx. The variables in this file are ID, year, host, best picture, best actor, best actress, best director, and best screenplay.

      a. Examine the Microsoft Excel file Oscars.xlsx and read it into a permanent SAS data set using the IMPORT procedure.

      b. Print a report that describes the contents of the data set including the attributes of the variables and data set.

      c. In a comment in your program, discuss any limitations of the functionality of the resulting data set.

      d. Print the Oscars.xlsx data file using the XLSX LIBNAME engine. In a comment in your program, discuss any limitations of using this method to read in the data.

      55. Researchers randomly assigned subjects to either a treatment group taking a cholesterol-lowering medication daily, or a control group taking a placebo daily. The difference in total cholesterol was measured after four months. The variables in the Tchol.dat file are subject ID, treatment group, difference in cholesterol, pre-treatment total cholesterol, and post-treatment total cholesterol.

      a. Examine the raw data file Tchol.dat and read it into SAS.

      b. Print the data set.

      c. Create a new DATA step and read in the data for only the subjects assigned to the treatment group. Do this as efficiently as possible by testing the treatment group variable as it is being read in with the INPUT statement.

      d. Print the data set.

      56. A gourmet pizza restaurant is considering adding new toppings to its menu. Each month they survey 10 customers about their preferences for three different toppings. They want data on several different toppings, so they don’t always ask about the same three toppings. Customers rate each topping on a scale of 1 (would never order) to 5 (would order often). The restaurant wants to compute average ratings for all toppings, so the ratings variables need to be numeric. The raw data file Pizza.csv has variables for the respondent’s survey number, and the ratings for five different toppings: arugula, pine nuts, roasted butternut squash, shrimp, and grilled eggplant. The first two digits in the survey number correspond to the month of the survey.

      a. Examine the raw data file Pizza.csv and read it into SAS using the IMPORT procedure.

      b. Print the data set.

      c. Print a report that describes the contents of the data set to make sure all the variables are the correct type.

      d.