could be recorded for each case. Second, you could use a number to represent each answer. For example, you could choose to enter 0 for all respondents reporting “Male” and 1 for all respondents reporting “Female.”
If you record the responses in the first way, it would be what Stata refers to as a string variable. A string variable is a variable in which the contents are actual words. String variables can be very useful for many purposes. For example, you can enter verbatim answers to questions directly into Stata, as was done for the variable religoth in the Chapter 1 Data.dta file.
The drawback of storing a variable such as gender as a string variable is that some statistical operations require numbers. For example, if you wanted to calculate the mean (i.e., mathematical average) of a variable, each category must be assigned a numeric value. For this reason, it is generally advisable, when possible, to use the second method and enter variables as numeric variables. These are variables that have actual numbers attached to each response.
Fortunately, many of the Stata commands that will be discussed in this book operate similarly with numeric or string variables. The commands that work only with numeric variables are those that perform statistical operations that require numbers to calculate, for example, the mean or a linear regression. Because numeric variables, typically, are more applicable to the vast majority of data analyses, the commands discussed in this book focus on their use with numeric variables (keeping in mind that many operate identically for string variables). The primary commands that are used (and are different) for string variables, including methods for changing a string variable to a numeric variable, are addressed in the Data Management: Using String Variables section in Chapter 3.
As has been discussed, often, you may be using data that you did not enter, so you may not have a choice or even be certain about the way in which variables were entered. There are several ways to determine whether a variable is a numeric or string variable. The most straightforward way is to open the Data Browser window. In versions Stata 10 or later, string variables are shown in a red font, whereas numeric variables are shown in either a black or blue font. In the Chapter 1 Data.dta file, you will see that only the variable religoth is a string variable.
Another option to see which variables are string variables is to click on a particular variable in the Variables window. In the Properties window, you will see an entry for Type. When the variable type starts with the letters “str,” the variable is stored as a string variable.
You may have noticed that more information about the variable type is listed in the Properties window. For example, gender is shown to be a byte variable, ids is a long variable, and religoth is a str31 variable.
These distinctions further demarcate variables within the general categories of numeric and string. They are also related to how much file space is allotted to store the variable.
All string variables have the “str” prefix, and the number indicates the maximum characters that can be used for that string variable. So the maximum length a denomination could be in the variable religoth is 31 characters. As you will see, this constraint can be altered, but it is advisable to use only the minimum number of characters that are needed. Otherwise you are using memory to store empty spaces.
Similarly, the various subtypes of numeric variables indicate the number of digits that each variable can hold. In order from the smallest to the largest, the numeric variable types are byte, int, long, float, and double.
Generally, Stata will store variables in the most efficient and effective way when you create them. Moreover, most Stata users will conduct countless analyses without ever having to worry about or manipulate these specific distinctions.
When you have the Data Browser open, you will probably notice, however, that the variables gender and employst look different from the variables ids and agecats. This difference is due to the fact that gender and employst have what are called value labels attached to them. Value labels will be covered in much more detail later, but they are labels that can be applied to the numeric codes used to represent responses. Remember that you could decide to use the number 1 to represent the answer “Female.” This choice may be difficult to remember (i.e., whether 1 was Male or whether 1 was Female); therefore, you can use value labels as a way to help remember this coding strategy. The variables ids and agecats were numerical responses, so they do not have any value labels that could be attached to them. You can see the actual numerical codes for each variable using the Data Browser window by clicking on the Tools menu, selecting Value Labels, and clicking Hide All Value Labels. When you do so, you will see the cases that were “Male” now display “0,” and the cases that were “Female” now display “1.” Or you can highlight (using either the direction keys or the mouse) a particular cell (e.g., “Male”). When you do so, the actual value is listed in a pane just underneath the icons.
Exercises
1 Open the “Chapter 1 Exercise Data.dta” data file.
2 Save a copy of the open data file named “Chapter 1 Ex mycopy.dta.”
3 Using the Data Browser, determine how many cases and variables are in the data set.
4 Which of the variables is a string variable?
5 Use the Data Editor to change the agefstdt value of the last case from 14 to 13.
6 Use the Data Editor to input an additional case with the following characteristics: ids value of 1004, is Male, completed the 12th grade, went on his first date at 16, lives in the Pacific census region, and does not live with his parents.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.