<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-2052114</id><updated>2011-04-21T18:09:55.156-07:00</updated><title type='text'>2127</title><subtitle type='html'>coursework for students</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://2127.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>14</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-2052114.post-3037536</id><published>2001-04-02T17:08:00.000-07:00</published><updated>2001-04-02T17:08:33.906-07:00</updated><title type='text'></title><content type='html'>Review questions, final examination&lt;br /&gt;&lt;br /&gt;Sampling:&lt;br /&gt;&lt;br /&gt;1. What are the main advantages of data collection by sampling, rather than doing a complete census?&lt;br /&gt;&lt;br /&gt;2. What are the main differences between probability and non-probability samples?&lt;br /&gt;&lt;br /&gt;3. Describe the main types of non-probability samples, and the conditions under which their use might be warranted.&lt;br /&gt;&lt;br /&gt;4. Describe the basic procedures of simple random and systematic sampling, and their possible advantages and disadvantages.&lt;br /&gt;&lt;br /&gt;5. What do we mean by proportionate and disproportionate stratified sampling, and under which conditions might we apply these techniques?&lt;br /&gt;&lt;br /&gt;6. Describe cluster sampling, and its advantages and disadvantages.&lt;br /&gt;&lt;br /&gt;7. What do we mean by a “sample frame”? Give a few examples, and describe their potential defects.&lt;br /&gt;&lt;br /&gt;8. What are the dangers of non-response in a sample survey?&lt;br /&gt;&lt;br /&gt;9. Describe what is meant by sampling “with probabilities proportionate to size”?&lt;br /&gt;&lt;br /&gt;Experimental design:&lt;br /&gt;&lt;br /&gt;1. Explain the purpose of random assignment of experimental subjects.&lt;br /&gt;&lt;br /&gt;2. Explain the main features of a double-blind medical experiment.&lt;br /&gt;&lt;br /&gt;3. Explain the main features of the classical design experiment (pretest-posttest with a control group).&lt;br /&gt;&lt;br /&gt;4. Explain the main features of “true” experiments, and the major subtypes.&lt;br /&gt;&lt;br /&gt;5. Explain the main features of “pre-experiments,” list  the major subtypes, and their main shortcomings.&lt;br /&gt;&lt;br /&gt;6. Describe two types of quasi-experiments.&lt;br /&gt;&lt;br /&gt;7. What is meant by “internal validity,” and what are the main threats to it?&lt;br /&gt;&lt;br /&gt;8. What is meant by “external validity,” and what are the main threats to it?&lt;br /&gt;&lt;br /&gt;Evaluation Research:&lt;br /&gt;&lt;br /&gt;1. Describe the similarities between outcome evaluation and experimental research.&lt;br /&gt;&lt;br /&gt;2. What is meant by “needs assessment,” and what are the data types we can use in such a study?&lt;br /&gt;&lt;br /&gt;3. What are “social indicators,” and how can they be used?&lt;br /&gt;&lt;br /&gt;4. In what ways do concept formation, measurement and sampling differ in applied research, when compared to theoretical research?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Qualitative research:&lt;br /&gt;&lt;br /&gt;1. Describe the controversy surrounding Freeman’s criticism of Margaret Mead.&lt;br /&gt;&lt;br /&gt;2. What are Kvale’s 12 steps that make up the “mode of understanding” in the qualitative interview?&lt;br /&gt;&lt;br /&gt;3. Describe the role of the participant observer in qualitative research.&lt;br /&gt;&lt;br /&gt;4. What is an example of “triangulation”in qualitative research?&lt;br /&gt;&lt;br /&gt;5. What are the general steps in the design of a qualitative field study?&lt;br /&gt;&lt;br /&gt;Methods of analyzing available data:&lt;br /&gt;&lt;br /&gt;1. What are the main characteristics of a content analysis study, and give two examples.&lt;br /&gt;&lt;br /&gt;2. What are the main components of a content analysis study?&lt;br /&gt;&lt;br /&gt;3. Describe Holsti’s requirements for a content analysis study.&lt;br /&gt;&lt;br /&gt;4. List the two main types of unobtrusive measures, and give examples.&lt;br /&gt;&lt;br /&gt;Univariate, bivariate, and multivariate analyses:&lt;br /&gt;&lt;br /&gt;1. When we do a cross-tabulation, why do we percentage down and compare across?&lt;br /&gt;&lt;br /&gt;2. Describe the following possible outcomes of trivariate analyses, and give examples:&lt;br /&gt;spurious relationships; replication; specification; interpretations and intervening variables; a suppressor variable; distorter variables. &lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-3037536?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/3037536'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/3037536'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_04_01_archive.html#3037536' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2806523</id><published>2001-03-16T10:33:00.000-08:00</published><updated>2001-03-16T10:46:58.946-08:00</updated><title type='text'></title><content type='html'>This is a cancellation notice for classes scheduled Monday March 19, 2001.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2806523?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2806523'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2806523'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_03_11_archive.html#2806523' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2642376</id><published>2001-03-05T09:51:00.000-08:00</published><updated>2001-03-05T10:01:48.986-08:00</updated><title type='text'></title><content type='html'>MONDAY, MARCH 5, 2001&lt;br /&gt;&lt;br /&gt;DUE TO UNFORESEEN CIRCUMSTANCES TODAY'S CLASS IS CANCELLED.&lt;br /&gt;WEDNESDAY'S CLASS (MARCH 7) WILL PROCEED AS PLANNED.&lt;br /&gt;PLEASE READ THE CHAPTER ON EXPERIMENTAL DESIGN.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2642376?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2642376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2642376'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_03_04_archive.html#2642376' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2588604</id><published>2001-03-01T10:55:00.000-08:00</published><updated>2001-03-01T11:05:25.203-08:00</updated><title type='text'></title><content type='html'>SOCI 2127/POLI 3007: SECOND LIST OF REVIEW QUESTIONS FOR THE MIDTERM TEST&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;1.             What are the four main ways of administering surveys?&lt;br /&gt;2.	Name the main advantages and main disadvantages of each strategy.&lt;br /&gt;3.             What are the major guidelines  we have to follow in question wording?&lt;br /&gt;4.	How can we maximize response rates for mail surveys?&lt;br /&gt;5.	What is CATI? How does it work?&lt;br /&gt;6.	Describe the characteristics and operation of a focus group.&lt;br /&gt;7.             What types of topics can be addressed in survey research?&lt;br /&gt;8.             Name five major biases and errors that can occur in question formulation in surveys.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2588604?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2588604'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2588604'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_02_25_archive.html#2588604' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2467858</id><published>2001-02-21T08:28:00.000-08:00</published><updated>2001-02-21T08:36:29.686-08:00</updated><title type='text'></title><content type='html'>REVIEW QUESTIONS, MIDTERM TEST, PART 1.&lt;br /&gt;&lt;br /&gt;More midterm test questions will be added next week, based on the assigned readings and my lectures.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;1. Describe the main features of the following research approaches: social surveys; social experiments; qualitative field studies.&lt;br /&gt;&lt;br /&gt;2. What would you consider the main advantages and disadvantages of these methods?&lt;br /&gt;&lt;br /&gt;3. What are  the eleven steps in the development of a survey ?&lt;br /&gt;&lt;br /&gt;4. Describe the problem of causality and time in social surveys, and the survey designs which address this issue (panel analysis, etc.)&lt;br /&gt;&lt;br /&gt;5. Describe the various phases of Wallace's model of science.&lt;br /&gt;&lt;br /&gt;6. What does Kuhn mean by a "paradigm" and a "scientific revolution"?&lt;br /&gt;&lt;br /&gt;7. What are the advantages of  associations between variables and concepts?&lt;br /&gt;&lt;br /&gt;8. Describe  the main characteristics of scientific theories?&lt;br /&gt;&lt;br /&gt;9. Describe what we mean by an "an abstract concept," and the main advantages and disadvantages of their use. &lt;br /&gt;&lt;br /&gt;10. What are the conditions for the establishment of a causal relationship, and what is meant by a spurious relationship?&lt;br /&gt;&lt;br /&gt;11.  Describe three types of definitions.&lt;br /&gt;&lt;br /&gt;12. Describe the main types  of reliability&lt;br /&gt;&lt;br /&gt;13. What is the general approach of criterion validation? &lt;br /&gt;&lt;br /&gt;14. What is face validity, and why is it unsatisfactory?&lt;br /&gt;&lt;br /&gt;15.  Describe the general process of  construct validation, using the "Becoming Modern" study.&lt;br /&gt;&lt;br /&gt;16.  What is the relationship between reliability and validity? 		&lt;br /&gt;&lt;br /&gt;17.  Describe  the four levels of measurement. &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2467858?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2467858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2467858'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_02_18_archive.html#2467858' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2287613</id><published>2001-02-07T18:14:00.000-08:00</published><updated>2001-02-07T18:19:54.256-08:00</updated><title type='text'></title><content type='html'> Walter Schwager: Variables and levels of measurement&lt;br /&gt;&lt;br /&gt;	In doing social research it is current practice to store the data gathered in a computer, so the data can be analyzed using various software programs such as SPSS (Statistical Package for the Social Sciences). To prepare data for computer storage the various values on a variable are usually given numerical codes. Thus we may assign the category “male” the code 1 on the variable sex, and “females” the code 2. Thus, in developing coded information we assign a numerical code to each of the values or categories of our variables. Allocating numerical codes to data allows us to store information in a computer more efficiently.  But another advantage of doing so is actually more important: the use of numbers enables us to use the powerful and elegant language of mathematics in dealing with these data.  This advantage is associated with a drawback, however.  A major peril is that the use of numbers frequently causes a unjustified feeling of exactness and reliability in dealing with the research data.  As Moroney put it: "It is an easy and fatal step to think that the accuracy of our arithmetic is equivalent to the accuracy of our knowledge about the problem at hand." So the use of numbers brings major advantages, but also potential dangers.  The question we shall address in this section is: what do these numbers mean?  What arithmetical or mathematical characteristics are associated with our use of numbers for the values or categories of different variables?  What interpretations of these numbers are warranted?&lt;br /&gt;	A brief example may help to clarify the issue.  You are undoubtedly familiar with the notion of an "average", or more accurately, the arithmetic mean.  If our sample consists of 5 individuals who, on the variable age, have the following values (in years): 20, 26, 30, 34 and 40, what is their mean age?  We find the mean by adding up the numbers in the series, and dividing their total by the number of elements.  In our example the mean age would be:&lt;br /&gt;&lt;br /&gt;	(20 +26 + 30 + 34 + 40)/5 = (150)/5 = 30&lt;br /&gt;&lt;br /&gt;	But now take the variable of religion.  Let us assume that the five individuals referred to have the following religions, with in brackets the numerical code for that category:&lt;br /&gt;	one Protestant (1);&lt;br /&gt;	one Roman Catholic (2);&lt;br /&gt;	one Jew (3);&lt;br /&gt;	one with No Religion (4);&lt;br /&gt;	and one with an "Other" religion (5).&lt;br /&gt;What is the "average" religion for this sample of 5 ?&lt;br /&gt;	&lt;br /&gt;	Well, we can proceed with our computations in the same way: add up all the values, and divide the total by the number of cases.  In this instance:&lt;br /&gt;&lt;br /&gt;	(1 + 2 + 3 + 4 + 5)/5 = 15/5 = 3&lt;br /&gt;&lt;br /&gt;	As we know, 3 refers to the Jewish category.  Does this mean that the "average" religion is Jewish?  And how would we have interpreted a result that would have given us a "mean" religion of, say, 2.1? What does an "average" religion refer to, anyway?&lt;br /&gt;	Let's scrutinize what we just did: we gave numerical codes to religious categories, but those numerical codes were no more than labels.  We could with equal justification have assigned entire ly different numbers in an entirely different sequence: Protestant (8), Roman Catholic (1), Jewish (6), No Religion (0), Other (7).  The only restriction is that each category should be assigned a unique code number, so we could not confuse it with another category.  If we had given these other numbers to the categories, the "average" religion would have been quite different:&lt;br /&gt;&lt;br /&gt;	(8 + 1 + 6 + 0 + 7)/5 = 22/5 = 4.4&lt;br /&gt;&lt;br /&gt;	The result that we obtained depended totally on our arbitrary assignment of given codes, and therefore cannot be interpreted in any meaningful fashion.&lt;br /&gt;	In the example where we computed the mean for the variable “age”, the mean referred to a value that could be interpreted:  a mean of 30 means that the average age is 30 years.  But a "mean" religion makes as much sense as an average telephone number for a sample, or a mean car license number.  In other words: the notion of a mean age makes mathematical sense, whereas the notion of a mean religion does not.  The numbers attached as codes to religious categories are no more than labels: we only know that cases having the same code are the same on religion, and those having different codes have different religions.&lt;br /&gt;	You can add and subtract years, and say that someone who is 14 is half as old as someone else who is 28, and twice as old as someone who is 7 years old.  But can you say that a Protestant (1) is half as religious as a Roman Catholic (with code 2) and one-fourth as religious as someone with No Religion (with code 4)?  That statement would not make sense, as the results are, once again, purely caused by our arbitrary assignment of numerical codes to religious categories.  Differences between these numbers (as indicated by subtracting them) do not refer to differences in "religiosity".  In other words: you cannot add, subtract, divide or multiply the numerical codes attached to the categories of the variable "religion".&lt;br /&gt;	This clarifies the statement in the first paragraph of this section: what arithmetical or mathematical characteristics are associated with our use of numbers for the values or categories of different variables? The various ways in which we can use numbers are called levels of measurement, and each level is called a scale.  The four levels of measurement that we shall discuss here are the following:&lt;br /&gt;	1.  nominal scales;&lt;br /&gt;	2.  ordinal scales;&lt;br /&gt;	3.  interval scales;&lt;br /&gt;	4.  ratio scales.&lt;br /&gt;&lt;br /&gt;	These four levels form a kind of ladder.  The bottom level, nominal scales, is the most rudimentary; each subsequent level becomes more refined, but includes all the characteristics of the preceding one.  You may be glad to hear that you already know all there is to know about the most sophisticated level of measurement, that of ratio scales.&lt;br /&gt;	The fact that for nominal scales we cannot apply what we generally consider "standard" &lt;br /&gt;mathematical operations points to the following problem.  Whether we can apply a certain mathematical operation to some aspect of reality is a question which can only be answered by checking whether the assumptions of that operation fit the characteristics of that situation.  Many students have found this hard to grasp.  When asked: is 1 plus 2 always 3? nearly everybody answers affirmatively.  But what about&lt;br /&gt;one cup of coffee, to which we add two spoonfuls of sugar?  This is not a trick question: it demonstrates that the addition of units, according to arithmetical rules, is only possible if the units remain the same, and are not decreased or increased in number due to the physical aspects of the actual addition operation.  This requirement is generally satisfied when we deal with cookies or apples, or even dollars or years; it is not when we deal with coffee and sugar, or even a male and a female rabbit, given a couple of months.  Therefore a mathematical operation can only be applied if the assumptions of that operation are satisfied by the subject matter that the operation is applied to!  This fit between the requirements of a mathematical operation and the characteristics of some subject matter is called isomorphism: similarity of form.&lt;br /&gt;	We shall now proceed to a more systematic discussion of these four levels of measurement, and the basic questions we shall be addressing for each one are the following:&lt;br /&gt;	a.  what are the implications of the way in which numbers are used for each level of measurement?&lt;br /&gt;	b.  associated with this is the following problem: what mathematical operations are permitted for each level of measurement?&lt;br /&gt;	c.  this in turn leads to the final problem: what statistical measures are appropriate for each level of measurement?  We shall deal with this final question in an introductory manner only.&lt;br /&gt;	Before starting on this discussion first a word about terminology.  We are, in this topic, always discussing what we can do with the numbers that represent various values on a given variable, as in the examples above, and what these numbers represent or mean.  Such a variable is said to be at a certain measurement level, or to be a certain scale.  The variable "religion" is at the nominal level of measurement, and can be said to be a nominal scale.  A nominal scale may be any variable at that level.  A ratio scale is a variable at the ratio level of measurement; as we shall see later, that might be "age", or "years married." Unfortunately the term "scale" also refers to instruments to measure attitudes, so some confusion may arise; so beware.)&lt;br /&gt;&lt;br /&gt;Nominal Scales, or: when is 1 plus 2 not 3?&lt;br /&gt;&lt;br /&gt;	In the example of religion, as we discussed a moment ago, the allocation of numbers was merely a labelling exercise: we assign a (numerical) name to a given category.  This is why we call this use of numbers: measurement at the level of categorical or nominal scales (from the Latin "nomen": name).&lt;br /&gt;	What characteristics are associated with this way of using numbers?  Only those of similarities and differences,- a unit of analysis with a given code is similar (on that variable!) to all other units with the same code, and it differs on that variable from all units of analysis with a different code. (In algebraic terms, a=a, and b=b; and a is not equal to b, and vice versa.) Put technically, the numerical codes identify equivalence classes, as all the elements within a certain category are equivalent: equal in value.&lt;br /&gt;	The allocation of numbers is purely arbitrary, however, as we already discussed.  As one author put it, "any two numbers may be interchanged without affecting anything but the notation." As long as we keep the numbers distinct for different categories, and we assign the same number to all cases within the same category, we may allocate any numbers we wish.&lt;br /&gt;	What mathematical operations are permitted for nominal scales?  Apart from operations dealing with similarities and dissimilarities, none.  The only operations allowed for equivalence classes are frequency counts: e.g., how many Protestants do we have in our sample?  Let's review this systematically.&lt;br /&gt;	1.  We can count the number of cases with a given code, e.g. the number of Protestants, or the &lt;br /&gt;number of Catholics;&lt;br /&gt;	2.  Can we compare these numerical codes in terms of more or less?  In other words, can we rank them?  No, as we have assigned them in an arbitrary fashion. (In our example No religion -code 4- would be "more" on some fictional variable than the preceding three categories: Protestant, Catholic, and Jewish, with codes 1, 2, and 3!)&lt;br /&gt;	3.  Can we add or subtract these code numbers?  No, as we have assigned them totally arbitrarily, and what would additions and subtractions mean here?  Although we all tend to believe that 1 + 2 = 3, at the level of nominal scales we cannot add one Protestant to one Roman Catholic to make one member of the Jewish faith; that would be an odd kind of interreligious procreation.  Thus at this level of measurement 1 + 2   3!&lt;br /&gt;	4.  Can we multiply or divide these numbers?  Again, the answer is no. our assignment of numbers has been arbitrary, and what would it mean to do the following sum: 2/2= 1?  Something like the following: RC/RC = Protestant?&lt;br /&gt;&lt;br /&gt;	In summary it can be stated that at the level of nominal scales we can only count (heads); we cannot rank, add, subtract, multiply or divide.&lt;br /&gt;	The statistical measures that are appropriate at the level of nominal scales are those that are based on head counts only: percentages, proportions, and modes or modal categories, as well as frequencies.&lt;br /&gt;	The notion of nominal scales is puzzling at first sight, so you may want to have a look at some other nominal or categorical level variables. Some of the most important categorical variables in sociology are: sex, ethnicity, race, religion, -occupation, party affiliation and marital status.&lt;br /&gt;	Nominal level variables are also known as categorical variables, as the values on them are distinct categories.  They are also known as qualitative variables, in contrast to the next three types, which are lumped together as quantitative variables. (The Baker text considers ordinal variables as qualitative as well.)&lt;br /&gt;	&lt;br /&gt;The ranking of numbers: ordinal scales&lt;br /&gt;&lt;br /&gt;	In many situations we use variables with values, that can be ranked in terms of more or less, or of greater or smaller. The educational achievement of a respondent’s mother or father can be fitted into one of the following categories:&lt;br /&gt;&lt;br /&gt;What is the highest level of formal education completed by your parents?&lt;br /&gt;&lt;br /&gt;EDUCATION 				MOTHER 			FATHER&lt;br /&gt;No schooling..............................................1				1&lt;br /&gt;Some Elementary schooling..................... 2 				2&lt;br /&gt;Completed Elementary school...................3 				3&lt;br /&gt;Some Secondary school............................4 				4&lt;br /&gt;Completed Secondary school....................5 				5&lt;br /&gt;Some University or College......................6 				6&lt;br /&gt;University degree or degrees....................7 				7&lt;br /&gt;Other (write in)&lt;br /&gt;Mother__________________________  8&lt;br /&gt;Father ___________________________				 8&lt;br /&gt;Don't know..............................................9				 9&lt;br /&gt;&lt;br /&gt;The first observation we can make is that the numerical codes can be interpreted in terms of similarity and dissimilarity, as in nominal scales. (After we have completed our discussion of levels of measurement you will note that each higher level of measurement has all the characteristics of preceding levels of measurement, plus some new ones.) But the codes also make sense in terms of more and less: "no schooling" (1) is clearly less than, say, "some elementary" (2);  (5), "completed secondary school", is clearly more than (4), "some secondary school."&lt;br /&gt;	So in what way do ordinal scales differ from nominal scales?  The codes of an ordinal scale can be ranked in terms of more or less on a given variable. (With the exception of the 8 -other- and 9 -don't know- categories this applies to the example above, as shown.) This ranking possibility results in a rank order, and therefore the term ordinal scales.&lt;br /&gt;	What can we say about the size of the differences between two values on an ordinal scale?  In general, little or nothing.  How would you compare the difference between 2 and 1, or 7 and 6, in the example just given?  Because the differences are unequal or unknown, we cannot compare them in mathematical terms: we cannot add or subtract, therefore (7-6)is not equal to (2-1).&lt;br /&gt;	For the same reason we can also not multiply or divide these numbers, as is discussed below.&lt;br /&gt;	What statistical measures can we apply to ordinal scales?  Basically the same as for nominal scales, plus the ones based on ranking.  These include percentiles (and quartiles) and such measures as the median.  If you have ranked a class of 15 students with scores on a music test, the score of the middle student -the 8th in this case- is the median value.&lt;br /&gt;	Many of the variables in social science research are of an ordinal kind: job prestige, educational level, a country's level of industrialization, and so on.  The largest collection probably consists of individual attitudes and aptitudes: the strength of your opinion in favour or against capital punishment, economic nationalism, sexual equality, or your scores on IQ tests, classroom tests, academic subjects and so on.  If a student gets a score of 70 on an academic test, we can presumably say that she has a higher score than someone with a score or 35, but can we say that the difference between a score of 70 and 35 is the same as that between 0 and 35?  We cannot.&lt;br /&gt;	This also implies that we cannot multiply or divide numbers at the level of an ordinal scale: Return to our example for a moment: would you be able to say that 6/3 = 2?  In other words, would you be able to say that someone with some university or college education has twice as much schooling as someone who finished elementary school?  That would not be a very meaningful statement to make.&lt;br /&gt;	To summarize our discussion more systematically, we can state that:&lt;br /&gt;	A.  the mathematical connotations of numbers used at the&lt;br /&gt;level of ordinal scales are:&lt;br /&gt;	1.  those of similarity or dissimilarity, as for nominal scales;&lt;br /&gt;	2.  those of ranking, resulting in rank orders;&lt;br /&gt;&lt;br /&gt;	B.  the mathematical operations permitted for ordinal scales are:&lt;br /&gt;	1.  those of counting:  how many elements are in the 1- category, for example;&lt;br /&gt;	2.  those of ranking, or comparisons in terms of more or less;&lt;br /&gt;&lt;br /&gt;	C.  The mathematical operations that are not permitted are those of:&lt;br /&gt;	1.  subtraction and addition;&lt;br /&gt;	2.  multiplication;&lt;br /&gt;	3.  division.&lt;br /&gt;&lt;br /&gt;	D.  what statistical measures are appropriate at the level of ordinal scales?&lt;br /&gt;	1.  those associated with head counts: percentages, proportions, and frequencies; modes and modal categories;&lt;br /&gt;	2.  those associated with rank orders: the median, and percentiles, to give only two examples.&lt;br /&gt;&lt;br /&gt;	  The mathematical characteristics of an ordinal system include the requirement of transitivity: if a is larger than b, and b is larger that c, than a must be larger than c; i.e., if a is larger than b, and b is larger than c, then c must be larger than a. In reality this transitivity requirement may be violated. The simplest example concerns sports teams: Team A may beat Team B (i.e. be better); Team B may beat Team C; but Team C may beat Team A! This is an example of intransitivity. (In such a situation the criteria for an ordinal scale are not fulfilled.  But in all sports competitive rules ensure that such intransitivity does not occur.)&lt;br /&gt;	In the natural sciences a few examples of ordinal scales still exist, including the Beaufort scale of wind velocity, (1: leaves move slightly; 10: buildings blow over), the Richter scale for earthquake strength, and the Mohs scale for the hardness of minerals.&lt;br /&gt;	In the social sciences many variables are at the level of ordinal scales, but because much more powerful and useful statistics are available for the next level of measurement, most of these ordinal variables are treated as interval level variables.  The consequences of this are debated in the profession, but these debates are of no great concern to us for the moment.&lt;br /&gt;&lt;br /&gt; Interval Scales, or: When is 10 not twice 5?&lt;br /&gt;&lt;br /&gt;	The clearest example to illustrate interval scales comes from the measurement of temperature.  In comparing Fahrenheit and Celsius scales, for instance, we can state that, roughly speaking,&lt;br /&gt;	34 degrees  F = 1 degree  C; and&lt;br /&gt;	68 degrees  F = 20 degrees  C.&lt;br /&gt;How do these temperatures compare to each other?  Well, in terms of Fahrenheit, 68/34=2, so it is tempting to say that one temperature is twice as warm as the other.  But how do they compare in terms of the Celsius or centigrade scale?  Now, 20/1=20, so here we might want to say that one temperature is twenty times warmer than the other.  How come that the two measurement systems give us two different results?  After all, our mathematical computations have been correct.  Why are our conclusions contradictory?&lt;br /&gt;	Could it be that we get these peculiar results because we employ different measurement systems?  No, because in measuring lengths in imperial and metric measures, two different systems, we still end up with the same results:&lt;br /&gt;	2 yards = 1.82 metres; and&lt;br /&gt;	4 yards = 3.64 metres.&lt;br /&gt;&lt;br /&gt;	How do these two lengths compare?  Well, in yards, clearly, 4/2 = 2, and in metres, 3.64/1.82 = 2 as well.  So changing measures of length did not influence the results here.  It also does not for surface and volume measures. By doing the conversion on a simple example you can check that yourselves, if you want.&lt;br /&gt;	The explanation for our conflicting results is that they are contradictory, because the two scales we compared have different zero points: the two scales start counting at different points.  These two scales, plus a third one, the Kelvin scale, can be illustrated as in Figure 5.2. The line represents the variable "temperature".&lt;br /&gt;									&lt;br /&gt;&lt;br /&gt;  -273 		degrees Celsius 			-18 	0 	38 	100&lt;br /&gt;		TEMPERATURE POINTS		*	*	*	*&lt;br /&gt;___________________________________________________________________________________&lt;br /&gt;  -460 		degrees Fahrenheit 			0 	32 	100 	212&lt;br /&gt;  0 		degrees Kelvin 			255 	273  	311 	373&lt;br /&gt;&lt;br /&gt;A GRAPHIC COMPARISON OF THREE    TEMPERATURE SCALES&lt;br /&gt;&lt;br /&gt;	The Kelvin scale starts at the "natural" zero of absolute zero, but the Celsius and Fahrenheit scales starts at relatively arbitrary points along the temperature line.  That is why you cannot divide temperatures on these two scales. (You would also run into problems with negative temperatures on these scales.)&lt;br /&gt;	These scales do have the quality that each unit difference on the scale is equivalent to each other unit; in other words, one degree Celsius difference is always the same, no matter where it is located.  Thus the difference between 10 F and 5 F is the same as that between 15 degrees and 10 degrees, for example.  The intervals between successive numbers are the same, or as it is sometimes put, they are equidistant.  That is different from the situation in the preceding level of measurement, ordinal scaling: there a point difference might mean, in one case, the difference between "some elementary schooling" and "no schooling" (codes 2 and 1 on question 40 in the preceding section), or between "some university or college", and "completed secondary school" (codes 6 and 5).  So in ordinal scaling the implications or meaning of a difference of a point depend on where on the scale you are.  At this new level of measurement, however, each point difference can be seen as an interval of equal size, hence the name "interval scale." (The reasons why these intervals are the same are rather complex, so we'll bypass that discussion.)&lt;br /&gt;	Because these intervals are the same, it is meaningful to compare differences by adding and subtracting numbers on the same scale, as we did in stating that the difference between 10 and 5 degrees Celsius was the same as that between, say, 33 and 28.  After all, (10-5)=5, and (33-28)=5.&lt;br /&gt;	What are the permitted mathematical operations for an interval scale?  First of all, we have to mention those that apply also to nominal and ordinal scales, and introduce the new feature specific to interval scales:&lt;br /&gt;	1.  similarity and dissimilarity;&lt;br /&gt;	2.  ranking;&lt;br /&gt;	3. addition and subtraction.&lt;br /&gt;&lt;br /&gt;	We cannot, however, multiply or divide, as our comparison of Fahrenheit and Celsius scales illustrated.&lt;br /&gt;&lt;br /&gt;	The statistical measures applicable to interval measures are first the ones we have encountered already:&lt;br /&gt;	1.  those based on counts: frequencies, percentages, proportions; modes and modal categories;&lt;br /&gt;	2.  those based on ranks: percentiles, medians, etc.&lt;br /&gt;&lt;br /&gt;	But for interval scales there are important new additions:&lt;br /&gt;	3.  statistical measures based on equal intervals: the&lt;br /&gt;(arithmetic) mean, and its associated measures of dispersion: the variance and the standard deviation.&lt;br /&gt;	4. we can now also use standard correlational techniques.&lt;br /&gt;&lt;br /&gt;	 Because the mean and the standard deviation, and correlational analyses are very useful statistical tools, social scientists like to use ordinal scales as if they were interval scales, as was stated in the previous section.  Most social scientists now seem to accept this practice. ("Pure" examples of interval scales in the social sciences are actually rather rare.)&lt;br /&gt;	Finally, do not confuse "interval scales" with a specific type of attitude scale, the Thurstone "equal appearing interval scales." Apart from the similarities in name these two scales are very different.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Ratio scales, or: back to basics&lt;br /&gt;&lt;br /&gt;	Ratio scales bring us back to familiar arithmetical examples: cookies, apples, or whatever else was used to teach you basic math.&lt;br /&gt;	Ratio scales have the characteristics of interval scales, plus the advantages of a natural zero point.  Take income, for example: dollars can be added and subtracted, multiplied, divided, and so on.  And you can start from a natural zero!&lt;br /&gt;	The mathematical operations applicable to ratio scales are all the ones that you are familiar with:&lt;br /&gt;	1.  counting;&lt;br /&gt;	2.  ranking;&lt;br /&gt;	3.  addition and subtraction;&lt;br /&gt;	as well as the most important new operations:&lt;br /&gt;	4.  multiplication and division.&lt;br /&gt;&lt;br /&gt;	As divisions establish ratios between numbers, this level of measurement is called a "ratio scale".&lt;br /&gt;	The statistical measures appropriate to ratio scales are the same ones as we applied to interval scaling, plus a rather obscure one hardly ever used in social science: the geometric mean, which you can safely ignore.&lt;br /&gt;	Examples of ratio scales, or variables at the ratio level of measurement in the social sciences, mainly deal with persons, objects or physical characteristics, such as space and time.  They include: number of residents in a community; annual income in dollars; number of children per family; number of years married; number of years education; number of working days; square foot per house; and so on.  Sometimes ratio scales deal with the frequency of events, e.g., the frequency of moves over the last ten years.&lt;br /&gt;	This tidiness of ratio scales is often lost to some degree when we combine values into groupings, or grouped data.  Take this question, for example:&lt;br /&gt;&lt;br /&gt;	41.  To the best of your knowledge, what was your total income in the past year?&lt;br /&gt;	Up to $15,000.................................................................................................1&lt;br /&gt;	$15,001-$25,000.............................................................................................2&lt;br /&gt;	$25,001-$35,000.............................................................................................3&lt;br /&gt;	$35,001-$45,000.............................................................................................4&lt;br /&gt;	$45,001-$55,000.............................................................................................5&lt;br /&gt;	$55,001-$65,000.............................................................................................6&lt;br /&gt;	$65,001 and over............................................................................................7&lt;br /&gt;	Don't know......................................................................................................9&lt;br /&gt;&lt;br /&gt;	Given the unequal size of the groupings, especially the ones at both ends, it may be dangerous to treat this variable as a ratio scale.  The treatment of grouped data requires some statistical caution, but we cannot go into that now.&lt;br /&gt;&lt;br /&gt;A Summary of Levels of Measurement Scales&lt;br /&gt;&lt;br /&gt;	What are the characteristics and implications of systems of numbers in social measurement?  The answer depends on the kind of variable that these numbers are used with: nominal variables, ordinal variables, interval variables, and ratio variables.  These variables are at different levels of measurement, and they form different measurement scales. &lt;br /&gt;	Nominal level variables are often called qualitative, as we are dealing here with qualitative differences between various categories, which are otherwise incomparable.  In ordinary language you might call these "cheese and chalk" or "apples and oranges" variables.  The other three levels of measurement are called quantitative variables, as they measure quantities of some characteristic. (As stated earlier, Baker classifies these scales somewhat differently.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2287613?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2287613'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2287613'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_02_04_archive.html#2287613' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2282770</id><published>2001-02-07T11:50:00.000-08:00</published><updated>2001-02-07T11:55:10.043-08:00</updated><title type='text'></title><content type='html'>15. AN OVERVIEW OF THE MAIN TYPES OF RELIABILITY AND VALIDITY&lt;br /&gt;&lt;br /&gt;THE MAIN TYPES OF RELIABILITY&lt;br /&gt;The reliability of a measure is defined here as the extent to which its measurement results remain the same under various (supposedly irrelevant) measurement conditions.&lt;br /&gt;&lt;br /&gt;I. EQUIVALENCE&lt;br /&gt;GENERAL QUESTION: what influence do various (supposedly irrelevant)  measurement conditions have on the data collected, where these data are collected at the same point in time?&lt;br /&gt;GENERAL APPROACH: compare data collected with the same instrument, but under slightly varying conditions. This results in the following main subtypes of equivalence:&lt;br /&gt;&lt;br /&gt;IA. Intersubjective agreement: vary human element: &lt;br /&gt;	1. inter-interviewer agreement: compare data collected by different interviewers on comparable samples;&lt;br /&gt;2. inter-coder agreement: compare results after different coders have coded the some set of interviews or other materials;&lt;br /&gt;	3. inter-rater agreement: compare results after different raters have rated the same set of subjects (or the same material);&lt;br /&gt;	4. inter-observer agreement: compare results after different observers have observed the some set of situations.&lt;br /&gt;&lt;br /&gt;IB. Comparability: vary instrument slightly:&lt;br /&gt;	1. agreement between single questions: compare results (for the same, or similar samples) of somewhat different question wordings, e.g., split ballot format.&lt;br /&gt;	2. alternate forms: compare results for supposedly similar multiple item instruments, composed of different items.&lt;br /&gt;&lt;br /&gt;II. STABILITY: repeat measures at different times&lt;br /&gt;GENERAL QUESTION: how do results on the same measure for the same&lt;br /&gt;subject, taken at two points in time, compare? It is assumed that&lt;br /&gt;the variable, and so the measurement results, should have&lt;br /&gt;remained the same.&lt;br /&gt;GENERAL APPROACH: compare data collected at two points in time&lt;br /&gt;for the some respondents, and the same instrument, assuming no&lt;br /&gt;change has occurred in target variable.  This results in one main&lt;br /&gt;type of reliability:&lt;br /&gt;&lt;br /&gt;IIA. Test-retest reliability: see description just given.&lt;br /&gt;&lt;br /&gt;III.  INTERNAL CONSISTENCY: check coherence of items&lt;br /&gt;GENERAL QUESTION: what is the general relationship between the&lt;br /&gt;items of a multiple item instrument, or between the items and&lt;br /&gt;their total score?  How do these results fit the theoretical&lt;br /&gt;expectations of the measurement model?&lt;br /&gt;GENERAL APPROACH: study correlations between the items of a&lt;br /&gt;multiple-item instrument, or between items and their total score,&lt;br /&gt;in various ways.  This results in the following subtypes:&lt;br /&gt;&lt;br /&gt;IIIA.  Split-half reliability: split the total instrument in&lt;br /&gt;various halves and compare the results.&lt;br /&gt;&lt;br /&gt;	1. odd-even: odd-numbered items form one half; even-numbered   items the other; compare results for same subjects on two halves.&lt;br /&gt;	2. coefficient alpha: calculates average split-half correlations for all possible halves for instrument.&lt;br /&gt;	3. Kuder-Richardson formula 20: comparable to coefficient alpha.&lt;br /&gt;&lt;br /&gt;IIIB.  Guttman scaling: assume Guttman model, and apply scalogram&lt;br /&gt;analysis.&lt;br /&gt;&lt;br /&gt;IIIC.  Inter-item correlations: study patterns of correlations&lt;br /&gt;between items: techniques such as factor analysis.&lt;br /&gt;&lt;br /&gt;IIID.  Item-score correlations: analyze correlations between items&lt;br /&gt;and their total score.&lt;br /&gt;&lt;br /&gt;1. Likert item selection technique: analyze means for the sameitem for bottom and top 25 percent of scorers.&lt;br /&gt;2. Phi coefficient: comparable to Likert, but compares bottom and top 50 percent of scores.&lt;br /&gt;	3. Item analysis procedures of a more refined type.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;THE MAIN TYPES OF VALIDITY AND VALIDATION&lt;br /&gt;&lt;br /&gt;VALIDITY: "the validity of a measurement is the extent to which&lt;br /&gt;its measures what it is intended to measure."&lt;br /&gt;&lt;br /&gt;Terminological note: The latest recommendations of the American Psychological Association and the American Educational Research Association now consider all types of validity sub-types of construct validation. Thus, they propose that we refer to the criterion-oriented type of construct validation, etc. That changes little to our discussion, however. (The APA and AERA do not recognize face validity as an acceptable type of validation.)&lt;br /&gt;&lt;br /&gt;I. CRITERION VALIDATION&lt;br /&gt;GENERAL QUESTION: how adequate is a new measure for measuring an already established concept or practical application, with an existing measure?&lt;br /&gt;GENERAL APPROACH: some accepted measurement procedure already exists for the target concept: the criterion measure.  The results for&lt;br /&gt;the new measurement procedure are compared with the results for&lt;br /&gt;the criterion measure for the same sample.  This results in criterion&lt;br /&gt;validation, with two subtypes:&lt;br /&gt;&lt;br /&gt;IA. concurrent validity: data for new measure and criterion measure collected at the same time, concurrently;&lt;br /&gt;&lt;br /&gt;IB. predictive validity: data for criterion measure are collected at a later time than those for new the measure, the predictor measure.&lt;br /&gt;&lt;br /&gt;II. CONSTRUCT VALIDATION&lt;br /&gt;GENERAL QUESTION: to what extent will the results of a new&lt;br /&gt;measurement procedure for a new concept, when correlated with&lt;br /&gt;measures of other concepts, confirm the hypotheses that follow&lt;br /&gt;from the theory associated with the introduction of the new&lt;br /&gt;concept?&lt;br /&gt;&lt;br /&gt;GENERAL APROACH: collect data on new measurement procedure for&lt;br /&gt;the target concept, and for theoretically related concepts, and&lt;br /&gt;check whether these results confirm the theoretically predicted&lt;br /&gt;results.&lt;br /&gt;&lt;br /&gt;MISCELLANEOUS TYPES OF VALIDITY&lt;br /&gt;&lt;br /&gt;FACE VALIDITY&lt;br /&gt;&lt;br /&gt;GENERAL QUESTION: does the measurement procedure look adequate&lt;br /&gt;for its intended purpose?&lt;br /&gt;&lt;br /&gt;GENERAL APPROACH: prima facie inspection of the instrument for possible shortcomings; no comparison with other data.&lt;br /&gt;&lt;br /&gt;CONTENT VALIDITY&lt;br /&gt;&lt;br /&gt;GENERAL QUESTION: does the measurement instrument adequately&lt;br /&gt;represent the content area of the target variable, or the target&lt;br /&gt;concept.&lt;br /&gt;GENERAL APPROACH: (a) where the content area is clearly defined,&lt;br /&gt;check whether the measure covers intended content; (b)"expert&lt;br /&gt;judgment": see whether other researchers agree with measure&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2282770?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2282770'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2282770'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_02_04_archive.html#2282770' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2282368</id><published>2001-02-07T11:16:00.000-08:00</published><updated>2001-02-07T11:21:32.250-08:00</updated><title type='text'></title><content type='html'>8. Validity in theoretical research &lt;br /&gt;&lt;br /&gt;The aim of theoretical research is the development of conceptual frameworks, linking concepts by a network of causal or statistical associations. The approaches to the problem of validity in theoretical research can only be understood against the background of the aims and purposes of theory development. It is therefore important that you have some grasp of the goals of theory development in dealing with theoretical validity. &lt;br /&gt;Validation procedures in theoretical research largely concentrate on two main issues: 1. Is a new measurement procedure a good way to measure an already well-established, well-measured concept? 2. Do the data resulting from a new measurement procedure for a new concept support the theoretical implications of a new concept, and its related theory. Although these issues are quite general and abstract, they shall be clarified by relevant examples from actual social research.&lt;br /&gt;&lt;br /&gt;9. The validation of new measures for old concepts &lt;br /&gt;&lt;br /&gt;In every subfield of social science a number of well-established theories can be found. The meaning of the concepts within the theory is relatively clear, and measurement procedures have been developed for the major concepts that are reasonably reliable and valid. As a major example in sociology we can point to social stratification theory, linking the concepts of education, income, and occupation to many other concepts. Other examples of such theories can be found in any introductory text, although no theory will be without its critics. Within social stratification theory the concept of socio-economic status and other concepts have demonstrated their utility, in that many studies have found that these concepts are strongly related to many other concepts. In other theories we can also find concepts that have similarly proven their value. Although these concepts and their measurement procedures are well-established, we may still want to validate new ways of measuring them. &lt;br /&gt;Why would we want new measures for old concepts with satisfactory measures? We may do so for a number of reasons. First of all, the new measure may be easier to use. Short versions of long attitude or personality measures have sometimes been developed, as they would take less time to administer. The results for the new, shorter version can then be compared with the results for the older, longer version, when the two are administered at the same time to a sample of subjects or respondents. Secondly, the new measure may be less reactive or obtrusive than the older measure. An observational measure of socio-economic status may be less intrusive than questions about income and education, for instance. Unobtrusive techniques were intended to produce measures which were less reactive. Ethnicity can often be accurately measured by pronunciation, even where the subject is trying to disguise his identity. (There are at least three incidents reported in history where invading foreigners were identified by their pronunciation of a single word. The most recent example concerns the Netherlands in 1940, where suspected German spies operating behind the Dutch lines had to pronounce the name "Scheveningen.") Thirdly, a new measure may have a wider or different range of applicability. A clear example comes from the area of mental health studies. Achenbach and his associates developed a child problem checklist that could be used to pinpoint potential mental health problems in boys and girls between 11 and 16 years of age. The checklist had to be filled in by parents. Since that original instrument new versions have been developed for younger children, as well as forms that could be filled in by the child's teacher or by the child itself (Achenbach, 1983). As a fourth reason, the new measure may be more precise or reliable. Although in daily life most of us do not encounter problems distinguishing men from women, for the Olympic Games more precise sex tests, in the form of chromosome counts are required. (The pronunciation test for ethnicity in Holland was also considered more reliable than other measures.) As the fifth and final reason, the new measure may be higher in systematic import, i.e., it may be linked to other concepts via stronger (and simpler) correlations than the existing measure. This is a claim made for the Wilson C-scale for conservatism, for instance, apart from additional claims that it is simpler to use and more reliable than other measures related to the same concept (Wilson, 1973). &lt;br /&gt;Social science examples of old measures being replaced by new ones tend to show only a few of the advantages just listed. In the physical sciences new measures may replace older measures for most of the reasons listed. The measurement of time, e.g., has moved from sun dials to pendulum clocks, to other kinds of clocks, to the present state of the art, atomic clocks. If you compare a sun dial as a measure of time with a quartz clock, you will find that most of the advantages listed apply. &lt;br /&gt;In validating a new measure for an established concept we proceed in general by correlating the results for the new measure with the results for an already existing measure. As the older measure can be considered as a criterion measure, we can consider such validation as a theoretical application of criterion validation.&lt;br /&gt;&lt;br /&gt;10. Validation strategies for new concepts &lt;br /&gt;&lt;br /&gt;In the last section we discussed the ways of validating new measures for established concepts with already existing measurement procedures. The next question follows naturally: how do we validate new measures for new concepts, for which no satisfactory measurements have as yet been developed? In general we shall assume that a new theoretical concept is introduced together with a related theory. In such  cases we are usually dealing with what often are called cluster concepts, linking many characteristics. Attitudes, to name one prominent kind of cluster concept in the social sciences, are not merely predispositions to act in one specific way toward one specific object. If an individual gives money to the Salvation Army once, we usually do not say on the basis of one such action that he has an altruistic attitude. Such an attitude refers to a general disposition to act, feel, and evaluate in a variety of altruistic ways toward a variety of objects, such as charitable organizations, deprived groups, and so forth. Being prejudiced does not merely imply that one acts once in a contemptuous manner towards one member of a minority group; the concept of prejudice assumes that individuals tend to react in a generalized manner, usually unfavourably, toward most, if not all, members of minority groups, in highly diverse situations. Another type of cluster concepts consists of personality traits.&lt;br /&gt;Cluster concepts can be considered as mini-theories, as they posit that specific regularities will be found; concepts are inconceivable without regularities. A simple example can clarify the point that a concept may have unwarranted assumptions, and thus be invalid. The theory of witchcraft, as developed in Europe in the Middle Ages and later, assumed that various empirical regularities tended to coincide: older women, who had formed a pact with the Devil, were able to slip through exceedingly small holes, fly through the air on broomsticks, goats, and other objects. These women were also capable of inflicting physical and other damage on other human beings, animals, buildings, crops, and other targets by supra-natural means. Because of their propensity to fly witches also were characterized by light body weight. In some European towns, such as Oudewater in Holland, the operationalist consequences of this witchcraft theory can still be seen: large weighing scales, to be used for the identification (measurement) of witches, who were supposedly light. How should we, as social researchers, approach the obvious validation question: is body weight a valid measure for the concept of witchdom? Can weighing scales distinguish validly between witches and non-witches? Unless you believe in witches, the absurdity of these questions should be evident. Confronted with such questions, we should answer that we do not share their assumptions: there is no acceptable evidence that witches exist. In other words: no such cluster of characteristics as low body weight, propensity to fly, etc., have been demonstrated (not to mention the assumption of the devil’s existence). As a result, merely asking whether body weight is a valid measure of being a witch is a misleading way of posing the question; it assumes that witches exist. We can only assess the validity of a cluster concept if we also consider the strength of the evidence supporting the assumptions implicit in that concept. That evidence has been called, not surprisingly, justificatory evidence by the philosopher Kaplan. (This points to one of the major differences between pragmatic and theoretical validation. In pragmatic validation, as long as a socially accepted criterion measure exists, you can engage in criterion validation.)&lt;br /&gt;Cluster concepts are frequently called theoretical constructs, and the method of validating them is therefore called construct validation.&lt;br /&gt;&lt;br /&gt;11. Construct validation: the example of “Becoming Modern”&lt;br /&gt;&lt;br /&gt;In construct validation it is necessary that the introduction of a new cluster concept be accompanied by a set of hypotheses linking that concept to other ones. These other, associated concepts may be causal factors, effects, or other correlates of the concept in question, as our example will demonstrate. We shall illustrate the procedures of construct validation by describing one large study in some detail. The study selected, conducted by Alex Inkeles and his co-workers, notably David Smith and Karen Miller, focused on the concept of individual modernity. The main results were published in a monograph, Becoming Modern, co-authored by Inkeles and Smith (1974), and a series of articles by Inkeles and others. The data for the project consisted of lengthy interviews with 5,500 men in a number of developing countries: Argentina, Bangladesh (then East Pakistan), Chile, India, Israel, and Nigeria. The fieldwork took place in the early sixties.&lt;br /&gt;What was the study topic of Becoming Modern? What was its theoretical perspective? &lt;br /&gt;&lt;br /&gt;In the Harvard six-nation study we were guided by a particular theoretical perspective. In most general terms, the main purpose of the research was to test whether, where, and how far individuals come to incorporate as personal attributes qualities which are analogous to or derive from the organizational properties of the institutions and the roles in which these individuals are regularly and deeply involved. To give this model greater specificity, we selected the factory as the embodiment of one major type of modern institution, so that our general question could be rephrased more concretely, as follows: "What are some of the personal qualities which extended service in a factory might inculcate in individuals who moved into such service after growing up in the typical agricultural village of one of the less developed countries?" (Inkeles, 1977: 142-143)&lt;br /&gt;&lt;br /&gt;Inkeles and his fellow researchers then performed an analysis of the characteristics of the factory system and the set of qualities they assumed the new factory worker would learn as a result of his occupational role.&lt;br /&gt;&lt;br /&gt;Among the qualities we expected... were a sense of personal efficacy, openness to new experience, respect for science and technology, acceptance of the necessity for strict scheduling of time, and a positive orientation toward planning ahead.  Each of these characteristics we then designated as components in our definition of the modern man conceived in psychosocial terms.&lt;br /&gt;&lt;br /&gt;To that set a series of other attributes, based on an analysis of the expected or required attributes of modern man as a participant citizen, as a family member, and in other roles, was added.&lt;br /&gt;&lt;br /&gt;For example, in the family realm we defined as more modern the insistence on selecting one's own spouse rather than accepting a spouse chosen by one's parents or other "elders," the preference for small rather than large families, and the willingness to practice birth control and the actual limitation of family size, as against the passive acceptance of "whatever number of children God might send."&lt;br /&gt;&lt;br /&gt;This analysis produced a list of 24 main themes, each considered&lt;br /&gt;part of the larger set of qualities defining individual modernity... It reflected a definite theoretical position, which we believe it should have, since that permitted testing whether certain explicit expectations underlying the definition were sound (Inkeles, 1977: 142-143).&lt;br /&gt;&lt;br /&gt;These lengthy quotations set out the basic theory of individual modernity, and partially describe the main elements of the modernity cluster or syndrome. A great number of items, 119, were then developed to cover each of these elements. A number of item selection procedures were used to produce an individual modernity scale. (Actually a number of scales were produced by  different item selection techniques, but a detailed overview is unnecessary for our purpose.) This scale was subsequently checked for its internal consistency, and these results were satisfactory. &lt;br /&gt;How could the researchers now proceed in testing the validity of their modernity scale? Could they use criterion validation? Clearly not, as there is no single, clear and accepted criterion of individual modernity. This resulted in a dilemma, which Inkeles and Smith describe as follows. (The OM scale is the Overall Modernity scale, combining the various aspects or dimensions of the modernity cluster.) As the authors set out their problem very clearly we shall quote a rather lengthy section of Becoming Modern. &lt;br /&gt;&lt;br /&gt;To prove its worth a scale not only must distinguish one individual from another, but must do so accurately. The usual method for establishing the validity of a scale is to apply it to people whose characteristics are already known by some independent criterion, which is why this approach is called the "criterion method of scale validation." If we were devising a test of psychic adjustment, for example, we might compare the scores of patients in a mental hospital with those of individuals whom psychiatrists had rated as well adjusted. If the scale was any good we would expect it correctly to identify the criterion group of hospital patients. Even the method of validation by d criterion group of "known" quality is full of vicissitudes, as the example just given will surely suggest. Our problem, however, was even more serious. There simply is no generally accepted external criterion by which we can certify a man to be modern. Indeed, one objective of our project was precisely to establish who were the modern men.&lt;br /&gt;Our theory of modernization offered a way out, but it also put us on the horns of a dilemma. The theory held that certain institutions and experiences have the capacity to change men in ways which make them more modern. We assumed that the more such experiences a man had been exposed to, the greater would be the degree of his individual modernity, as expressed in his attitudes, values, and behavior. Therefore, if the OM scale was valid it should have assigned higher scores to men who had been much exposed to modernizing institutions and experiences. In other words, our theory indicated that we should take certain objective social characteristics as defining the external criteria by which to test the OM scale. Accordingly, those who were better educated, who worked in industry rather than agriculture, who lived in the city rather than the countryside, and who made above-average use of the mass media should have scored as more modern.&lt;br /&gt;&lt;br /&gt;Although this sounded very plausible, we had good reason to hesitate before committing ourselves to this method for testing the validity of the OM scale. The proposed approach suffered from the defect that it assumed the correctness of the very theory we were attempting to test. We therefore faced the prospect of being confronted by a dilemma should we discover that individuals more exposed to modernizing experiences failed to score higher on the OM scale. We had to recognize hat if such were the outcome we would be faced with two alternative explanations without being able to choose between them. One alternative would be to argue that the fault was in the OM scale. In other words, one might maintain that the institutions cited did actually change men in ways which made them more modern, but that the OM scale failed to reflect those changes. Adopting that explanation would imply that our theory of change had been correct, but the OM scale was invalid. The second alternative would be to assume that the OM scale was quite good at telling which men were truly modern, but that the institutions cited did not contribute to making them so. Adopting that interpretation would lead to the conclusion that the scale was valid, but that our theory about the causes of individual change had been incorrect.&lt;br /&gt;Although we were distressed by the prospect of great ambiguity should the OM scale fail to be positively associated with modernizing experiences, we saw no alternative for establishing the validity of the scale. And we took comfort in the realization that should the OM score indicate greater modernity among those more exposed to modernizing experiences, we would be a double winner. That result, we felt, would establish simultaneously that the OM scale was valid as judged by an external criterion and that increased exposure to modernizing institutions brought about greater individual modernity. Thus, our causal theory would be proved correct, and the scale established as valid, simultaneously.  (Inkeles and Smith, 1974: 119-121; emphases in original.)  &lt;br /&gt;&lt;br /&gt;Let's review how Inkeles and Smith proposed validating their overall modernity, OM scale. Initially they tested the scale for internal consistency, which can be considered as a step in the trait validation process. After that: &lt;br /&gt;1. first they set out the theory, listing which causal factors or  "modernizing experiences" would be associated with high scores on the OM scale; &lt;br /&gt;2. they then checked whether the data indeed supported their  theoretical predictions that increased exposure to these  modernizing experiences would be associated with high scores on the OM scale; &lt;br /&gt;3. if this were the caser they would be double winners: their theory would be supported, and their measure proven valid; &lt;br /&gt;4. on the other hand, if the results were negative, there would be a number of possibilities:  &lt;br /&gt;a. the scale was valid, but the theory incorrect;  &lt;br /&gt;b. the scale invalid, but the theory correct; &lt;br /&gt;and, although Inkeles and Smith overlooked this possibility:  &lt;br /&gt;c. the scale was invalid, and the theory incorrect.  &lt;br /&gt;&lt;br /&gt;Inkeles and Smith had specified that nine different experiences would lead to an increase in the OM scores, and they developed specific hypotheses for these nine causal factors. An example is that "Higher OM Scale scores will be associated wit higher mass media exposure levels" (Inkeles and Smith, 1974:161). These hypotheses were tested statistically, and in general the predictions were well supported by the results.    &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;12. The strategies of construct validation &lt;br /&gt;&lt;br /&gt;The example of the Harvard six-nation study illustrates the main characteristics of construct validation This set of procedures can be applied when a new concept is proposed, that is linked to other concepts in a scientific theory. The justification for this procedure was given by Cronbach and Meehl, when they stated:  &lt;br /&gt;&lt;br /&gt;Scientifically speaking, too "make clear what something is" means to set forth the laws in which it occurs. We s a@ll refer to the interlocking system of laws which constitute a theory as a construct network (Cronbach and Meehl, 1956).  &lt;br /&gt;&lt;br /&gt;Construct validation can be applied in the situation where a tentative measure of a proposed new concept has to be tested as to its validity. That concept should be clearly linked to other, measurable concepts via a theoretical network. The procedures of construct validation should follow these general steps:  &lt;br /&gt;&lt;br /&gt;1. As a preliminary step the reliability of the measuring instrument should be assessed; &lt;br /&gt;2. The hypotheses linking the new measure are then tested by  analyzing whatever data are available; &lt;br /&gt;3. Where the results tend to support the hypotheses, both the  adequacy of the proposed theory and the validity of the new measure are supported; &lt;br /&gt;4. Where the results do not tend to support the hypotheses put  forward, this can be due to:  &lt;br /&gt;a. the adequacy of the proposed new theory;  &lt;br /&gt;b. the lack of validity of the new measure; or  &lt;br /&gt;c. some combination of these two possibilities.  &lt;br /&gt;&lt;br /&gt;(We assume here that other aspects of our research are not to blame for the negative results, e.g., the results are not caused by inadequacies of experimental design.)   &lt;br /&gt;&lt;br /&gt;13. The notion of content validity  &lt;br /&gt;&lt;br /&gt;A very different type of validity than the ones we have just is formed by content validity. Content validity refers to the extent to which a measure, usually a multiple item instrument, represents the specific content area of a given target variable or target concept. As such it is most useful in educational tests, where the curriculum or other guidelines often delimit the content area for a test of, say, grade 13 calculus, rather carefully. Nevertheless content validity is also an important criterion in other areas of applied and pure research. The authors of an inventory of measures of occupational attitudes evaluated all the instruments considered on the "proper sampling of content:"  &lt;br /&gt;&lt;br /&gt;Proper sampling is not easy to achieve, nor can exact rules be specified for ensuring that it is done properly... Nevertheless, there is little doubt of the critical nature of the general aim in scale construction... In the job satisfaction area, we have given detailed consideration to the analysis of responses to open-ended questions from representative samples which ask the respondent, "What things (do you like best) (don't you like) about your job?" We feel that these responses offer invaluable guidelines to the researcher as to both the universe of factors he should be covering and the weight that should be given to such factors. (Robinson, Athanasiou, Head, 1969:4)&lt;br /&gt;&lt;br /&gt;      Paul Lazarsfeld has suggested a method, called conceptual&lt;br /&gt;analysis, to increase the content validity of attitude scales and&lt;br /&gt;similar instrument.  He suggests the following four steps:&lt;br /&gt;&lt;br /&gt;1. "the creation of a rather vague image or construct;"&lt;br /&gt;2. the concept is then elaborated by the specification of&lt;br /&gt;   aspects or dimensions of the concept;&lt;br /&gt;3. for each of these dimensions indicators or items are then&lt;br /&gt;   developed;&lt;br /&gt;4. the best items for each dimension are then combined in an&lt;br /&gt;   overall scale. (Lazarsfeld, 1959)&lt;br /&gt;&lt;br /&gt;     In the development of the Wilson Conservatism Scale the following dimensions of conservatism were specified, for instance (Wilson and Patterson, 1967:3):&lt;br /&gt;&lt;br /&gt;a. Religious fundamentalism&lt;br /&gt;b. Right-wing political orientation&lt;br /&gt;c. Insistence on strict rules and punishment&lt;br /&gt;d. Intolerance of minority groups&lt;br /&gt;e. Preference for conventional art, clothing, and institutions&lt;br /&gt;f. Anti-hedonistic outlook&lt;br /&gt;g. Superstitious resistance to science&lt;br /&gt;&lt;br /&gt;     In this manner the content area of a particular scale will at  least be considered explicitly, rather than being left unconsidered.   (In other parts of scale development one would have to consider aspects of item selection, internal consistency, validity, and so on.)&lt;br /&gt;     Although the content validity of an instrument can vary from poor to good, its assessment still has to rely on the considered review by    researchers. In the end the level of scale's content validity is therefore a matter of professional judgment.&lt;br /&gt; &lt;br /&gt;14. Validity, reliability, and measurement error&lt;br /&gt;&lt;br /&gt;What is the relationship between reliability and validity?&lt;br /&gt;There are various ways to answer this question.  First of all, if&lt;br /&gt;a measure is valid, it necessarily is reliable; it cannot be&lt;br /&gt;subject to high levels of measurement error, as that would&lt;br /&gt;necessarily decrease its validity.  If a scale measures, say, the&lt;br /&gt;concept of prejudice accurately, it cannot be strongly influenced&lt;br /&gt;by extraneous factors.&lt;br /&gt; 	If we take the first two types of reliability, equivalence&lt;br /&gt;and stability, we can state that low reliability virtually&lt;br /&gt;excludes high validity.  If a measure is highly susceptible to&lt;br /&gt;measurement error, it is unlikely to be a good measure for any&lt;br /&gt;concept. (The only exception is that such a measure may be useful&lt;br /&gt;as an indicator of measurement error, say, social desirability&lt;br /&gt;set.) On the other hand, high reliability does not guarantee high&lt;br /&gt;validity: a consistent measure may still be off! In summary, a&lt;br /&gt;valid measure is necessarily reliable; a reliable measure is not&lt;br /&gt;necessarily valid, but an unreliable measure is always invalid. In other terms, reliability is a necessary, but not sufficient condition of validity. (The only combination not possible is unreliable but valid.)&lt;br /&gt;      As far as internal consistency forms of reliability and&lt;br /&gt;their relationships to validity are concerned, they are complex&lt;br /&gt;and depend on the measurement model for the target variable.  But&lt;br /&gt;we shall not dwell further on this topic.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;     In sociological methodology a somewhat different approach to the adequacy of data is presently gaining in prominence. In this model, which has largely been derived from psychology, a respondent's score on a test or scale is considered the result of two factors: the respondent's true score on the variable measured, and a separate error score, which is due to various measurement errors and biases. (Sometimes the measurement error is divided into two separate components, the systematic error--bias-- and the random error.) This model is formalized in the following formula:&lt;br /&gt;&lt;br /&gt;0(m) = T(s) + E(s),&lt;br /&gt;&lt;br /&gt;or: the observed measurement 0(m) equals the sum of the true&lt;br /&gt;score T(s) and the error score E(s).&lt;br /&gt;      This measurement error approach is very useful in developing&lt;br /&gt;statistical approaches to measurement error, as well as to the influence of measurement error on statistical relationships between variables.  Some authors have suggested that this model should replace the older approaches to validity and reliability that have been covered in this chapter.  Still it is likely that the latter approaches will remain important, as they offer substantive    approaches to the assessment of measurement adequacy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2282368?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2282368'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2282368'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_02_04_archive.html#2282368' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2203846</id><published>2001-02-01T08:02:00.000-08:00</published><updated>2001-02-01T08:06:11.750-08:00</updated><title type='text'></title><content type='html'>ATTENTION; THE CLASS OF MONDAY, FEBRUARY 5, HAS BEEN CANCELLED. PLEASE READ THE MATERIAL BELOW. PLEASE LET YOUR FRIENDS AND CLASSMATES KNOW!&lt;br /&gt;&lt;br /&gt;SEE YOU ON WEDNESDAY!&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2203846?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2203846'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2203846'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_01_28_archive.html#2203846' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2165931</id><published>2001-01-29T12:08:00.000-08:00</published><updated>2001-01-29T14:12:41.463-08:00</updated><title type='text'></title><content type='html'>WALTER SCHWAGER: THE RELIABILITY AND VALIDITY OF DATA  &lt;br /&gt;&lt;br /&gt;1. The adequacy of data: the general problem &lt;br /&gt;&lt;br /&gt;In social research numerous factors can influence the results of our data collection process, such as the wording of the questions, interviewer characteristics and behaviour, coder interpretations of open-ended questions, and others. Given the various aspects that can influence our data collection, we may well wonder how trustworthy or adequate the resulting data really are. But how can we check the adequacy of our data? We check the trustworthiness of the testimony of a witness in a court case by comparing it with the testimony provided by other witnesses. These witnesses may be more or less trustworthy, but none of them is beyond doubt; even the evidence provided by a police officer or an "expert witness" may be questioned. In a similar fashion, we can only evaluate our data by comparing our data with other information that is available. But that results in a peculiar problem: the only alternative information available consists of other sets of data, which are also open to doubt. Doesn’t that lead us into a vicious circle of checking doubtful data with other dubious data? Although this is to some extent true, the situation is fortunately not totally hopeless. Courts usually manage to cope with comparable situations quite adequately, even where all available evidence is (in principle) open to doubt. What a court is supposed to do is to relate and compare the various parts of testimony, and to piece them together into a coherent web of evidence, which as a total is beyond reasonable doubt. In that process some witnesses, who appear more credible, may be given more credence than others. In a similar fashion we can evaluate the adequacy or trustworthiness of some set of data by relating it to other available sets of data. We can escape being locked into a vicious circle of dubious data because we can check the various types of supporting (or conflicting) evidence, and the amount of such evidence. In addition not all kinds of data are equally open to doubt. An example, taken from census data: we can consider whether data indicating that a person is Chinese are trustworthy. If that person claims to speak Hindi, we seem to have some conflicting data (assuming that Chinese are unlikely to speak those languages). However, if we subsequently find that that person was born in the China, we are more likely to accept the data on ethnicity, and to reject the data on language. Such consistency checking forms one possible test of the quality of our data.	&lt;br /&gt;Data are never beyond doubt, and neither are the resulting conclusions, findings, and theories. Scientific findings always remain tentative, open to criticism, modification, or rejection. Some philosophers have tried to identify "indubitable data," which could provide an unshakeable foundation for the edifice of science, but it now appears that their quest has been fruitless. "Hard" data, in the sense of data that are beyond doubt, do not exist.&lt;br /&gt;The topic of the adequacy of data in social science has generally been discussed under the twin headings of reliability and validity. The problems subsumed under the heading of reliability mainly deal with the sensitivity of a measure, such as a question or an attitude scale, to the influence of (supposedly irrelevant) measurement conditions. To test for the influence of a given condition, e.g., question wording, we vary that condition, and compare the results for various conditions. Where the results differ, the varied condition has had an effect, and the reliability of such an instrument is therefore low. As this example illustrates, in reliability tests we generally compare evidence that is very similar, and only differs in a few respects, such as wording.       &lt;br /&gt; The problem of validity deals with the suitability of a measure for its intended purpose. If we use a specific question to measure "ethnic origin," how good is that question for its purpose? Validity is sometimes defined as "the extent to which a measure measures what it purports to measure." In evaluating the validity of a given measure of a certain target concept, we generally introduce other measures of the same concept. This other information has usually been collected by different methods. Consequently in validity tests we generally compare evidence that is quite dissimilar (say, different measures of the same concept), whereas in reliability tests we compare more similar evidence: results of the same measures collected under somewhat different measurement conditions. Aspects of validity we shall not discuss here include "internal" and "external" validity, which deal with the adequacy of experiments, rather than of measurement.&lt;br /&gt; A complication for our purposes arises from the fact that most analyses of reliability and validity stem from the disciplines of psychology and education, and are therefore only partially applicable to other types of social research. Psychological and educational measurement are very pervasive, especially in the U.S.A. but also elsewhere, and these tests often have a direct and major impact on people’s lives: you may be refused entry to a job, a prestigious university,  or an educational program if your score is too low, e.g.  Therefore tests in these fields have been subject to considerable scrutiny, often even  been subject to court challenges. For example, there is an enormous research and general literature dealing with these tests and measures, such as intelligence and the college entrance SAT (Scholastic Aptitude Test). As a result these disciplines have had to pay much more attention than sociologists  to issues of measurement adequacy.&lt;br /&gt;In psychology research topics include personality traits, the measurement of intellectual abilities, attitudes, and so on. In education attention is mostly focussed on the development of assessment tests in specific educational subjects, such as high school physics, basic English, or general aptitudes. In both these fields there was and is little interest in single questions, as used in social surveys; instead, often only multiple item instruments were used (as well as other methods, such as projective techniques).  In other forms of social research single questions are very often used to measure opinions, intentions and beliefs, to collect factual information, and in other ways. Whenever multiple item instruments are used, they tend to be attitude scales, rather than personality test or aptitude tests. As a result, the analyses developed in psychology and education fit other social research only partially. Nevertheless we shall follow these analyses, because they have been the most systematic, sophisticated, and also the most influential reviews of these problems, even in other disciplines. &lt;br /&gt;&lt;br /&gt;2. The reliability of data&lt;br /&gt;&lt;br /&gt;How do we know whether the way a question is formulated, or an interviewer's characteristics, make a difference in the results of our data collection? Simply by comparing the results that we get for different question formulations, by checking the results obtained by different interviewers, and so on. In this approach to data quality we compare data obtained under largely similar conditions, with only one aspect changed. We simply check whether changing the measurement conditions (wordings, interviewers, etc.) produced a change in our results. In a general sense the reliability of a measurement procedure can be defined as the extent to which the measurement results remain the same under various measurement conditions. (These measurement conditions are supposed not to have such an influence.) Procedures for measuring reliability largely consist of comparing data that are very similar in nature, as is shown in the examples given above. If we compare data collected at the same time, but under slightly differing conditions, we test for a general type of reliability called equivalence. Equivalence refers to the similarity of data under conditions where the researcher or research assistant may vary, or the written instrument or question is changed. A subtype: where the "human factor" in the research situation is varied, we have tests for intersubjective agreement or objectivity. The question addressed here is: will two interviewers (rater, coders, observers) obtain the same results? A clear example of inter-rater agreement can be seen in figure skating competitions, where the judges have to rate a competitor on a ten-point scale. In other situations we may refer to inter-interviewer, inter-rater, inter-coder, or inter-observer agreement. In practice we often refer to the lack of such reliability, in terms of interviewer effect, coder error, and so on. Because science is a social enterprise, this type of intersubjective agreement is very important. Your results as a researcher must be repeatable by other researchers, and you must be able to come to agreement about the facts, i.e., the data. Without such basic intersubjective agreement science as a shared, social undertaking would become impossible. It is the lack of such agreement and openness to intersubjective checks which makes some matters, e.g., psychic phenomena, so difficult to study. A psychic may claim to "see" things, but other persons cannot repeat this experience. If different interviewers, working with similar samples, return with very different results, the research process is in trouble. Under such conditions it is hard to come to any conclusions, as the data appear largely the result of the individual biases of the interviewer. (On the other hand, mere intersubjective agreement is not sufficient to establish validity: a unanimous jury may still be wrong!)&lt;br /&gt; Another type of equivalence concerns comparable instruments or questions. We may, say, compare results for a single  question, phrased in a positive fashion ("Do you think the United States should allow public speeches against democracy?") and results for a similar question, phrased in a negative way. ("Do you think the United States should forbid public speeches against democracy?") One would expect that the two questions would elicit the same results, but that is often not the case. (In the case just mentioned, there was at least 20% difference between the results.) This kind of equivalence might be called comparability. In the same manner we can compare results on the same question, applied via a personal interview, a telephone interview, mail questionnaires, and so on. Other approaches to comparability address the following question: do two attitude scales, consisting of different items lead to comparable results? The simplest examples are from the educational field: how do we know that a grade 13 mathematics test in one high school is comparable to another grade 13 math test in another high school? (Without standardized  exams this is a serious problem, which may be related to grade inflation.) The way to compare different samples of items measuring the same content is to construct alternate forms of the same attitude scale or aptitude test, administer both to the same sample, and compare the results. &lt;br /&gt;In summary: the first major type of reliability we have discussed is equivalence, where we compare data obtained at the same time, under slightly different conditions, which are not supposed to influence the results.&lt;br /&gt; Another major type of reliability is called stability. In stability checks we compare two or more measurements of the same respondent, taken at different times, assuming that the variable we measure has not changed in the interval. The standard example is that of a bathroom scale: if you measure yourself one day, and again the next, these two results should be similar, supposing your weight has not changed over the period between the two measurements. This type of reliability is therefore called test-retest reliability. It acts as a general check on all the relevant measurement conditions that may have a bearing on the result: the question or the attitude scale, the interviewer and so on. In contrast to the previous kinds of reliability we discussed test-retest reliability is not a check on the influence of specific measurement conditions (where we compare, say, different interviewers) but rather a global check on the total measurement situation. The crucial assumption underlying test-retest reliability is that no change has occurred between the two measurements at different moments, and this may be a dangerous and incorrect assumption. Apart from all other factors causing change, the first interview or other measurement may have an effect on respondents' attitudes, opinions, etc. (In experimental design, this is called the pretest effect.) Test-retest reliability checks are thus unsuitable for changeable personal characteristics or states; instead they assume fixed individual traits.&lt;br /&gt;Low test-retest reliability may also indicate that the sample studied has no clear ideas on the study topic. If you question respondents on some topic that has little relevance for them, they may be unable to answer, because they don't know much about the matter at hand. In that situation they have to guess, or make up an answer. The next time you come around, they may answer differently, thus making for low test-retest reliability. In using test-retest approaches it is accordingly important that such random responses are weeded out, by allowing "don't know" responses, or by other means, say, filter questions (where you ask respondents first whether they are familiar with a certain topic, before you ask their opinion on that subject). Otherwise you may reject an instrument which works fine for that segment of the sample with well-crystallized attitudes or opinions, but which has low test-retest reliability for the uninformed guessers. In a slightly different fashion test-retest measures can also be used to check other, more specific kinds of stability. After a coder has marked 500 interviews you can ask her to recode some of her earliest work "blind", i.e. without her knowing how she coded them before. That way you can check whether her interpretations have changed with time or remained stable, i.e. whether "coder drift" has occurred or not. In similar fashion observers can be asked to categorize filmed or videotaped situations once more, or teachers to re-mark the first exams they marked. In this approach retesting is used as a control on a specific type of measurement condition, rather than a global check. &lt;br /&gt;The use of test-retest methods is most appropriate in learning, laboratory, or clinical situations. It is used quite often in the development of attitude scales. In social surveys the retesting of respondents is not very practical for a number of reasons; still retesting does occur in social surveys, especially ones designed to test the reliability of measurement, and to check coder drift and other possible sources of error. The stability of measures is therefore a major subtype of reliability, with the test-retest approach as the associated method of comparing results. The major types of reliability we have discussed in this section follow.&lt;br /&gt;&lt;br /&gt;I. EQUIVALENCE&lt;br /&gt;A. Intersubjective agreement&lt;br /&gt;     1. inter-interviewer agreement&lt;br /&gt;     2. inter-coder agreement&lt;br /&gt;     3. inter-rater agreement&lt;br /&gt;     4. inter-observer agreement&lt;br /&gt;B. Comparability&lt;br /&gt;     1. agreement between single questions&lt;br /&gt;     2. different forms of multiple item instruments: alternate  forms&lt;br /&gt;&lt;br /&gt;II. STABILITY&lt;br /&gt;     1. test-retest reliability&lt;br /&gt;&lt;br /&gt;A final point concerns the relationship between reliability and the precision of your results.  Put simply, if your data are precise and refined, then they are more likely to be unreliable. In other words, when precision increases, reliability decreases and vice versa.  If two raters have to mark a student on a scale of 1 to 100, they are more likely to disagree than when asked to use a pass-fail scale, for instance.  You consequently cannot push your data beyond a certain level of precision, without paying the price of decreased reliability.  As a researcher you can ask your respondents how many minutes they watched television last week, but those precise results probably are unreliable and untrustworthy.			&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;3. The internal consistency of multiple item instruments&lt;br /&gt;&lt;br /&gt;In the alternate forms type of reliability we develop two different (but comparable) tests or scales, measuring the same general content area, and then correlate a respondent's scores on the one test with his or her scores on the other. After all, if two attitude scales both claim to measure religiosity, or if two tests both claim to measure musical knowledge (at the same level), then they should correlate. The same kind of argument can be developed for the items that together make up a single scale. In that case one can split an attitude scale of, say, twenty items into two subscales, each containing ten items, and compare the scores on both subscales. This comparison produces a measure of split-half reliability.  In the early applications of split-half reliability generally two equivalent halves were compared.  One way of finding those halves is by putting all the odd-numbered items in one sub-scale, and all the even-numbered in the other.  Not surprisingly, this is called the odd-even method of split-half reliability.  Later statistical measures were developed which reflect the correlations between all possible halves, but these are too complicated statistically for our purposes. (They include coefficient alpha and the Kuder-Richardson formula 20.) Split-half reliability is nowadays considered as one aspect of a general set of reliability problems, dealing with the relationships between the items of a multiple item instrument, such as an attitude scale.  The aspects of reliability dealt with here are referred to as the internal consistency of an instrument.&lt;br /&gt;	With regard to the items in a scale, you may ask: are these items comparable, i.e. are they sufficiently correlated? If two items both measure, say, alienation, one would hope that the items are highly correlated. (A similar reasoning is applied when we compare two single questions under the heading of equivalence.) Unfortunately matters are not that straightforward: what is "sufficient" as a correlation between items (or between items and their total score) depends on the measurement model. An example should clarify this matter, but we return to these issues of scaling later in this course. &lt;br /&gt;	In Guttman scaling we have a specific model assuming that respondents will not react in the same way to different items. Items are supposedly ranked from weak to strong, and before you can get to the strong items you must pass through the weaker ones first. The Guttman scale can be compared to an onion (to get to the heart, you have to get through the outer layers first) or to a staircase (to get to the top, you must climb the lower steps first). &lt;br /&gt;	The Bogardus social distance scale is a famous example of the Guttman model, which measures the acceptability of one group to members of another group, say an ethnic category. In a study of attitudes to Martians you might ask your sample: &lt;br /&gt;&lt;br /&gt;Would you be willing to accept Martians as&lt;br /&gt;1. visitors to your country?&lt;br /&gt;2. as citizens in your country?&lt;br /&gt;3. to employment in your occupation?&lt;br /&gt;4. to your street as neighbours?&lt;br /&gt;5. to your club as personal chums?&lt;br /&gt;6. to close kinship by marriage?&lt;br /&gt;&lt;br /&gt;The idea of this scale is that you have to accept Martians at levels 1-4, if you accept them at level 5. Similarly, if you reject them at level 3, you would reject them also at levels 4, 5, and 6.  The Bogardus scale, as a Guttman scale, has particular assumptions, and associated ideas on what constitutes "error:” it would be an error if respondents answered “yes” to item 6 and 4, but “no” to items 1, 2,3, and 5.&lt;br /&gt; &lt;br /&gt;(The model for the Likert summated ratings technique of attitude scaling assumed a correlation between each item and the total score (for the two extreme groups):  this item-score correlation is therefore a reliability measure for the Likert scale. Other item-total score approaches are possible, e.g., by not comparing the two extreme groups of scorers, but rather the bottom half of the scorers and the top 50 per cent. Under some conditions this may lead to different results, but we cannot go into that.) &lt;br /&gt; &lt;br /&gt;	Other methods to check the inter-correlations between the items in a multiple item instrument cen be found, such as factor analysis and other statistical techniques. The point remains, however, that such intercorrelations between items, or other evidence of internal consistency are only desirable if your measurement model requires them.  As we saw for the Guttman scaling model, that is not always the case.  &lt;br /&gt;	Other situations in which a high level  of internal consistency may be unnecessary, if not undesirable, occur in applied research. An industrial sociologist may want to measure "job satisfaction," and include satisfaction with the following job components in her  measure:  a. satisfaction with financial conditions;  b. satisfaction with the physical job environment;  c. satisfaction with job security;  d. satisfaction with supervisory personnel; e. satisfaction with one's immediate colleagues. Strong correlations between these various components are not of interest to her, as long as the measure "works", i.e. is valid as a measure of job satisfaction.  A typical aspect of internal consistency measures is that you do not merely relate a couple of data sets, as was the case in other  types of reliability; instead you consider the interrelationships between a whole set of items at the same time. (Some authors have pointed out that very strong correlations between items are actually undesirable. Having perfect correlations between 10 items in an attitude scale means that each respondent that agreed to one item agreed to all, and each respondent that disagreed with an item disagreed with all others as well. Your respondents would then fall into only two groups: those with a score of 0, and with a score of 10. In that case you might as well have a scale that consisted of a single question.)&lt;br /&gt;In summary we can list the following points regarding internal consistency measures:  &lt;br /&gt;a. internal consistency measures address the question of the interrelationships between the items of a multiple item instrument; &lt;br /&gt;b. they do so by analyzing the relationships between these items, or between the items and their total score; &lt;br /&gt;c. the major types of internal consistency measures are split-half reliability (with odd-even reliability as one subtype); coefficient alpha, and Kuder-Richardson formula 20 as extensions of split-half reliability; the Guttman coefficient of reproducibility; and other techniques such as factor analysis;  &lt;br /&gt;d. these measures do not always give you the same results;  &lt;br /&gt;e. what constitutes a desirable result depends upon your measurement model and its criteria.&lt;br /&gt;&lt;br /&gt;*************************************************************************&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2165931?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2165931'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2165931'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_01_28_archive.html#2165931' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2165920</id><published>2001-01-29T12:07:00.000-08:00</published><updated>2001-01-29T10:53:54.326-08:00</updated><title type='text'></title><content type='html'>4. The validity of measurement&lt;br /&gt;&lt;br /&gt; In the preceding sections reliability of measurement has been defined as the extent to which the results of a measurement process remain the same under various measurement conditions. The problems of validity are concerned with the aims and objectives of such measurement processes, rather than the influence of measurement conditions. We collect data for specific purposes, and the general question of validity concerns the adequacy of the data for such ends. The aim for which the data are intended in practical research shall be called the target variable; the purpose of theoretical measurement shall be referred to as the target concept.&lt;br /&gt;	Please note that the same data can be used for various purposes: unemployment information can be used to measure available surplus labour, or it can be used to measure destitution or poverty. Voting intentions can be used to predict election results, or they can be used to measure political attitudes. The answers to a question like "Do you feel in good health?" may be a useful and valid measure of subjective health, but only a dubious, invalid measure of objective health status. The adequacy of the data depends upon the purpose to which they are put; when we talk about the validity of some set of data, we therefore always have to specify the purpose for which they are more or less valid. The validity of a specific measure is always its validity for a given purpose. The next topic deserves a separate discussion. Why is the validity of a measure a problem at all?&lt;br /&gt;&lt;br /&gt;5. The inadequacy of face validity &lt;br /&gt;&lt;br /&gt;	If we want to know a person's age, what would be more reasonable than to ask: "What is your age?" And why can't we accept the answers as they are given? The question deals with the topic, and it seems to be formulated reasonably clearly, so what is the problem? Such superficial inspection of a question (or questions) may result in the conclusion that the instrument "looks good;" in a more technical fashion it is sometimes said that the instrument has “face validity."&lt;br /&gt;	Why do we make such a fuss about validity problems; why can't we just accept face validity? In many studies we may have to rely on face validity, it should be admitted. If you want to know how many bottles of beer a person drinks per week, you may not have the resources to count the empties in the basement, and instead you may have to accept the answers on faith. And in a single sample survey you may be collecting information on so many variables that you cannot assess each of the questions used extensively. Put in a more glorified fashion, you may have to rely on the face validity of your questions. In general, however, such an uncritical acceptance of a measures that merely look good is less than desirable. There are many reasons why we should not be willing to accept measurements on face validity. One major set of reasons has already been given: the reliability of the questions may be low, for various reasons. People may not understand the question; they may be influenced by social desirability set, the options provided, apparently slight changes in wording, and so forth. People may lie about their age for various reasons, or give incorrect answers for some other reason. It has been found that respondents tend to give answers clustering around tens: twenty, thirty, and so on. Various errors and biases can arise in the development of a question, and we found that a superficial inspection of such a question will often not indicate that some defects exist in its phrasing. If our results could be so unreliable, how can we accept their validity on faith?&lt;br /&gt;	In theoretical research a more complicated objection arises against face validity, that can be put briefly as follows: concepts imply certain regularities in the world, and face validation may naively assume that the assumptions are correct. We may assume that our respondents will generally be left-wing or right-wing politically, while they may actually have shifting and varying orientations on different issues. A single question: "Would you generally consider yourself a political liberal, or a political conservative?" would seem to have face validity, but is based on erroneous assumptions about political attitudes. In summary, face validity can be described as a superficial inspection of the suitability of specific data for a specific purpose; however no other data are related to the data to be tested, i.e., no new evidence is introduced. Face validity may serve a purpose in checking to what extent an operational definition fits common usage. If you define "middle-aged" as starting at age 30, we may wonder whether this is what Mauritians in general understand by that term. This, however, could more appropriately be considered as an aspect of the acceptability of definitions, than as the validity of measurement. &lt;br /&gt;	One might say that face validity is sometimes accepted legally: the tick you put on an election ballot is accepted as the valid expression of your political preferences. Even under those circumstances the law tries to standardize the measurement situation: your vote cannot have been bought, it must have been a secret ballot, and other conditions have to apply. In this way the reliability of the measurement situation is maximized. One might say that in these situations face validity has been decreed by law, but even such declarations are not beyond criticism. After all, we often criticize dubious elections in dictatorships or totalitarian states. In face validation we do not bring in additional comparative data, but remain satisfied with a superficial inspection of the instrument (question or attitude scale) itself.&lt;br /&gt;&lt;br /&gt;6. Validation strategies in applied research&lt;br /&gt;&lt;br /&gt;	In applied research the variables we are interested in have generally been selected for their practical importance. We want to gather information, but that information is not going to be used for the development of a general theory. Instead it will be used to answer questions which are usually specific and limited: who is going to vote for the M.M.M. in the next election? How many high school students intend to go on to university? Which children are likely to become involved with the law? Which students will make good lawyers, or good dentists? Which employees are good prospects for promotion? The purpose that data are supposed to serve in applied research, the characteristic they have to measure, is called the target variable. The major validation question in applied research can now be formulated as follows: how adequate are the data for their target variable? This is often called practical or pragmatic validation.&lt;br /&gt;	We can use “known groups,” i.e., groups that are known to differ on the target variable to check whether a scale "works," i.e., whether it will discriminate between these criterion groups after the scale has been developed (see the reading on attitude scalin).  The Wilson Conservatism C-scale was validated by testing whether groups known to differ on conservatism would differ, as predicted, in their scores on the C-scale. Wilson was interested in measuring conservatism, which here is the target variable. In the same manner checklists of problem behaviours can be developed that will distinguish between, say, children that have mental health problems, and children that don't. In the latter case a psychiatric or psychological assessment of the children's mental health would be the benchmark that we could use to develop our checklist. Such an assessment constitutes the criterion measure for the development of a child mental health checklist. A criterion measure is some applied and accepted measure of the target variable, that can be used to test or validate a new measure. This overall strategy can be used in many contexts in education, personnel and industrial psychology, and many other fields. In general we study a sample of subjects that vary on some variable of importance: success as law students, mental health, competence as parents, level of child abuse, and so on. In educational and personnel testing such other evidence usually takes the form of criterion measures. When intelligence testing started, the validity of intelligence tests was assessed by correlating them with the criterion measure of teacher ratings. A test of occupational aptitude might be compared with a supervisor's rating, as the criterion variable.&lt;br /&gt;	This type of validation approach has therefore been given the label of criterion validation. In criterion validation the data to be tested are compared with the data produced by a criterion measure, which usually is an accepted, practical measure of the target variable. You may run into a problem, however, if these other data are of low validity: the teacher or supervisor may not be very good at rating students or employees. But even in those cases there is some practical agreement on what a good student or employee should be like, and you can continue the search for better criterion variables: test marks, job promotions and demotions, or other indicators measuring the criterion variables concerned. &lt;br /&gt;	But if a valid criterion variable already exists, why should we try to validate a new measure for the same variable? If we can trust supervisor ratings, why do we need a new occupational aptitude test? In applied research such a new measure may be desirable for the following reasons: 1. the new test may be easier or cheaper to apply: a teacher may need three months to get to know a student, and a test may only require an hour; 2. the new test may be applicable in more varied situations, say, where no teacher or supervisor is available; &lt;br /&gt;3. the new measure may be more reliable, and less susceptible to errors and bias; 4. the criterion variable may only apply later, while the new test may have predictive value. A graduate selection test may be validated on the later performance of the students tested, but such a test is useful as a predictor instrument (see below). Other reasons will be discussed in the sections on theoretical validation.&lt;br /&gt;	In some cases the comparative data for the criterion measure are collected around the same time as the data for the new measure being validated. In that case we talk about concurrent validation.  Although outside the area of psychological and educational testing this occurs only infrequently, questions or attitude scales may also be used to predict some future behaviour. Selection tests may be used to find promising law or dentistry students, to select good prospects for insurance sales careers, and so forth. In such cases the later success (or lack thereof) can be used to assess the validity of a selection test, a predictor, administered many years earlier. This validation strategy is, not surprisingly, called predictive validation. In other social science research predictive validation may be used where one studies educational intentions, the likelihood for an adolescent to end up as a juvenile delinquent (as in research by the Gluecks), or where it is attempted to predict how successful a marriage is going to be (as in studies by Burgess and others). The best known example of predictive validation in survey studies -although it is hardly ever considered under that heading- is in voting intention studies, used to predict the outcome of coming elections. (In some situations older, earlier evidence may be used to validate retrospective questions. This may occur where, say, hospital records are used to check whether former patients correctly  remember earlier hospitalizations. This is sometimes called a “reverse record check”.)&lt;br /&gt;	Most of the examples we have given here are of an applied nature: the target variables are of a practical kind, and are of importance in a specific setting. Ratings by supervisors, teachers, psychiatrists and others may form the criterion measures, as well as occupational or other forms of success (or lack thereof). Criterion validation procedures  will work successfully with such applied criteria, without us having to consider them critically. The new measure to be tested only has to work, i.e., correlate with the criterion measure; why or how it works is not important, although that may be of interest to the researcher. This uncritical acceptance of current criteria has not gone unchallenged, however. Imagine a study which finds that nearly all successful university administrators are male and white; should we then develop a selection test that excludes women and non-whites?  Tests that reflect such a status quo may be discriminatory.&lt;br /&gt;	In summary, the process of validating data in applied social research is sometimes called pragmatic validation, with criterion validation as the main strategy. In this procedure the data collected on the new instrument to be assessed are compared with the results for a more established measure for the target variable, i.e., the criterion measure. The two main subtypes of criterion validation are concurrent and predictive validation.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;7. Pragmatic validation in survey research&lt;br /&gt;&lt;br /&gt;	Validation procedures have not been developed in a coherent fashion for survey research. A number of possible reasons may explain this. In survey research we generally collect data on a large number of variables, and it is unfeasible to try and validate each single question in a reasonably lengthy interview. Consequently survey researchers may rely on a pretest to weed out unsuitable questions, and accept questions and scales otherwise on face validity. In experimental research in psychology usually only a few variables are involved, and it becomes feasible and important to try to validate these variables. For many sample surveys no comparative information is available that can be used for validation purposes, and in a crude sense most sample surveys can be said to "work."  Where discussions of validity in survey research can be found, they are frequently couched in different terms, such as measurement adequacy. Nevertheless pragmatic validation procedures do exist for survey research, and an overview of the major approaches will be given here. In criterion validation approaches we compare the results obtained by an instrument to be validated (a question or an attitude scale) with other data that are valid indicators of the target variable. Thus, if somebody claims to own a car, you can ask to see the ownership papers as supporting evidence (although policemen are more likely to do so than survey researchers). That supporting evidence can be used in two ways: in an individual, case-by-case fashion, where you compare data for each single case; or in a collective fashion, where you check the overall patterns of the sets of data being compared. The first approach is sometimes called a micro-match approach, and the second an aggregate approach.&lt;br /&gt;	An example of the micro-match approach can be found in hospitalization or crime victimization studies. In the last type of studies the names of crime victims are selected from police files, stretching back, say, three to five years. These respondents are then asked, as part of a survey, whether they have ever been victims of a crime, and their answers are compared with the record. The researcher can now study how valid retrospective questions on this subject matter are, and how the answers vary for factors like time lapsed since the crime, type of crime, and so on.&lt;br /&gt;	In these studies documented evidence is available that can be used in methodological studies of retrospective studies as a criterion measure. In other studies the amount of housework husbands claimed to do was compared with the estimates given by their wives (the latter generally gave a lower estimate). In many cases such an approach is clearly unfeasible; there you may have to rely on selected reports by interviewers. An interviewer may give you independent information on housing type and other observable variables. (Market researchers may be checking the contents of people's bathroom cupboards for brand names of aftershaves, etc., but such practices are ethically dubious, unless the respondent's permission is obtained.) Here the interviewer's information acts as a criterion measure.&lt;br /&gt;	With aggregate checks you expect that the data to be validated and the data that act as external evidence will show the same general distribution. If half your sample states that they're going to vote for a specific political party, and in the election only a quarter actually do so, then something is wrong with the questions asked (assuming that your sample is a fair one). This is an example of aggregate checks in predictive validation, as the survey tries to predict future voting behaviour.&lt;br /&gt;	A well-known example of an aggregate check comes from a U.S. census study, where one million more women than men claimed to be married. The reason for this discrepancy, so it was speculated, is that a number of single mothers laid claim to the more respectable status of matrimony. Hyman reports on a study where college alumni were asked for their average grade while in school. Although one might suspect some grade inflation here, the reported grades apparently showed a reasonable distribution of grades, so no systematic bias seems to have occurred. Finally in a social services planning social workers asked the following question: "Does your family need services for children with special needs?" referring to children with severe mental or physical problems. The researchers assumed that parents would understand what was meant by that question, but more than half of the respondents answered the question in the affirmative. This unlikely high proportion seems to indicate that few parents grasped the intent of the question.&lt;br /&gt;	Voting studies are one kind of study where one compare reported voting intentions against the actual results. In a majority of such studies the methods used have proven effective, and where problems did occur they could often be blamed on sampling problems rather than data collection methods. As a result survey researchers may tend to assume that their methods "work," and be less concerned about reliability and validity checks. Yet one cannot take the reliability and validity of survey methods for granted, and more attention should be paid to methods of assessing data quality, such as the rather inexpensive split ballot technique. (In many survey studies questions are checked by correlating the answers with other, related, questions. The earlier example of Hindi-speaking Chinese is such an example. The use of such consistency checks or control questions is more a test of reliability than of validity; here the internal consistency of the data is checked, and no different evidence is introduced. A respondent may be consistent, i.e., give reliable answers, and still not be truthful, as would be the case for a consistent, but lying, court witness. To assess the truthfulness or validity of an answer you have to bring in external evidence of a different kind; to check for reliability you have scrutinize internal consistency. However, if reliability is low, validity will also be dubious.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2165920?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2165920'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2165920'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_01_28_archive.html#2165920' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2075639</id><published>2001-01-22T11:48:00.000-08:00</published><updated>2001-01-22T11:48:33.870-08:00</updated><title type='text'></title><content type='html'>WALTER SCHWAGER: THE ELEMENTS OF THEORY: CONCEPTS AND CORRELATIONS&lt;br /&gt;      &lt;br /&gt;1. Introduction&lt;br /&gt;&lt;br /&gt;In this reading two basic components of scientific theories are presented.  The first element involves the idea of general&lt;br /&gt;concepts; the second consists of the notion of relationships between&lt;br /&gt;variables. Both these elements are of major importance for the construction of scientific theories, and their characteristics can best be understood as means towards the end of theory of development.  In social research we utilize both concepts and correlations, and we do so to construct scientific theories. Because we need to understand the ends (the development of scientific theories) in order to understand the means (concepts and correlations) we shall start this chapter with a discussion of the most relevant characteristics of scientific theories. &lt;br /&gt;&lt;br /&gt;2. Generalization and selectivity in scientific theories&lt;br /&gt;&lt;br /&gt;It can safely be said that each and every object and event is unique, as the chances of an exact match are infinitely small: even seemingly uniform industrial products can always be found to differ in some detail.  One-egg twins have a different life history from the moment the fertilized egg divides, and certainly each of us can justly claim being unique in some detail or combination of details: finger prints, moment of birth, development, and other characteristics.  Biographers, historians, geographers, and others delving deeply into their topics of study can rightfully assert that their subject matter is unique, no matter what it is.  History may be said to repeat itself, but it is never repeated in exact detail.&lt;br /&gt;The activities of detailed description and analysis of some unique subject matter are a well-established and prestigious part of the intellectual tradition, especially in the Humanities.  Writing a biography is a respected undertaking.  Esteemed art historians may focus on a single painting, and try to describe and interpret that work of art as fully as possible.  Outside academic studies detailed descriptions have their place too: the more meticulous a police description of a fugitive is, the more useful it is likely to be.&lt;br /&gt;The German philosopher Windelband labeled these descriptive activities as idiographic: describing the unique, and many historical and cultural studies, as well as others in the social sciences, such as geography, are clearly idiographic in nature.  At the opposite end of this pole Windelband placed the generalizing, nomothetic or law-positing sciences.  It is at this end that most social scientists find themselves. Put simply, the idiographic scholar describes the unique, and individualizes; the nomothetic scientist describes the recurring, and generalizes. We can also say that the idiographic scholar tends to deal with the concrete, actual example, in all its uniqueness and richness of detail.  The nomothetic scientist, in contrast, tends to deal with the abstract: "withdrawn or separated from ... actual examples."&lt;br /&gt;The nomothetic social scientist has a different aim, and a perspective different from that of the scholar trying to depict the unique. A sociologist may write a detailed analysis of a community, but usually such a case-study is undertaken to get insight in more general patterns.  Instead of writing a detailed study of one strike, he will try to find out which occupational groups are most strike-prone; and instead of dwelling in exhaustive detail on one case of multiple murder, as Truman Capote did, she will try to find patterns in homicide, showing, say, that many victims were relatives, friends, or acquaintances of the murderer.  In focusing her attention on few characteristics of many instances of some phenomenon she will undoubtedly lose sight of the richness of the individual object or event, hoping instead to relate a selected few characteristics in many diverse situations.  An apple falling off the tree, a rose thrown from the balcony, a man jumping from the top floor, raindrops falling: all these different instances are covered by the law of gravity. As Lundberg phrased it: "No two cases of any of these events are ever identical in all respects nor are the natural conditions under which they occur ever the same.  Yet by a process of ignoring all this variety and concentrating our attention on some single characteristic or aspect of the event (abstracting), we can make general statements that are equally true for all falling men, all raindrops, and all projectiles." And to stress the point again, Newton's theory is so useful as it is not limited to falling apples in seventeenth century England, but to all physical objects in all circumstances.  It even applies to the movements of the planets, and the tides.      So, instead of considering phenomena in their unique individual variation, the social scientist looks at selected characteristics of her subject matter and tries to relate these properties in lawlike generalizations.  A social worker, having to take many details into account, may claim that each case he has is unique; for the sociologist this attitude would be self-defeating.  Of course the social scientist's attitude is also shared by lay people: in saying that "fat people are jolly" or "Latins are temperamental" they state that in their opinion some personal traits are frequently related to others. (The systematic evaluation of evidence for such statements, and their integration into a theoretical framework would make such statements more "scientific".) The point to note, in the words of Lazarsfeld and Rosenberg, is that "no science deals with its objects of study in their full concreteness.  It selects certain of their properties and tries to establish relations among them.  The finding of such laws is the ultimate goal of scientific inquiries.&lt;br /&gt;     	    Sociology has not produced many laws comparable to Newton's or similar natural science laws, and it is doubtful whether it will ever produce many universal laws relating concepts in a relatively straightforward way.  But let us assume that a general law is found stating that "the larger the size of a group, the more impersonal it is likely to be, and the more interaction in the group will be regulated by formal, impersonal rules." This is a powerful statement describing what is likely to occur in any group that increases in size, even those groups that consciously set out with the aim not to be impersonal: communes, or interracial groups that are trying to break down racial prejudice by meeting as individuals, rather than as Blacks and Whites.  This generalization would apply to any group, and as such it is powerful, as it has a large area of applicability or scope.&lt;br /&gt;Why do scientific theories have this attraction?  Because at their best they allow you to understand causal patterns, which enable you to explain, predict, and maybe manipulate or prevent certain phenomena.  This is clearest in examples from other disciplines: if specific factors are the causes of epidemics of a given kind, then the occurrence of that kind can be explained by indicating that these factors did occur.  You may also be able to predict that if those conditions recur, then a similar epidemic may result, and you may want to initiate actions to prevent such a recurrence.  If you know what causes radio waves, and their characteristics, you can create them under appropriate conditions and manipulate them to your advantage.  This vision of knowledge, understanding, and maybe power also inspires social scientists.  What are the causes of racial prejudice?  How can we explain racial prejudice that has occurred in the past?  Under what conditions is it likely to occur, or to increase in intensity?  What can we do to change the course of events?  Other theories deal with the rise of totalitarian governments, the occurrence of revolutions, family breakdowns, the power structure in societies, and an endless catalogue of other topics.&lt;br /&gt;Given a certain subject matter, how does a researcher select the variables or concepts to be included in a study?  One criterion is that, no matter how indirectly, they be subject to empirical evidence.  In other words, they must be researchable, and one should not have to accept them on faith.  This is actually quite a complex requirement, and it is hard to specify exactly what this testability criterion means.  For one thing, some concepts are much more complicated than others: showing that a person is "male" is considerably easier than showing that a nation is a "democracy", for instance. (For reasons which will become clearer later in this chapter, we generally talk about concepts, as well as variables, in the context of this discussion.)&lt;br /&gt;A most important second requirement is that the concepts selected be scientifically fruitful: one must be able to relate them to many other concepts. (The general notion of "a relationship between variables" will be discussed later, but the basic idea is that those variables or concepts covary: say, an increase in one -group size- is related to an increase in another -formality-.) Different terms are used to describe this scientific fruitfulness of concepts, and systematic import and theoretical import are two of them.  These notions are worth a lengthy introduction.&lt;br /&gt;In the sciences, as in the arts, the principle of freedom of creation applies: the social scientist is free to introduce any concept that he or she thinks is going to be fruitful, or modify any existing concept for a similar reason.  These attempts should not be judged by their orthodoxy and also not be dismissed if spectacular results cannot be produced immediately: they should be given a fair chance.  In the long run, however, the major evaluation of these variables will center around the question: are the introduced variables or concepts systematicallv related to other concepts in that field of study?  Socio-economic status is a concept of high systematic import in sociology: it is related to virtually every aspect of a person's or group's existence.  Once we know an individual’s or family's socioeconomic status we can predict many other characteristics of that person or group.  Sex is such a characteristic of high import too: it is correlated with modes of dress, role expectations, behaviour, occupational possibilities, and many other variables.&lt;br /&gt;Different sciences or pseudo-sciences have selected different concepts as the "central" concepts, that is, the concepts that are high in theoretical import.  What these concepts have in common is that they are all supposed to correlate strongly, or have been proven to do so, with many other concepts in that discipline.  That is to say, given information on that concept, an expert in that discipline will be able to predict many other characteristics of the subject under study, or at least claim to be able to do so.  In orthodox Freudian theory, such experiences as early or late weaning, and early versus late toilet training were supposed to correlate with many other aspects of one's later personality.  What is the variable of high systematic import in astrology?&lt;br /&gt;Consequently, one crucial test for a concept is that it is correlated with many other concepts.  The progress of any science is associated with the dumping of concepts (and associated theories) that have not lived up to their claims.  The theories of phrenology attempted to relate skull characteristics to moral and intellectual capacities, and it largely failed (although it is now known that specific regions of the brain carry distinct functions).  To cite a recent, and less recognized, example from this debris of abandoned concepts and theories: a social scientist claimed some decades ago that traits of one's digestive system were correlated with personality traits: the more constipated you were, the more dogmatic or authoritarian you were likely to be. (It probably should not surprise us that this gentleman was employed by a dried prune company.)&lt;br /&gt;Summing up we can state that nomothetic scientists, the ones interested in developing generalizing scientific theories, have to search for concepts with high systematic import, that are linked to many other concepts via law-like generalizations, which together constitute a scientific theory.  Scientific creativity consists, at least partly, of the knack to select the "right" variables: those with high systematic import, and explanatory power.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;3. The main characteristics of scientific theories&lt;br /&gt;&lt;br /&gt;In pure, theory-oriented research, research methods are essentially intended to collect factual information, that can be used to develop scientific theories.  As scientific theories are based on research data, i.e., factual information about the world, science can be said to describe the world, and social science can be said to describe the social world.  As a result the whole of science is sometimes conceived of as a gigantic storehouse of facts about the world, a large catalogue of isolated bits of information. This interpretation of science as an unselective repository of facts is incorrect for two reasons.  This conception is practically impossible, as the number of facts about the world is unlimited and inexhaustible.  The passing of every second creates a new, infinitely large series of facts, even if we merely wanted to describe where every living human being on earth found himself or herself at that fleeting moment.  Secondly, this view of science if factually incorrect as science is considerably more selective and discriminating in its attention to facts.  Rather than being interested in all facts, or states of affairs of whatever kind, nomothetic science -including social science- tries to find patterns and regularities in the unwieldy mass of available information, as was described in the preceding section.  By finding such regularities facts can be stored, explained, and predicted with a great deal of efficiency and power.  This selectivity of science is best understood by describing the salient features of scientific theories.&lt;br /&gt;A scientific theory can be understood as having the following characteristics: it is&lt;br /&gt;1.	an empirically based,&lt;br /&gt;2.	widely applicable&lt;br /&gt;3.	system of relationships&lt;br /&gt;4.	between concepts;&lt;br /&gt;5.	that is economic, efficient, and powerful,&lt;br /&gt;6.	in describing,&lt;br /&gt;7.	explaining,&lt;br /&gt;8.	and predicting factual aspects of reality,&lt;br /&gt;9.	that can be intersubjectively shared.&lt;br /&gt;&lt;br /&gt;In the next few sections these aspects of scientific theories will be discussed separately.  Regarding the first item, in preceding chapters it has been pointed out that science is based on factual descriptions, in the form of data.  This foundation on information, based on sense perceptions, is what is meant by the term "empirically based:" based on observation and experiment.  The empirical base of science, and the links between theory and data constitute an intricate problem, that will recur throughout this text.  For the moment, however, a basic understanding of the empirical base of science should do.&lt;br /&gt;The next three items in our description of theories deserve considerably more attention.  First we shall turn our attention to the notion of concepts.&lt;br /&gt;&lt;br /&gt;4. The utility of abstract concepts&lt;br /&gt;&lt;br /&gt;One of the distinguishing requirements for a scientific theory, that we discussed earlier, was that of wide applicability.  A theory would not be terribly useful if it could only be applied in very specific settings, say, a specific town in a given year. Lefkowitz, Blake, and Mouton studied the influence of a person, dressed in a specific way, on others when he crossed a pedestrian crossing against a red light.   We are interested in the findings of that research  because they probably are valid for a wide range of situations, rather than merely for pedestrian crossings in downtown Austin, Texas, in the early Fifties.  Kohn's first study of social class and parental values is of interest to us, not merely because it describes findings for Washington, D.C., at a given moment, but because we hope the findings are applicable elsewhere as well.  Newton's laws were so important, not because they applied only to falling apples in seventeenth century England, but to all kinds of very diverse phenomena (the solar system, the movement of the tides, and many other events).  (This is frequently not the case in applied research, where we are often only interested in specific, rather than general, situations.) &lt;br /&gt;How do we extend the applicability of research findings in science?  We do so basically in two ways: first, by generalizing from a sample to a wider population; this is a problem of inferential statistics.  The second method, which is of interest to us here, consists of the use of general or abstract concepts.  The term "concept" is used rather frequently and loosely in the professional literature, but it is hard to find a precise and clear description of what concepts are.  A concept can be defined as a general idea, a mental picture of some characteristic that some otherwise diverse set of phenomena have in common.  This definition may not help you very much, and it has to be admitted that concepts are rather vague and ambiguous entities.  Paradoxically, you will find, however, that concepts are so important for the development of theories because they are so indefinite and imprecise!  A concept is basically a more general, less specific, kind of variable.&lt;br /&gt;Let's go back to the Lefkowitz study, described earlier.  What was said about the experimenter's model?  He was a 39 year old male, dressed sometimes in "a freshly pressed suit, shined shoes, white shirt, tie, and straw hat," and at other times in "well-worn scuffed shoes, soiled patched trousers and an unpressed denim shirt.” What was being observed?  Whether other pedestrians reached or passed the "white line in the center of the street while the signal still flashed 'wait'." Strictly speaking the conclusions derived from this study should have been phrased in terms of the details which we just cited: dress details, pedestrian behaviour, and so forth.  Instead the conclusions offered by the researchers were formulated in general terms of "status" and "conformance" or "violation".  These terms are much more general or abstract than the concrete details just cited.  In other words: the researchers phrased their findings in terms of general concepts.  Rather than talking about "crossing against the light", they talked about "violations".  The first term provides us with a great deal of actual detail, and is as a result called "concrete"; the second, more abstract, term includes crossing against the light, but also many other types of rule-breaking behaviour.  The concept of “violation” includes all those forms of behaviour that imply the breaking of some social rule.  Because it covers so many different specific instances, and provides little actual detail on what actions are included, a term like "violation" is much more general or abstract than the term of "crossing against the light", or other, more specific, terms included in the general category of violations, say, cheating on one's taxes."&lt;br /&gt;Why did these researchers choose to phrase their conclusions in terms of general concepts, rather than using more concrete, detailed descriptions?  This was done because the general conclusions are much more useful, in that they can be applied and tested in many different settings, while more specific conclusions would only apply to very few situations.  Let's take two additional pieces of research that are somehow comparable to the Lefkowitz study:&lt;br /&gt;a. a study in the military, where it is tested whether an example (whether "good" or "bad") is more likely to be followed when the initiator is a high status individual, and status is measured by military rank;&lt;br /&gt;b. a study in a university, where it is studied whether students in a library are more willing to react to a conversation (thus breaking the norm of silence) started by a high ranking member of the faculty, than to one initiated by a lowly undergraduate student.&lt;br /&gt;&lt;br /&gt;If we compared these three pieces of research in concrete terms, we would not be able to find similarities: after all, what do pedestrian crossings have to do with the military, or with university libraries?  But by introducing abstract concepts like "status" and "conformance" we are able to compare them, and use the total evidence in a cumulative fashion.  By formulating a general conclusion, couched in abstract concepts, we have achieved a high level of applicability, and we can also derive hypotheses that can now be tested in entirely different contexts: the police force, other societies, etc.  In other words: we have achieved a large domain of applicability, or scope.&lt;br /&gt;From this example we can derive the major advantages of the use of general concepts:&lt;br /&gt;1. an increase in comparability: by using abstract concepts we can now compare studies that differ quite considerably in actual detail; this, in turn, leads to&lt;br /&gt;2.an increase in cumulativeness: more studies can now be used to develop and strengthen a given theory;&lt;br /&gt;3. an increase in scope or applicability: more, and more varied situations can now be covered by a theory;&lt;br /&gt;4. an increase in economy and efficiency: one concept can now do the job of a numerous, more specific, concepts, as in the concept of  status in the examples above.&lt;br /&gt;All these advantages do not come without drawbacks, of course.  The main question that can be raised is whether it is indeed warranted to subsume all these different instances under the same general headings, of, say, status, or violation. (This is an interesting problem in the history and philosophy of science,but not often discussed in social science, apart from Campbell’s analysis of “discriminant validation”.) In sociology this problem is usually formulated differently: how adequate is the measure of a general concept, like "status", consisting of a specific aspect of it, such as military rank, or mode of dress, as in the Lefkowitz study?  This issue will be discussed in the section on measurement adequacy.  But even this potential risk is well worth taking, considering the possible benefits.&lt;br /&gt;Another difficulty in using general concepts is that in moving from concrete details to abstract concepts we are, in a sense, jumping to theoretical conclusions that are sometimes unjustified.  A major pitfall is that an abstract concept may be unwarranted in its implications.  Example: many people move quite assuredly from some visible physical signs (skin colour, hair type, eye form) to the concept of "race".  A dark person with curly hair is tagged as "Black", another person, for other reasons, as “Caucasian” or "White", etc.  The problem is that the concept of "race" often carries additional connotations to those of skin colour or hair type: it assumes that the world population can be divided into a series of major types, with relatively distinct physical, intellectual, psychological, temperamental, and other characteristics, which are all assumed to be transmitted genetically.  Many, if not most, reputable scholars doubt whether these various assumptions and implications are correct, and consequently, whether the concept of "race" is warranted.  Abstract concepts often carry this surplus meaning, consisting of implicit or explicit theoretical assumptions, and by using such concepts we may, unwittingly, endorse such implications.  You may test this by asking yourself what connotations the terms "male" and "female" carry for you: do they merely refer to biological and physiological differences, or do you also imply that there are innate differences in temperament, various abilities, achievements, interests, and so on?  The latter differences are the ones that may be due to social influences, according to some critics, of course. We shall return to these topics in the section on construct validity.&lt;br /&gt;To finish off this section two minor issues should be clarified.  Why do we refer to "general" or "more abstract" concepts?  Essentially because even a term referring to a very specific set of objects ("pressed suit") is an abstract concept, because it can refer to many pressed suits (brown, blue, small, large, and so on), rather than referring to a single specific, concrete, single, identifiable suit.  Usually we are not interested in such concepts, which are still rather specific; we'd much rather use more inclusive, more abstract concepts, such as "status symbols." From now on, when talking about concepts, I shall assume that we are talking about these more general concepts.&lt;br /&gt;Finally, what is the difference between concepts and variables?  The two terms are used interchangeably in the discipline, but concepts usually refer to more abstract, more general types of characteristics, whereas variables may be of any kind.  In a later discussion on applied research I shall suggest that the term "concept" is not appropriate in much applied research, where we are generally more concerned about specifics rather than generalities.  We don't want to know in market research, say, whether an individual has (generally) a high standard of personal cleanliness, but rather whether he or she buys soap of Brand X. Now, the buying of Brand X, although for the applied researcher a most important variable, is generally not a characteristic called a "concept"; it is far too specific for that usage.&lt;br /&gt;&lt;br /&gt;The types of concept discussed are more specifically called property concepts, as they deal with characteristics and attributes. (Concepts of relationship deal with the associations possible between various property concepts, such as correlation.) The same general concepts can be tagged or referred to by different names or terms; "affluence" and "poverty" may be two different terms used for the same concept, i.e. wealth.&lt;br /&gt;We started this section with the question how general concepts could be used to increase the applicability, the scope, of a theory, i.e., the variety of situations in which a theory could be applied. The answer was that the use of more abstract concepts enabled us to apply results, laws, and theories to more, and more diverse, situations.  It also has the additional benefit of reducing the number of variables required in theory development, as a number of specific variables can be included in a more general concept.  Yet the use of general concepts also gives rise to a set of related problems, those of the measurement of concepts, as well as committing us to the implicit assumptions, the surplus meaning, of theoretical concepts.&lt;br /&gt;&lt;br /&gt;A note on induction and deduction&lt;br /&gt;&lt;br /&gt;In the text by Baker, pp. 51-54, induction, as a scientific method, is often linked to empirical generalization. We have to separate two types of arguments: &lt;br /&gt;empirical generalization, which most resembles statistical conclusions (all swans we have observed until now are white, therefore all swans are white, is a favourite philosophical example);&lt;br /&gt;abstraction: Protestants and urban residents have high residence rates; therefore people with social integration have high suicide rates. This is a creative jump, where Durkheim identifies what Protestants and city dwellers have in common.&lt;br /&gt;The same analysis can be applied to deduction: it can be a statistical or logical argument (all humans are mortal, John is a human, therefore he is mortal), or a case of including a specific case in a general category. Are short people a minority group? Are intellectuals suffering from low social integration? This second process is especially important where you try to deduce a specific, testable hypothesis from a general theory (see Baker, 54-57).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2075639?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2075639'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2075639'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_01_21_archive.html#2075639' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2066315</id><published>2001-01-21T18:28:00.000-08:00</published><updated>2001-01-29T11:04:12.393-08:00</updated><title type='text'></title><content type='html'>PATTERNS AND ASSOCIATIONS AMONG VARIABLES&lt;br /&gt;&lt;br /&gt;In science we do not usually look at characteristics or variables of the phenomena we study one by one, in isolation: we always try to find correlations and association between these variables (which may finally result an integrated theory). Why do we do that? If two or more variables are linked, preferably in a law-like manner, we gain a number of major advantages:&lt;br /&gt;&lt;br /&gt;1. The advantage of prediction: if education and income are related, then you can predict that (in general) a higher level of education will be related to a higher income;&lt;br /&gt;&lt;br /&gt;2. The advantage of causal explanation: you can now explain a person’s income by his or her education;&lt;br /&gt;&lt;br /&gt;3. The advantage of manipulation: we can now increase people’s incomes by giving them higher levels of education;&lt;br /&gt;&lt;br /&gt;4. The advantage of prevention: if we know that smoking leads to lung cancer, we can take steps to reduce smoking and so prevent lung cancer.&lt;br /&gt;&lt;br /&gt;The more variables are related to each other, the more characteristics you can predict by just knowing one “central” variable. If many characteristics are related to religion or ethnicity, then knowing someone’s religion will enable you to predict many other characteristics of that individual. Such clusters of variables are frequently called typologies. &lt;br /&gt;Research dealing with single variables is not unusual: such “descriptive” studies are often found in demographic reports, where we try to describe the population of a town or city by sex, age, and other variables. However, social research is more likely undertaken specifically to find correlations and  regularities: what is the effect of specific child-rearing practices on adult behaviour? What is the relationship between social class and values? Between population size and criminality?&lt;br /&gt;&lt;br /&gt;In Chapter 12 we shall see that we can go beyond the relationships between two variables by introducing additional variables. This way we can understand the conditions under which certain regularities occur, or we can understand causal mechanisms better. These statistical and analytical methods are discussed as multivariate analyses.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2066315?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2066315'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2066315'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_01_21_archive.html#2066315' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2052114.post-2066280</id><published>2001-01-21T18:25:00.000-08:00</published><updated>2001-01-21T18:25:50.690-08:00</updated><title type='text'></title><content type='html'>WALTER SCHWAGER: SCIENCE, THEORY, AND OBSERVATIONS&lt;br /&gt;Comments on Baker, Chapter 2, part 1: pages 45-57.&lt;br /&gt;&lt;br /&gt;1. Theory and observations&lt;br /&gt;&lt;br /&gt;It is generally accepted that science is based on empirical evidence, that is, observational results. Philosophers of science have frequently interpreted empirical evidence as “direct” sense observations (mainly seeing, but also smelling, feeling, etc.).  However,  in natural science empirical evidence is often produced by measurement instruments which themselves are based on scientific theory. Even a simple mercury thermometer does not measure temperature directly, but uses the law that a column of mercury will expand with an increase in temperature, and therefore it will move up higher in a glass tube. Therefore, the higher the temperature, the higher the mercury column, and vice versa.&lt;br /&gt;The relationship between scientific theories and observational evidence has been the subject of much controversy over the last three decades, especially as the result of the work of Thomas Kuhn and some other philosophers of science, such as Norwood Hanson. Before Kuhn and Hanson the accepted idea in the philosophy of science was that science rested on solid observational bases, much as a building rests on solid foundations. Because in this view all competent scientists could agree on these observations, science was therefore also seen as objective. As a consequence, a scientific theory could not be open to doubt, and had to be accepted by all observers. In a phrase, seeing is believing. Observations might support, modify, or reject a theory, but a theory could not change observations, according to this interpretation. In other words, in this view of science, observations are “pre-theoretical.” This is sometimes called a “foundationist” theory of science.&lt;br /&gt;Kuhn changed this view of science drastically. Often, according to Kuhn, scientists work within the context of a certain theory, which they accept as given, trying to refine and extend it; this he calls doing “normal science.” A number of problems may remain for this accepted theory, but these are ignored for the moment. Even conflicting observational evidence may be left aside, in the hope that these “anomalies” will be resolved later. But then a contrary theory may arise, which not only throws out the previous theory, but often also the observations it rested on, or at least re-interprets them. In Kuhn’s famous phrase, a “scientific revolution” is occurring.&lt;br /&gt;The concept most associated with Kuhn is that of “paradigm”. In grammar a paradigm is a standard solution or model, like the way we conjugate French verbs like “sentir.” In Kuhn’s theory of science, a paradigm is a specific theory, associated with a certain way of looking at things. In Marxist sociology class differences are explained in certain ways, but a functionalist will look at stratification in a different manner. Paradigms are therefore like theoretical model solutions: look for the class conflict! or: look for the contributions to society!  (Kuhn used the term “paradigm” in many different ways, so don’t expect total clarity on this issue.) A scientific revolution occurs when one dominant paradigm, or theoretical interpretation, is replaced by another. &lt;br /&gt;A change in paradigms means that we look at the world differently: we no longer see the sun and the stars as circling a stationary earth, but we see the moving earth circling the sun. A paradigm switch, or a scientific revolution, is like a Gestalt switch, a phenomenon you may encounter when viewing an ambiguous picture. Instead of seeing an old woman with a large nose, you suddenly see a young woman with a fur collar, and the details make sense in the new interpretation.&lt;br /&gt;&lt;br /&gt;One of the most recent revolutions to occur in natural science concerns the theory of  continental drift, or, as it is called nowadays, of tectonic plates. Until recently it was generally assumed that the continents were rather fixed and unchanged, with minor exceptions, such as changes due to erosion. But some puzzling anomalies remained: how could we explain the presence of fish fossils high in the Alps? Some Christian scientists believed this supported the historical truth of Noah’s flood; others, like the French philosopher Voltaire, came up with tortuous arguments about seagulls who flew high into the mountains and by accident dropped their prey.&lt;br /&gt;A German meteorologist, Wegener, was not the first to note a number of puzzling but rather weak findings, such as the fact that the Atlantic coast lines of South America and Africa match quite nicely, and that these coasts also share a number of fossils and geological features. To defend his theory of “continental drift” (1912) he also introduced other evidence. However, his theory of  encountered its own share of problems: how could these massive continents move such large distances? To cut a long story short, Wegener’s views were roundly rejected by the scientific community until some thirty years ago, when they were re-introduced as tectonic plate theory. This more recent theory rests on more and newer evidence. It also makes sense of more geological features, such as the development of mountains, fault lines associated with volcanoes, earthquakes, and a host of other phenomena. (See web site: http://pubs.usgs.gov/publications/text/historical.html)  Notice that Wegener was not a geologist, but a meteorologist: scientific revolutionaries are often outsiders who have not been trained in the prevailing scientific theories.) So Wegener’s initial weak findings have now been strengthened by being integrated into a theoretical network that links other findings and explanations. &lt;br /&gt;An interpretation of scientific knowledge where weak bits of evidence mutually reinforce each other has been called “coherentist.” I have sometimes called this view of science a web theory, based on the analogy of a spider web. According to this interpretation of science, a theory links observations which may, on their own, be quite weak. However, by being linked to other observations they each gain in trustworthiness. In this  view, science resembles a court case, where the testimony of all witnesses is open to doubt, but if their statements  match they reinforce each other mutually. The problem with this view  is that each theory leads to a search for selective evidence, and evidence is only accepted in the context of a certain theory. In other words, believing is seeing (although that is likely too extreme an interpretation of Kuhn’s views). And a theory may change the nature of observations (as we shall see below): in the current phrase, even observations are theory-laden. (These issues were not of central concern to Kuhn, but much more to philosophers like Hanson and Quine.)&lt;br /&gt;How can a theory re-interpret earlier observations? Take yourself, reading these notes. Unless you are reading them in a moving vehicle, you probably have the feeling that you are anchored quite firmly in space, sitting still. You also have the feeling that your physical environment is  stable and stationary. How, then, can we accept the theory that the earth is spinning at high speeds through space? The early defenders of the moving earth theory claimed that if you move at a steady speed, as in a boat or, nowadays, an elevator, you don’t notice that you are moving. So here Galileo and his followers modified what at first looked like a simple, indubitable observation to have it fit their theory. &lt;br /&gt;Does this mean that science is subjective and circular? Not necessarily. The cumulative evidence in support of a certain theory may be so strong that we can accept it as objectively true: nobody doubts any longer that the earth is round and that it spins around the sun. However, sociological theories are not so well-established, and therefore there remain a number of competing theories in sociology.&lt;br /&gt;In the social sciences we can find a number of paradigm conflicts, where theoretical approaches to specific issues are fundamentally different. Examples can be found in the area of mental illness (biological/genetical factors or social factors?), and crime and delinquency (genetic? physiological? social factors? social construction?)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2052114-2066280?l=2127.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2066280'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2052114/posts/default/2066280'/><link rel='alternate' type='text/html' href='http://2127.blogspot.com/2001_01_21_archive.html#2066280' title=''/><author><name>Walter</name><uri>http://www.blogger.com/profile/05717979879000061088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry></feed>
