Scaling of Scores and Ratings Assignment Help


The previous chapter briery introduced a few perspectives on testing, with an emphasis on validity as a measure of the electiveness of test scores. Validity is an overarching issue that encompasses all stages in the test development and administration processes, from blueprint to bubble sheet, including the stage wherein we choose the empirical operations that will assign numbers or labels to test takers based on their performance or responses. In this chapter, we’ll examine the measurement process at its most fundamental or basic level, the measurement level. We’ll dine the three requirements for measurement, and consider the simplicity of physical measurement in comparison to the complexities of educational and psychological measurement where the thing we measure is often intractable and best represented using item sets and composite scores. Along the way, we’ll describe the four types of measurement scales that are available, and we’ll look into why Stevens (1946) concluded that not all scales are created equal. Last are scoring and scoring referencing, including examples of norm and criterion referencing.

How do we dine it?

We usually dine the term measurement as the assignment of values to objects according to some system of rules. This dentition originates with Stevens (1946), who presented what have become the four traditional scales or types of measurement. We’ll talk about these shortly. For now, let’s focus on the general measurement process, which involves giving an object, the person or thing for whom we’re measuring, a value that represents something about it. Measurement is happening all the time, all around us. Daily, we measure what we eat, where we go, and what we do. For example, drink sizes are measured using categories like tall, grandee, and vent. A jog or a commute is measured in miles or kilometers. We measure the temperature of our homes, the air pressure in our tires, and the carbon dioxide in our atmosphere. The wearable technology you might have strapped to your wrist could be monitoring your lack of movement and decreasing heart rate as you doze off reading this sentence. After you wake up, you might check your watch and measure the length of your nap in minutes or hours.

These are all examples of physical measurement. In each example, you should be able to identify 1) the object of measurement, 2) the property or quality that’s being measured for it, and 3) the kinds of values that could be used to represent amounts of this quality or property. The property or quality that’s being measured for an object is called the variable. The kinds of values we assign to an object, for example, grams or degrees Celsius or beats per minute, are referred to as the units of measurement that are captured within that variable. So, three things are required for measurement to happen: an object, a variable, and values or units. Again, the variable is the quality or property we measure, the object is for whom we measure it, and the values are the numbers or labels we assign. Once you can identify these three components for each physical measurement example above, make sure you can come up with your own examples that contain all three parts.

From Physical to Intangible With most physical measurements, the property that we’re trying to represent or capture with our values can be clearly denned and consistently measured. For example, amounts of food are commonly measured in grams. A cup of cola has about 44 grams of sugar in it. When you see that number printed on your can of soda pop or fizzy water, the meaning is pretty clear, and there’s really no need to question if its accurate. Cola has a lot of sugar in it. But, just as often, we take a number like the amount of sugar in our food and use it to represent something abstract or intangible like how healthy or nutritious the food is. A food’s healthiness isn’t as easy to dine as its mass or volume. A measurement of healthiness or nutritional value might account for the other ingredients in the food and how many calories they boil down to. Furthermore, deferent foods can be more or less nutritional for deferent people, depending on a variety of factors. Healthiness, unlike physical properties, is intangible and midcult to measure.

The social sciences of education and psychology typically focus on the measurement of constructs, intangible and unobservable qualities, attributes, or traits that we assume are causing certain observable behavior or responses. In this course, our objects of measurement are typically people, and our goal is to give these people numbers or labels that tell us something meaningful about qualities such as their intelligence, their math ability, or their social anxiety. Constructs such as these are midcult to measure. That’s why we need an entire course to discuss how to best measure them. A good question to ask at this point is, how can we measure and provide values for something that’s unobservable? How do we score a person’s math ability if we can’t observe it directly? What we need is an operationalization of our construct, an observable behavior or response that increases or decreases as a person moves up or down on the construct. With math ability, that operation alizarin might be the number of math questions a person answers correctly out of 20. With social anxiety, it might be the frequency of feeling anxious over a given period of time. When using a proxy for our construct, we have to assume or infer that the operationalization we’re actually observing and measuring accurately represents the underlying quality or property that we’re interested in. This brings us to the overarching question for this course.

What makes measurement good?

In the last year of my undergraduate in psychology I conducted a research study on the constructs of aggression, sociability, and victimization with Italian preschoolers (D. A. Nelson, Robinson, Hart, Albano, & Marshall, 2010). I spent about four weeks collecting data in preschools. Data collection involved covering a large piece of cardboard with pictures of all the children in a classroom, and then asking each child, individually, questions about their peers. To measure sociability, we asked three simple questions: “who is fun to talk to?” “who is fun to do pretend things with?” and “who has many friends?” Kids with lots of peer nominations on these questions received a higher score, indicating that they were more sociable. After asking these and other questions to about 300 preschoolers, and then tallying up the scores, I wondered how well we were actually measuring the constructs we were targeting. Were these scores any good? Was three or eve questions enough? Maybe we were missing something important? Maybe some of these questions, which had to be translated from English into Italian, meant deferent things on the coast of the Mediterranean than they did in the Midwest US?

This project was my rest experience on the measuring side of measurement, and it fascinated me. The questions that I asked then are the same questions that we’ll ask and answer in this course. How consistently and accurately are we measuring what we intend to measure? What can we do to improve our measurement? And how can we identify instruments that are better or worse than others? These questions all have to do with what makes measurement good. Many deferent things make measurement good, from writing high-quality questions and items to adherence to established test development guidelines. For the most part, the resulting scores are considered good, or elective, when they consistently and accurately describe a target construct. Consistency and accuracy refer to the reliability and validity of test scores, that is, the extent to which the same scores would be obtained across repeated administrations of a test, and the extent to which scores fully represent the construct they are intended to measure. These two terms, reliability and validity, will come up many times throughout the course. The second one, validity, will help us clarify our dentition of measurement in terms of its purpose. Of all the considerations that make for elective measurement, the rest to address is purpose.

Share This