Five Decisions Researchers Make When Constructing Itemized Rating Scales

Posted Oct 04, 2018

When designing a questionnaire, a researcher has a variety of rating scales to choose from.

‍If you recall, rating scales can be comparative or noncomparative. Comparative rating scales are used to directly compare stimuli, and the collected data (which has ordinal or rank properties) can only be analyzed in relative terms. Conversely, noncomparative rating scales, also known as monadic scales, evaluate only one stimulus at a time. Noncomparative scales can be either continuous or itemized.

As a researcher, you’ll want to use an itemized rating scale any time you want to present respondents with a scale that has a number or brief description associated with each key scale point. Itemized scales are extremely common in Market Research, and you have three main options at your disposal: Likert scale, semantic differential, and Stapel scale. Once you have an understanding of the types of itemized scales available to you and which one(s) you’d like to employ in your survey, there are five decisions you need to consider and make to ensure you’re constructing a scale that will meet your data collection needs.

Understanding Likert, semantic differential, and Stapel scales

The Likert scale is named after its developer, Rensis Likert. It is a very widely used scale that requires respondents to indicate the degree of agreement or disagreement with each of a series of statements about some stimulus. Typically, a 5- or 7-point scale is used, and the data is treated as interval. It is very easy to create, and respondents are familiar with and understand this common scale. A disadvantage, however, is that it is timely for respondents to answer because they need to read and respond to each statement separately.

The semantic differential is a 7-point scale with endpoint labels that have opposite (bipolar) meanings, such as “hot” and “cold”. Respondents typically rate some stimulus on a number of these itemized scales. It is important to mix up which end-point has the positive and negative label to limit bias that results from some respondents’ tendency to mark the right- or left-hand sides without reading the labels carefully. This is another very popular and versatile scale used throughout different types of Market Research projects such as comparing brands, products, or company images, developing advertising and promotional strategies, and developing new products.

The Stapel scale is also named after its developer, Jan Stapel. This scale measures the respondents’ attitudes toward a single descriptor by measuring how accurately or inaccurately the descriptor describes an object. This unipolar scale is usually presented vertically and contains 10 scale points numbered from -5 to +5, with no neutral (zero) point. For example, respondents could be asked to rate Shampoo brands (the objects) on a number of descriptors such as High Quality, Attractive Packaging, and Thoroughly Cleans.

Itemized Rating Scale Decisions

Regardless of the itemized scale used, there are several important decisions researchers must make when constructing each scale:

Number of scale points/answer options
Balanced vs. unbalanced scale
Including/excluding a neutral answer option
Forced vs. non-forced scales (where you include N/A answer option)
Nature and degree of scale point descriptions

Selecting the number of scale points/answer options

Selecting the number of scale points or answer options is a tricky decision to make because you must balance what you want as a researcher with what respondents can actually handle responding to. On one hand, as you include more scale points, you achieve finer discrimination among various stimuli. However, from a respondent perspective, it is tiresome to review more than a few answer options. Although there is no single optimal number, it is generally recommended to include around seven options, plus or minus two (5-9 total). If you know that your respondents are likely to be more knowledgeable about a topic, you have leeway to include more scale points. The opposite is also true – if you feel respondents will be less knowledgeable, include fewer scale points. Also, be sure to consider the nature of the stimuli and recognize that some things can be more finely discriminated than others.

When to use a balanced vs. unbalanced scale

As the name suggests, a balanced scale has an equal number of favorable and unfavorable answer options. For example, a balanced scale could include these answer options: “extremely good”, “very good”, “good”, “bad”, “very bad”, “extremely bad”. Generally, you want to provide a balanced scale to obtain objective data. But, if you suspect that the distribution of responses will be skewed positively or negatively, an unbalanced scale in the direction of the skewness is appropriate. An example of an unbalanced scale is: “extremely good”, “very good”, “good”, “somewhat good”, “bad”, “very bad”. Be sure that you take into account the nature and degree of unbalance during your analysis.

Should you include a neutral answer option?

When your scale uses an odd number of answer choices, the central choice is typically a neutral option. It is important to recognize that the presence, placement, and labeling of a neutral answer choice can significantly influence respondents. You should include a neutral answer option when at least some of the respondents are likely to select it. However, if you want to force a response or if you believe few-to-no respondents will select a neutral answer option, or if it doesn’t make sense given the question context, you should use an even number of answer choices without a neutral option.

Proceed with caution when using a forced scale

Relatedly, if you use a forced scale, respondents must express an opinion because you did not include an N/A (Not Applicable) answer choice. Keep in mind that respondents who want an N/A option when there isn’t one may default to selecting a choice near the middle of the scale. If a large proportion of respondents do this, then your data is being distorted in terms of central tendency and variance. Therefore, you want to be very cautious when choosing to use a forced scale.

How to describe the scale points

Scale points can have verbal (e.g., “very likely”), numerical (e.g., “1”), or pictorial (e.g., “?”) descriptions, or a combination. You also need to decide if you want to label every scale point (e.g., “very likely”, “likely”, “not likely”, “not very likely”), some, or only the extremes (e.g., “agree completely” ------------------- “disagree completely”). Interestingly, providing a verbal description may not improve the accuracy or reliability of the data. However, it can be argued that labeling at least some of the scale points helps to reduce ambiguity. When describing scale points, you also want to be cognizant that using strong phrases such as “completely” or “extremely” may result in less variable responses as respondents may avoid committing to these strong anchor points. Contrastingly, weak descriptors like “generally” may result in uniform data distribution.

The Takeaway

You probably know by now that designing a questionnaire and its rating scales is no simple task. While there is no standard formula for creating the “perfect” scale, there are several pros and cons researchers need to consider to ensure the itemized rating scales they construct will enable them to collect the data needed for answering their business objectives. Be sure to review these decision-making criteria next time you are forming rating scales: