MaxDiff is used for understanding preference and importance for multiple attributes: brand preferences, brand images, product features, advertising claims, etc. Compared to standard ranking questions, MaxDiff can offer a better understanding of the overall preference order of a set of items as well as the distance between them.
What is Maximum Difference Scaling (MaxDiff) analysis?
MaxDiff is an analytical technique used in market research to measure preference and importance among consumers. The goal of a MaxDiff is to help researchers understand which items to prioritize amongst a set by asking respondents to make trade-offs between the items—essentially asking them to indicate the “best” and “worst” options out of a given set, which subsequently quantifies the preference or importance of the items on a list.
Here are some examples of questions that can be answered using a MaxDiff:
- What advertising slogans/messages do consumers find most appealing?
- What product features are most important when consumers are thinking about a purchase?
- What are travelers looking for in a vacation destination?
- What type of cooking TV shows do consumers prefer to watch?
Here’s an example
Let’s say you want to determine the top emotions/feelings that respondents want to experience when they watch a movie.
A simple approach would be to ask respondents to rate—on a scale of not desirable to very desirable—various feelings such as nostalgic, informed, creative, scared, brave, uplifted, excited, etc. You could then determine which of these have the highest top box or top 2 box. While there is nothing inherently “wrong” with using traditional ratings scales analyses, one drawback is that they can lead to skewed data because respondents often place high importance on many feelings—likely because they are not being asked to make a trade-off between the options. In other words, when each feeling is being evaluated on its own, respondents do not have to express their preferences or weigh the pros and cons for one item over another.
However, this is where a MaxDiff can be useful and often yields substantially more powerful data. Unlike with scaled questions, the data obtained through this method DOES require trade-offs because it asks respondents to make choices.
Why should you use MaxDiff?
In the example above regarding movies, instead of learning which of those feelings respondents view as most desirable on a five point scale (which could in fact be all of them!), we could actually determine not only which are most preferred, BUT also how much more preferred once we see the full range of scores through the MaxDiff analysis.
The power and value of a MaxDiff analysis lies in asking respondents to make choices rather than expressing strength of preference by using some type of numeric scale. The trade-offs that survey respondents make are indicative of the relative importance they place on certain features or attributes, which can help companies make better and more strategic decisions.
Check out our advanced MaxDiff
aytm’s Advanced MaxDiff lets users test up to 200 items, but avoids overburdening respondents by only asking them to evaluate 3–5 items at a time. It comes in two varieties: Aggregate (formerly MaxDiff Express) and HB.
Aggregate focuses on collecting general aggregate information, without the intention of obtaining individual-level estimates. Typically, respondents would see 3–5 screens. And while it’s still possible to see results for a subset of respondents, keep in mind that the model only covers the subset of respondents—it doesn’t take into account the rest of responses.
In HB mode, the method focuses on collecting high-resolution individual-level data that can be analyzed by Hierarchical Bayesian model. In typical settings, respondents would see 10-20 screens. Besides the additional option to extract individual logistic coefficients, HB mode makes it possible to look at analysis done on a subset of respondents with the confidence that the results are more robust—because the model allows for individual-level estimation.
We also recommend including a short instruction for the test so respondents understand the task and how many times they’ll be asked to perform it. For example:
- Reorder: Please rank the following items in the order of your preference: from most preferred on top to least preferred on the bottom.
- Best/Worst: Please select your best choice (thumbs up) and your worst choice (thumbs down).
- Image Grid: Please select your most preferred then least preferred image below
One more thing to note—sample size plays an important role in the success of a MaxDiff. And because it can be sensitive to the number of completes, we recommend a minimum of 400 survey.
MaxDiff summary metrics
There are three standard metrics used to analyze MaxDiff: preference likelihood, average-based preference likelihood, and utility scores.
Below is a brief description of each, but do note that none of these are superior to the others—the metric used should be determined by what is most important to a particular project.
- Preference likelihood: The probability that a given item would be selected within the survey. The baseline percentage is determined by the number of items per screen programmed in the MaxDiff.
- Average-based preference likelihood: The probability that a given item would be selected when paired with any other one item. For average-based PL, the baseline is automatically set to 50%. A score about 50% represents an above average performer.
- Utility scores: Raw scores (centered around zero) that reflect directional preferences of items. Zero represents the average performance; the more positive an item's utility, the more it is preferred by respondents, and the more negative an item's utility, the less it is preferred.
What is Anchored MaxDiff?
Anchored MaxDiff supplements standard MaxDiff with additional questions designed to assess the absolute importance of items, attributes, claims, etc. While a traditional MaxDiff identifies the relative importance of items, an Anchored MaxDiff allows researchers to draw conclusions about whether specific items are actually important or not.
Here’s how it works: First, respondents complete a standard MaxDiff exercise in which, over multiple screens, they indicate the most and least important items among the subset that is shown. The number of items shown in a subset and the total number of screens shown varies according to the total number of items being tested.
After the MaxDiff, respondents complete a second exercise in which they identify which items in the full set are important, or “must-haves,” and which are generally unimportant, or “nice to have.”
The analysis takes into account responses in both exercises to determine relative performance of all items tested, anchored around a utility boundary that provides an absolute threshold for better interpretation of results.