Your strategic and tactical quantitative research work – designing, programming, and fielding an online questionnaire – result in raw data files containing all the respondents’ answers to your survey. Typically, some form of data preparation must be completed before your analysis begins. Neglecting to carefully prepare your raw data may jeopardize the statistical results and bias your interpretations and subsequent findings.
Sometimes, your data must be statistically adjusted to become representative of your target population. While this is not always necessary, it can enhance the quality of your data. There are three techniques at your disposal: weighting, variable respecification, and scale transformations.
Weighting is a statistical adjustment made by assigning a weight to each respondent in the database to reflect that respondent’s importance relative to the other respondents. The purpose of weighting is to increase or decrease the number of respondents in the sample that have certain characteristics so that the sample data is more representative of the target population. You may also want to weight the data to assign greater or lesser importance to respondents with certain characteristics. For example, if you’re surveying respondents to determine what product features should be modified on a current product in the marketplace, you may want to place a greater weight on respondents who are heavy users of the product. In this example, you could assign a weight of 3.0 to heavy users, 2.0 to medium users, and 1.0 to light or non-users. (The value of 1.0 represents an unweighted respondent.)Weighting also comes in handy if you have target quotas of particular groups of respondents that are not met during fielding. For example, if you know that the share of 18-24 year old pet owners is 12% of the total pet owner population, but your sample data reports 18-24 year old pet owners to be only 8%, you can statistically weight the data so that 18-24 year old pet owners represent 12% of your total dataset and match the population. To calculate the weighting factor, you simply divide the population distribution by the sample distribution: e.g., 12 / 8 = weighting factor of 1.5 for 18-24 year old pet owners. Keep in mind you can have a weighting factor of less than 1.0 if you need to decrease the weight of a particular group from your dataset to match the population distribution: e.g., if the population distribution is 15%, but the sample distribution is 20%, 15 / 20 = weighting factor of 0.75. Proceed with caution when weighting data because it nullifies the self-weighting nature of sample design, and if utilized, be sure to always note the weighting procedure used in any reporting deliverables.
Variable respecification involves transforming the data to create new variables or modify existing variables. The purpose of variable respecification is to create variables that are consistent with the objectives of the research. For example, you ask respondents about purchase intent on a 7-point scale, so you have 7 different response categories in your survey that you collapse into 3 or 4 total categories in the dataset (e.g., collapsing by respondents “most likely to buy” – those who select 7, 6, or 5 on the scale –, “neutral” – those who select 4 –, and “least likely to buy” – those who select 3, 2 or 1). Alternatively, you could create new variables that are the combination of several other variables. You can also create new variables by taking a ratio among two existing variables. The use of dummy variables is another type of respecification technique that uses variables that take only two values, typically 0 or 1, to respecify categorical values. Dummy variables, also called binary, dichotomous, instrumental, or qualitative variables, are helpful when the category coding is not meaningful for statistical analysis. Instead, you can represent the categories with dummy variables. For example, if you have heavy, light, and non-users coded as 3, 2, 1 respectively, you can represent these with the dummy variables X3, X2, X1. Heavy users (X3) would = 1 in the data sheet, and the others would = 0. Light users (X2) would = 1 in the datasheet, and all others = 0. And non-users (X3) would = 1, with all others = 0.
Product Usage CategoryOriginal Variable CodeDummy Variable Code X1Dummy Variable Code X2Dummy Variable Code X3Heavy users3001Light users2010Non-users1100
Scale transformation is the manipulation of scale values to ensure comparability with other scales or to otherwise make the data ready for analysis. For example, your survey may utilize several different scales such as a Likert scale, continuous rating scale, semantic differential scale, and Stapel scale. You cannot readily compare data from these scales, so if you wanted to compare brand image scores from a Stapel scale with purchase interest scores from a Likert scale, you would need to transform them. Even if the same scale is used across questions, different respondents may interpret the scale differently, which can be corrected through scale transformation. Standardization – a process very similar to computing z scores – is a common type of transformation that enables you to compare variables measured with different types of scales. Scale transformation may be required in international marketing research to ensure units of measurement are comparable across countries or cultures so you can make meaningful comparisons.
Data may require statistical adjustments for several reasons, and you have a few different techniques available to use. Consider weighting if you need to make the sample data more representative of the target population. Variable respecification is beneficial when you need to modify or create new variables that are more consistent with your research objectives. Employ dummy variables if the coding used is not conducive to statistical analysis. Lastly, scale transformation enables you to compare data across various scales and can be particularly helpful for multi-country research projects.