Aims of this study

More accurate estimates of happiness from available data
Happiness is typically measured using single questions such as "Taking all together, how happy would you say you are these days?" Such questions are answered by choosing from a list of response options that are ordered from more to less happiness. Often these options are denoted by words such as 'very happy', 'pretty happy' and 'not too happy'.  In order to quantify the responses to such questions, researchers assign numerical values to the verbal response options; in the above case of three response options typically 3 for 'very happy', 2 for 'pretty happy' and 1 for  'not too happy' and they on this basis compute means and standard deviations.
   In doing so, researchers implicitly assume that the distance between 'very' and 'pretty' happy is the same as between 'pretty' and 'not too' happy. This may not be the case, possibly respondents see a greater distance between the latter than between the former. If so, unhappiness will be underestimated. Individual scores of 3, 2 and 1 will not fully reflect the real differences in happiness and as a result the variance shared with other variables will not be fully reflected in correlations. Aggregated scores will also be affected, mean scores will be higher and standard deviations lower than factual.
   In this study we assess what people have in mind when they tick 'very' or 'pretty' happy and use that information to estimate more accurate numerical values for verbal response options to questions about happiness. For example, it might appear that the value of 'pretty happy' on scale 1 to 3 is actually 2,5 instead of 2 and 'not too happy' 0,8 instead of 1. We then use these estimates to re-analyze available data and check whether this refinement really makes a difference.

Better comparability of responses to different questions on happiness
In synthetic studies on happiness, researchers often combine findings obtained using slightly different questions. As they try to maximize the number of observations they accept some diversity in the studies they include. This causes several problems, which can to a large extend be solved by this study.

Overcoming difference in wording of response options
Researchers typically assign the same numbers for response options denoted with slightly different words. For instance, if the third response option in the above question is 'unhappy' rather than 'not too happy' they also code this response as 1. Obviously, this involves a loss of information.
    This study will help us to do better, because it should generate more precise estimates of the numerical values to be used for specific response options. For example, it might possibly show that the average respondent would equate 'not too happy' with 1,1 point on a 1 to 3 scale and 'unhappy' with 0,8 points. These values can then be used to compute weighted scores that more accurately reflect the actual differences in the happiness at stake.

Overcoming differences in number of response options
Another problem in research synthesis is that response scales differ in the number of response options. The above example of a survey question in happiness involves three response options, but there are also questions with four responses options and even questions that offer seven. Some researchers solve this problem by transforming observed scores to one common scale. One way is to down size the longer scales, e.g. shortening a 4-step scale to range 1-3 by lumping the last two options together. This involves a loss of information and the danger of distortion Alternatively one can stretch the scales linearly to a common range, e.g. when drawn out to range 0-10, score 2 on range 1-3 becomes 5. This method appears to produce implausible results, in particular when applied on short scales.
    In this study we solve the problem in another way. We ask proto-respondents to do the job and to assign the values of response options on a common numerical scale. We present them with a scale ranging from 0 to 10 and ask them to divide that scale into intervals that correspond to the degrees of happiness denoted by the words used for response options to questions about happiness. The intervals will be greater with shorter response scales and the meaning of words will vary accordingly. For example, on a 3-step scale the response option 'very happy' may be seen to cover the range 10 to 8, with a mean of 9, whereas on a 5-step scale the option 'very happy' may be seen to denote range 10 to 9 with a midpoint of 9,5. Once obtained, these values will enable us to calculate comparable scores from available frequency distributions of responses to such different questions.

Overcoming language differences
Still another problem is that the same question is often asked in different languages. The meaning of the words used to qualify the perm happiness may rate differently in different tongues. This is a problem in nations were multiple languages are spoken, such as India and it is also a problem for cross-national comparisons of happiness. The most common solution for this difficulty is to reduce translation error, typically by using the technique of forth and back translation. Yet perfect translation is often not possible. A commonly mentioned example is that the English word 'happy' can only be translated in French to the word 'heureux', which however, may denote a higher degree of satisfaction.
    With this study we by-pass the translation problem. We ask native speakers to rate the numerical value of words used for response options in their own language. If it is true that the French are more choosy about how they use the word 'happy', they might place the option 'tres heureux' in the range 10 to 9, whereas the English raters would place 'very happy' on range of 10 to 8. This would result in different numerical mid-values, respectively 9,5 and 9,0. These differences are taken into account when we use the values to compute weighted averages from available frequency distributions.

Better comparability of happiness over time
The average happiness of citizens was assessed for the first time in the USA in 1945. Since then, more than 200 assessments have followed, but it is still not clear whether Americans have become happier or not. One problem is that increments tend to be small close to the ceiling, but another problem is that the survey questions have differed slightly over time and that this is likely to obscure the overview of the small trend. That latter problem is typically solved by limitation to identical questions, but this means that about half of the available data must then be left out, yet we need large amounts of data to discern the trend from ransom variation.
    This study alleviates this problem in two ways. Firstly it enhances the comparability of responses to questions that differ only slightly in the wording of responses options; e.g. 3-step items using 'pretty happy' for the second option instead of 'fairly happy'. As argued above, the subtle difference between such words will reflect in the different values assigned by our raters, which are then taken into account in the weighted mean we calculate. Secondly, this study enables the comparison of responses to questions that differ in number of response options. As indicated above, the rating procedure is likely to neutralize the differences in length of response scales. Together this will broaden the database that can be used for analyzing change in happiness in nations.

Better comparability of happiness across nations
In November 2005 the World Database of Happiness counted 123 nations where general population surveys had included questions on happiness. Yet again the questions used are not identical. The most commonly used question had only been applied in 78 nations and the translation of this item into the different languages is questionable in some cases.  This impedes our ability to make comparative analyses of happiness in nations.
    This study can lessen the above problem in three ways. As in the case of comparison over time it should broaden the database available, as differences in phrasing of response options and the difference in number of response options should cease to be a problem, and finally, translation error should be much reduced by our method.

More opportunity for meta-analysis of studies on happiness
In November 2005 the World Database of Happiness included 8.833 Correlational findings on happiness yielded by 891 empirical studies. These numbers will double when all the available findings are entered. This collection has been gathered for the purpose of facilitating research synthesis, and for meta-analysis in particular. However, the collection has hardly been used for this purpose as yet and one of the main reasons is that the measures of happiness used in the various studies differ too much.
    This study should also serve to resolve this problem. All the benefits mentioned above apply also for this problem of heterogeneity of measurement of happiness. Meta-analysis will benefit from more accurate estimates of happiness from the available data and from better comparability of responses to questions that differ in wording, number of response options and language. However these benefits can only be reaped if the full distribution of responses is available, which is not always the case with correlational findings.

Better measurement of happiness
This study should improve the measurement of happiness. Ideally, the meaning of response options to a question is the same for all respondents. Yet in practice there are always differences in interpretations of words and some words give rise to more differences than others. The use of such words must be avoided and therefore it is worth knowing which words cause confusion.
    In this study we can identify such words using the standard deviation of the ratings. For example, if our English respondents differ more in their rating of the term 'rather happy' than of 'fairly happy' while the midpoints are the same, future researchers should better avoid the former term.

Back to World Database of Happiness, International Happiness Scale Interval Study