The 5-Item Likert Scale and Percentage Scale Correspondence with Implications for the Use of Models with (Fuzzy) Linguistic Variables

The aim of this paper is to examine how people perceive correspondence between the 5-item Likert scale and the percentage scale (the LS-PS correspondence thereinafter). Are all five items of the Likert scale equidistant? Do people use the same scale when evaluating different objects? Are men and women different? Are people from different countries / cultures different? The method of the study was a questionnaire with 661 participating respondents altogether from the Czech Republic, Ecuador


Introduction.
Evaluation of objects such as goods, services, or companies on various scales is almost omnipresent. However, it is difficult to measure human attitude, character and personality because of its subjective nature (Prasad, 2016), and to transform subjective opinions into objective reality (Joshi, Kale, Chandel & Pal, 2015). The use of scales enables comparing, rating, or ranking objects, which is important in many areas of business, science, entertainment, or personal life. The scales used for evaluation are usually numerical or linguistic. In the former case, objects are usually evaluated on scales from 1 to 5 (typically in the form of 1 star to 5 stars), from 1 to 10, or in percents from 0 to 100%. This approach is followed by many Internet platforms such as Booking.com, which assigns each accommodation option values from 1 to 10; Goodreads.com, where its users rate books from 1 star to 5 stars; IMDb.com, where a movie fan can find rating of his/her favorite movie from 1 to 10, etc.
In the latter case, objects are assigned "labels", such as from A to F when evaluating students' performance in the USA and majority of EU countries, from A to H for energy consumption of houses, credit ratings from A to C or D, and so on. In the case of the linguistic scales, its values (linguistic terms) are called categories, and it is required that the set of all categories of a given scale is ordinal, which means categories are ordered from the worst (least desirable) ones to the best (most desirable) ones.
A Likert scale, introduced by Rensis Likert (Likert, 1932), is a bipolar scaling method, which measures positive or negative response to a given statement. The scale is very popular among researchers in psychology, sociology, pedagogy, marketing, or business that involves questionnaires. A Likert scale is usually treated as ordinal or interval data. It is usually constructed from an odd number (3, 5, 7, or 9) of categories, where the medium category expresses ambivalence, indecisiveness, or lack of an opinion of a respondent, while the first and the last category express an extreme opinion. Typically, the five-item Likert scale has the following form: {very good; good; average; poor; very poor}, or {strongly agree; agree; neither agree, nor disagree; disagree; strongly disagree}. It has become common practice to treat this scale as a rating scale and assume that the equal intervals hold between the response categories, see e.g. Blaikie (2003). However, as Creswell (2008) points out, we have no guarantee that intervals are equal indeed. The assumption that the "distance" between each successive pair of categories is the same, is going to be challenged in this paper. More on Likert scales, their proper use, statistical properties, or applications can be found in Allen and Seaman (2007), Blaikie (2003), Clason and Dormody (1994), Creswell (2008), Dawes (2008), Edwards and Kenney (1946), Hasson and Arnetz (2005), Joshi et al. (2015), Norman (2010), Traylor (1983), Prasad (2016), or Willits et al. (2016.
Finally, Booking.com is the example of a platform that combines the numerical and linguistic approach. According to Booking.com, an accommodation rated 9+ is "superb", 7-9 means "good", "okay" is 5-7, "poor" 3-5, and "very poor" 1-3. The origin of this correspondence remains unclear, since there is no study linking linguistic terms such as "superb" with the percentage scale.
Apart from Internet platforms, linguistic variables are extensively used in decision-making modeling under uncertainty. Usually, the models express linguistic terms via triangular (or other) fuzzy numbers, and Likert scales are assumed to be equidistant, see e.g. Carrasco et al. (2012), Casola et al. (2005), Holeček and Talašová (2010), Lima et al. (2016), Lin andYeh (2012), or Yan et al. (2017). However, if individuals do not perceive Likert scales to be equidistant, this input assumption might lead to unwanted distortion of results.
Therefore, the aim of this paper is to examine a Likert scale and percentage scale correspondence (the LS-PS correspondence in short) to find answers to the following questions: is 5-item Likert scale perceived to be equidistant by respondents (decision makers)? Are there any differences with respect to location, gender, or age of respondents? Do respondents use the same (universal) scale for evaluation? To find answers to the aforementioned questions, several research hypotheses were formulated and statistically tested. The method of the study was an experiment carried out via a questionnaire.
The presented study is the first of its kind, but its results are not only of theoretical worth. On the contrary, it has important practical implications. The results can be used as an input to decision-making models with (fuzzy) linguistic variables, improving their accuracy.
The paper is organized as follows. In Section 2 the research problem is presented. In Section 3 data and the method of the study are described. Section 4 provides results of the study and in Section 5 a more accurate 5-item Likert scale for modeling with triangular fuzzy numbers is proposed. Conclusions close the article.

The problem and research hypotheses.
To date, there is no empirical study known to the authors on how exactly people perceive linguistic scales as well as their identification with the percentage scale. Moreover, there is no evidence on whether people use one (general) scale for evaluation or different scales in different situations. Likert scales are assumed to be symmetrical and equidistant. But how much is such an assumption realistic? Do all people use the same scale? Do they use such a scale in different situations and for different objects? Is there no difference between them for respondents with respect to country/culture, gender, or age?
To find out the answers to the aforementioned questions, the relationship between 5-item Likert scale and the percentage scale was examined by the experiment described in the next section so that the following null hypotheses can be statistically tested: • H 01 : Items on the 5-item Likert scale are evenly distributed (are equidistant) with respect to the percentage scale.
• H 02 : Individuals from different countries use the same percentage scale for the corresponding 5item Likert scale (LS-PS correspondence).
• H 03 : Men and women use the same LS-PS correspondence.
• H 04 : Younger and older respondents use the same LS-PS correspondence.
Further, one research question (Q1) was examined: Q1: Do individuals use one (general, universal) scale for the evaluation of different objects (goods or services)?

Data and method.
The experiment linking the 5-item Likert scale with the percentage scale was carried out during 2017-2018. Table 1 summarizes the locations and the number of respondents.
The method of a study was a questionnaire disseminated online in Spanish, French, and Czech languages, respectively. High school students from Clermont-Ferrand and university students from the locations such as Karvina, Quito, Cuenca, Marseille, and Angers (see Table 1) formed the set of 661 respondents. The respondents were instructed to assign each of 5 categories of the Likert scale (very good -good -average -poor -very poor) an interval in percents corresponding to their perception of what is "very good", "good" and so on, when evaluating two items: a movie and a cell phone. Thus, each student provided two LS-PS correspondences, which amounts to 1,322 LS-PS correspondences altogether. Naturally, the intervals assigned to the five items had to cover the entire interval [0, 100%] without overlapping: β , and 4 β denote the boundary values between "very poor" and "poor", "poor" and "average", "average" and "good" and finally "good" and "very good" categories, respectively. Further, let 0 0 = β and 100 5 = β . Clearly, we have: (2) Let 1 δ , 2 δ , 3 δ , 4 δ , and 5 δ correspond to the "lenghts" of intervals corresponding to "very poor", "poor", "average", "good", and "very good" categories, respectively.
To test the null hypotheses H 02 -H 04 , the independent two-sample t-tests with unequal variances (with normality tested as well) and one-factor ANOVA were applied (see the next section). Hypothesis H 01 was tested via a paired two-sample t-test and one-factor ANOVA. Testing was performed via MS Excel and Gretl.

Results.
The results of the study are summarized in Tables 2-6, where the overall results and results with respect to geographic location, gender, and age of respondents are provided.
As can be seen from Tables 2 and 3, respondents didn't divide, on average, the 5-item Likert  scale uniformly with boundaries   {  }   100  ,  80  ,  60  ,  40  ,  Another interesting feature were regional differences in i β (see Table 2). The i β values of respondents from Ecuador were the highest, meaning the respondents were the most "demanding" in their evaluations. On the other hand, the values from respondents in France were the lowest. Also, respondents from France assigned the largest width ( 5 δ ) to the category "very good", while respondents from the Czech Republic and Ecuador assigned the largest width ( 1 δ ) to the category "very poor". The differences between France and the other two countries could be caused by more heterogenous sample of French respondents with respect to age and education.  As for gender diferences (Table 4), men assigned higher values to all i β than women on altogether, but also for each country separately. Further, only 23.3% individuals used the same scale in the evaluation of a mobile phone and a movie (Table 5).  (Table 6). Source: Own elaboration.

Equidistance of the 5-item Likert scale.
The first problem examined in the presented study was the problem of (alleged) equidistance of the 5item Likert scale. The null hypothesis states all five categories have the same "width".
• The null hypothesis H 01 : The hypothesis was statistically tested via a paired two-sample t-test with unequal variances (in MS Excel, after checking for data normality), where the test of all 10 pairs of i δ and j δ , was performed with its results being shown in Table 7a.

National (geographic, cultural) differences in LS-PS correspondence.
Since respondents originated from different countries and cultures, the following null hypothesis was tested: The hypothesis was tested (for each i β separately) by one-factor ANOVA, where the independent factor was a country of respondents: Czech Republic, Ecuador, and France. The results of ANOVA (performed in MS Excel) are provided in Table 8.
The null hypothesis H 02 was rejected at least at p = 10 -34 (!) level (other p-values were even smaller and rounded by MS Excel to 0).

Gender differences in LS-PS correspondence.
To test (possible) differences between men and women with respect to LS-PS correspondence, the following null hypothesis was formulated: • H 03 : Men and women use the same LS-PS correspondence ( For the test, a two-sample unpaired t-test with unequal variances was performed (for each i β separately), and the results are shown in Table 9a. Since women i men i β β ≠ for i = 2, 3 and 4, the null hypothesis H 03 was rejected at p = 0.0018 level. In addition, gender differences in all three countries (separately) were examined. The following Table 9b shows p-values of the null hypothesis H 03 tested by two-sample t-test.
At p = 0.01 level, the null hypothesis H 03 was rejected in the case of French respondents.

Age differences in LS-PS correspondence.
To test possible differences between younger and older respondents, respondents were divided into two age groups: 15-29, and 30 and more.
• H 04 : Younger and older respondents use the same LS-PS correspondence ( Since the data from France were the only one including older respondents (above 29 years of age), they were the only data applicable for the testing of hypothesis H 04 . The result of two-sample unpaired t-test with unequal variances (for each i β separately) is shown in Table 10.
The null hypothesis H 04 was rejected at p = 0.01 level.

Uniqueness of the 5-item Likert scale.
Many models based on (fuzzy) linguistic variables assume the Likert scale to be equidistant, which means the scale is also invariant with respect to a subject of evaluation. Consequently, humans are supposed to use one (universal) scale for evaluation. Therefore, the following question was examined: • Q 1 : Do individuals use one (general, universal) scale for the evaluation of different objects (goods, services, etc.)?
According to the empirical data acquired in this study, only 23.27% of respondents used the same correspondence between percentage scale and Likert scale for a movie and a cell phone. Thus, the meaning of linguistic terms "very good", "good", etc., is not universal and varies with respect to a subject of evaluation.
The answer to the question Q 1 is therefore negative.

The 5-item Likert scale and its representation by triangular fuzzy numbers.
In the context of a decision-making, linguistic variables are used to model uncertainty present in many real-world problems. As such, they are often expressed in terms of fuzzy sets. Here, we recall two useful definitions: Definition 1 (the fuzzy set). Let U denote the universal set of discourse. Then, a fuzzy set Definition 2 (the triangular fuzzy number). Let . Then, the triangular fuzzy number (TFN) is a triplet (a, b, c) with the membership function given as follows:   However, according to our findings, 5-item Likert scale is not equidistant, nor symmetrical. A new, more realistic expression of the 5-item Likert with TFNs, based on the empirical data acquired by the presented study, is provided in Table 11, the 3 rd column. Although the numerical differences between both sets of TFNs are not large, they may contribute to more accurate modeling and decision making involving 5-item Likert scales. Of course, triplets (a, b, c) in TFNs' definition can also be further adjusted for gender, age, or geographic location since differences in the perception of the LS-PS correspondence with respect to the aforementioned factors were found statistically (highly) significant. These refinements may lead to even more precise results.

Conclusions.
The aim of the paper was to examine empirical relationship between the 5-item Likert scale and the percentage scale. The results indicate that decision makers use different scales when evaluating different subjects. Therefore, linguistic terms such as "very good" are not used universally as they rather depend on a subject of evaluation. Another important feature found by the presented study is that the 5-item Likert scale is not perceived to be equidistant by decision makers, and this conclusion is statistically highly significant. The linguistic term "very poor" was found 'wider' than other four terms.
Moreover, the study found that regional, gender, and age differences were also highly statistically significant. While respondents from Ecuador and the Czech Republic used almost identical LS-PS correspondence, respondents from France were markedly "less demanding". In addition, men provided LS-PS correspondence with higher boundary values than women in general. Similarly, older respondents provided LS-PS correspondence with higher boundary values than younger respondents.
The results of the presented study suggest that the problem of human perception of linguistic terms is much more complex than previously thought, and decision-making models with (fuzzy) linguistic variables, which employ equidistant and universal Likert scales, constitute excessive oversimplification of reality and might lead to undesired distortion of results.
We believe the findings of our study can be used as a more accurate input for models applying linguistic variables of the form of the 5-item Likert scale and more realistic linguistic evaluation of goods or services. However, more thorough research on the topic would be desirable.