Gamiﬁed environmental multi-criteria decision analysis: information on objectives and range insensitivity bias

Multi-criteria decision analysis (MCDA) is well suited to address complex public policy problems but could beneﬁt from new tools to involve many laypeople. Online information on specialized topics could be more engaging by including game elements. This paper reports an experiment that assessed a gamiﬁed interface to (1) inform laypeople about the objectives to consider in wastewater management decisions, (2) assist them in constructing range-based preferences, and (3) provide a positive experience. We measured the eﬀects with (1) a knowledge pre-and posttest, (2) the elicited weights and a range sensitivity index, and (3) an experience questionnaire based on self-determination theory. Answers from 174 participants indicated that participants learnt about the objectives and constructed preferences in both the gamiﬁed and control treatments. However, in neither were weights suﬃciently adjusted. Our gamiﬁcation making the ranges salient did not help overcome this bias. Both treatments were experienced as neutral to positive, the gamiﬁed being more enter-taining. We discuss implications: if gamiﬁcation of tools for participatory decision-making is to be promoted, it requires further research. Range insensitivity remains an unresolved bias in MCDA.


Background and motivation
Need for participatory multi-criteria decision analysis (MCDA). Public policy and environmental decision problems are complex, with many stakeholders involved that may have very different Thus, training and careful use of such methods should mitigate the most severe biases and the use of heuristics. When eliciting weights, good practice requires to ask consistency check questions (Hobbs and Meier, 1994), emphasize the attributes ranges to counteract the range insensitivity bias (Gabrielli and von Winterfeldt, 1978;von Nitzsch and Weber, 1993;Eisenführ et al., 2010;Montibeller and von Winterfeldt, 2015), and use symmetrical hierarchies of objectives (Jacobi and Hobbs, 2007;Marttunen et al., 2018).
The present work contributes to behavioral OR (Franco and Hämäläinen, 2016;Franco et al., 2021). It focuses on local preferences, which are based on the attribute range in the specific context (Goldstein, 1990;van Ittersum and Pennings, 2012), rather than on general attitudes toward the objectives, also referred to as global preferences. In this study, local preferences are captured as weights given to objectives. According to the value-comparison hypothesis (Fischer, 1995), local weights for objectives are best elicited with methods that emphasize cross-attribute comparisons, such as trade-off and swing methods (e.g., Eisenführ et al., 2010).
For instance, when deciding about a job, a decision-maker might state that high salary is the most important objective. This is a general statement, based on a global scale from very low to very high salary. However, consider a specific decision where (1) the range of variation in salary is marginal, that is, the alternatives to choose from offer similar salaries, and (2) the range of variation of a second criterion unimportant on the global scale varies greatly on the local scale. For example, the number of holiday days per year could be four times larger for one alternative than for the others. A decision-maker will be asked to express trade-offs between "high salary," which varies marginally, against "high number of holiday days," which varies greatly. Considering the local scale of this specific decision, the decision-maker should give little importance to "high salary," because it does not discriminate well between the alternatives relative to "high number of holiday days," which does. The trade-off and swing weight elicitation methods emphasize such comparisons between attribute levels, and should be range sensitive. The value difference on a narrow attribute range is larger than that on a broad attribute range if the value function is renormalized. As consequence, the weights should be lower on a narrow attribute range than on a broad range.
However, experiments report that the weights adjustment for manipulated ranges of attribute is lower than the normatively assessed adjustment (Gabrielli and von Winterfeldt, 1978;von Nitzsch and Weber, 1993;Fischer, 1995). The range sensitivity principle is so generally accepted in the decision analysis literature (see, e.g., textbook Eisenführ et al., 2010, p.151-154), that to our knowledge very few recent follow-up experiments or analyses have been conducted, despite great concern about decision-makers' insufficient adjustment of weights to ranges (Morton and Fasolo, 2009;Montibeller and von Winterfeldt, 2015). Moreover, these early experiments used simplified decision problems such as job, flat, or car selection with four attributes or fewer. In our experiment, we used a complex real-world decision on wastewater management to investigate range sensitivity. Our second research question reads as follows: (RQ2) Does informing participants on objectives and making attribute ranges especially explicit facilitate preference construction based on the provided range of the attribute? Specifically, do we observe the expected change in weights when the range is manipulated?
Gamification to improve the experience. User experience is central to interfaces evaluation. We used the self-determination theory (SDT), in particular the basic psychological needs theory (BNPT), to investigate user experience of the gamified information on objectives, as is often done in the literature on gamification (Seaborn and Fels, 2015;Mekler et al., 2017;Nacke and Deterding, 2017; van Roy and Zaman, 2017;Ryan and Deci, 2017a). The original proposition reads, "there are three basic psychological needs, the satisfaction of which is essential to (…) integrity, and wellbeing. These are the needs for autonomy, competence, and relatedness. (…) Need frustration is associated with impoverished functioning." (Ryan and Deci, 2017b, Proposition 1a, p.242). Additionally, these authors propose that "psychological need satisfaction and frustrations vary within persons over time, contexts, and social interactions. Any factor or event that produces variations in need satisfaction or need frustration will also produce variations in wellness, and this principle extends from highly aggregated levels of analysis down to moment-to-moment or situation-tosituation variations in functioning." (Ryan and Deci, 2017b, Proposition 1b, p.243). Other findings from qualitative approaches, close to the grounded theory, align with the BPNT (Heimann and Roepstorff, 2018). In this study, we evaluate user experience by exploring whether gamification satisfies basic needs. Previous studies suggest that gamification fulfills three fundamental human needs: autonomy, competence, and relatedness (e.g., Nacke and Deterding, 2017, and references therein). Autonomy represents the need for volition, to decide voluntarily for oneself. Providing choices is common to manipulate autonomy (Sheldon and Filak, 2008;Przybylski et al., 2010;Ryan and Rigby, 2019). Competence represents the need to achieve something, to be successful and effective. It can be manipulated by providing clear goals and unlocking the next difficulty level when easier levels are achieved (Ryan and Rigby, 2019). Relatedness represents the need to connect with others in mutual exchange. In single-player video games, nonplayer characters are a major feature influencing relatedness (Ryan and Rigby, 2019). They support the player and enable some "exchange with others." We designed our gamification to satisfy these three needs (Section 1.3), and in turn improve the user experience. Our third research question is: (RQ3) Does the proposed gamified interface to inform laypeople on objectives provide a positive user experience? Do the constructs of the self-determination theory explain this effect?
To answer the three research questions, we carried out an experiment. We tested a gamified interface to inform laypeople on objectives, based on a real-world decision about sustainable wastewater management. We also varied some of the range information provided. The experimental design is fully described in Section 2.

Developing a gamified tool to inform laypeople on objectives
Here, we report on the gamification design following the method by Morschheuser et al. (2017). This method includes seven phases: project objectives (I); analysis of context and users (II); ideation (III); design of prototype(s) (IV); implementation of design of prototype (V); evaluation (VI); and monitoring (VII). The present paper describes the first six phases, focusing on the evaluation. Monitoring is beyond the scope of our work.
(I) Project objective. First, we identified the aim of the gamified tool as providing information on objectives in such a way that it makes the attribute range explicit and salient. We expect that this will increase range awareness and help overcome the range insensitivity bias. The tool should inform on the objectives to consider, in our example when deciding about wastewater management, following the current best practices (Payne et al., 2006;Anderson and Clemen, 2013). Specifically, the gamified tool should provide each objective's name, its definition, its measure or attribute, the unit in which it is measured, the best, worst, and status quo levels of each attribute, and the expected impact of these attribute levels. It should draw attention to the ranges, that is, the best and worst possible levels of each attribute. In other words, the gamified tool should provide the facts needed to understand the decision at stake and thus facilitate factual learning. In turn, this should help laypeople to construct range-sensitive preferences on the relative importance of the objectives. Moreover, the interface should provide an engaging user experience. Ideally, the gamified tool should be flexible enough to allow case-specific variations, for instance, in the objectives to consider or in the ranges of the attributes. To be accessible to many, it should be an online interface not requiring software download. An online interface would also allow centralized data collection from distributed respondents. As final aim, it should be possible to extend and merge this gamified tool with a prototype that we have developed to elicit weights assigned to objectives (Aubert and Lienert, 2019). Given our objectives, we decided to develop a "soft gamification" (Bailey et al., 2015). Soft gamification means that we included game elements in our informative interface, as opposed to a fully fledged game that includes information.
(II) Context and users. The target users are citizens affected by decisions about wastewater management in rural Switzerland. Thus, the target users are German-speaking adults and young adults with at least basic education. They will use the gamified tool online without assistance from an experienced decision analyst.
(III) Ideation. First, we brainstormed a storyline. The storyline should introduce a challenge motivating users to learn about wastewater management, hereby enabling them to understand the objectives. To develop the story, we considered the feedback from game designers concerning our first gamified prototype (Aubert and Lienert, 2019). They appreciated the initial storyline, but made several suggestions for improving it. In a nutshell, the game designers suggested that the story should become more immersive by connecting more strongly to the wastewater topic and local decision context. Second, we consulted game element lists (Duke, 1980;Hamari et al., 2014), and added those elements expected to positively influence the constructs of basic psychological needs theory ( Table 1) that had not yet been included. We based this brainstorming phase on valuefocused thinking (Keeney, 1992). This means that we developed options for telling an interesting story about wastewater management that would enable us to reach the objectives identified by game designers and the theories used to explain the mechanism of gamification.
(IV) Prototype development and pretests. In the early 2018, we twice pretested prototype versions drawn on paper at the Zurich University of the Arts (https://gamedesign.zhdk.ch/, retrieved on 11 October 2019) with game design lecturers. Each pretest led to improvements, resulting in a single, improved prototype. Thereafter, two decision analysts with experience in wastewater management tested the improved paper prototype. This enabled further development and sparked discussion to select the objectives, based on a Swiss case study (Beutler and Lienert, 2020). An internal workshop was organized with three decision analysts involved in real-case MCDA applications for decisions about wastewater management in Switzerland. We identified 10 objectives Beutler and Lienert, 2020), and organized them in a two-level hierarchy of objectives. High removal of organic matter and high removal of micropollutants defined the higher level objective of high environmental protection. High recovery of nutrients, low water consumption, and low net energy  consumption defined high sustainable resource use. High health protection, high attractiveness, and low time demand for the end-users defined high societal well-being. Low cost and high flexibility defined high economic performance. We wrote specification documents describing the design (available upon request). These included a technical document (eight pages) specifying following features: supporting technology and exploitation system, structure of the information on objectives and alternatives provided to the player, game elements, language, data that needs to be recorded, access, license, and confidentiality terms. Specifically, the description of game elements included an overview of the five chapters of the story, a description of the characters in the story, the feedback provided by the interface and the progression display, and the random events. An appendix to the technical document (19 pages) specified an exemplary path through the interface with pictures of the paper prototype, and text for the instructions and dialogues. (V) Prototype implementation. The company Opinion Games GmbH (Brugg, Switzerland: https://opiniongames.ch/, retrieved on 15 August 2019) won the competitive call to implement the concept, described in the specification documents, and deployed it for web-based use. The gamification consisted of a narrative frame composed of five chapters, each being a web page (Figure 1, screenshots and descriptions of each chapter, see Supplementary Information SI 1). The overall mission is to discover the challenges Mr. Akles, a wastewater engineer, faces. The challenges correspond to the objectives for MCDA. The designed gamification, based on the BPNT, released the information gradually. It included a chapter dedicated to the range of attributes. Autonomy game elements (action choice and random events) should make the users feel volition. Competence game elements consisted of small clearly defined tasks to achieve, and gradual information delivery. Hereby, © 2022 The Authors.

International Transactions in Operational Research published by John Wiley & Sons Ltd on behalf of International Federation of Operational Research Societies
A. H. Aubert et al. / Intl. Trans. in Op. Res. 30 (2023) 3738-3770 3745 Fig. 1. Screenshots of the gamified information on objectives for a wastewater management decision in rural Switzerland. (a) Page 2: discovering the objective (name, icon, and definition). (b) Page 4: pop-up window depicting an attribute level in the status quo, the current state in the decision case; that is, the fulfilment bar for "low costs," the text (in German) describing the attribute level and its expected impact, the attributes' unit (e.g., CHF per person and year), and the numerical values for the worst (red, lowest number), and best states (green, number at the top). Clicking on the specific red ("worst possible state") or green ("best possible state") buttons changes the fulfilment bar and the descriptive text. , and were then automatically directed to one of the four treatments, thereafter to a weight elicitation interface, and finally to a survey (programmed in LimeSurvey).
the aim was that the users should feel progress and could realize achievements. Relatedness game elements consisted of interacting with a guiding nonplayer character, Mr. Akles. Table 1 also summarizes the users' actions and related game elements.
(VI) Evaluation. The focus of this paper is on the evaluation of the gamified interface prototype. We provide method details concerning evaluation in the Introduction (Section 1.2), in the Methods (Section 2), and the results in Section 3.

Experimental design and procedure
We evaluated our gamified interface quantitatively and qualitatively. First, we compared the gamification ( Fig. 1) with a control treatment. The control consisted of a pdf file containing the identical information on the 10 objectives. Each objective description fitted on an A4 page (example screenshot, see SI 2). The experiment was conducted in a controlled environment in a lab at the Institute of Psychology at the University of Zurich. Participants completed the survey using the university computers seated in individual cubicles.
We designed a 2 × 2 between-subject experiment. The varying factors were (1) the information mean (gamified vs. nongamified control) and (2) the attribute ranges (manipulated vs. original control), leading to four treatments in total (Fig. 2).
When arriving, participants received an information sheet about the experiment and signed a consent form. Thereafter, they answered a pretest on the computer about their knowledge of wastewater management (SI 3). One of four treatments followed (Fig. 2). Next, we elicited participants' preferences, specifically the weights, using the swing method (von Winterfeldt and Edwards, 1986;Eisenführ et al., 2010). Its procedure makes attribute ranges explicit (description see SI 4, with screenshots). Therefore, swing elicitation is recommended to limit the range insensitivity bias (Montibeller and von Winterfeldt, 2015). How to elicit swing weights online is discussed elsewhere (e.g., Aubert et al., 2020). Finally, the knowledge test was repeated (SI 3), and participants responded to a feedback questionnaire, designed to measure the autonomy, competence, relatedness, perceived learning, and experience (SI 5).
Second, we qualitatively evaluated the gamification orally and in writing to strengthen our understanding of the quantitative results. The written qualitative data were answers to open text boxes in the feedback questionnaire. The oral qualitative data were collected during a test user workshop with game design students from Zurich University of the Arts, which was facilitated by the first author. After introducing the BPNT, participants signed an informed consent form and individually tested the gamified interface. Thereafter, they individually assessed it based on three tasks (SI 6). First, they wrote down two words that spontaneously came to their mind to describe their experience. Second, they rated their overall experience on scales from 1 to 5 (selection of questions from the experiment). After a break, we discussed their answers in an audio-recorded session.
Finally, we tested the gamified interface in a real case study (without a control). We assumed that the real-case respondents, whom the decision substantially affects, would engage in the survey more than participants from the experiment would. A Swiss rural municipality was deciding about the future of their wastewater management following a structured MCDA process in facilitated workshops (Beutler and Lienert, 2020). The local decision-makers wanted to include the citizens' opinions, since some alternatives directly affect them. For instance, novel decentralized wastewater technologies could be installed in their houses (Larsen et al., 2016). The local decision-makers welcomed the gamified information on objectives and the weight elicitation survey. We included a shortened version of the feedback questionnaire. The municipal administration invited the citizens to participate between August and September 2019. A lottery incentivized the respondents. Its total value was CHF 1000, which was split among respondents who registered as vouchers for one of four stores.

Sample definition and recruitment
We randomly distributed the four treatments among participants and balanced for gender. In addition, to limit the potential confounding effect of age due to the digital divide between generations , we focused on the age group of 18-30 years. This age group is (1) part of the target group, being citizens of voting age, (2) used to digital interfaces, and consequently (3) familiar with their design and usability (Cunliffe, 2000). Thus, they are unlikely to drop out of the survey for lack of digital knowledge and can provide critical feedback, both of which are important for informing future improvements.
We advertised the experiment on the internal pin board at Eawag, on the student virtual marketplace (marktplatz.uzhalumni.ch/de, retrieved on 24 January 2020) of the ETH Zurich and the University of Zurich (UZH), on the test person server of the UZH (www.psychologie.uzh.ch/ probandenserver/public/, retrieved on 24 January 2020), on mini-jobs.ch, with flyers in various ETH and UZH buildings, and the ad was further shared on social media and per e-mails. The invitation was in German. Each participant received CHF 30.
We determined the sample size by a priori statistical power analysis using G*Power 3.1.9.2 (Faul et al. 2009), with the results reported in Ryan (2017). We targeted 200 participants to be equally distributed across our four treatments. We collected data from January to mid-March 2019. The project underwent ethical review by the Eawag directorate and was approved of; it classified as a minimal risk project involving human subjects.

Measures
For each research question, we developed measures based on the literature (Table 2). Some measures are quantitative (performance test), while others are subjective self-reported evaluations by each participant. For the sake of conciseness, we only develop the Range Sensitivity Index hereafter. The specifics of all measures are in the SI.
To measure preference construction, we elicited swing weights given to the 10 objectives (full procedure described in SI 4, with screenshots). First, participants rank-ordered hypothetical scenarios with only one objective at its best attribute level. Second, participants rated the ordered hypothetical scenarios relatively to one another, assigning points between 100 (for the preferred scenario that was ranked first) and 0 (for the worst-case scenario, where all objectives were set to their worst level as default). We explored possible effects of range sensitivity by informing on objectives. We made the attribute ranges explicit.
To investigate RQ2, the ranges of attribute was a varying factor. Previous experiments halved or doubled the ranges (von Nitzsch and Fischer, 1995). Gabrielli and von Winterfeldt (1978) called for a strong manipulation of attribute ranges, as they observed that they may not have manipulated the ranges sufficiently in their experiments. We modified two of the original ranges of our case. We chose two attributes that (1) were expressed in the same readily understood unit, the percentage, and (2) could vary in opposite directions. The objective of high removal of organic matter (Orgm) was made relatively unimportant by reducing the range of its attribute: the original worst level of only 37% removal of organic matter was increased to 82%, so the original range of 56% was reduced to 11%. The objective of high flexibility (Flex) was made relatively important by increasing the range of its attribute: the original worst level of 37% flexibility was further decreased to 1%, so the original range of 51% increased to 87%.
We decided for a between-participant experiment for two reasons (Fischer, 1995). First, it prevents subjects from directly comparing attribute ranges and adjusting weights accordingly. Second, it is closer to real-world conditions: when making a decision, there is usually only a single local context. Thus, we compared the weight distributions (mean, median, and standard deviation) of the two objectives with manipulated attribute ranges to the weight distributions with the original ranges. We calculated the range sensitivity index (RSI) developed by von Nitzsch and  across participants, with the reference objective of low cost (Cost), which is easy to understand. We also calculated the RSI with a second reference objective of low water consumption (Wat), which statistically had the same average weight in both conditions (8.5 (SD = 3.1) and 8.2 (SD = 3.3), respectively; based on the results of the experiment). The RSI consists of the percentage of range difference that the participants considered. It is calculated with M, the expected change of weight resulting from the normative analysis of attribute range effects, in our case with a linear value function, and M eli , the elicited or empirical change of weight (von Nitzsch and Weber, 1993;Fischer, 1995;Eisenführ et al., 2010), as in Equation (1): We averaged the weights between participants and scaled the weights of Orgm (or Flex) and Wat (or Cost) so that (1) the ratio between Orgm (or Flex) and Wat (or Cost) stayed constant; and (2) © 2022 The Authors.  the sum of the weights of Orgm (or Flex) and Wat (or Cost) equaled 1. Thereafter, we used the largest range as the global value model and calculated M, the expected adjustment factor of the weight for the narrow range conditions as in Equation (2):

International Transactions in Operational Research published by John Wiley & Sons Ltd on behalf of International Federation of Operational Research Societies
where v r represents the linear value function for Orgm (or Flex) on the global attribute range, and B r the local attribute range. We calculated M eli , the elicited (or empirical) adjustment factor of the weight as in Equation (3): with the scaled average of the elicited weights as w manip local for the objective Orgm (or Flex) in the narrower attribute range treatment, w re f local for the objective Wat (or Cost) in the narrower attribute range treatment, w manip gl obal for the objective Orgm (or Flex) in the larger attribute range treatment, and w re f gl obal for the objective Wat (or Cost) in the larger attribute range treatment. According to the range sensitivity principle, the RSI should be 1 (elicited weight normatively adjusted). If the weights are range insensitive, the RSI equals 0.

Data analysis
We analyzed the data using R project for statistical computing (R Development Core Team, 2020). Given the sample size (>50), we evaluated the differences between groups using Student t-tests. Effect sizes were calculated with the effsize package. Further details are given in Section 3. We removed 20 participants from the entire analysis due to their responses to a filter question (SI 5d). The filter question asked whether participants had encountered a nonplayer character. Obviously, participants receiving the gamified information should answer yes, the others no. Six participants receiving the gamified treatment answered that they had not seen a character. Fourteen participants receiving the control answered that they had seen a character. We discarded all 20, because we doubted the reliability of their answers.

Participants of the experiment
In total, 204 people participated in our experiment. We analyzed data from 184 participants, of whom 132 were female and 51 male. One person did not disclose gender. The mean age of participants was 23.5 years (SD = 3.16). The majority (75.54%) had a university degree. Most participants had little or no previous knowledge of wastewater: 97.28% answered knowing "nothing at all," "rather little," or "a little" about wastewater before the experiment. The sample was split in two groups, gamified (N = 95) and nongamified (N = 89). The gender ratio was equal in both groups (see SI 7 for a detailed sample description).

Factual learning (RQ1)
Overall, the participants' delta knowledge score ( KS, difference between the initial and final scores) was positive (Table 3): they learnt about the objectives after receiving information. The mean initial and final scores were statistically significantly different (paired t-test; t(183) = -23.08, p < 0.001, d = -1.97). There was no difference in the scores between the nongamified and the gamified treatments, suggesting that a similar amount of factual learning occurred in both treatments (two sample t-test; t(182) = -0.74, p = 0.462, d = -0.11).
Note, we tested the knowledge test with 52 additional participants that completed the knowledge test twice, six days apart (median) but did not receive information on objectives. This test showed that (1) overall, without information on objectives, there were no statistically significant differences between the initial and final scores; (2) within participants, the initial and final score correlated, indicating good test-retest reliability, r = 0.75; and (3) the KS from our informed participants, those participating in the actual experiment, were statistically significantly higher than the KS from uninformed participants (see SI 8a, for results of the statistical test). Thus, we are confident that our interventions facilitated factual learning.
There was also no statistically significant difference between the treatments in the self-reported factual learning ratings on 5-point Likert scales (SI 5a). Participants reported having "rather learnt" about wastewater management (gamified: mean μ = 3.88, SD = 0.68; nongamified μ = 3.93, SD = 0.77). In addition, the 20 citizens from the Swiss case study (Beutler and Lienert, 2020) obtained final knowledge scores in the same order of magnitude (μ = 6.4, SD = 2.1, median Mdn = 7.0).  Table 3 Knowledge scores (KS) for the pretest before receiving information on objectives (initial) and the posttest after receiving information (final). KS varied between 0 and 10. KS = final KS -initial KS; thus, -10 < KS < 10. μ = mean; SD = standard deviation; Mdn = Median  Table 4 Mean weights (μ), standard deviations (SD), and median weights (Mdn) given to the objectives of high removal of organic matter (Orgm) and high flexibility (Flex), for the original (Origin.) and manipulated (Manip.) ranges of attribute

Preference construction (RQ2)
For RQ2, we discarded another 10 answers (N = 174). These 10 participants wrongly entered their identification number in the swing weight elicitation survey, which prevented us from linking their weights to the rest of the data. We compared the effect of the range manipulation on the weights using t-tests for independent samples. By decreasing the range of the objective "high removal of organic matter" (Orgm) in the manipulation, we expected Orgm to receive a lower weight in the manipulated treatment than in the control. This was statistically significantly verified for the total sample, albeit with a small effect size (Table 4; SI 8b). By increasing the range for the objective "high flexibility" (Flex) in the manipulation, Flex should receive a higher weight in the manipulated treatment than in the control. Over the total sample, the weight given to Flex in the manipulated treatment was higher than in the control; however, the difference was not statistically significant (Table 4; SI 8b). The range manipulation seemed to have a small effect on the weights but was mostly not significant, once even in a counterintuitive direction (mean weight for Flex in the nongamified group), and not systematically in favor of gamification, despite our design that made attribute ranges explicit.
The range sensitivity indices (RSIs) varied from negative values to 0.63 at best (Table 5). The RSI is the ratio between the expected and the elicited weights (see Methods, and von Nitzsch and Weber, 1993). Negative RSIs, not reported so far in the literature, imply that the elicited weights lead to M eli > 1. This means that the ratio of the elicited weights for the narrow attribute interval was higher than the ratio of the elicited weights with a larger attribute interval. We observed  Table 5 Expected (M) and elicited (M eli ) weights (mean (standard deviation)), and range sensitivity index (RSI) for the objectives "high removal of organic matter" (Orgm) and "high flexibility" (Flex), in reference to the objectives "low water consumption" (Wat) and "low cost" (Cost negative RSIs for high flexibility (Flex) in the nongamified group (consistent with the mean weight results (Table 4)). It indicates that the weights were adjusted to the range in the opposite direction from that suggested by the range sensitivity principle. We observed the highest positive RSI for the gamified group (Flex). In contrast, the nongamified group was more range sensitive than the gamified group for the objective "high removal of organic matter" (Orgm). In sum, the results of gamification on range sensitivity were equivocal: our gamification emphasizing the attribute ranges when informing participants on objectives did not systematically lead to the normatively expected weight adjustment. When testing the perceived range sensitivity (feedback questions) about the percentage of improvement between the best and worst cases (see method Section 2.3.2), participants who received the nongamified treatment achieved a statistically significant better score (μ = 2.11, SD = 0.93, Mdn = 2.00) than those who received the gamified treatment (μ = 1.52, SD = 1.04), Mdn = 1.00; t(172) = 3.90, p < 0.001, d = 0.59). However, the self-reported range sensitivity on the 5-point Likert scales did not differ between the treatments: on average, participants reported being somewhat (3) to rather aware (4) of the best and worst levels (gamified: μ = 3.41, SD = 0.69, Mdn = 3.50; nongamified: μ = 3.48, SD = 0.61, Mdn = 3.50; SI 8b). Most participants reported learning about their own preferences in both treatments (gamified: μ = 3.51, SD = 0.74, Mdn = 4; nongamified: μ = 3.44, SD = 0.67, Mdn = 3; SI 8b). They also reported that those preferences reflect their opinions (gamified: μ = 4.35; SD = 0.66, Mdn = 4; nongamified: μ = 4.28, SD = 0.64, Mdn = 4; SI 8b).

Experience (RQ3)
We measured how users experienced the interface for informing about wastewater objectives with eight items (see SI 5a). We then compared this perceived experience in the gamified and nongamified treatments using independent sample t-tests and adjusting the significance level with Bonferroni corrections for multiple testing. The gamified group experienced the survey as   Table 6). No differences were found between the groups for any other of the single items (Table 6). To our knowledge, no properly validated scales exist to measure the constructs of basic psychological needs when studying the effects and mechanisms of gamification. Therefore, as is common in game and gamification studies, we adapted the scales that we found in the literature (Sheldon and Filak, 2008;Przybylski et al., 2010;Tamborini et al., 2010;Peng et al., 2012;Ryan and Deci, 2017a;Xi and Hamari, 2019). We did not attempt to create a new scale, which would require testing for convergent and discriminant validity. The following result can be considered as a prestudy. Our scales to measure competence showed acceptable internal reliability (α = 0.74), comparable to internal reliability levels reported in the literature (e.g., Sheldon and Filak 2008;Przybylski et al., 2010;Ryan, 2017). This indicated that the items consistently measured a single construct. The internal reliability for our scale for autonomy was also acceptable (α = 0.70). However, the internal reliability for our scale for relatedness was not acceptable (α = 0.19), indicating that the items did not measure a single construct. We thus refrained from analyzing its items on a single scale (i.e., aggregating all items) and report comparisons at the single-items level. This result suggests that two different subdimensions of relatedness may not be well captured on a single scale (discussed in Section 4.3). There were no statistically significant differences between the two treatments for any of the items measuring experienced competence, autonomy, or relatedness (Table 7). For all, answers were neutral to positive (3 and above).
Autonomy and competence were positively correlated with entertainment (for N = 184, r = 0.40, r = 0.40, SI 8c). These correlations were of the same order of magnitude for the gamified (r = 0.38, r = 0.45 for autonomy and competence, respectively) and the nongamified groups (r = 0.41 and r = 0.36). However, the trends that competence correlated more closely with entertainment in the gamified group than in the nongamified one and that autonomy correlated more closely with entertainment in the nongamified group than in the gamified one were not significant (SI 8c). Nevertheless, the need for autonomy and competence seemed to have been satisfied, which potentially enhanced the user experience.  Table 7 Mean (M), standard deviation (SD), and median (Mdn) of the constructs of the basic psychological needs theory for all participants (total), participants from the gamified treatment and from the nongamified treatment. Finally, we checked for the relatedness with the nonplayer character (NPC) for the gamified group (SI 8d). Participants reported rather low relatedness to the NPC, with items receiving low scores between 2.16 for the creation of virtual relationship (worst 1, best 5) and 3.26 for how encouraging the NPC was.

Workshop, Swiss case study, and qualitative results
This section focuses on two additional data sources: the workshop with game design students and the Swiss case study. On 29 April 2019, 11 game design students and two lecturers took part in the workshop. The feedback on the general experience ranged from positive to negative (results of the first workshop task, SI 9a). The gamified information was perceived as informative, interesting, and a source of information. However, the amount and level of information was perceived differently: some found it was too complex, overwhelming, and demotivating (SI 9a), whereas others thought it was too simplified and not detailed enough. Participants discussed the game elements critically, both positively and negatively. For instance, one student warned that it was possible to click through without reading the text. The artwork was positively perceived by three participants, although one student reported mismatches between the icons and background styles. SI 9b presents comments about the nonplayer character Mr. Akles, who was intended to create relatedness. Many participants reported being annoyed by the automatic voice-over: they found that it sounded awkward and unnatural even though a professional actor had read the text. Four of the game design students qualified the gamification as bland (not spicy at all), boring, and monotonous, while positive comments included friendly and cute (SI 9a). Four participants asked for the target audience, doubting that the gamification was suitable for the citizens of a rural area. This last comment resonated with the qualitative feedback collected in the main experiment and the case study (SI 9b, SI 10b). Game design students suggested ways to improve the gamification (Sections 4.3 and 4.4).
In the Swiss municipality, 20 citizens completed both the weight elicitation and the shortened feedback surveys. Three of the respondents were decision-makers involved in the complete MCDA process and participated in decision-making workshops (Beutler and Lienert, 2020). The gamified information on objectives was perceived either positively or negatively (SI 10c). All items about the overall experience scored from neutral (3.1) to positive (4.5) (SI 8f).

Discussion
This paper focuses on using gamification for informing laypeople about the objectives important for wastewater management decisions in Switzerland. We tested whether a gamified interface highlighting the ranges enabled factual learning and range-sensitive construction of preferences, and whether it provided a positive user experience. We compared the interface with a nongamified control treatment in an experiment, and used the gamified version in a real decision case. In the following, we reflect on our three research questions (Sections 4.1-4.3). Since our study showed the limits of our proposed gamification, we then suggest ways on how to overcome and address the encountered difficulties (Section 4.4). Finally, our experiment opened up further research questions, and we provide preliminary guidance on further research paths (Section 4.5).

Factual learning about objectives (RQ1) Yes, the proposed gamified interface to inform on objectives facilitated factual learning, as did the nongamified information.
Both the gamified interface and the identical text in the nongamified control treatment facilitated factual learning. Knowledge scores showed that participants significantly increased their knowledge about the wastewater objectives by roughly 25%. This was also supported by the qualitative feedback (Table 3, SI 10a). For instance, the game design students found the gamified interface informative (SI 9a). Thus, the gamified interface is suited to teaching laypeople about objectives relevant to wastewater management. However, contrary to our expectations, the gamified interface led to just as much factual learning as the control.
This lack of difference in learning between the treatments dovetails with fairly equivocal findings regarding the usefulness of gamification to facilitate learning. Some reviews report positive successes (e.g., Kasurinen and Knutas, 2018;Subhash and Cudney, 2018), whereas others have mixed views. For instance, a review paper on social learning with collaborative serious games reports evidence for cognitive and normative learning, but relational learning is not sufficiently investigated to reach conclusive results (den Haan and van der Voort, 2018). One theoretical paper is provocatively titled "why gamification fails in education (…)" (van Roy and Zaman, 2017). Dichev and Dicheva (2017) also report that 64% of 41 reviewed papers are inconclusive, another 10% report negative results, and just 26% report results supporting the usefulness of gamification for learning in higher education. This suggests that gamification can backfire, for instance, if participants are caught in the game and forget the "serious" part, or if participants are overloaded by learning how to play, thereby not being able to process the new information. Given the literature, our results are plausible.
In sum, it is somewhat disappointing that the gamified information on objectives for factual learning is only as good as the control, especially because the results stem from a rigorously designed experiment. However, the gamified interface was as good as the control, suggesting that it was successful in conveying the relevant knowledge, while also being experienced as more entertaining.
In addition, the lack of differences in our study could at least partially be due to our sample composition and testing conditions. Despite recruiting participants broadly (including advertising on a platform for small jobs, social media, by broad word-of-month, and by e-mail to apprentices), 75.5% of the participants had a university degree. Most participants were used to reading and processing long and complex texts. Additionally, the formal experimental situation in cubicles at the university and the monetary incentive may have increased the motivation to perform well, and the sense of obligation to follow the instructions. Further studies should investigate the extent to which individual characteristics and experimental conditions influence the perception of gamification and its outcomes.
The Swiss case study provided preliminary insights on this sampling issue. Only three of 20 respondents in the case study had a higher education; the remaining 17 respondents belonged to our target group, laypeople. The final knowledge score of these 17 respondents did not differ strongly from those with higher education (6.4 and 7.0, respectively) and was about the same as our experimental results ( observations. Proper experiments with larger public samples and controlling for potential moderating factors, such as education level and learner type (Buckley and Doyle, 2017) would enable better understanding of the effects of gamification (Dichev and Dicheva, 2017;Nacke and Deterding 2017;Landers et al., 2018). Finally, this study focused on a gamified online interface to inform about objectives of wastewater management. The proposed gamification can be improved to increase the feelings of autonomy, competence, and relatedness (see Sections 4.3 and 4.4). Future research will need to test whether changing aspects of gamification would lead to increased factual learning. Alternatively, other means to enhance factual learning could also be investigated. This endeavor would require reviewing literature for learning in adulthood (Merriam and Baumgartner, 2020), and/or multimedia learning (Niegemann and Heidig, 2012). To mention one example, instructional videos might be a means to inform laypeople about the objectives (e.g., Norman, 2017;Espino et al,. 2021).

Preference construction and range insensitivity bias (RQ2) Informing participants on objectives making attribute ranges explicit facilitated preference construction, which, however, was insufficiently range-sensitive.
Our results for the gamified treatment suggest that participants constructed preferences and that the elicited weights reflected their opinion (Section 3.3). Additionally, participants reported considering the best and worst attribute levels. The dedicated test questions partially supported this self-reporting: the participants' perceived range sensitivity was medium on average (about two of four points). Thus, the gamified interface facilitated range-based preference construction to some degree.
However, despite designing our gamified interface to emphasize the attribute ranges, our gamification failed to create stronger range sensitivity than the nongamified information on objectives. One reason for the lack of evidence for higher range sensitivity in the gamified treatment than in the control could be that range sensitivity relates to factual learning, because factual knowledge about the objectives is a prerequisite for considering the ranges. Given that we did not find differences in factual learning (RQ1), the lack of differences in range sensitivity (RQ2) might not be surprising. Alternatively, the way we made the ranges explicit in the dedicated chapter on the worst and best levels (SI 1) could be suboptimal. On hindsight, we are uncertain whether we were able to make the texts about the consequences of the extreme levels of attributes tangible enough to make sense for laypeople. As future research proposition, we suggest explicitly testing different types of textual information as well as other information modes for conveying ranges to laypeople. This could include a small training task at the beginning of the task so that respondents can experience the effects of range insensitivity (Anderson and Clemen, 2013). Moreover, our preliminary search for correlation between knowledge scores and range perception are not conclusive (SI 8b), and require further investigation. For instance, individual RSIs could be calculated, instead of between participants.
The range sensitivity indices (RSI) is the ratio between the expected and elicited weights (von Nitzsch and Weber, 1993). These were rather low in our experiment (at best 0.63), but in line with those found in the literature, which were at best 0.62 (Monat, 2009, (Table 5). Negative RSIs do not comply with the range sensitivity principle. Negative RSIs show that an objective receives a higher weight with a narrow attribute range than with a wide one. The weight-change model (Wedell, 1998) could explain this. This model suggests that in a pairwise comparison, when the range of attribute level is extended on Dimension A, the relative weight given to differences along Dimension A decreases, because the weights for both Dimensions A and B are adjusted relative to the larger ratio of difference in the attribute levels (see numerical example in Wedell, 1998). Wedell (1998) tested their weight-change model in a series of three experiments with 159, 98, and 143 participants. However, their results did not support the weight-change model, whereas ours might. If future studies also report negative RSIs, research should test this model in detail.
Low RSIs, indicating insufficient weight adjustments to range manipulation, are common (Monat, 2009). Possibly, we used inadequate linear single-attribute value functions to calculate the RSIs. RSIs depend on the shape of the value function (Weber and Borcherding, 1993) and is congruent with the value-shift model (Wedell, 1998;Verlegh et al., 2002). This model suggests that the participants rescale the value functions to the changed attribute ranges instead of adapting the weights. Accordingly, van Ittersum and Pennings (2012) inferred local weights based on global nonlinear value functions. This seems to be a promising avenue, with or without gamification. Future research should continue exploring the reasons and potential remedies to the range insensitivity bias, which is fundamentally important in MCDA, and where we see some urgency for finding effective debiasing methods (Montibeller and von Winterfeldt, 2015).
Concerns about range insensitivity relate to what the weights represent. In particular, the multiattribute value theory literature discusses global versus local weights. Recent articles suggest relevancy analysis (Marttunen et al., 2019) or the swing weight matrix method (Ewing et al., 2006) to overcome the range insensitivity bias. These approaches view weights as a combination of global importance, which is based on peoples' values, and local importance, which is based on the attribute range in the specific application case. Alternatively, Monat (2009) suggests eliciting first the respondents' own global scale, that is, best and worst attribute levels based on experience or imagination, and second the weights using direct rating. Thereafter, the decision analyst can calculate the weights to fit the local scale. Eliciting global ranges from participants in trade-offs was tested in the nineties (Baron, 1997). However, it may create an anchoring bias (Montibeller and von Winterfeldt, 2015). Further experiments could compare these procedures and the standard trade-off and swing weight elicitation methods.
We can learn from experiences gathered to tackle other biases. For instance, the splitting bias persisted, despite informing about the bias, raising awareness, and educating about the elicitation process and pitfalls (Weber et al., 1988;Pöyhönen and Hämäläinen, 2001;Jacobi and Hobbs, 2007). It might be worth developing debiasing approaches in case the range sensitivity bias also persists, despite providing information that points to avoiding it (Jacobi and Hobbs, 2007). Debiasing approaches could include (1) multiple elicitations from the same person to reveal possible conflicts in answers and ask participants to correct their obviously biased preferences, or (2) mathematically correct the biases, for example, following up on van Ittersum and Pennings (2012). In general, OR researchers could reflect on whether gamification could help overcoming known biases, beyond the already mentioned range insensitivity or anchoring bias. Potentially interesting biases include group think, anchoring, or the status quo bias. Furthermore, it is important to ensure that the gamification does not introduce or strengthen biases. For instance, a wrongly developed storyline or graphic design could make one objective more salient than others, thereby potentially triggering the availability bias (Montibeller and von Winterfeldt, 2015). A potential limitation of our study may be the weight elicitation procedure, which is supported by an online interface, but without an experienced decision analyst. We used an improved interface in comparison to previous attempts (Aubert et al., 2020). Its evaluation was positive, but we suggested further improvements (Aubert et al., [accepted]). Online weight elicitation presents both advantages and challenges (Mustajoki et al., 2004;Insua and French, 2010;Aubert et al., 2020). Specifically, online participation -aided with a decision support system, but no experienced analyst -presents the following advantages: space and time constraints no longer apply. Once set by an experienced analyst, less experienced operators can easily carry it out, adapt, or replicate it in different contexts. Being accessible to almost everybody, affected laypeople can be involved, potentially broadening the range of opinions and helping to identify conflicts. Online participation in such surveys enables involving many individuals in decision-making, and satisfies a call for increased participation in praxis (e.g., French and Argyris, 2018;Johnson et al., 2018). However, challenges remain (Marttunen and Hämäläinen, 2008). To date, the digital divide remains a reality. Cognitively demanding and long surveys can be irksome and demotivating. It is unclear whether participants really understand the instructions or the process as a whole (Mustajoki et al., 2004;Introne and Iandoli, 2014), which could increase the risk of people using heuristics and exhibiting biases. Good understanding of the instructions and the process is a prerequisite for high-quality weight elicitation, whether facilitated by an analyst or not. To circumvent the absence of facilitation and the risk of misunderstanding of the process, some researchers favor online interactive group decision and negotiation (Insua and French, 2010). They aim to increase in-depth understanding of the selected representative participants (Bayley and French, 2008;Benyoucef and Verrons, 2008;Vieira et al., 2020).
The MCDA community is exploring how to contribute to e-democracy (Insua and French, 2010). Several paths are possible: one can focus on the democratic process, or on the tools and instruments used in the process (Lavin and Rios Insua, 2010). Researchers can also focus on increasing the understanding of involved participants, or on increasing the number of participants. Online preference elicitation for independent use are tools to increase the number of participants. Few studies are available so far (Mustajoki et al., 2004;Insua and French, 2010;Gregory et al., 2016;Lienert et al., 2016;Aubert et al., 2020). Future research should control specifically for process understanding, biases, and heuristics use, and systematically evaluate the online interfaces. As Phillips (2006, p.188) wrote: "Is an unsound tool better than unaided judgement? One wonders."

Gamification to improve user experience (RQ3) The gamified information on objectives was more entertaining.
Comparing items of user experience, the only statistically significant difference between the gamified and nongamified treatments was that the gamified information on objectives was more entertaining than the control (Section 3.4, SI 8c). In all other respects, both treatments received neutral to positive assessments. There are two potential reasons for this lack of a significant effect in the overall experience between treatments. One reason could be at the manipulation level: our items were reliable, but there was no difference to measure, because the manipulation was too weak. Another reason could be at the measurement level: the manipulation was fine, but the items did not measure reliably enough what we manipulated. Items for the overall experience came from the literature (e.g., Harms et al., 2014). We tend toward the first explanation at the manipulation level, because the differences in the constructs meant to influence the overall experience (the three basic psychological needs) were themselves minor. We discuss this hereafter.
There were no statistically significant differences between the gamified and nongamified treatments for the constructs of autonomy, competence, or items of relatedness (Section 3.4, Table 6). The constructs of autonomy and competence showed acceptable internal reliability and received neutral to positive assessments. However, the internal reliability (Cronbach's alpha) of the relatedness scale suggested issues in our selection of items. We discuss these results in more detail below.
The SDT, and in particular the basic psychological needs theory (BPNT), is one of the most commonly used theories to explain the effect of gamification (Seaborn and Fels, 2015;Nacke and Deterding, 2017;Ryan and Deci, 2017a;Ryan and Rigby, 2019). Several studies have reported positive effects of gamification on the three constructs of the BPNT (Gagné, 2003;Wee and Choong, 2019;Xi and Hamari, 2019). To the best of our knowledge, no scales to measure the constructs of the BPNT for gamification were properly validated yet. Therefore, we adapted those scales that we found in the literature, as these authors did themselves (Sheldon and Filak, 2008;Przybylski et al., 2010;Tamborini et al., 2010;Peng et al., 2012;Ryan and Deci, 2017a;Xi and Hamari, 2019). In practice, one should formulate the question in the same way as the original questions, but ensure that it makes sense in the new context. Given our results (particularly for relatedness, see below), our adjustments might have been too strong, and we recommend reconsidering the original items (Gagné, 2003), or newly partly validated scales for gamification (Högberg et al., 2019).
We found that competence and autonomy were positively correlated with entertainment in both conditions. This indicated that feeling competent and volitional increases the positive perception of the interface. However, perceived competence, that is, how much one feels successful and effective, was statistically not significantly higher in the gamified condition than in the nongamified one, and its relation to entertainment was similar in the gamified and nongamified conditions. We manipulated competence within the gamified interface based on small tasks with clear goals such as matching tasks and progress. Apparently, this was not enough to significantly increase the perceived competence or lead to higher correlation between competence and entertainment.
Autonomy, that is, deciding voluntarily for oneself, also showed no statistically significant differences between the gamified and nongamified conditions. This is unexpected because the gamified interface contained game elements that are usually connected with the experience of autonomy, such as being able to choose the order of events (Ryan and Rigby, 2019). In our interface, the participants could choose the order in which they discovered and read the information about the objectives. Possibly, the inclusion of more game elements connected with autonomy would have produced a different result. Game elements associated with immersion, including narratives and avatars, have a positive effect on perceived autonomy (Xi and Hamari, 2019). Our interface contained a nonplayer character, Mr. Akles, but the lack of a virtual relationship with that character (see next paragraph) suggests that the created environment was insufficiently immersive. Moreover, a nonfixed structure such as a nonlinear storyline might better satisfy the need for autonomy (Wee and Choong, 2019), a feature that was not offered in our gamified interface. Finally, experimental conditions can also restrict the participants' feeling of autonomy (Heimann and Roepstorff, 2018) which is essential for playfulness. Certainly, sitting in a cubicle in the university lab to conduct a computer experiment is hardly a playful condition. The medium correlation between experienced autonomy and entertainment, however, indicated that it is promising to consider game elements that increase autonomy when developing a game-based interface, as an increased feeling of autonomy leads to a better experience. Third, we failed to measure a single construct for relatedness, which represents connectedness and exchange with others. Our three items for measuring relatedness (SI 5a) showed unacceptable internal reliability. Further analyses (SI 8e) confirmed that these three items measured different aspects and not one single construct. This could be because relatedness in case of gamified information on objectives for societal decisions might have two aspects: (1) relatedness to the nonplayer characters and (2) relatedness to society as a whole. This dichotomy of relatedness is also discussed in the literature. The players can be related to the nonplayer characters in the game-based interface independently of feeling related to other participants (e.g., in our case, society) (Ryan and Rigby, 2019). Our construct included different subdimensions in one, which turned out to be unsuccessful. A recent new questionnaire to measure the experience of gamification, gamefulquest (Högberg et al., 2019), includes a specific section for social experience, one of the subdimensions of relatedness. However, these authors do not identify the connection with the nonplayer characters as a stand-alone dimension. Investigating gamification with BPNT is an emerging field of research, and we need to develop standards, for instance, for the items and the variables investigated. Our results contribute to this rapidly growing gamification literature. It can serve further research to identify critical issues and support converging toward common standards.
Finally, despite the proposition that relatedness can be experienced through a virtual relation with a nonplayer character (Ryan and Deci, 2017a;Ryan and Rigby, 2019), hardly any research has investigated the importance of nonplayer characters per se (Mekler et al., 2017;Wee and Choong, 2019). Instead, the need for relatedness was satisfied by competition, cooperation in a team, and social networks (Wee and Choong, 2019;Xi and Hamari, 2019). Results relating most to nonplayer characters show that avatars, a meaningful story, and teammates are elements that positively influence social relatedness (Sailer et al., 2017). Most studies identify nonplayer characters as a part of the narrative or meaningful story. Our results showed a lack of virtual relationship and call for specifically investigating the effect of nonplayer characters in gamification. Their design can demand substantial resources, up to using artificial intelligence to adapt the nonplayer character to the player, as already occurs in some video games (e.g., Kopel and Hajas, 2018). This question has already been raised: "we are intrigued by how needs for relatedness may be met by 'computer generated' personalities and artificial intelligence" (Ryan et al., 2006, p. 350). Can participants relate to simple, less costly, nonplayer characters? Should nonplayer characters have specific traits to facilitate virtual relationships? What are these traits? We call for investigating the relatedness to nonplayer characters in addition to social relatedness.

Limits of the proposed gamification
Our results showed that our gamified information failed to create immersion, and in particular, relatedness to the nonplayer character Mr. Akles. The nonplayer character was a cornerstone of the proposed gamification, and the absence of relatedness to Mr. Akles could explain the overall lack of effect of our gamification. The nonplayer character and avatar could benefit, for instance, by including possibilities for interactive dialogues that create emotions or at least empathy. These are elements that make a meaningful storyline, which facilitates immersion and in turn the feeling of autonomy (Xi and Hamari, 2019). Additionally, improving voice-over and sound ambiance could contribute to immersion. Qualitative feedback indicated that the voice-over that we used in our interface should be less theatrical and more natural. The whole experience could benefit from background sounds, and each objective could have a specific sound signature in addition to the icon. Finally, more diverse nonplayer characters could be included, each with different opinions. This would highlight the diversity of worldviews while creating options for more interactions and dialogues. Implementing these aspects in a nonlinear structure would also increase the feeling of autonomy (Heimann and Roepstorff, 2018).
Random events are game elements that break the linearity of fixed structures, which we also implemented in our gamified interface as drinking coffee with Mr. Akles (Table 1). These also need improvement, as they should contribute to fulfilling the need for autonomy (Ryan and Rigby, 2019). Currently, the main random events occurred when the participant was inactive to remind them to take action. Thus, participants that remained active did not encounter any random events. If encountered, they had little to no consequences on the fixed storyline, as they just served as reminders. Consequently, game design students judged them as being useless. They suggested relating them to the topic of wastewater management to give additional information and a stronger meaning to the storyline, thus increasing its immersive potential. As for the nonplayer character, we did not find any discussion of the effect of random events per se in the recent gamification literature.
The present study focused on Evaluation (phase VI of the gamification design method we used) (Morschheuser et al., 2017). After this phase, it is common to iterate back to phase IV Design, or even III Ideation, to "solve" the problems revealed in the Evaluation phase. In hindsight, we could have engaged earlier with users via test panels, in shorter and more frequent iteration loops. Co-designing is a common practice when developing serious games for complex decisions, as often done in participatory modelling (e.g., Barreteau et al., 2014;Ditzler et al., 2018). Co-designing usually ensures that the gamification matches the audience, and can reveal "wrong ideas" early in a design process. It is also in line with user-centered design (Hollender et al., 2010). However, it is also highly resource consuming. In some cases, the co-design process is even the aim. Co-design processes help participants to learn and make explicit their tacit knowledge (Barreteau et al., 2014;.

Further outlooks for research and gamification
SDT is not the only valid theoretical framework for investigating gamification but it is currently commonly used (Seaborn and Fels, 2015;Nacke and Deterding, 2017;Ryan and Deci, 2017a;Ryan and Rigby, 2019), mostly as the basic psychological needs subtheory. However, Loughrey and Broin (2018) argue that most gamification research currently oversimplifies SDT and ignores other subtheories such as cognitive evaluation, organismic integration, causality orientations, and goal contents (Ryan and Deci, 2017b). Loughrey and Broin (2018) invite follow-up research: (1) to create a taxonomy of game elements and measure whether they are perceived as extrinsically or intrinsically motivating, noting that this will be user and context dependent (causality (2) to verify whether individuals perceiving more external regulation are more likely to integrate extrinsic motivational elements such as game elements, as opposed to individuals perceiving more internal regulation (organismic integration); and (3) to differentiate individuals who are intrinsically motivated, as the use of extrinsically motivating elements, even with an intrinsically motivating functional significance, may thwart their existing motivation. This final aspect might have been the reason for the mixed judgment by the voluntarily participating Swiss citizens in our gamified survey in the case study: they were intrinsically motivated people who did not need the extrinsic motivational elements of gamification. In addition, consideration should be given to the interactions between basic needs (Ryan and Deci, 2017b) and the effects of age, level of education, and exposure to games in general on the perception of gamification. This relates to an unexplored question, whether gamification is also suitable for more senior citizens. Gamification research is growing. This will shed light on whether its promising features are as powerful as expected, but this requires that rigorous evaluation showing negative or null results are also published (thereby overcoming some publication bias towards positive results) (Sridharan and Greenland, 2009). Indeed, potential drawbacks of gamification also need to be investigated. For instance, is gamification diverting attention and cognitive capacity from the main task?
Finally, if the gamification is digital as in our case, the frameworks used in the field of Human-Computer Interaction (HCI) can be relevant (e.g., Carroll, 1997;Hochheiser and Lazar, 2007). HCI among others deals with the usability of programs or digital tools, that is, how users behave and/or accept them. Interestingly, some of this work also refer to some SDT subtheories (Venkatesh, 2000).

Conclusion
We developed and thoroughly tested an online interface in a controlled experiment. The aim of the online interface was to inform laypeople about objectives in a real decision about sustainable wastewater management, and to elicit their informed weight preferences for later use in decisionmaking. This is important because online interfaces for MCDA, which can be used unaided by a decision analyst, could make MCDA processes more participatory. For instance, many affected citizens could take part in the decision process, which so far has been difficult to achieve in decision workshops.
We tested the gamified information on objectives for three aspects. First, factual learning occurred: the participants were better informed about the objectives after completing the survey. However, contrary to our expectations, factual learning in the gamified treatment was not higher than in the nongamified control treatment. This is in line with a growing body of studies on gamebased learning that report no evidence of an increase in factual learning due to gamification (Dichev and Dicheva, 2017;van Roy and Zaman, 2017). One reason in our case could be suboptimal gamification design, which we discuss.
Second, range-based preference construction was confirmed. However, our results, in line with the existing literature, showed that the weights assigned to manipulated attribute levels were not sufficiently adjusted. Differences between the gamified and control treatments in range sensitivity were equivocal. Future experimental settings such as within-subject design for the range manipulation may be useful. Third, the gamified interface to inform about objectives provided a neutral to positive user experience, and was more entertaining than the nongamified control. However, gamification did not increase feelings of competence and autonomy. Our questionnaire, which is fully available for other researchers, requires further development. Our preliminary results on this study are not readily comparable to others in the literature. One reason is that gamification is an emerging field in which no standards exist yet. Our results contribute to ongoing scientific debates about (1) serious games and gamification to enhance learning; (2) preference construction, questioning the range sensitivity principle, and the meaning of weights in MCDA; and (3) evaluating gamification based on the SDT.
Despite the results not always being statistically significant, our study provides interesting insights. We have learnt that our gamified online information on objectives may have suffered from a suboptimal gamification, in particular not sufficiently satisfying the need for autonomy of participants by proposing a too linear story. In addition, the scale to measure the relatedness is insufficient. It should distinguish between the connectedness to the nonplayer character and social connectedness, which is very relevant in public decision-making. Co-designing games in participatory modeling would be another promising approach to support gamified decision-making for complex cases.
We discuss future research lines, which include: (1) continuing efforts to assess the feasibility of online MCDA interfaces that allow for eliciting preferences unaided by a decision analyst; (2) carrying out studies to test different models that could explain range insensitivity, as our results questioned the range sensitivity principle; (3) having a more comprehensive understanding of the SDT in studies focusing on gamification by replicating experiments with different participants; (4) developing gamification-specific validated and reliable basic psychological needs scales; and (5) testing the effectiveness of different interventions. Generally, we targeted a growing societal demand by investigating an innovative way to engage laypeople in complex environmental decision-making while exploring its effect on a well-known bias. We encourage our colleagues, particularly from behavioral operational research, to join us in the exciting expansion of traditional decision support approaches offered by technical online developments. the case study. Finally, we are grateful for the anonymous reviewers' comments, which helped to greatly improve this paper.
Open access funding provided by ETH-Bereich Forschungsanstalten.

Funding
The Swiss National Science Foundation Ambizione grant #173973 to Aubert supports this work.

Software availability
Name