Background

Evidence suggests that choice causes an illusion of control and that people feel more likely to achieve preferable outcomes even when the options are functionally identical (Langer, 1975). Lottery tickets are often used to demonstrate how choice can cause an illusion of control. For instance, each ticket in a lottery have the same probability of being selected. This selection process occurs randomly. Nonetheless, it has been demonstrated that having a choice creates an “illusory sense” that drives a person to believe that their chosen lottery ticket contains the winning numbers (Langer, 1975). Klusowski et al. (2021) claim that, on the contrary, many other alternatives could lead a person to choose a specific option. Precisely, the authors believe that choice reflects people’s pre-existing beliefs rather than an illusion of control. But what if, instead, a person’s preexisting beliefs cause an illusion of control, and this is what drives a person to make a choice? How does the level of education impact that illusion of control?

Research questions

Therefore, this project aims to analyze in R if 1) preexisting believes causes an illusion of control, which drives a person to feel confident about making that specific choice. Also, if 2) the more level of education a person has, the least likely preexisting beliefs cause an illusion of control when making a choice.

Data origins

The dataset for this project was taken from the paper “Does Choice Cause an Illusion of Control?” by Joowon Klusowski, Deborah A. Small, and Joseph P. Simmons (2021). The dataset chosen for this project was from experiment 17, which is very extensive. Only a few variables were included to analyze, which also includes other sociodemographic variables such as education, age, and gender, as shown below:

# 2.Load the data

library(readr)

mydata <- read_csv("data.csv")
# Clean data 

db <- df %>% mutate(ResponseId, p_selected, p_highest, likelihood,
                              confidence2, age, gender, education, 
                              choice_tertiary)

head(db, 5)
## # A tibble: 5 x 10
##   p_highest confidence2 p_selected   age gender education likelihood Finished
##       <dbl>       <dbl>      <dbl> <dbl>  <dbl>     <dbl>      <dbl>    <dbl>
## 1      34             5       34      26      1         2          5        1
## 2      33.3           5       33.3    28      1         4          5        1
## 3      33.4           5       33.3    24      1         2          5        1
## 4      70            10       70      41      0         5         10        1
## 5      60             9       20      41      0         5          9        1
## # … with 2 more variables: ResponseId <chr>, choice_tertiary <dbl>

The meaning of the variables used for this project can be found in the codebook to the Github. Similarly, to see the original experimental design, refer to this link for the paper and for the dataset here.

Data preparation

#3.Run Data analysis

#3.1. Rename the variables to group them in the data frame

ID <- rep(db$ResponseId)
Confidence<- rep(db$confidence2)
Age <-rep(db$age)
Gender<-rep(db$gender)
Selection <- rep(db$p_selected)
Probability <- rep(db$p_highest)
Groups <- rep(db$choice_tertiary) #choice condition (pre_ choice=, post choice=, post_no_choice=)
Education <- rep(db$education) # 1: no high school degree; 2: high school degree; 3: associate's degree; 4: bachelor's degree; 5: graduate degree (e.g., master's, Ph.D.)

#3.2. Group all the variables as data.frame: 

dat <- data.frame(Probability= Probability, Groups=Groups, 
                  Selection= Selection, Age= Age, Gender= Gender,
                  Education=Education, Confidence=Confidence)

head(dat, 5)
##   Probability Groups Selection Age Gender Education Confidence
## 1       34.00      2     34.00  26      1         2          5
## 2       33.34      1     33.33  28      1         4          5
## 3       33.40      0     33.30  24      1         2          5
## 4       70.00      1     70.00  41      0         5         10
## 5       60.00      2     20.00  41      0         5          9

Calculate summary statistics for the variables

describe(dat) 
##             vars   n  mean    sd median trimmed  mad   min max  range  skew
## Probability    1 599 38.65 12.94   34.0   35.35 0.74 33.33 100  66.67  3.52
## Groups         2 599  0.99  0.82    1.0    0.99 1.48  0.00   2   2.00  0.01
## Selection      3 599 36.20 13.02   33.4   33.91 0.89  0.00 100 100.00  3.21
## Age            4 599 35.66 10.73   33.0   34.44 8.90 18.00  69  51.00  0.97
## Gender         5 599  0.55  0.51    1.0    0.55 0.00  0.00   2   2.00 -0.08
## Education      6 599  3.38  1.07    4.0    3.36 1.48  1.00   5   4.00 -0.19
## Confidence     7 599  4.69  2.55    4.0    4.55 1.48  0.00  10  10.00  0.49
##             kurtosis   se
## Probability    12.63 0.53
## Groups         -1.50 0.03
## Selection      13.61 0.53
## Age             0.36 0.44
## Gender         -1.71 0.02
## Education      -1.21 0.04
## Confidence     -0.28 0.10

Calculate statistical analysis using OLS regression

#Ordinary least squares (OLS) regression

m1 <- lm(Confidence~Equal + Age + Groups + Education + Gender, data= dat)
summary(m1)
## 
## Call:
## lm(formula = Confidence ~ Equal + Age + Groups + Education + 
##     Gender, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.5361 -1.8366 -0.4388  1.3824  6.4538 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.791881   1.042814   1.718   0.0863 .  
## Equal        1.587217   0.201864   7.863  1.8e-14 ***
## Age         -0.008124   0.009288  -0.875   0.3821    
## Groups1      0.070787   0.244898   0.289   0.7726    
## Groups2      0.032249   0.243716   0.132   0.8948    
## Education2   2.183502   1.003735   2.175   0.0300 *  
## Education3   2.639962   1.021796   2.584   0.0100 *  
## Education4   2.101443   0.999802   2.102   0.0360 *  
## Education5   1.998024   1.025937   1.948   0.0519 .  
## Gender       0.210148   0.196137   1.071   0.2844    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.415 on 589 degrees of freedom
## Multiple R-squared:  0.1148, Adjusted R-squared:  0.1013 
## F-statistic: 8.491 on 9 and 589 DF,  p-value: 5.452e-12

Visualisation

p <- ggplot(data=dat, aes(x= Education, y= Confidence, colour = education)) +
  geom_boxplot(aes(frame= gc)) + 
  labs( title = "Does preexisting beliefs cause an illusion of control?") + 
  xlab("Level of Education") +
  ylab("Level of Confidence") 

#Plot interactive animated visualization

ggplotly(p)

Results and Summary

The presented data analysis and visualization showed that age, gender and Groups (Pre-Choice, Post-Choice and Post-No Choice) are not statistically significant to determine the illusion of control (confidence that your box will win).

Furthermore, the results show that people with no education tend to have less confidence than those with education. Nevertheless, higher education does not necessarily mean more confidence. For instance, Undergraduates and Postgraduates, on average, have less confidence on their selected box than high school and associate degree holders.

Although the authors of this article managed to cover the relation between choice and preexisting beliefs precisely, it was challenging to decide which variables were needed to adequately represent and analyze the hypothesis. Therefore, it would be interesting to plan other alternatives of experimental design and statistical methodology to collect and analyze the data.

For further studies, it would be of great interest to visualize the correlation between preexisting beliefs that causes an illusion of control with both age and gender, as well as with the winning box to see the accuracy of their selection.

Reference

Langer, E. J. (1975). The illusion of control. Journal of Personality and Social Psychology, 32, 311–328.

Klusowski, J., Small, D. A., & Simmons, J. P. (2021). Does Choice Cause an Illusion of Control?. Psychological Science, 32(2), 159-172.