P1: Test a Perceptual Phenomenon - Background Information

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition

Questions For Investigation


What is our independent variable? What is our dependent variable?

  • Independent variable: Based on the color of the words displayed, two cases are identified namely Congruent and Incongruent. THe identification of the corresponding case is the independent variable.
  • Dependent variable: time to identify the color of words corresponding to congruent or incongruent case.

What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? Justify your choices.

    • Null hypothesis : Population mean of Congruent ($\mu_C$) and Incongruent ($\mu_I$) cases are equal. $$H_0: \mu_C - \mu_I =0$$
    • Alternate hypothesis : Population mean of Congruent ($\mu_C$) and Incongruent ($\mu_I$) cases are different. $$H_A: \mu_C - \mu_I\neq0$$
  • Since the sample size $n < 30$, one sample two tailed t-test (for paired samples) with $\alpha = .05$ is proposed. This will determine whether there is a significant difference in the two samples namely Congruent and Incongruent cases. We don't know the population standard deviation, hence the Bessel corrected standard deviation of the sample should be used.

  • Assumptions made:
    • We assume the distributions of dependent samples and their difference are normaly distributed (Gaussian).
    • We assume the samples are randomly selected.

In [3]:
import csv
import numpy as np
import pandas as pd
from IPython.display import display
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv('dataset.csv') # read the data
display(data)
Congruent Incongruent Diff_CminusI
0 12.079 19.278 -7.199
1 16.791 18.741 -1.950
2 9.564 21.214 -11.650
3 8.630 15.687 -7.057
4 14.669 22.803 -8.134
5 12.238 20.878 -8.640
6 14.692 24.572 -9.880
7 8.987 17.394 -8.407
8 9.401 20.762 -11.361
9 14.480 26.282 -11.802
10 22.328 24.524 -2.196
11 15.298 18.644 -3.346
12 15.073 17.510 -2.437
13 16.929 20.330 -3.401
14 18.200 35.255 -17.055
15 12.130 22.158 -10.028
16 18.495 25.139 -6.644
17 10.639 20.429 -9.790
18 11.344 17.425 -6.081
19 12.369 34.288 -21.919
20 12.944 23.894 -10.950
21 14.233 17.960 -3.727
22 19.710 22.058 -2.348
23 16.004 21.157 -5.153

Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability.

Mean and Stand deviation for both cases are given.

For congruent case (n = 24) : $$\overline{x_C} = 14.051\quad \sigma_D = 3.559$$ For incongruent case (n = 24) : $$\overline{x_I}=22.016\quad \sigma_I = 4.797$$


In [4]:
fig=plt.figure(figsize=(7,5.5))
plt.subplot(221)
plt.hist(data["Congruent"], color="#D86E3F")
plt.xlabel('Time Scores for Congruent', fontsize=10)
plt.ylabel('Frequency', fontsize=10)
plt.subplot(222)
plt.hist(data["Incongruent"], color="#2088B2")
plt.xlabel('Time Scores for Incongruent', fontsize=10)
plt.ylabel('Frequency', fontsize=10)
plt.subplot(223)
plt.hist(data["Congruent"], color="#D86E3F",alpha=0.75, 
         label="Congruent")
plt.hist(data["Incongruent"], color="#2088B2", alpha=0.75, 
         label="Incongruent")
plt.xlabel('Time Scores', fontsize=10)
plt.ylabel('Frequency', fontsize=10)
fig.tight_layout()
plt.legend(loc=1,prop={'size':9})
plt.subplot(224)
data[["Congruent", "Incongruent"]].boxplot( return_type='dict', grid=False)
plt.ylabel('Time Scores', fontsize=10)
plt.xlabel('Type', fontsize=10)
plt.show()

Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.

  • The distribution of data for Congruent and Incongruent is shown above.
  • Observations rom the frequency distribution:
    • Most of the time scores for Congruent case is lesser than the Incongruent case with some overlapping data.
    • Both distribution have the highest frequency at 6 around the middle of each distribution,
      i.e. Mode of Congruent < Mode of Incongruent
    • Boxplot shows the median of congruent case lesser than the incongruent case with some outliers in the congruent case.
      i.e. Median of Congruent < Median of Incongruent

Now, perform the statistical test and report your results. What is your confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task? Did the results match up with your expectations?

  • Measuring the sample differences as $x_{D_i}=x_C{_i}-x_{I_i}$, we can report

    • mean $\overline{x_D} = -7.965$
    • standard deviation $\sigma_D = 4.865$
    • degrees of freedom $df = 23$
    • Standard Error of Mean $SEM = 0.993$
    • $t_{statistic} = -8.021$
    • For a two-tailed test @ $\alpha=0.05$, the critical t-value $t_{critical}=\pm 2.0687$
    • Correlation factor $r^2=.737$
    • $p-value<0.0001$
    • Confidence Interval $CI =(-10.019, -5.910)$
  • Since the $t_{statistic}$ fall outside critical value $t_{critical}$ for $\alpha=0.05$, the difference between two samples (congruent and incongruent) are significant i.e. not likely due to random chance. Alternatively, the probability of both samples the being same is less than 0.01%. Hence the null hypothesis is rejected.
  • We can say with a 95% confidence interval that the subject requires around 6 to 10 time-units less to identify congruent words than incongruent words.
  • Around 73.7% of data account for the difference in the two samples.
  • Since this is an experimental data, we can conclude that the time taken by subjects to identify the ink color of a word was significantly influenced by the match/mismatch with words represented them.
  • Yes, the results match with expectectaions.

Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect? Some research about the problem will be helpful for thinking about these two questions!

The verbal and visual centers of cognition in the brain seems to be linked. When there is a contradiction between them, the brain seems to take longer time to process information. It would be intersting to see if there is a difference in cognition time to identify words with swaped letters.