P1: Test a Perceptual Phenomenon - Background Information¶

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition

Questions For Investigation¶

What is our independent variable? What is our dependent variable?¶

Independent variable: Based on the color of the words displayed, two cases are identified namely Congruent and Incongruent. THe identification of the corresponding case is the independent variable.
Dependent variable: time to identify the color of words corresponding to congruent or incongruent case.

What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? Justify your choices.¶

- Null hypothesis : Population mean of Congruent ($\mu_C$) and Incongruent ($\mu_I$) cases are equal. $$H_0: \mu_C - \mu_I =0$$
- Alternate hypothesis : Population mean of Congruent ($\mu_C$) and Incongruent ($\mu_I$) cases are different. $$H_A: \mu_C - \mu_I\neq0$$
Since the sample size $n < 30$, one sample two tailed t-test (for paired samples) with $\alpha = .05$ is proposed. This will determine whether there is a significant difference in the two samples namely Congruent and Incongruent cases. We don't know the population standard deviation, hence the Bessel corrected standard deviation of the sample should be used.
Assumptions made:
- We assume the distributions of dependent samples and their difference are normaly distributed (Gaussian).
- We assume the samples are randomly selected.

import csv
import numpy as np
import pandas as pd
from IPython.display import display
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv('dataset.csv') # read the data
display(data)

Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability.¶

Mean and Stand deviation for both cases are given.

For congruent case (n = 24) : $$\overline{x_C} = 14.051\quad \sigma_D = 3.559$$ For incongruent case (n = 24) : $$\overline{x_I}=22.016\quad \sigma_I = 4.797$$

fig=plt.figure(figsize=(7,5.5))
plt.subplot(221)
plt.hist(data["Congruent"], color="#D86E3F")
plt.xlabel('Time Scores for Congruent', fontsize=10)
plt.ylabel('Frequency', fontsize=10)
plt.subplot(222)
plt.hist(data["Incongruent"], color="#2088B2")
plt.xlabel('Time Scores for Incongruent', fontsize=10)
plt.ylabel('Frequency', fontsize=10)
plt.subplot(223)
plt.hist(data["Congruent"], color="#D86E3F",alpha=0.75, 
         label="Congruent")
plt.hist(data["Incongruent"], color="#2088B2", alpha=0.75, 
         label="Incongruent")
plt.xlabel('Time Scores', fontsize=10)
plt.ylabel('Frequency', fontsize=10)
fig.tight_layout()
plt.legend(loc=1,prop={'size':9})
plt.subplot(224)
data[["Congruent", "Incongruent"]].boxplot( return_type='dict', grid=False)
plt.ylabel('Time Scores', fontsize=10)
plt.xlabel('Type', fontsize=10)
plt.show()

Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.¶

The distribution of data for Congruent and Incongruent is shown above.
Observations rom the frequency distribution:
- Most of the time scores for Congruent case is lesser than the Incongruent case with some overlapping data.
- Both distribution have the highest frequency at 6 around the middle of each distribution,
  i.e. Mode of Congruent < Mode of Incongruent
- Boxplot shows the median of congruent case lesser than the incongruent case with some outliers in the congruent case.
  i.e. Median of Congruent < Median of Incongruent

Now, perform the statistical test and report your results. What is your confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task? Did the results match up with your expectations?¶

Measuring the sample differences as $x_{D_i}=x_C{_i}-x_{I_i}$, we can report
- mean $\overline{x_D} = -7.965$
- standard deviation $\sigma_D = 4.865$
- degrees of freedom $df = 23$
- Standard Error of Mean $SEM = 0.993$
- $t_{statistic} = -8.021$
- For a two-tailed test @ $\alpha=0.05$, the critical t-value $t_{critical}=\pm 2.0687$
- Correlation factor $r^2=.737$
- $p-value<0.0001$
- Confidence Interval $CI =(-10.019, -5.910)$

Since the $t_{statistic}$ fall outside critical value $t_{critical}$ for $\alpha=0.05$, the difference between two samples (congruent and incongruent) are significant i.e. not likely due to random chance. Alternatively, the probability of both samples the being same is less than 0.01%. Hence the null hypothesis is rejected.
We can say with a 95% confidence interval that the subject requires around 6 to 10 time-units less to identify congruent words than incongruent words.
Around 73.7% of data account for the difference in the two samples.
Since this is an experimental data, we can conclude that the time taken by subjects to identify the ink color of a word was significantly influenced by the match/mismatch with words represented them.
Yes, the results match with expectectaions.

Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect? Some research about the problem will be helpful for thinking about these two questions!¶

The verbal and visual centers of cognition in the brain seems to be linked. When there is a contradiction between them, the brain seems to take longer time to process information. It would be intersting to see if there is a difference in cognition time to identify words with swaped letters.

	Congruent	Incongruent	Diff_CminusI
0	12.079	19.278	-7.199
1	16.791	18.741	-1.950
2	9.564	21.214	-11.650
3	8.630	15.687	-7.057
4	14.669	22.803	-8.134
5	12.238	20.878	-8.640
6	14.692	24.572	-9.880
7	8.987	17.394	-8.407
8	9.401	20.762	-11.361
9	14.480	26.282	-11.802
10	22.328	24.524	-2.196
11	15.298	18.644	-3.346
12	15.073	17.510	-2.437
13	16.929	20.330	-3.401
14	18.200	35.255	-17.055
15	12.130	22.158	-10.028
16	18.495	25.139	-6.644
17	10.639	20.429	-9.790
18	11.344	17.425	-6.081
19	12.369	34.288	-21.919
20	12.944	23.894	-10.950
21	14.233	17.960	-3.727
22	19.710	22.058	-2.348
23	16.004	21.157	-5.153