Subtleties in controlling confounds in inferential statistics

some surprising, some obvious in retrospect

Phillip M. Alday

29 April 2016, Uni Adelaide

When all you have is a hammer ...

Animate and inanimate words chosen as stimulus materials did not differ in word frequency (p > 0.05).

Controls and aphasics did not differ in age (p > 0.05).

Control and ecological validity in conflict

(Sassenhagen & Alday, under review at B&L)

What's the problem?

Animate and inanimate words chosen as stimulus materials did not differ in word frequency (p > 0.05).

Controls and aphasics did not differ in age (p > 0.05).

Where do I start?

Philosophy

  1. You can't accept the null in NHST, only fail to reject it.

Statistics

  1. You're violating testing assumptions because by design you did not randomly sample.
  2. You've performing inferences on a population you don't care about.

Pragmatics

  1. You're failing to perform the inference you actually care about.

Philosophy: Accepting the null.

Statistics: Getting useless information.

Random sampling

Populations vs. samples

Pragmatics: Testing what you care about.

What to do, what to do

I scream, you scream ...

Fresh off the presses


(DOI: 10.1371/journal.pone.0152719)

Arrows show causal true relationships

(All model diagrams from John Myles White)

Highlighting shows conditioning (in modelling)

But what if I didn't measure the actual causal variable?

Conditional probabilities and modelling (is so hard for frequentists)

Going against the arrow and not hitting a bright stop sign in time leads to confusion and trouble.

Simulated for typical data

(DOI: 10.1371/journal.pone.0152719.g002)

But all of this follows directly from the GLM

Standard GLM applications have a "vertical" error/variance term.


Y = β0 + β1X1 + ε

ε ∼ N(0, σ)

In other words, we assume:

  1. (Measurement) error/variance only occurs in the dependent variable.
  2. We manipulate the independent variables directly and without error.

But doesn't subjective temperature influence summertime fun?

Subjective handwaving is easy ...

... objectivity is hard

Modelling the full structure helps

The end is near ...

So what do we do in practice?

As always ...

You can find my stuff online: palday.bitbucket.org

The end is here.

If you have no questions about this stuff ....

We can discuss

We will not discuss

I really don't understand it either.

References

Clark, Herbert H. 1973. “The Language-as-Fixed-Effect Fallacy: A Critique of Language Statistics in Psychological Research.” Journal of Verbal Learning and Verbal Behavior 12: 335–59. doi:10.1016/S0022-5371(73)80014-3.

Judd, Charles M., Jacob Westfall, and David A. Kenny. 2012. “Treating Stimuli as a Random Factor in Social Psychology: A New and Comprehensive Solution to a Pervasive but Largely Ignored Problem.” J Pers Soc Psychol 103 (1): 54–69. doi:10.1037/a0028347.

Westfall, Jacob, and Tal Yarkoni. 2016. “Statistically Controlling for Confounding Constructs Is Harder Than You Think.” PLoS ONE 11 (3). Public Library of Science: 1–22. doi:10.1371/journal.pone.0152719.

Westfall, Jacob, David A. Kenny, and Charles M. Judd. 2014. “Statistical Power and Optimal Design in Experiments in Which Samples of Participants Respond to Samples of Stimuli.” Journal of Experimental Psychology 143 (5): 2030–45. doi:10.1037/xge0000014.