| |
The Theory of Group Operating Characteristic Analysis in Discrimination
Tasks
Vit Drga
1999
A thesis submitted in fulfillment of the requirements for the degree of
Doctor of Philosophy in Psychology.
Victoria University of Wellington, New Zealand.
Overview
My thesis had two main parts to it: Part I was about the theory of GOC
analysis. Part II was about practical
extensions of GOC analysis. To get an overview of the thesis, read the separate
prefaces for Part I
and Part II. In
retrospect, the prefaces together read better than the thesis
abstract.
Part I: Group Operating Characteristic Analysis
Part I is concerned with the effects of observer inconsistency in
discrimination tasks, and how the effects can be removed within the context of
the Theory of Signal Detectability (TSD). Chapter 1 gives an overview of TSD,
with details of methodologies and measures of performance that are used in later
chapters. Chapter 2 describes models of observer inconsistency, and also mean
receiver operating characteristic analysis and group operating characteristic (GOC)
analysis as means for removing variability due to inconsistency. Chapter 3
describes transform-average GOC analysis, which is a generalisation of GOC
analysis that encompasses generalised mean ratings and arbitrary ordinal scaling
of a rating scale. Chapter 4 introduces the transfer function, which relates
values on a decision axis to values on a rating scale and shows how a transfer
function can be estimated from data. Chapter 5 provides a theory of GOC analysis
that incorporates the developments of previous chapters within a single
framework. Stochastic ordering is shown to be the key statistical property
needed in order for GOC analysis to remove the effects of inconsistency from
experimental data. If stochastic ordering holds, then GOC analysis works for
arbitrary transfer functions and arbitrary scalings of a rating scale.
Part II: Functions Of Replications Added
For a multiple-replication data set, GOC analysis may be used to minimise
unique noise effects and improve performance in a discrimination task. As more
replications are combined, performance improves as a function of replications
added (FORA). Stable empirical FORAs result from all combinations analysis (ACA),
where average performance is calculated over all possible GOC curves for a given
number of replications. A widely applicable FORA regression function is
introduced. Extrapolation of this function to an infinite number of replications
makes it possible to estimate asymptotic unique-noise-free performance, based on
a finite data set. Chapter 6 introduces a FORA regression procedure, which is
able to estimate known theoretical performance to better than two decimal
places. Chapter 7 applies FORA regression to an amplitude discrimination
experiment in which 100 replications were run. The very large data set makes it
possible to not only estimate asymptotic performance, but to estimate sample
statistics and error bounds of the asymptote. Chapter 8 shows FORA results for
four sets of experiments on frequency discrimination and amplitude
discrimination. FORA regression is shown to be very robust across experimental
paradigms, observers, types of stimuli, stimulus parameters, performance levels
and measures of sensitivity. Chapter 9 is a summary chapter.
Inconsistent decision making is a long-standing problem in psychophysics,
where decisions based on the same stimulus often differ across replications of
an experiment. Inconsistency is described statistically by the concept of unique
noise, the effects of which are removed by averaging ratings across replications
on a per-stimulus basis. A group operating characteristic (GOC) curve is a type
of receiver operating characteristic (ROC) curve based on the mean rating per
stimulus. GOC analysis is shown to improve task performance dramatically
compared to ROC analysis, and can recover theoretical ROC curves from noisy
data. This thesis presents a theory of GOC analysis showing why the procedure
works. It also develops transform-average GOC analysis, transfer function
analysis, and shows how to estimate unique-noise-free performance from a finite,
unique-noise-affected data set.
Transform-averaging of ratings (for example, by using geometric or harmonic
means) extends GOC analysis to include strictly monotonic increasing (s.m.i.)
transformations of rating scale data. Although s.m.i. transforms do not alter
ROC curves on any single replication, it is shown that they do alter GOC curves
because of unique noise. Nevertheless, GOC analysis may be transform-invariant,
apart from residual unique noise effects. Empirical evidence is given showing
how GOC performance improves towards theoretical performance regardless of the
particular rating scale that is involved.
A psychophysical transfer function is an s.m.i. mapping from a decision axis
onto a rating scale. Transfer functions underlie theoretical interpretation of
empirical ROC analysis, and it is shown how they can be estimated from empirical
data. The theory of GOC analysis incorporates both transfer functions and
transform-average GOC analysis under the same framework. The theory shows that
GOC analysis will work under arbitrary (and possibly unknown) transfer
functions, and under arbitrary ordinal scalings of a rating scale, but only when
a family of unique-noise-affected evidence distributions are stochastically
ordered on the decision axis. If stochastic ordering does not hold,
unique-noise-free GOC performance changes according to the scaling of a rating
scale. When that is the case, empirical results and subsequent theoretical
interpretation become somewhat arbitrary. This finding about
unique-noise-affected rating scales also extends to theoretical models that
incorporate unique noise. Without stochastic ordering on a decision axis, the
theoretical unique-noise-free ROC curve can change following an s.m.i.
transform of the decision axis.
GOC performance improves as a function of replications added (FORA). Stable
empirical FORAs result from all combinations analysis (ACA), where average
performance is calculated over all possible GOC curves for a given number of
replications. The logarithm of FORA increments is generally a linear function of
the logarithm of the number of replications, typically with r2>0.995.
This pattern implies a three-parameter data model that provided an excellent
description of FORAs from six different experimental projects. These projects
involved different aural discrimination tasks, experimental paradigms, decision
methodologies, individual observers, levels of performance, stimulus parameters,
and measures of sensitivity. Dozens of different FORAs followed the same
mathematical form - only the three parameters of the data model changed.
Extrapolation of a FORA to an infinite number of replications makes it possible
to estimate asymptotic unique-noise-free performance and its sample statistics
based on a finite data set. Empirical FORA analysis showed that the observer
with the best (unique-noise-affected) ROC performance was often not the observer
with the best unique-noise-free performance. This shows that unique noise can
generate deceptive results in psychophysics, but that its effects can be removed
by using GOC analysis.
Last updated
08 Nov 2009 04:37 PM
|