Contrast sensitivity in 1/f noise

Share Embed


Descripción

CONTRAST SENSITIVITY IN 1/F NOISE

Andrew Morgan Haun

A Dissertation Submitted to the Faculty of the Graduate School at the University of Louisville in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Department of Psychological and Brain Sciences University of Louisville Louisville, KY July 2009

ii

TABLE OF CONTENTS

LIST OF FIGURES

vii-viii

ABSTRACT

ix

1. INTRODUCTION

1-26

a. Spatial Vision

1

b. Contrast Sensitivity and Brain Structure

4

i. Physiology of Contrast Transduction

4

ii. Perceptual Contrast Sensitivity

6

c. Channel Theory

7

d. Natural Scenes and 1/f Amplitude Spectra

11

e. Measuring Contrast Sensitivity

14

i. Threshold-versus-Contrast Functions

14

ii. The Decision Variable

16

iii. Modeling Contrast Sensitivity

18

f. Broadband Contrast Sensitivity

2. CONTRAST TRANSDUCTION

20

27-62

a. Overview

27

b. Contrast Transduction in the Brain

31

c. Perceptual Transduction

34

i. The d’ Function

34

ii. Masked d’ Functions

36

d. Relating the d’ and Contrast Response Functions

38 iii

e. Spatial Models

39

i. Spatial Filters

39

ii. Tying w to Filter Bandwidth

43

iii. Using Spatial Filters in Simulations

45

f. Uncertainty Theory i. Uncertainty Models and Nonlinear Transducers

47 51

g. Adding External Noise

55

h. Summary

62

3. METHODS a. Overview

63-93 63

i. Assumptions and Stimuli

63

ii. Transducer Theory Review

65

b. Methods and Equipment

68

i. Methods for Threshold Measurement

68

ii. Equipment and Setting

69

c. Threshold Measurement

71

i. Obtaining Threshold Measurements

71

ii. Distributed Threshold Measurement

74

d. Stimulus Content

76

i. Stimulus Domain and Target Stimuli

76

ii. Pedestal Stimuli

80

iii. Mask Stimuli

80

iv. Combining Target and Mask Stimuli

82

v. Stimulus Generation, Normalization, and Contrast

85

e. TvC Paradigms

87

f. Subjects and Procedure

91

iv

4. RESULTS

94-119

a. Overview

94

b. Thresholds

95

c. Baseline TvC Functions

100

i. Dipper Functions

100

ii. Orientation

101

iii. Spatial Frequency

105

d. Masking Functions

109

e. Masked TvC Functions

113

5. ANALYSIS: THE S-F FUNCTION

120-146

a. Measuring TvC Functions

120

b. Parameterizing TvC Functions

121

i. Interpreting S-F Parameters c. Baseline and Masked TvC Parameters i. Simplifying Mask Effects d. Fixing Parameters Across Spatial Frequency

6. ANALYSIS: NOISE MASKING a. Noise in the Contrast Stream i. Adding Noise to a Contrast Transducer b. Application of a Model Observer

123 132 138 142

147-165 147 149 151

i. Modeled Sensitivity to Experimental Stimuli

151

ii. Testing Noise-Masking Predictions

155

c. Model Fitting Using Maximum Likelihood

162

d. Summary

164

v

7. DISCUSSION

166-184

a. What Kind of Stimulus is 1/f Noise?

167

b. Perimetry

169

i. Narrowband (‘Baseline’) Contrast Sensitivity

169

ii. 1/f Masked Contrast Sensitivity

173

iii. Orientation Effects

177

c. Gain Control or Noise Masking?

179

i. Effects of Gain Control and Surround Suppression

179

ii. Noise Masking

181

d. Summary

183

CONCLUSIONS

185

REFERENCES

186

vi

LIST OF FIGURES

1. INTRODUCTION

4.3

2. CONTRAST TRANSDUCTION 2.1 TvC and d’ functions p29 2.2 d’ function regimes p35 2.3 Contrast gain control p37 2.4 Observer simulation p44 2.5 Full-rectified noise masking p59 2.6 Half-rectified noise masking p61

4.4

3. METHODS 3.1 Simplified d’ functions p67 3.2 Trial schematic p72 3.3 Stimulus content p77 3.4 Target stimuli p79 3.5 1/f noise samples p81 3.6 Pedestal and mask stimuli p83 3.7 Stimulus content and samples p84 3.8 Stimulus conditions p88 4. RESULTS 4.1 Threshold measurement p97 4.2 Normalized baseline data p101

4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12

Oriented detection thresholds p103 Baseline TvC functions 1 p104 Baseline TvC functions 2 p107 Suprathreshold pedestal masking p108 Masking functions p110 Threshold/mask slopes p112 Masked TvC functions p114 Masked and baseline TvC functions p116 TvC function differences p117 Mean TvC function differences p118

5. ANALYSIS: THE S-F FUNCTION 5.1 S-F function schematic p124 5.2 S-F function node effects p126 5.3 All-free parameter fits p134 5.4 Baseline by masked parameters p135 5.5 Parameters by frequency p136 5.6 Parameters by orientation p137 5.7 Chi-squared values by model p139 5.8 Vary-z fits p140

vii

5.9 5.10 5.11

k-identity test for masking p141 Fixing parameters across frequency p144 k-identity test for spatial frequency p145

6.3 6.4 6.5 6.6

6. ANALYSIS: NOISE MASKING 6.1 Filters and simulated TvC functions p153 6.2 Simulated masking functions p154

Multiple fixed-mask results p156 Human and simulation comparison p157 Model comparison p161 PGC/PEN/vary r comparison p164

7. DISCUSSION 7.1 Uncertainty model p176

iii

ABSTRACT

Tests of visual ability were carried out in order to measure contrast detection and discrimination thresholds for narrowband spatial patterns. These thresholds were measured in isolation or against a background of 1/f broadband noise. This type of noise was used due to its similarities with the spectral structure of natural scenes, which typically have a frequencyamplitude relationship of a = 1/f. Several possible effects of 1/f noise on sensitivity were postulated: different forms of gain control might operate, shifting and/or depressing the perceptual response to contrast; the noisiness of the added patterns might increase the variability of the perceptual response, thereby affecting sensitivity; or there might be attentional effects whereby an observer’s ability to perform the task efficiently might be affected. Target stimuli were created at several spatial frequencies and orientations, in order both to measure these potential effects and to determine whether some might be more or less effective at certain stimulus ranges. The main finding is that contrast gain control can be considered the dominant factor in masking by 1/f noise; other factors, including stimulus noise and attentional state, appear to also be present, but seem to also be dependent on stimulus and experimental conditions. In the effort to describe these effects, various methods for interpreting contrast discrimination functions are created and applied to human data.

iii

1. INTRODUCTION

Spatial Vision

Human psychophysical contrast sensitivity is a topic about which much has been learned in the past forty years. The novel approach taken in the experiments described in this document is a focus on contrast sensitivity during viewing of images with naturalistic amplitude spectra. The distribution of contrast at different spatial scales is relatively regular across even a small sample of real-world images, and this regularity provides a way to reproduce 'naturalistic' visual conditions in simplistic visual stimuli. An understanding of the importance of this topic requires a review of the basics of spatial vision. Vision is an extremely complex phenomenon, accounting for a large portion of what a human brain is made to accomplish. Images themselves are impossibly complex signals, of virtually infinite variety 1. The ability of humans to discriminate amongst this variety is as of yet undefined, and it will certainly be many more generations before we can begin to understand just what the visual system is capable of seeing, a question which is parallel with that of how it sees. The study of contrast sensitivity is, like any other line of vision science, an effort at simplification of the visual process. All visual information must pass through a contrast stage, 1

Images are focused light patterns. The interface between light images and the human nervous system occurs in the photoreceptor layer in the retina, which is composed of around a million (106) rods and cones. Let’s assume that each photoreceptor is capable of signaling ten light levels, a gross over-simplification. The photoreceptor array then amounts to a set of dimensions in which images can be represented. In that case, at the interface between light and the nervous system, 10^(106) unique images can be represented by those photoreceptors: 10100*10100*10100. While not infinite, this is a vast space well beyond comprehension. Field (1994) discusses in similar terms the potentials for variation within image space, as well as the capacity for the visual system to handle this variation.

1

where individual neurons respond monotonically to contrasts of particular shapes and sizes. Contrast sensitivity, then, amounts to a front-end image filter in relation to the rest of the visual system, delivering visual information to diverse areas of the broader visual system tasked with encoding more complex and interesting forms of visual information. Of course, natural science works through simplification, breaking complex problems down into simpler ones which might afford solutions. Vision science, therefore, has focused on elemental stimuli, simple images defined (explicitly or implicitly) as those which are expected to stimulate a particular neural population and to bypass certain other populations. Since neural structures underlie behavior, such stimuli can be used to simplify visual behavior and make it more amenable to study. The problem here is that such simple images sometimes hold very little resemblance to real-world objects, which in their contrast structure are often equivalent to many dozens or even hundreds or thousands of simultaneously viewed ‘elemental stimuli’. This project was motivated by a desire to describe this stage of encoding in what could be seen as an ecologically valid context. Of course this effort is bound to fall short of such an objective since, as mentioned above, the variety of images is infinite. An effort to study the visual system must use stimuli which are defined in a way which can be understood and used by the experimenter; such definitions must be simpler than the potential variance of the signals being modeled. The effort is perhaps most difficult when the object of study is early vision: the retino-geniculate-striate pathway must carry all visual information in just a few broad streams, and so must impose a particular ‘shape’ on that information which fits the needs of more narrowly defined visual areas deeper into the brain. In this process, much information present in the stimulus image is lost, deemed as redundant or unimportant by different segments of the filtering aspect of the system. This presents something of a paradox for the study of early vision,

2

in that the ideal stimulus for early vision, or ‘spatial vision’, is virtually undefined by experimenters. So, it is necessary to regress to those elemental stimuli, which although they bear no resemblance to naturalistic stimulation can nonetheless be tied directly to processes known to occur in the brain. One of many important differences between the stimuli typically used in spatial vision experiments and those encountered on a moment-to-moment basis by normal observers is that real-world imagery is broadband, composed of contrasts at many scales. Because of what has been learned of the structure of the visual system, spatial vision stimuli are almost always narrowband, sinusoidal patterns. These stimuli are designed to stimulate narrow subsets of available visual mechanisms, so that these can be studied in isolation. However, since in the real world these visual mechanisms (often referred to as ‘channels’) are usually not operating in isolation, we may question the ecological validity of many of the conclusions arrived at through traditional spatial vision studies. For example, the well-known ‘contrast sensitivity function’ (Schade 1956; Wilson, McFarlane and Phillips 1983) purports to describe the bandwidth of the visual system, but is measured with elemental stimuli like sinewave gratings. The study described in this document takes a different approach, beginning at the notion of the narrowband contrast stimulus which is de rigueur in spatial vision research, and not without good reason. Real-world contrast stimuli are broadband, the equivalent of many narrowband images with different properties combined together in a single location. A convenient fact of mathematics and nature allows that the broadband quality of real-world images is something of a content-neutral property that can be easily reproduced in a visual stimulus with structural properties (e.g. ‘meaningfulness’) independent of its broadbandness. That is, regardless of its visual structure (i.e. whether it depicts an object, a scene, or otherwise)

3

an image may be constructed with any arbitrary amplitude spectrum (i.e. bandwidth), including one which reproduces the regular broadband contrast distribution encountered in real-world images. In the experiments described in this study, all stimuli have randomly generated broadband visual structure, leaving only two major factors to be considered in an analysis of broadband contrast sensitivity: stimulus noise and stimulus contrast. To be sure, the use of broadband noise is not a panacea to the problem of making the study of spatial vision ecologically valid. What it does afford is a bridge for connecting some of the qualities of broadband vision to much of what has been studied regarding contrast perception over recent decades.

Contrast Sensitivity and Brain Structure

Physiology of Contrast Transduction

An observer’s ability to detect differences in luminance across space constitutes his contrast sensitivity. Since contrasts (in luminance as well as wavelength) are what constitute the spatial variegation to which a visual system is exposed, it is only natural that they should be primary in visual coding. Contrast coding begins in the retina, with ganglion cells which pool photoreceptor responses 2 across space in a concentric pattern which gives them selectivity for, essentially, spots of a certain size (Enroth-Cugell & Robson 1966). ‘Center-surround’ organization is the rule with ganglion cells, where stimulation of the central area of a cell’s receptive field causes a change in response (e.g. excitation), and stimulation of the surround area 2

By way of an intermediary network of neurons whose properties contribute to the selectivity of the ganglion cell, to its and the photoreceptors’ adaptive states, and other response properties which are very interesting in their own right; but, which are not relevant here.

4

causes an opposite change (e.g. suppression). Ganglion cell responses are carried along the optic nerve to the lateral geniculate nucleus (LGN) in the thalamus where new visual input is quickly brought up to speed and integrated with whatever else the brain happens to be doing at the moment. Some LGN neurons acquire a faint orientation selectivity, but this is nothing compared to what happens at the next level, in the first area of visual cortex. In primary visual cortex (called V1 in humans and other primates), neurons pool responses 3 in more complex, collinear patterns which afford them more specific tuning in orientation and spatial frequency (Hubel and Weisel 1962, 1968). Neurons at this level have also apparently consolidated or begun to consolidate information about motion, color, binocular disparity, depth, and other properties. From primary visual cortex visual information spills into a continually branching network, feeding different sorts of image-derived information to different parts of the brain. Primary visual cortex, the most significant point of departure for image information in the visual system, is where perceptual contrast sensitivity seems to be determined. Prior levels of visual processing, from retina to thalamus, also involve contrast encoding, with increasing degrees of convergence and signal amplification at each level. It seems as if it is at V1 where this encoding ends and the result becomes available to the behavioral observer (Graham 1988), particularly since from a physiological stance all the information necessary for behavioral contrast detection seems to be available (Geisler and Albrecht 1997; Chirimuuta and Tolhurst 2005; Goris, Wichmann, and Henning 2009). We may see V1 as the end of a chain of amplifier processes (Watson 1990), since individual V1 neurons seem to be more sensitive to contrast than

3

Patterned level-to-level pooling was originally the means by which cortical neurons were thought to acquire sensitivity, since it is the easier means to visualize (Hubel and Weisel 1962). However, it is now thought that this pooling actually affords only a rather weak orientation selectivity to a neuron, with much selectivity arising through interneuron suppression and facilitation within the cortex (see Ferster and Miller 2000).

5

neurons located in prior stages (Sclar, Maunsell and Lennie 1990). Beyond V1, visual areas become more sensitive to the spatial structure of contrast, and less concerned with transduction, so V1 seems to be the peak of contrast encoding per se.

Perceptual Contrast Sensitivity

To measure the subjective aspect of contrast sensitivity, as with any experiment designed to measure some perceptual operation, an observer is given instructions as to how to use his working and long term memory to perform and remember the task he is given, how to use his fingers to respond by pressing some buttons, and how to search for a target. The abilities measured in such a task are in such qualitative register with the properties of V1 neurons that it is normally thought that they are largely one and the same, with this thinking derived from various theoretical and empirical bases (Graham 1988; Chirimuuta and Tolhurst [Bayesian]; Ross and Speed 1991 [VEP and psychophysics]; Boynton, Demb, Glover and Heeger 1998 [fMRI and psychophysics]; Kwon, Legge, Fang, Cheong, and He 2009 [fMRI and psychophysics]). Orientation and frequency bandwidths for contrast sensitivity are qualitatively similar (see Channel Theory, below) for human and animal observers and for V1 neurons, and cortical response as a function of stimulus contrast closely follows the d’ function for contrast (Boynton et al 1998; Kwon et al 2008). It therefore seems reasonable to simplify the physiology of contrast sensitivity in this way: having gotten his instructions from the experimenter, the observer in a contrast sensitivity experiment uses his primary visual cortex, in conjunction with his memory and his fingers, to carry out his task.

6

This may in fact be an over specified view of what is occurring during a contrast sensitivity experiment, since we still are not certain whether the rest of the brain truly is functionally transparent to what occurs in V1. Nothing about a psychophysical experiment, in itself, can reveal the absolute structure of the brain in any more than an abstract sense. The source of behavior is a black box (the brain), or a set of black boxes (memory, vision, action, etc.), and the subjective experience of behaving is also rather opaque: it is in some ways as difficult, or more so, to describe the structures underlying your own behavior, as it is to experimentally probe the behaviors of another person. The interpretive advantage, lending confidence to the statements in the previous paragraph, lies in studies not of behavior but of brain and neuron function. Brain imaging and single-unit neuron recording techniques have both produced reams of evidence suggesting that what V1 appears to do with images is, in many ways, fundamentally similar with what an observer does in a contrast sensitivity experiment. This evidence is outlined in the following sections.

Channel Theory

One important property which has led psychophysicists to this conclusion is the selectivity of visual sensitivity. Physiological studies established early on that V1 neurons possessed receptive fields 4 which were best-fit by a pattern of specific spatial frequency and specific orientation (i.e. they could be drawn as elongated positive-negative ‘lobes’). This in combination with the spatially localized nature of the receptive fields means that the typical V1

4

The spatial pattern of excitatory and inhibitory regions which affect the response of a visual neuron is normally referred to as that neuron's 'receptive field'. Today the prefix 'classical' is often appended, since under various conditions modulatory effects on a neuron's response can be detected which do not fit nicely into the concept of a spatial pattern of excitatory and inhibitory regions.

7

neuron actually responds to a range of spatial frequencies and orientations, in an apparently nearideal tradeoff in spatial and spectral localization (Daugman 1989). The typical spatial frequency bandwidth, measured as full-width at half-height of the tuning function, of visual cortical neurons is about 1½ octaves 5 (in cats: Movshon, Thompson and Tolhurst 1978; and in monkeys: De Valois, Albrecht and Thorell 1982), with a central peak frequency, so that spatial frequencies at up to ±¾ octaves will stimulate the neuron to differing degrees (spatial frequencies beyond will have no effect; in the spatial domain, this means that just over 1 full cycle of the peak frequency is represented by the excitatory-inhibitory pattern of the receptive field). These bandwidths appear to narrow somewhat with increasing spatial frequency (this also means that receptive fields must grow larger somewhat, acquiring more well-defined flanking ‘lobes’ with increasing spatial frequency). Orientation bandwidths tend to be around 45° from side to side, peaked at a central orientation (in cats: Wilson and Sherman 1976; and in monkeys: Parker and Hawken 1988), with these bandwidths also narrowing a bit with increasing frequency (meaning that receptive fields lengthen somewhat with increasing frequency; a V1 receptive field tends to be longer along its preferred orientation than it is wide; Foley 2007 reviews physiological and psychophysical evidence to this effect). Measuring neuron bandwidths is straightforward enough: a neuron is located and recorded from using an electrode while patterns of light are shown to the animal owning the neuron, until a 'best' pattern is found. Psychophysical measurements are by definition indirect, and the presence of selectivity must be inferred by combining stimuli in clever ways in order to deduce the existence of internal mechanisms. Measurements of psychophysical sensitivity bandwidths are known largely through three techniques: contrast masking, subthreshold summation, and contrast adaptation: as it happens, these techniques reveal perceptual 5

An octave is a doubling in frequency.

8

mechanisms with similar bandwidths as cortical neurons: between 1 and 2 octaves in spatial frequency, and up to 40° in orientation. Masking refers to the decrease in contrast sensitivity that can be caused by superimposing a target pattern (the one to be detected) with another pattern. Early studies described bandwidth by testing sensitivity for a sinewave grating with fixed orientation and frequency, while varying the spatial frequency (Wilson et al 1983) or orientation (Campbell and Kulikowski 1967) of a second, high-contrast grating added to the first. If the second grating had very similar frequency or orientation with the first, sensitivity would be poorer. Using such a technique orientation masking bandwidths were first tested (Campbell and Kulikowki 1967), with bandwidth estimated to be around 30°. Later masking studies yielded similar results, and suggested that bandwidth narrows with increasing spatial frequency (Phillips and Wilson 1984; Blake and Holopigian 1985). Spatial frequency bandwidths are well-described through similar means: many studies have confirmed that the spatial frequency bandwidth of masking is around 1.4 octaves (Stromeyer and Julesz 1972; Wilson, McFarlane and Phillips 1983), broadening a bit toward lower frequencies and narrowing toward higher frequencies. Contrast adaptation is a decrease in sensitivity following prolonged viewing of a contrast pattern. Viewing a single-spatial frequency grating pattern results in adaptation only to that and nearby frequencies; again, the bandwidth of this effect has been shown to be around 1½ octaves (Pantle and Sekuler 1968; Blakemore and Campbell 1969). Orientation bandwidths measured by the same methods are similar to those measured through masking: around 40°, narrowing toward higher frequencies (Movshon and Blakemore 1973; Snowden 1991, 1992). Subthreshold summation involves measuring an observer's sensitivity to a pattern whose contrast is maintained just below the detection threshold, while another below-threshold pattern

9

with different orientation or spatial frequency (or other property) is added. The rationale is that if a mechanism is sensitive to both patterns, the sum will be more detectable than its parts; if the two patterns are sensed by two mechanisms with different selectivities, summation will not occur, and the transition between summation and lack thereof can be taken to describe the bandwidth of a mechanism. Studies of spatial frequency (Sachs, Nachmias and Robson 1971; Kulikowski and King-Smith 1973; Quick and Reichert 1975) and orientation (Kulikowski, Abadi, and KingSmith 1973) bandwidths revealed results similar to (or artifactually narrower than) masking and adaptation studies. Given these findings, vision science has taken up a general multiple-filter model within which subsequent findings in spatial vision can be described (e.g. Wilson, McFarlane and Phillips 1983; Watt and Morgan 1985; Watson and Solomon 1997). Such theories take the position that spatial vision consists of a set of narrowband filters, or channels, consisting of receptive field-like detector patterns distributed across the visual field: this conception of spatial vision is very similar to what many assume to be the job of the primary visual cortex. Although the original conception of channel theory was of a set of independent spatial mechanisms, it is now recognized that channel outputs are subject to attention (Carrasco, Penpeci-Talgar, and Eckstein) and spatial context (Chen and Tyler 2008; Polat and Sagi 1993). The outputs of these variously tuned channels can be combined or adjusted relative to one another by parallel or feedback processes in order to facilitate the emergence of visual features like contours, textures, and surfaces.

10

Natural Scenes and 1/f amplitude spectra

A ubiquitous property of natural images (and natural signals in general) can be found in their (Fourier) amplitude spectra. A typical image will be composed of spatial frequency components with amplitudes which decline as a proportional function of frequency; so, two components will have amplitudes a whose ratio is similar with the inverse ratio of the component frequencies f : a1 a 2 ∝ ( f1 f 2 ) , or simply a ∝ f −1

−1

(Carlson 1978; Burton and Moorhead 1987;

Field 1987). The 1/f property of real-world imagery has therefore become a focus of spatial vision researchers (e.g. in Field 1987; Tolhurst and Tadmor 1997; Parraga, Troscianko, and Tolhurst 1999; Hansen and Hess 2006), whose standard model is to a great degree explicated in the frequency domain. Field (1987) noted that a 1/f amplitude spectrum would contain an equal amount of contrast in each multiplicative frequency interval, which is to say that an image with a 1/f spectrum scales. For example, a 1/f image (i.e. a spatial domain image corresponding to a 1/f amplitude spectrum) will have an equal amount of contrast within any octave spatial frequency interval (say, between 1 and 2cpd and between 7 and 14cpd). Field's theories have been influential in suggesting that the structure of the visual system, with its more-or-less constant spectral bandwidth across frequency, is indicative of an adaptive fit to the visual environment. However, the vast distribution of image types has made formulation of a 'natural scenes'-based vision science difficult, since no standardized natural scene-based imagery can be created to rival the theoretical simplicity of the sinewave grating stimulus. A compromise has been found in the use of random noise which has the 1/f property (Brady and Field 1995; Field and Brady 1997;

11

Schofield and Georgeson 2003; Henning, Bird, and Wichmann 2002; Essock, DeFord, Hansen, and Sinai 2003). Noise stimuli retain the spectral quality demanded by the experimenter, and can be varied in their spatial structure (albeit randomly) without introducing early-vision attentional confounds such as recognizable objects. Whether noise is indeed a content-neutral stimulus or whether it in fact represents a special category of imagery which must be explained per se (i.e. does visual noise provoke unique higher level or attentional responses?) is a question too far, which must be mentioned here but will not be addressed again until the final section of this document. The channel theory of spatial vision supposes that whatever the structure of a visual stimulus, presuming that the spectral content is the same the response of the earliest layers of the visual system will be much the same (cf. Grossberg and Mingolla 1985; Graham and Sutter 1998). Interesting or recognizable structure attracts attention, and stimulates pattern formation circuits in visual cortex (Li 2000), but these serve only to modulate sensitivity. The advantage of using stimuli with no meaningful structure is that these naturalistic modulations are largely avoided, and so all we are left with is the broadband spectral force of the spatial image. Again, we return to the notion that contrast sensitivity appears to be arranged in such a way that 1/f structure is effectively processed, and in that sense 1/f noise is an appropriate stimulus to investigate how the visual system responds under a broadband load, in the relative absence of higher order response modulations and attentional effects. Using broadband 1/f noise as a visual stimulus, Essock et al (2003) showed that contrast sensitivity for broadband structure is poorest when the structure was horizontally oriented, and best when it was obliquely oriented (the 'horizontal effect'). This result conflicts with the typical contrast sensitivity finding with oriented stimuli: using narrowband stimuli such as sinewave

12

gratings, sensitivity is normally best for vertical and horizontal stimuli, and poorest for the obliques (the 'oblique effect', Appelle 1972). Essock et al suggested that the reason for both phenomena lay in an intrinsic neural bias, where more neurons exist tuned to horizontal and vertical than to oblique orientations (e.g. see Li, Peterson, and Freeman 2003). This bias was proposed to exist both in neurons underlying detection or transmission of contrast information, and in neurons mediating normalization or gain control processes which affect sensitivity. Essock et al proposed that the oriented normalization effect only occurred in the presence of a broadband stimulus, hence the difference in broadband and narrowband detection tasks. Each cortical neuron (or psychophysical mechanism) might be associated with a local gain control pool (Heeger 1992; Foley 1994; Schwarz and Simoncelli 2001), which inhibits the neuron or mechanism if the pool is sufficiently stimulated by nearby frequencies or orientations. As Essock et al’s account goes, in the case of single grating stimuli, the gain control pool is essentially quiet, since it is presumably rather broad and is not sufficiently stimulated by spectrally sparse stimuli. On the other hand a broadband stimulus simultaneously stimulates many visual neurons or mechanisms, and thereby provides sufficient input to the local gain control pools (Hansen, Essock, Zheng, and DeFord 2003). The functional significance of contrast gain control processes has been widely discussed, and it is commonly assumed that gain control mechanisms are adaptive, working to normalize neural responses to stimulus contrast, preventing saturation to intense stimuli and promoting sensitivity to weak stimuli. Since gain control processes are supposed to affect the neural response level to a stimulus, they must be expected to affect perceptual sensitivity to contrast (a strong link can be shown between behavioral sensitivity and fMRI measures, e.g. Kwon et al 2008, Boynton et al 1998; and VEP, Ross and Speed 1991). In fact, many researchers have purported to show gain control

13

effects in contrast sensitivity and discrimination, by measuring d’ functions for ‘target’ contrast stimuli alone or in the presence of additional ‘masking’ contrast stimuli (Greenlee and Heitger 1988; Wilson and Humanski 1993; Foley 1994; Watson and Solomon 1997; Meese and Holmes 2004). Addition of mask stimuli tends to have the effect of depressing or shifting d’ functions (depending on the experiment design; see Haun and Essock 2009), a result which is in broad agreement with the gain control hypothesis. The experiments detailed in this dissertation aim to describe whether this 'gain control' theory of the horizontal effect is accurate, by linking together Essock et al's broadband 1/f stimuli with the body of methods and knowledge accumulated by the more traditional grating-based approach to spatial vision. They also aim to determine whether the gain control model is appropriate in interpreting contrast detection or discrimination data gathered from 1/f noise stimuli, since other factors, particularly the random aspect of the stimuli, may have qualitatively similar effects on sensitivity.

Measuring Contrast Sensitivity

Threshold-versus-Contrast Functions

The experiments included in this project all involve measurements of threshold-versuscontrast (TvC) functions. The standard paradigm goes as follows: contrast sensitivity is measured for a particular target, while the contrast of a background pattern is methodically varied by the experimenter. The background may be identical to the target, in which case it is usually termed a ‘pedestal’. The background may be different from the target, in which case it is usually termed a ‘mask’ and the measured function will normally be called a ‘masking function’.

14

The normal finding is that as pedestal contrast is increased from zero, sensitivity to the target stimulus at first increases, until the pedestal contrast is about equal to the target’s detection threshold. At this point, as pedestal contrast is increased sensitivity to the target stimulus steadily decreases. The TvC function produced by a variable-pedestal experiment therefore has a dipper-like shape, with threshold facilitation for low-contrast pedestals, and threshold elevation for high-contrast pedestals. When the background is not similar to the target, an ordinary finding is that threshold elevation begins soon after the background exceeds its own detection threshold, with no preceding facilitatory dip 6 (Ross and Speed 1991; Foley 1994). TvC (and masking) functions are useful empirically because they can be used to reconstruct an observer’s internal ‘sensitivity function’ for a particular type of stimulus, which is then tied directly to the perceptual response to the stimulus (Cannon and Fullenkamp 1991; Levi, Klein and Chen 2005). There are two principal mysteries here, both of which can be addressed through measurement of TvC functions. First, we want to know the difference between ‘isolated channel’ contrast sensitivity and broadband contrast sensitivity. Are perceptual responses to contrast amplified or suppressed while viewing naturalistic spatial structure? Do orientation anisotropies alter significantly? Does the spatial frequency of contrast structure affect how well it is seen in 1/f spatial structure? Measuring TvC functions across the visible spatial spectrum, both with and without a masking pattern of 1/f noise, can go a long ways in answering these questions. Second, TvC functions can be modeled from several different perspectives, some of which may be more or less appropriate in a 1/f noise-masking context. Modeling allows us to parameterize data in potentially meaningful ways, so that in this case it can be speculated as to what mechanisms cause changes to sensitivity as a result of adding 1/f noise to a stimulus. 6

Under some circumstances, dipper functions are obtained for masks which appear to be rather different from the target stimuli (e.g. Meese et al 2007). However, in these cases the mask may still have sufficient ‘similarity’ or spectral overlap with the target that the conditions for producing a dipper-like function are still met.

15

Attentional factors may be affected, as well as contrast gain control mechanisms and visual mechanisms meant, perhaps, for excluding external noise. In the next section, the basic psychophysical underpinnings of detection and discrimination are discussed. The internal, perceptual response to contrast, which is used by observers in a task which measures TvC functions, is a theoretical construct, to which many difficult issues are attached. Though threshold measurement is straightforward, the perceptual quantity being measured has historically been difficult to understand, though some grasp of its nature is necessary if we are to understand what, exactly, is being measured and parameterized in experiments such as the ones described here.

The Decision Variable

In a contrast sensitivity experiment, what is being measured is the observer’s ability to respond to a stimulus. Psychologists and those of us capable of introspection will assume that these voluntary responses are secondary responses, made by the observer according to the form of an involuntary sensory response, the observer's ‘decision variable’. This internal response may be produced by an interaction between the signal and the visual system, or it may be a random fluctuation which is independent of the signal. According to (an absolute application of) signal detection theory, if a signal is present, however small, there is a corresponding response; also, there are always noises present in any system, which if independent of the signal level will combine with the response additively. The relative contribution to perception of these two factors is normally described as the signal-to-noise ratio (the sensitivity statistic d'; Green and

16

Swets 1966) 7, and this ratio can be measured by measuring an observer’s behavioral responses to stimuli. The ability of an observer to consistently respond to the presence of a stimulus, to detect it, indicates a relatively high d', where the signal-dependent response level regularly exceeds the noise level. Inability to detect the stimulus indicates a low d', where the response is insignificant in comparison with the noise level. As the contrast of a pattern is increased from zero, the relationship between contrast and the decision variable d' is expansive (Nachmias and Sansbury 1974; Stromeyer and Klein 1974; Foley and Legge 1981), corresponding to the near-threshold facilitation described in the previous section. The decision variable increases a little faster than the square of contrast until the threshold is surpassed (i.e. until the signal-dependent response exceeds the noise level). This relationship then inverts, and the internal response becomes compressive (The ubiquitous ‘law of diminishing returns’; Campbell and Kulikowski 1967; Nachmias and Sansbury 1974; Legge 1981), corresponding to the aforementioned increasing threshold elevation with stimulus contrast. A given d' level is associated with a certain stimulus intensity, this relationship forming a d' function (described later in this text; the form of the function was first described by Stromeyer and Klein 1974); and differences in this d' level correspond to differences in stimulus intensity. Near threshold, this is intuitive: d' goes from a value of 0 when stimulus intensity is far below threshold to a value around 1 or 2 as ‘detection’ sets in, and this transition is measurable. Higher intensities are harder to understand in this context, since subjectively they seem ‘perfectly’ visible (i.e. subjectively they seem to have a d' at infinity). However, by signal detection theory even high intensities are only probabilistically detectable, so that a given suprathreshold intensity may in theory produce a d' value of, say, 10, although this performance level would be 7

d' is the difference in an observer’s hit rate and false alarm rate in units of the standard deviation of the decision variable noise distribution (z-scores). In the two-alternative forced-choice paradigm, d' corresponds directly to the observer’s hit rate.

17

impossible to measure in practice (corresponding to a probability-of-seeing higher than 99.999%!), requiring hundreds of thousands of experimental trials. Despite the impracticability (or incomprehensibility, if the absolute application of SDT is rejected) of measuring suprathreshold sensitivities directly, they can be inferred by measuring an observer’s ability to discriminate between two intensities. If for two suprathreshold stimuli an observer’s d's are, respectively, 9 and 10, his ability to discriminate between them will be described by a d' of 1, which is easily measurable. If a constant d' value is sought at many different stimulus intensities, a threshold versus contrast function is obtained, which can then be used to reconstruct the suprathreshold d' function. This method is now the standard means of estimating the relationship between internal response and stimulus intensity (e.g. see Ross and Speed 1991; Foley 1994; Meese and Holmes 2002; Yu, Levi, and Klein 2003; Huang and Dobkins 2005; Ling and Carrasco 2006; Bex, Mareschal and Dakin 2007).

Modeling Contrast Sensitivity

The shape of the d' function is well-described. Its nature is still something of a mystery, as d' is simply a representation of a signal-to-noise ratio. So, it is very difficult to know what different parts are played by the signal and noise components of the observer's internal response. The simplest route is to take the d' function at face value, and assume that there is a constant noise level against which an internal stimulus-dependent signal first increases expansively for low contrasts and then compressively for high contrasts. Although the most convenient forms of contrast sensitivity modeling take this approach (e.g. Meese and Summers 2007; also see Pelli

18

1991 8), evidence for it has been scarce. Meanwhile, it is taken to be a null hypothesis, and those interested in the question have set out to show cases where the internal noise level changes with stimulus contrast and, therefore, with internal response level (Pelli 1985; Kontsevich, Chen, and Tyler 2002; Gorea and Sagi 2001; Katkov, Tsodyks and Sagi 2007; Dosher and Lu 1999; Levi, Klein and Chen 2005). The different features of the d’ function are accounted for elegantly by the parameters of the following equation, proposed by Legge and Foley (1981) 9: d′ =

r ⋅ (c )

p

z p − q + (c )

p −q

Here p describes the rate of d’ expansion, q the rate of compression, r indicates the magnitude of the function, and z is a contrast value near the threshold. This model is explored in detail in Transducer Theory, but it suffices to say that while it is a convenient means of parameterizing contrast discrimination data, it is still merely descriptive, and was developed independently of other investigations into the neural and psychophysical processing of contrast. It is therefore something of a black box, one that accurately describes detection and discrimination, but which in itself affords little additional information. An attempt is made in this project to impart considerations of external and internal noise, and other structural factors likely to be involved in visual processing, to this model.

8

Pelli reasoned that the main source of noise in the visual system should be photon noise; since photon noise is Poisson, it is dependent on the mean luminance level and so should be relatively independent of stimulus contrast. 9 An earlier, considerably less elegant, but conceptually equivalent set of equations was supplied by Stromeyer and Klein (1974), and so this function in some quarters is known as the Stromeyer-Foley, or S-F function.

19

Broadband Contrast Sensitivity

The experiments described here were designed to measure contrast transduction in the human visual system as a function of spatial frequency and orientation. ‘Transduction’ in this case refers not to the transformation of luminance contrast into neural response; rather, it refers to the transformation of luminance contrast into a salient, subjectively perceived entity. This psychophysical construct, salience, is identified with the signal-to-noise ratio, commonly called d΄, used by the observer in making judgments during a psychophysical task. The transduction process is measurable through a number of distinct methods, including the method to be used in these experiments: the contrast discrimination task. The pattern of transduction as a function of spatial frequency (and orientation) is a complex structure, since transduction is not a simple (e.g. linear) process; but, it is possible to describe it rather precisely through mathematical parameterization. The unique aspect to these experiments is that they will describe this d΄ structure during stimulation of the visual system by a field of 1/f spatial noise. 1/f describes the property of a Fourier spectrum containing components whose amplitudes decline with increasing frequency in a manner proportional to frequency. A 1/f amplitude spectrum is characteristic of real-world imagery, which has made it a statistic commonly used by vision scientists wishing to create stimuli which have similar properties with naturalistic imagery 10. So, to the extent that 1/f spatial noise can be treated as a ‘naturalistic’ spectral stimulus, these experiments aim to describe the transduction space of human vision during spectrally naturalistic viewing conditions. Over 10

I use ‘naturalistic’ to mean ‘not in an experimental setting’, meaning what the human visual system sees from moment to moment in its normal, natural environment. I do not use it to mean ‘images from nature’, i.e. excluding evidence of human presence; the human environment naturally includes evidence of human presence. At any rate, the distinction is not necessarily relevant here. Further specifiers as to ‘image category’ will be introduced when and if necessary.

20

the past two decades studies attempting to show links between basic spatial vision capacities and the statistics of natural scenes have become more and more common as computing power has increased, augmenting the ability of researchers to both analyze large numbers of naturalistic images (i.e. photographs) and to use such images, or components of them, as stimuli in vision experiments. Much is known about contrast transduction when the stimuli are distinct and simple, easily described patterns based on sinewave gratings. Using gratings as stimuli, contrast sensitivity (in the form of detection thresholds) as a function of spatial frequency follows a well known pattern, commonly called simply the contrast sensitivity function, or CSF. The CSF is obtained by separately measuring an observer’s detection threshold for sinewave gratings of many different spatial frequencies against an otherwise blank field of some constant luminance. If the CSF is measured against a background identical with the target pattern, rather than against a blank field, the shape of the CSF will become flatter and flatter the higher the contrast of the background, though the acuity limit will remain as it represents a physical limit of the visual system. This flattening of the CSF has been linked to another finding, that of ‘contrast constancy’ with frequency. If observers are asked to estimate 11 the apparent contrast of a pattern of some spatial frequency, their estimates will become more and more similar as a function of spatial frequency the higher the contrast of the pattern, despite potentially large differences in their ability to detect the different patterns. An example is in order: say we have two sinewave gratings, one at 4cpd (near the normal peak of photopic contrast sensitivity), and one at 20cpd (between peak sensitivity and the acuity limit, where sensitivity is poorest). If both gratings are

11

Several methods, none ideal, are available for such a task. Luckily they seem to be in close agreement on this finding.

21

set to a contrast near the 4cpd detection threshold, the 4cpd grating will be faintly visible, and the 20cpd grating will be invisible. If both gratings are set to a contrast near the 20cpd detection threshold, the 4cpd grating will be clearly visible, while the 20cpd grating will appear faint. If the contrast level of the two gratings is increased above this level, the apparent contrast of the 20cpd grating will soon appear to ‘catch up’ with the intensity of the 4cpd grating. So, as contrast is made greater than the detection threshold, apparent contrast is less and less dependent on sensitivity and spatial frequency. This discussion of apparent contrast is not detached from the topic of contrast transduction, because as I implied in the first paragraph of this section apparent contrast, or salience, is actually the product of transduction. That is, we can assume that as stimulus intensity is increased, an internal d΄ level increases, and that this internal d΄ level (or a nearly linear function of it) is interpreted by the observer as the intensity of the stimulus. This assumption is plausible because of the close agreement between contrast discrimination studies and apparent contrast studies. All indications are that contrast transduction is mostly flat as a function of orientation and spatial frequency, with a CSF-shaped ‘front’ corresponding to the contrast detection threshold. The hypothesis behind these experiments is that transduction is reconfigured into a different shape when the visual system is loaded with a 1/f stimulus. Results obtained in an earlier experiment provide compelling evidence that contrast transduction is significantly altered during viewing of 1/f noise (Haun and Essock 2009). In that study, contrast sensitivity was tested for oriented bands of spatial noise, either alone (against a blank, mean luminance background) or when masked by a 1/f broadband noise pattern. They found that in the presence of 1/f noise, masking decreased with target spatial frequency, at frequencies masking was greater

22

for horizontal than vertical targets, and that at high frequencies it was greatest overall for horizontal targets. Certain features of their results, described in the next chapter, implicated a gain control process in these anisotropic masking effects. This finding was consistent with Essock et al’s earlier supposition that the horizontal effect of broadband contrast perception was due to anisotropic response normalization, through a contrast gain control mechanism. These experiments aim, in part, to determine the extent to which the gain control model can reasonably be applied to 1/f noise stimuli, and to what extent other factors, such as noise masking and attentional uncertainty, must also be considered. It is necessary here to go through a brief introduction to the notion of response normalization and contrast gain control. First, for an understanding of how these processes might be occurring, consider what is known of the visual mechanisms underlying spatial vision. The CSF described above is not a uniform, integrated structure; instead, it appears to be the envelope of a set of much more narrowband contrast detecting mechanisms. Once this discovery was made in the late 1960’s and early 70’s, for some time it was widely assumed that these mechanisms were largely independent of one another. That is, it was thought that a target stimulus detectable by one mechanism could not be effectively masked (i.e. could not have its threshold raised) by a stimulus detectable by a ‘far away’ mechanism 12. More recent findings have established that different mechanisms can indeed interact in unpredictable ways. The primary finding has been that of ‘cross-orientation masking’, where two gratings of similar spatial frequencies but orthogonal orientations can significantly can interfere significantly with each other’s visibility (Foley 1994; Meese and Holmes 2007), although at the cortical level they must be detected by different mechanisms. This is consistent, however, with the discovery of

12

‘Far away’ in spectral and spatial domains, i.e. at a very different frequency, orientation, or spatial location. ‘Far away’ can be useful as a term describing two mechanisms whose ranges of sensitivity do not overlap significantly.

23

cross-orientation suppression by cortical neurons (Bonds 1991). Many researchers have, on the basis of the physiological findings, concluded that gain control serves a normalizing function, ensuring that neurons do not saturate and thereby lose their ability to signal differences in contrast or spatial detail. The fact that saturation is easily evoked in cortical neurons by simple, artificial stimuli is a good illustration of the idea that ‘correct’ neural functioning, and therefore visual functioning, may not be well-elicited by such stimuli. As part of a study of contrast and contrast modulation sensitivity in noise, Schofield and Georgeson (2003) showed that a CSF measured (using gratings) in the presence of a 1/f noise mask was peaked at a much higher frequency than a ‘standard’ CSF. They suggested that the result was due to broader perceptual channel bandwidths toward the lower frequencies, resulting in greater stimulus variance being pooled towards low frequencies, thereby raising thresholds relative to higher frequencies. They approached the problem from an ‘equivalent noise’ perspective, which treats the masking stimulus as adding to the external noise of the detecting mechanism. An important aspect of the standard equivalent noise paradigm is that noise always masks a target (or has no effect), with an increase in noise amplitude corresponding only to an increase in threshold. In a study with many features in common with the Georgeson and Schofield experiments, Haun and Essock (Submitted) measured oriented CSF functions in the presence of 1/f noise, and found a qualitatively similar result. They found, however, that at the peak of this CSF function, sensitivity was several decibels better than would have been expected with a narrowband mask of the same contrast. That is, they found that inclusion of content beyond the target frequencies and orientations facilitated contrast thresholds, rather than elevating them. Haun and Essock measured threshold versus contrast functions for the facilitated targets, using either narrowband

24

or broadband pedestals, and found a nearly constant degree of facilitation by the broadband pedestals. Their data were consistent with the gain control model of contrast masking as proposed and supported by previous authors (Foley 1994, etc.), but was not amenable to the equivalent noise model used by Georgeson and Schofield. Schofield and Georgeson did not consider their result from the perspective of a contrast gain control model, which is under some conditions redundant with the equivalent noise model (specifically, when the mask does not stimulate the spatial filter which detects the target). Importantly, Georgeson and Schofield’s interpretation of their result did not predict that any threshold facilitation might occur due to the presence of 1/f spatial noise. While facilitation is not precluded within the equivalent noise paradigm, it is not predicted by the basic model. This result suggests that it may be more profitable to consider 1/f noise masking within a gain control model rather than as addition of external noise. The next chapter describes in detail the various alternative means of modeling data collected in contrast and noise masking experiments. It can be shown that the degree of threshold facilitation obtained in Haun and Essock’s experiments is unlikely to be the result of any reasonable permutation of the noise masking theory, and that a shift in the input-response function, usually referred to as contrast gain control, is a necessary part of any explanation. The pattern of response by the visual system during stimulation by 1/f spatial structure is, for the most part, an unknown. Many recent studies have used broadband stimuli, including noise and photographs of real world scenes, in vision experiments, without any detailed understanding of how the contrast in these images is being transformed into an internal signal, interpretable by the observer. Broadband stimuli are normally treated as excessively complex and not amenable to a narrowband, ‘classical’ mechanism-based approach. However, the results

25

of Schofield’s and Haun’s respective studies demonstrate that useful data, amenable to classical channel-theory analysis, can indeed be gathered using 1/f stimuli. The specific study described in this document represents an attempt to express in precise, classical terms, the perception of contrast in broadband, 1/f noise imagery. The next chapter is focused on an explication of what I have termed ‘contrast transducer theory’. It is given by physiological evidences that contrasts of different intensities provoke internal responses of monotonically related intensities, under controlled conditions. Various theories of the structure and nature of this internal response must all be considered and compared, for a complete analysis of a set of contrast detection and discrimination data. With this explication provided, subsequent chapters can focus on application of appropriate models to data collected in these experiments.

26

2. CONTRAST TRANSDUCTION

Overview

To understand the basis of the experiments described in later chapters, it is necessary to have some familiarity with psychophysical theories of contrast transduction. Much of this chapter involves a review of existing theories; some is a new synthesis of existing ideas, and some of these predictions are novel and relate directly to the use of broadband stimuli in spatial vision experiments. Still, for those wishing to get directly to the experiments, this chapter can be used as a reference for material that comes up later, particularly in Chapter 5, and a ‘contrast transduction primer’ has been provided at the beginning of Chapter 3 to outline the most basic ideas necessary to understand the basic objectives of this project. It is generally easier to discriminate between two very weak contrasts than it is to discriminate between two strong contrasts. This is one of many examples of perceptual behavior which follows Weber’s law (or Steven’s law), where discrimination becomes more difficult with increasing stimulus intensity. In addition, discrimination between weak contrasts can be easier than discriminating a weak contrast from a blank background (Nachmias and Sansbury 1974). These two phenomena combine to produce the characteristic dipper function for contrast discrimination, as shown in Figure 2.1a (Legge and Foley 1980). The dipper function obtained in contrast discrimination experiments can be explained in a number of ways, and furthermore is similar to discrimination functions in other visual modalities (reviewed by Solomon 2009). The 27

increase in threshold with suprathreshold pedestals can be seen as the result of a compressive contrast response combined with late additive noise (e.g. Gorea and Sagi 2001 13), as an expansive contrast response combined with signal-dependent multiplicative noise (Kontsevich, Chen, and Tyler 2002) or as any point in between these extremes. Likewise, the facilitatory dip at near-threshold contrasts can be interpreted as an expansive contrast response combined with late additive noise (Foley and Legge 1981; Kontsevich et al 2002), or as a linear (or compressive) response with signal-dependent noise which decreases with increasing contrast (cf. Katkov, Tsodyks, and Sagi 2007; Pelli 1985). While the suprathreshold portion of the TvC function remains ambiguous to modelers along the additive-multiplicative noise continuum, there are distinct models of near-threshold facilitation which take different approaches regarding the problem of signal and noise contributions to sensitivity. Nonlinear transducer theory makes use of a single equation to predict contrast discrimination behavior, referred to here as the Stromeyer-Foley (S-F) function (Stromeyer and Klein 1974; Legge and Foley 1980; Yu, Klein and Levi 2003). This theory is purely operational, making no implicit assumptions about the differential contributions of response or noise to sensitivity, and produces a nonlinear d’ function which can be matched to experimental data (Figure 2.1b). On the other hand, uncertainty theory (Pelli 1985) for example explicitly presumes that near threshold, internal noise declines with increasing signal contrast, after which increasing threshold with suprathreshold pedestals is the result of a compressive d’ function. In fact it is doubtful that anything but accurate modeling of the neural interactions underlying contrast perception can resolve these mysteries; there have been attempts at such modeling (Geisler and Albrecht 1997; Chirimuuta et al 2005; Goris, Wichmann and Henning 2009), but at this stage all are simply hypotheses which will gain or lose 13

Additive noise is also presumed to be the dominant factor limiting sensitivity by those who would link physiological response to sensitivity as measured by fMRI (Kwon et al 2008; Boynton et al 1998) or VEP (Ross and Speed 1991).

28

support in the future, and at present none can be confirmed or disconfirmed by psychophysical evidences.

Figure 2.1 Typical results and theoretical interpretation of a contrast discrimination experiment. Left: threshold versus contrast function. Abscissa axis represents the contrast of a pedestal or standard stimulus, and ordinate axis represents the amount contrast must be increased from the standard for successful discrimination. Right: sensitivity function, in units of d’, for the threshold versus contrast function shown at left. Abscissa is pedestal contrast – ordinate is psychophysical sensitivity to that contrast. In a discrimination experiment, ‘standard sensitivity’ is set by the standard contrast, and an increase in contrast corresponding to a particular increase in d’ – in this case 1.3, 82% correct in a 2AFC experiment – is sought. As represented by the gray areas at right and the corresponding black arrows at left, at different pedestal contrasts the same d’ increase will correspond to different contrast increases, thereby forming the threshold versus contrast function.

The effect on contrast sensitivity of an added masking stimulus to a contrast discrimination display (a double masking paradigm) can also be explained in a number of different ways. Adding a fixed-contrast mask to a discrimination display raises thresholds for low pedestal contrasts, and normally has little to no effect on thresholds at higher pedestal contrasts (Ross and Speed 1991; Foley 1994; Meese and Holmes 2004). This effect can be described as a shifting to higher contrasts of the dipper function, upward and rightward on loglog contrast axes. A modification of the S-F function has been successful at modeling this

29

change as a function of mask contrast. Importantly in the context of viewing naturalistic, broadband imagery, this modification can be linked directly with the phenomenon of cortical contrast gain control (Heeger 1992; Ohzawa, Sclar, and Freeman 1985; Schwarz and Simoncelli 2001). Gain control mechanisms have been proposed as a means for normalizing image contrasts to prevent response saturation in the visual system (Carandini, Heeger, and Movshon 1997; Wainwright, Schwarz, and Simoncelli 2002), which has led to suggestions that such normalization might be particularly powerful in the context of complex, anisotropic broadband imagery (Hansen et al 2003). A second option for explaining masking effects can be found in uncertainty theory, which could be modified conceptually to produce a similar effect in a double masking paradigm. This would require invoking changes in observer attention to the stimulus display as a result of introducing a mask. No such model of pattern masking has been described, though it is theoretically feasible, and plausible given what is known of visual processing. The absence of studies using broadband stimuli in contrast masking experiments accounts for the lack of existence of an uncertainty theory for contrast masking. As a final option, the masking stimulus can be described as an added source of external (to the observer) noise, passed through a nonlinear transducer. This also can potentially account for the dipper-shifting visual behavior in a double masking paradigm. Fortunately, these three alternatives make discriminably different predictions regarding the precise form of the masked dipper function. The alternative models of contrast discrimination are detailed below. In subsequent chapters, data can then be interpreted in terms of the model parameters introduced here.

30

Contrast Transduction in the Brain

Contrast can be defined most simply as a spatial discontinuity in luminance. The first layer of the visual system, including the photoreceptors and the network gathering and modulating their outputs, contains neurons which essentially encode the presence or absence of light. A spot of light focused on the retina will therefore result in a similarly sized spot of neural activity in the same location. A ganglion cell responds to a localized spot of activity in the prior layers, but its response can be suppressed or in some cases eliminated by making the spot larger. So, the best stimulus for a ganglion cell will tend to be a spot of light of a particular size and location: in other words, a spatial discontinuity in luminance. Such discontinuities can be augmented by surrounding these spots of light with ‘opposite’ types of light: a bright spot should be surrounded with darkness and vice-versa, a green spot surrounded by red, and so on. This is the beginning of contrast encoding by the visual system. A ganglion cell’s response to an ideally shaped stimulus increases as the contrast of the stimulus increases. This increase is relatively linear above the neuron’s threshold, followed at higher contrasts by response compression and saturation at a maximum response level (Barlow and Levick 1969). LGN neurons show a similar pattern, both in the shapes of their receptive fields and in their pattern of contrast response. This pattern is referred to as the contrast response function (CRF). V1 neurons acquire orientation selectivity, and again follow a similar CRF patterns. Retino-cortical changes in CRF shape are best described as an increase, from level to level, in their steepness as the response emerges from the threshold (Sclar, Maunsell, and Lennie 1990). It has long been known that visual neuron CRFs are well-described by a hyperbolic ratio function, named the Naka-Rushton function for the good fit to photoreceptor luminance response

31

functions discovered by its namesakes (Naka and Rushton 1966; Albrecht and Hamilton 1982; Sclar et al 1990). The function takes the following form, with c representing stimulus contrast or, more precisely, the response of a linear operator (such as a receptive field) to contrast: n ( c) R(c ) = Rmax ⋅ n n c50 + (c )

(2.1)

The steepening of contrast response in neurons along the retino-cortical pathway is measured by the Naka-Rushton function as an increase in the exponent n, with c50 denoting the contrast region of the relatively linear regime of the CRF 14. For retinal ganglion cells, n takes on values near 1.0 (Guenther and Zrenner 1993), for LGN neurons n tends to values between 1.0 and 2.0 (Derrington and Lennie 1984; Sclar, Maunsell, and Lennie 1990), and in V1 the value of n normally exceeds 2.0 (Sclar, Maunsell, and Lennie 1990; Albrecht and Geisler 1991). In each case, if contrast is viewed in multiplicative (log) units, the CRF will be sigmoidal. Mass recording techniques such as VEP (Ross and Speed 1991), fMRI (Boynton et al 1999; Kwon et al 2008), and optical imaging (Zhan, Ledgeway, and Baker 2005) confirm that cortical neurons en masse behave in the same manner, with an expansive increase in response to low, nearthreshold contrasts, a quasi-linear contrast-response above threshold, and a compressive, saturating response at higher contrasts. Ganglion cell and LGN neuron response functions change as a function of local mean luminance, maintaining effective dynamic ranges through contrast and luminance gain control mechanisms (Shapley and Enroth-Cugell 1984; Mante, Frazer, Bonin, Geisler, and Carandini 2005). The CRFs of cortical neurons, in particular, are not fixed. Contrast response in visual cortex is modulated not by local luminance, but by local contrast and also by attentional factors. 14

C50 is named the ‘semi-saturation constant’ in neuroscience literature, because specifies the contrast at which R(c) reaches half the maximum response level of Rmax. It is therefore sometimes used as a rough measure of the neural response threshold.

32

The first mechanism is known as contrast normalization, and is likely meant to extend the dynamic range of a cortical neuron, whose range is already cut somewhat short in comparison with retinal and LGN neurons by its relatively lower maximum response rate and steeper response function. Contrast gain control appears to work by shifting the value of c50 to rest nearer to the local distribution of contrasts. This ensures that cortical neurons are not constantly in a state of response saturation, as well as preserving their more delicate features such as orientation selectivity (Carandini et al 1997; Heeger 1992; Albrecht and Hamilton 1982). Heeger (1992) showed that response suppression in cortical neurons could be described through a modification of the Naka-Rushton model, incorporating terms to represent the apparently ‘divisive’ effect of suppressor stimuli:

R (c j ) = R j max ⋅

(c ) + ∑ (c ) n

j

σ jn

n

(2.2)

i

i

Here ci is the response of the jth linear operator to contrast. The semisaturation constant c50 is replaced with a value σj linked with the jth operator. Instead of a value identical with the numerator term cjn added to the semisaturation constant in the denominator as in the NakaRushton function, the sum of many terms ci, across many other neurons including the jth, is added. Heeger demonstrated that this type of model allowed neuron bandwidths to remain constant across contrasts among other interesting properties, and later authors (Carandini et al 1997, Wainwright et al 2002, Hansen et al 2003) have shown that this type of gain control can be ideal for effecting contrast normalization schemes in complex images like natural scenes.

33

Perceptual Transduction

The d’ Function

The most basic model of contrast detection and discrimination is a nonlinear function of contrast (Stromeyer and Klein 1974; Legge and Foley 1981), whose value accelerates for nearthreshold contrasts and decelerates for higher contrasts. Such a pattern of perceptual response accounts for the near-threshold facilitation and higher-contrast masking seen in threshold versus contrast functions. This equation is often called simply the contrast response function, though it is perhaps best termed the d’ function for contrast. A mathematical representation of the relationship between d’ and contrast is widely used, and the name ‘Stromeyer-Foley’ or S-F function has been suggested (Yu et al 2003): d ′(c) =

r ⋅ (c )

p

z p − q + (c )

p −q

(2.3)

Here, c is a linear measure of contrast, often modeled as the response of a linear filter tuned to some set of stimulus attributes (namely orientation and spatial frequency; Legge and Foley 1981) 15. The accelerative portion of the function follows a power function of power p and decreasing in power with contrast, with p normally at a value greater than 2.0+q. q describes the power of the suprathreshold portion of the function, and could also be referred to as the Stevens’ power law exponent (Stevens 1957). Since suprathreshold contrast discrimination is increasingly difficult with increasing contrast, the suprathreshold d’ function is compressive, translating to a

15

In fact, c can be implemented as the convolution of a linear filter G with the stimulus pattern, represented in the frequency domain as a gain parameter g (e.g. for a vertical filter, g(θ = 0°) might be equal to 1.0, decreasing to 0.0 as θ approaches ±90°), so that c = G*C, where C is the stimulus pattern.

34

value for q which is less than 1.0, and normally around 0.4 (Legge 1981). z marks the transition between the expansive and compressive portions of the function, and is often termed a threshold parameter. r determines the absolute gain of the response function. Refer to Figure 2.2 for an illustration.

Figure 2.2 Contrast d’ function, on three scales, with p = 2.4, q = .4, z = .05, and r = 25. A) On linear axes, the suprathreshold compression is clear. B) Zoomed in, low-contrast expansion can be seen between c = 0 and z (marked by the vertical dotted line). Near z the function is quasi-linear, then becomes compressive. C) On log-log axes, the difference in the low-contrast and high-contrast regimes is clear, as is the transition point near z.

This function can be used to accurately describe human performance in an increment threshold task. c can be taken as pedestal or background contrast, with Δc set equal to discrimination threshold at c, representing an increase in contrast which results in a certain predetermined increase in d′16:

d ′ = d ′(c + ∆c ) − d ′(c )

(2.4)

16

This requires that we accept the hypothesis of d’ additivity, which is empirically well supported (Nachmias and Sansbury 1974; Foley and Legge 1981; Pelli 1985, 1987).

35

d′ =

r ⋅ (c + ∆c )

p

z p − q + (c + ∆c )

p −q



r ⋅ (c )

p

z p − q + (c )

p −q

(2.5)

This equation then can be solved numerically for Δc 17, and this value compared directly with data collected in a contrast discrimination experiment.

Masked d’ Functions

The S-F function model is quite successful when applied to simple difference-threshold experiments (refer to Figure 2.1), where the only stimuli are a target stimulus and a background stimulus often called the pedestal since it is structurally similar with the target (Legge and Foley 1980). However, when a second mask which is structurally different from the pedestal is introduced, the model fails completely (Ross and Speed 1991; Ross, Speed, and Morgan 1993; Foley 1994). For example, in an experiment with vertical pedestal and test stimuli, adding a horizontal grating to the stimulus display will result in significant threshold elevations, especially when the pedestal contrast is low. This failure can be remedied by inserting extra terms cj representing the contrasts of additional masks in the denominator of equation 1 above (Foley 1994):

d′ =

r ⋅ (c )

z p − q + (h j c j )

p

p −q

+ (c )

(2.6)

p −q

1

p  p  p −q p −q Rearranging Equation 4, we find that ∆c =  1  d ′ + r ⋅ (c )  − c . Since this makes Δc  ( ) + + ∆ z c c p −q  p −q r  ( ) z + c     = f(Δc), Newton’s method can be used to quickly converge the function to the ‘actual’ value of Δc.

17

(

)

36

This arrangement allows contrast masks to have at low pedestal contrasts a suppressive effect on d’, which is normally what is seen in double masking experiments. It also allows for an improvement of suprathreshold sensitivity (Haun and Essock 2009), since it amounts to shifting the ‘more sensitive’ expansive portion of the d’ function to a higher contrast range. It is also conveniently similar to models of gain control in cortical neurons (Heeger 1992), and can be expressed in terms very similar to Equation 2, with the primary difference being that under whatever circumstances, the d’ function does not saturate, so that p is always greater than p - q. The parameter h has been referred to as a ‘gain control’ coefficient (Meese and Holmes 2004; Essock et al 2009), since this particular modification was inspired by similar models of neural response normalization often characterized as contrast gain control models.

Figure 2.3 TvC functions (left) and d’ functions (right) produced using Equations 4 and 6. The solid line and symbols follow an ‘unmasked’ function. As mask contrast is increased from zero, the TvC function shifts up and to the right on log-log axes (thin gray lines), corresponding to a shift to higher contrasts of the d’ function. Dashed lines and open symbols show that gain control thus implemented can produce lowered thresholds at suprathreshold background contrasts (Figure adapted from Haun and Essock 2009).

37

Since it is possible that an additional mask might overlap with the linear filter producing the target contrast response c, thereby increasing the value of c, another term is necessary to represent this overlap:

d′ =

r ⋅ (c + w ⋅ c j )

p+q

z p + (h j c j ) + (c + w ⋅ c j ) p

p

(2.7)

w·cj therefore is equivalent to the response to the added mask of the primary linear filter. Some empirical knowledge can be brought to bear on this parameter. Psychophysical and physiological studies agree that filter bandwidths narrow with increasing spatial frequency (DeValois et al 1982, Snowden 1992); no contrary evidence has been presented to indicate a reverse trend under any circumstances. So, a reasonable constraint on w might be to enforce a monotonic, decreasing relationship with spatial frequency. Similar constraints are simply not available for the other parameters of the d′ function, as the processes by which they arise are still rather mysterious. A potential constraint on p is discussed below, under Uncertainty Theory. The model expressed in Equation 6, applied to the difference experiment described by Equation 4, appears to be sufficient to describe human performance in virtually any contrast masking experiment with minimal modification (see Chen and Tyler 2008, for an important example of a novel and useful modification).

Relating the d’ and Contrast Response Functions

The relationship between the cortical CRF and the psychophysical d’ function has been noted previously. There is one important difference between psychophysical and neural contrast 38

sensitivity, and that is that neurons clearly saturate at high contrasts, while contrast perception apparently does not, though it does become progressively more compressive. This gap is, in fact, not so difficult to bridge. Watson and Solomon (1997) showed that two or more saturating neuron CRFs, each with unique parameters, could be summed to produce a non-saturating response function virtually indiscriminable from the d’ function. Since neural response variance is dependent on response level, this simple combination is not enough to account for the d’ function, but Chirimuuta and Tolhurst (2005) have shown that Bayesian combination of several CRFs, on the order of 10 neurons, produces a d’ function similar to that of human observers. Goris et al (2009) have shown that on the order of several hundreds of neurons with correlated variances and a broad distribution of c50 values can be summed directly to produce the psychophysical d’ function, and Sit et al (2008) have used results from optical imaging studies to demonstrate that the level of uncorrelated noise in visual cortex is in fact constant with stimulus contrast.

Spatial Models

Spatial Filters

The primary elaboration on the one-dimensional model just described involves the incorporation of spatial elements, including model filters and experimental stimulus images (Watson and Solomon 1997; Meese and Georgeson 2005). A spatial filter G is multiplied point for point against the stimulus image S used in the experiment, and the product is then summed

39

for a scalar measure of the effective stimulus contrast, called the linear response, denoted here as ri,j of the ith filter to the jth stimulus:

ri , j = ∑ Gi (x, y ) ∗ S j ( x, y )

(2.8)

x, y

The linear contrast response to the stimulus is then used as the contrast term in something like Equation 5, or alternatively, ri, can be computed as a function of j and used as a weighting factor, equivalent to c in Equation 3. The standard filter used in spatial vision models is some form of cortical receptive field pattern, occasionally described the product of a sinusoidal grating pattern and an oriented Gaussian envelope (a Gabor filter; Daugman 1984). The number of grating cycles passed through the Gaussian is typically less than 2. These patterns are used because of their resemblance to primary visual cortex neuron receptive fields, and their resemblance to psychophysically measured sensitivity bandwidths. These empirical bandwidths are observed to have log-symmetric spatial-frequency tuning, and so some researchers have begun to use logGabor filters (Meese and Georgeson 2005; Hansen and Hess 2006), which must be created in the frequency domain, to better approximate this property:

G ( f ,θ ) = e

log 2  f f peak    − e∗ σ 2f

2

⋅e

 θ −θ peak   − e ∗ σ2

θ

2

(2.9)

Here, f and θ are spatial frequencies and orientations, respectively, represented in the Fourier domain. fpeak and θpeak are the peaks of the frequency and orientation tuning functions for the Gabor. σf corresponds to the width at half-height, in octaves (doublings of frequency), of the 40

spatial frequency tuning function, and σθ corresponds to the width at half height, in degrees, of the orientation tuning function. These log-frequency filters will be used in all spatial models applied in Analysis. The Fourier spectrum defined by these parameters through G(f,θ ) is then transformed into the spatial domain; the real valued portion of the spatial transform then constitutes an evensymmetric (‘cosine phase’) filter, and the imaginary valued portion constitutes an odd-symmetric (‘sine phase’) filter. If responses are allowed to go negative and positive, two filters with phases separated by 90°, a ‘quadrature set’ can completely account for contrast within their spatial range, regardless of phase. The phase insensitivity of complex cells in visual cortex has been explained as being derived from a quadrature set of fixed-phase simple cells; since negative responses do not exist in cortical encoding, four simple cells would be necessary, with phases spaced 90° apart. The question then arises of how these filter responses should be used in the context of the models described here. Two options present themselves. One is to rectify the responses of each filter, and to directly pass these contrast values into a transducer function, combining multiple responses according to a summation rule as described below, treating each transducer as a model simple cell. The other option is to combine the responses of the quadrature set, and pass this combination to the transducer, thereby treating it as something like a model complex cell. This type of measure is often referred to as a measure of contrast amplitude:

2 2 + rimag = max (rφ =0 ,0 ) + max (rφ =90 ,0 ) + max (rφ =180 ,0 ) + max (rφ = 270 ,0 ) c amp = rreal 2

2

2

2

(2.10)

41

Here the max function simply denotes half-rectification, the absence of negative responses which necessitates four equally spaced filters; in this arrangement two of the values will always be zero, so camp amounts to a measure of the linear response magnitude of the linear filters. Spatial models typically include a complement of filters, distributed at many spatial locations across the area which contains the experimental stimulus. Sensitivity declines with eccentricity (angular distance from foveal fixation), and this factor can be incorporated based on prior empirical findings (Pointer and Hess 1989; Foley 2007). The relationship between contrast sensitivity and retinal eccentricity can be described as an increase in threshold from foveal sensitivity according to the function cδ = 0.4*(ecc*freq), measured in decibels (20*log10(c)). The multiple measured responses must then be combined according to some pooling rule. Based on measurements of contrast sensitivity with grating stimuli of varying size, it is believed that by some process the pooled sensitivities are raised to a power around 3 or 4, and summed (Watson and Solomon, 1997; Watson and Ahumada 2005; Foley et al 2007). It is therefore reasonable to assume that the nonlinear contrast responses to a stimulus are linearly summed prior to the decision stage:

n

n

n

i =1

i =1

i =1

d ′ = ∑ Ri (c + ∆c ) − ∑ Ri (c ) = ∑ (Ri (c + ∆c ) − Ri (c ))

(2.11)

Here i denotes the ith of n responses. This is the pooling rule I have employed in all cases where more than one response is combined for a decision.

42

Tying w to filter bandwidth

In addition to providing a direct measure of stimulus contrast, spatial filters can also provide a measure of the ‘overlap’ parameter w introduced in equation 5, which can help in keeping it within reasonable bounds (i.e. near values which are plausible given empirical measures of mechanism bandwidths). For the purposes of this project, this is the main advantage of a spatial model. Without this consideration, and without spatial manipulations in the experimental design (e.g. changes in stimulus size and shape), spatial models otherwise will return results equivalent to one-dimensional functional models. It is straightforward to create a lookup-table of w values as a function of the bandwidth parameters σf and σθ, for example:

Table 2.1

This is done by measuring the linear responses of a filter with specific σ values to a background image and to a target image, both with equivalent nominal contrasts, and then taking the ratio of these values. If the ratio is 1, this means that the background excites the filter to the same degree that the target does. The ratio is therefore the same as the w parameter. Most reasonable ‘empirically based’ bandwidth values should be within these bounds, and perhaps slightly larger. It has been suggested (Levi et al 2005; Taylor, Bennett, and Sekuler 2003) that, at least when noise is used as a stimulus, channel bandwidths are adjustable in the spatial frequency dimension, so unusually large or small values cannot be absolutely discounted. 43

Under the conditions included in the project, the relative contributions of the spatial frequency and orientation bandwidths to w cannot be directly estimated, which muddles the issue somewhat. In the case of the spatial models, an empirically based ‘aspect ratio’ can be enforced, with something like 30° per octave. In the case of functional models, w(σf ,σθ) contours can be computed. Some rough estimation of the relative contribution of the two bandwidth parameters can be made using these same stimuli by removing mask content at the same orientation or spatial frequency as the target, and remeasuring masked detection thresholds.

Figure 2.4 A basic structure for simulating contrast discrimination experiments. 1) Two spatial stimuli of different contrasts are convolved with 2) a spatial filter, yielding two estimates of stimulus contrast c1 and c2. 3) These estimates are input to a contrast response function to which is added 4) normally distributed noise, yielding two estimates of perceptual response R1 and R2. In 5), the simulated observer ‘chooses’ the stimulus associated with the greater R value as having the greater c value. This process can be treated as a single experimental ‘trial’ and used with a psychophysical method such as the QUEST used in these experiments.

44

Using Spatial Filters in Simulations

The simplest model spatial observer for a contrast detection and discrimination experiment should include five stages, illustrated in Figure 2.4. The first stage is the stimulus representation, which in the case of this model consists of the same set of numerical matrices which were converted into luminance displays to be viewed by human observers in the actual experiments. The second stage is the filter stage, which is multiplied against the stimulus image and summed according to some rule to give a linear response to image contrast (previous section). Filter parameters used in Analysis are as described the previous section, namely quadrature logGabor functions symmetric in orientation and log spatial frequency tuning (Eq. 2.9). Here, a set of filters, distributed at multiple locations (denoted as x and y in the figure) across the ‘stimulus field’ will be used, located in a radial matrix 18. The sensitivity or linear gain of these filters (represented by their contrast in the illustration) varied with eccentricity by a factor which matched the general decline in sensitivity with eccentricity which occurs in human vision (Meese and Summers 2007; Pointer and Hess 1989). It is well known that sensitivity also depends to a point on the size of a target (Tootle and Berkley 1983), and it is generally agreed that this occurs due to spatial summation across multiple filter responses (Foley 2007; Meese and Georgeson 2005), so use of multiple filters is a natural component of the model.

18

Foley (2007) states that the form of the sampling matrix is relatively unimportant and has minimal effect on results if other factors are held constant (e.g. spacing between filters, filter gain, variation of gain with eccentricity, etc). I have found the same, having experimented with hexagonal and rectangular matrices; a radial (i.e. circular) matrix was chosen here because of the potential for the orientation of the model to be altered. Rotation of either a hexagonal or rectangular matrix can be problematic if an algorithm is set to maximize the number of filters filling a given space, since the number of units making up the matrix will be limited ultimately by the rectangular shape of the image matrix, and will vary with orientation, which is inconvenient. A circular matrix will not vary in its bestfitting number of units with orientation, which is convenient.

45

The third stage of the model observer is the nonlinear transduction stage, which in this model consists simply of the S-F equation (Eq. 2.6), with the sum of the filter outputs used as input. A choice can be made here as to whether each filter output should cascade with its own SF function (parallel-nonlinear), or whether the whole set of linear outputs should be combined prior to passing through a single overall S-F function (serial-nonlinear). This issue has been addressed in Transduction Theory, with the conclusion that the nonlinearity of spatial summation (Graham 1989; Foley 2007) supports the former alternative. However, the parallel model is more computationally complex, and logically demands consideration of multiple parameter sets across the multiple S-F function dimensions. Experiments varying stimulus size might be able to make distinctions regarding this stage (e.g. Foley et al 2007, Meese and Summers 2007), but in the context of these experiments it amounts simply to conceptual complication. The behavior of a multiple, parallel nonlinear transducer model is equivalent with the behavior of a single serial transducer model, so it is not difficult to reconcile with the single-function fitting carried out in the previous sections. The serial-nonlinear route is therefore taken in this simulation. The fourth stage of the model observer involves addition of noise. In this model a single, additive, normally distributed internal noise source is included, since there are no means by which multiple internal sources (perhaps involved at multiple nodes of a process like that described by Yu, Klein and Levi 2003, or Levi, Klein and Chen 2005) can be disambiguated with these data. There is a conceptual problem at this stage, since when the response level is low there will be the possibility of negative response values. Since the response can be treated directly as a d’ value, this would imply that on some stimulus presentations the observer ‘sees’ the target as being distinctly un-targetlike. Whether this phenomenon occurs in human observers is not known (what would it be like phenomenally if it did occur?), and the question is probably

46

one of many reasons that a multiplicative (i.e. response-dependent) noise source is much more plausible than an additive one. It is important to remember here that the S-F function is a black box which does its job well, but whose signal and noise components are a mystery. Negative values are odd, but are a logical part of the model. This response-plus-noise value is then taken as the response of the observer to the stimulus and can be used in a final stage to simulate performance in a psychophysical task. For this model, the fifth stage consists of the same 2IFC QUEST procedure as that performed by subjects in the experiments (the details of the procedure are described in the next chapter, Methods). Pairs of stimuli, one containing a target stimulus, are passed through the first four stages of the model. The corresponding pairs of responses are then compared, and the model ‘chooses’ the larger of the two. The SDT interpretation of 2IFC tasks (the very basis of these tasks, in fact) presumes that a human observer does just this: that he compares the responses of his visual system to the two stimuli, and chooses the stimulus that produced a larger response. This model assumes perfect memory, which certainly falls short of the truth (Magnussen and Greenlee 1999), but which might be subsumed into the noise added to the response. With these five stages established, we can test how such a model might respond to the targets and masks used in the experiments described in subsequent chapters (such a simulation is carried out in Chapter 5, Analysis).

Uncertainty Theory

Uncertainty theory differs from the functional models described above in that it includes a variable number of transducers which are not particularly sensitive to the target stimulus, but

47

which are nonetheless attended by the observer, who is therefore relatively ‘uncertain’ as to which transducers are the most relevant to the task. This amounts to introducing an additional source of noise which affects sensitivity (in the standard formulation) only at very low target contrasts. The theory assumes that a maximum-response decision rule is used, where the observer chooses the interval (in a 2AFC task) which contains the single largest response across many transducers. When combined with the presence of transducers insensitive to the target this rule results in a distortion of the response distribution which is, theoretically, measurable. The ins-and-outs of uncertainty theory have been detailed elsewhere (Green and Swets 1966; Pelli 1985; Graham 1989), and this section will describe what is necessary to simulate uncertainty in contrast sensitivity. In most SDT models of contrast detection, it is assumed for the sake of simplicity that the response distribution has constant variance across response levels, and that it is Gaussian. If all transducers used by an observer have independent noises of equal magnitudes, if these are summed linearly by the observer, and if they are all sensitive to the target stimulus, then every stimulus presentation, regardless of contrast, will result in a response distribution of the same variance. This simplifies sensitivity calculations: on a ‘standard’ trial, the transducers all respond according to the pedestal level, while on a ‘test’ trial, the transducers will respond at that level plus some amount, which the experimenter sets at a constant d’ level. Since the variance is not affected by response level, and since the mean response corresponds uniquely to a certain stimulus contrast, the distribution function can be done away with completely, and d′ can be treated as a simple scalar response difference rather than as the distance between two distribution means, as in Equation 2.

48

When irrelevant transducers are added to the mix along with a maximum response rule, it becomes necessary to consider the shapes of the response distributions. For example, when the stimulus contrast is at zero, if all transducers are responding with mean values of zero, the distribution of max values across many trials will have a mean greater than zero. The distribution functions for each transducer denote the likelihood of the maximum response Rmax being below a given response level. The distribution functions are obtained by integrating across the probability density functions for each response:

F (R j ,max < R j | c stim ) = ∫



R =0

f (R j | c stim )dR j

(2.12)

There are N+M of these response distribution functions, N sensitive or relevant transducers, and M irrelevant ones. These are multiplied together to obtain the joint distribution functions. The derivative is then taken across this product to obtain a probability density function:

f (Rmax | c stim ) =

(

)

d N M F (R j ,max < R j | c stim ) F (Rk ,max < Rk | cirr ) dRmax dRmax

(2.13)

The result is a probability density function representing the distribution of largest responses across all attended transducers. The difference in the means of test and standard response distributions is then taken, for a direct measure of d′:

d ′ = E ( f (Rmax | c ped + ∆c )) − E ( f (Rmax | c ped ))

(2.14)

49

This model presents two or three new variables, depending on whether or not value is given to the ‘irrelevant’ stimulus contrast. N is the number of task-relevant transducers, which can potentially respond to the target stimulus, while M is the number of task-irrelevant ‘distractor’ transducers insensitive to the target. Uncertainty theory has been cited as an alternative explanation of the facilitatory dip seen in contrast discrimination functions. A linear transducer sensitivity function is normally assumed in uncertainty models, for simplicity’s sake. When cstim is very small, large values of N+M will result in a combined max distribution which is much broader than that of a single transducer (eq. 9). This will result in higher thresholds than would be predicted if only a single relevant transducer were being observed. As cstim increases, the max distribution will gradually narrow until it is equal to the max distribution for only the relevant transducers, which will be accompanied by a gradual decrease in threshold as the linear relevant transducers’ response increases. This can be expressed as follows:

∞ ∞ ∞ for Rstim >> Rirr ,  ∫ f (R j | c stim )dR j  ≈  ∫ f (R j | c stim )dR j   ∫ f (Rk | cirr )dRk   R =0   R =0   R =0  N

N

M

(2.15)

Once cstim is sufficiently high to escape the effects of the irrelevant transducers, facilitation will cease, and threshold will be determined entirely by the form of the transducer function. In the case of a linear sensitivity function, thresholds will remain constant as a function of contrast once the irrelevant transducers have been escaped; since this does not occur in TvC measurements, we know that contrast sensitivity above threshold cannot be linear. There is some recent analytic evidence for a decline in internal noise with increasing contrast below the detection threshold 50

(Katkov et al 2007), but this finding has not yet been corroborated or linked with a particular functional architecture for contrast detection.

Uncertainty Models and Nonlinear Transducers

The uncertainty model describes the effects of combining relevant and irrelevant transducers, but does not state the functional form of those transducers. It is often assumed simply that transduction must be linear, and at higher contrasts transducer noise is dependent on response (i.e. the sensitivity function is compressive). In combination with uncertainty, such an arrangement would produce dipper functions in a contrast discrimination experiment (Pelli 1985). Certain findings (Tyler and Chen 2000; also, e.g. cortical neurons produce very little baseline noise, and so would be expected to contribute very little to a near-threshold max distribution) indicate that uncertainty is an unlikely cause of near-threshold facilitation, and that nonlinear transduction is the more likely cause (but see Katkov et al 2007). Even if this is true, uncertainty still remains a theoretical issue, since multiple relevant and irrelevant nonlinear transducers might be used by observers to carry out a contrast detection and discrimination task. Uncertainty theory is general enough to accommodate such a situation, by making R a particular nonlinear function of c, and makes easily quantifiable predictions. Since the effect of uncertainty is to raise thresholds at low stimulus contrasts, and since this effect declines with increasing contrast, uncertainty combined with any transducer function will result in elevated thresholds and a steepening of any near-threshold facilitatory trend. Uncertainty combined with the nonlinear transducer function described previously would therefore produce elevation of thresholds at low contrasts, and a steepening of the facilitatory dip in the TvC function.

51

The uncertainty model has widely been applied to cases of stimulus detection and nearthreshold discrimination, but not to suprathreshold discrimination. Neither has the potential for a masking stimulus to produce spectral uncertainty previously been explored. These topics must necessarily be considered here. On the second point, that of a stimulus to produce uncertainty, we can imagine how an observer might handle a set of contrast channels, some few of which are to be tested with a target image, and the rest of which are to be ignored. In a contrast discrimination task, we assume that the observer is able to focus his attention on a single, most-relevant channel, and that his ability to discriminate or detect channel responses follows a nonlinear (i.e. ‘Stromeyer-Foley’) function of stimulus contrast. In this case, behavior will be as is normally observed, with a dipper-shaped TvC function. However, it should be noted (as it is an aspect to the main hypothesis of this dissertation) that this situation has no ecological basis or precedent. ‘Single channel stimuli’, i.e. image features which can be expected to stimulate a single spatial channel to the exclusion of all others at the same spatial location, are far from the norm in natural imagery. Most locations in a natural image contain features even the simplest of which can be expected to stimulate several channels simultaneously. An isotropic texture with elements all of the same size, such as a flat surface covered in pebbles, might stimulate only channels tuned to a certain spatial frequency, but multiple oriented channels at that frequency will be simultaneously stimulated. An oriented edge will stimulate channels at several spatial frequencies, all at a single orientation. Either type of feature will normally be found in close spatial context with other orientations and spatial frequencies. In short, visual stimulation at any location in the optic array is far more likely to be broadband than to be ‘single channel’, so the fact that observers are able to utilize single channels in viewing spatial stimuli is a testament to the plasticity of the visual system.

52

Still, in the case of a single-channel stimulus, the task is being made easy for the observer. He may not be tempted to utilize his visual system in a naturalistic manner, since the stimulus is so pronouncedly artificial. If we make the visual stimulus more closely resemble a natural image, by simultaneously activating other channels at different spatial frequencies and orientations, will the observer still be able to localize the relevant channel as easily? There is evidence that when faced with detecting targets in noise, the apparent bandwidth of contrast sensitive mechanisms becomes adjustable (Levi et al 2005; Taylor et al 2003). What is being measured in this case may not be the bandwidth of the channel itself, but rather the tendency of an observer to pool multiple channels together: that is, in using the channel array, a configural ‘stimulus template’ might be applied, particularly when searching for structurally complex stimuli such as letters, faces, or other non-elemental spatial stimuli (this is often assumed within the classification image paradigm: cf. Gold, Sekuler and Bennett 2004, Nandy and Tjan 2007). It is conceivable that broadband stimulation, particularly naturalistic 1/f stimulation, acts as a cue to the visual system to pool adjacent channels in more naturalistic, ecologically probable templates. At the least, such a tendency on the part of the visual system would tend to make it more difficult to localize spectrally simple single-channel stimuli, thereby introducing channel uncertainty into a behavioral situation where it does not otherwise exist. Some studies have considered spatial uncertainty (Foley and Schwarz 1998), positing that observers are unable to completely restrict their attention to the precise location at which a small stimulus (such as a Gabor) is presented; this inability may be by design of the experimenters. As a result, the observer inadvertently makes his decision on the basis of a pool of responses including some which are sensitive to regions distant from the stimulus location. In the context of the experiments included in this project, spectral (frequency domain) uncertainty is more

53

likely. Stimuli used here are always presented at the same visual location, in every condition and on every trial. It may be that observers are for some reason unable to completely localize a specific orientation and spatial frequency within the visual system, requiring that they pool responses across a larger spectral region than necessary. Also, although during one block of trials a single spatial frequency and orientation is used as a target on every trial, across many blocks of the experiment an observer will see targets of many different frequencies and orientations. So, it is possible that some degree of confusion might set in on some trials, particularly on those conditions where target parameters are not clearly signaled by the stimulus on each trial (i.e. broadband masking conditions, and subthreshold pedestal baseline conditions) and, if a ‘max response’ decision rule is used by observers, this confusion would manifest as uncertainty as defined in this section. Addition of an uncertainty parameter M to the nonlinear transducer model described in eq. 4 would not be particularly useful, since the transducer model is expected to fit the data very closely. However, since increasing uncertainty mimics an increase in the transducer exponent p (i.e. it adds to the steepness of the facilitatory dip in the TvC function), an M parameter would afford an opportunity to fix p at a single ‘low’ value of 2 across all conditions. This is convenient for a number of reasons. It essentially allows M to replace p as a free parameter, keeping the total number of free parameters constant and making this a neutral model alteration as far as goodness of fit measures go. More importantly, it allows the transducer model to be tied to an empirical measure of V1 neuron contrast response: Naka-Rushton functions fit to single unit contrast response data have an average exponent of 2, though there is significant variation. This value is also of theoretical importance, allowing contrast transducers to be viewed as energy or amplitude mechanisms (Morrone and Burr 1988; Heeger 1992), and has also

54

been shown to be an optimal value for information transmission given the constraints of a saturating transducer (Gottschalk 2002). If the transducer exponent is fixed at a value of 2, the uncertainty variable will be able to pick up the slack in the TvC function data, and this difference can be attributed to observer uncertainty among nonlinear S-F functions with uniform p values. One interesting hypothesis here is that in the baseline condition, when there is only a gray background, observers are able to become very certain of exactly what orientation and spatial frequency they are looking for. However, in the masking conditions, the mask consists of structure at many different frequencies and orientations. At most mask contrasts used, the mask is obviously visible, so in some sense the observer is certainly aware of these non-target components by virtue of non-zero transducer responses to filters tuned to those components. Whether or not the observer is able to exclude these non-target responses in making discrimination judgments, and just what type of pooling rule the observer uses across the many activated transducers available in the stimulated visual system, would determine whether or not uncertainty effects would manifest in the masking condition.

Adding External Noise

Transducer and uncertainty models implicitly and explicitly consider the contributions of internal noise to sensitivity, internal noise being variation in the observer’s decision variable as the result of internal, observer-dependent processes. The implicit considerations are made as priors for a simple analytical expression for contrast perception (e.g. the S-F function outlined above); additive noise is presumed to be constant under all contrast conditions (Legge and Foley 1980; Foley 1994), and so its properties can essentially be ignored. Internal noise is explicitly

55

considered when a part of the stimulus display is itself random, providing a source of external noise, particularly in the equivalent noise paradigm (Pelli and Farrell 1999; Lu and Dosher 1999) whereby the amount of external noise equivalent with the amount of internal noise is sought. External noise may also have a significant effect on sensitivity, particularly in experiments such as those presented here where the stimuli themselves have randomly generated structure. Since current psychophysical models of sensitivity all incorporate a single ‘decision stage’ at which signal dependent responses and noises are combined, models of sensitivity incorporating external in addition to internal noise sources must present a means of combining these different sources. Since the masks used in this dissertation’s experiments are a form of spatial noise, the primary consideration here will be how addition of external noise affects sensitivity given a known, nonlinear sensitivity function. First, it must be said that while noise does affect contrast sensitivity (Pelli 1981; Legge, Burgess and Kersten 1987; Lu and Dosher 1999), it does not unambiguously do so by contributing to the internal noise limiting sensitivity. Noise is also a type of contrast, a variation in luminance across space, and is therefore detected by the same types of mechanisms that detect more elemental targets such as sinewave gratings. So, noise masking effects are also amenable to a pattern masking model such as that described earlier in this chapter (Foley 1994). This author is not aware of any attempts to disambiguate the two potential sources of masking – gain control or an increase in internal noise – though in principle it should be possible to tell the difference. Most studies using noise as a masking stimulus do so coming from the perspective that it is the variance of the stimulus that causes masking (e.g. Lu and Dosher 2005; Pelli and Farrell 1999), and likewise, most studies using gratings to mask other gratings (e.g. Meese and Holmes 2004; Petrov, Carandini, and McKee) presume that the masking they measure is

56

produced by ‘gain control’ effects. There have been no attempts to determine the extent to which both effects might affect sensitivity in a single experimental context. Based on the following discussion, Chapter 5 (Analysis) presents some evidence that both phenomena contribute to masking of targets by 1/f noise patterns, under certain conditions. The problem is not as simple as combining external and internal noises directly, since external noise must first be passed through whatever transducer function is used before it can affect the later decision stage. This can be readily expressed in terms of the sensitivity function, but it will help to introduce some new terms. First, a choice must be made regarding our assumption of what happens with responses when part of the noise distribution is negative. Negative contrast is possible, simply as an inversion of positive contrast, while negative response is impossible (recall the discussion of this point a few pages ago in Using Spatial Filters in Simulations). There are two options. The input distribution, noise added to contrast level, can be rectified, with negative values made positive, ‘reflected’ around the zero contrast point. This could be explained if a transducer’s linear contrast-dependent inputs c are the combination of rectified different-phase responses (as with the quadrature filter mentioned previously in Spatial Models). Alternatively, the input distribution can be half-rectified, with negative values set to zero. This would be the result of assuming that transduction was phase-sensitive, with the relevant transducer receiving inputs from a single-phase linear input. The difference in predicted TvC functions is significant; for now, we’ll proceed with the full-rectified option, and return to the half-rectified case later. Let’s say that post-transducer sensitivity d’(|c|) is the ratio of the mean transducer response m’(|c|) and the standard deviation v’(|c|) of the transducer’s intrinsic noise. An external noise distribution n, such as a mask stimulus, is passed through the transducer function d’(|c|),

57

and transformed into the sensitivity term d’(|c+n|). d’(|c+n|) and v’(|c|) will sum algebraically if converted to variances (squared), so that the final output noise b’(c) of the transducer will be

b ′( c ) = v ′( c ) + d ′( c + n ) 2

2

(2.15)

A noise-masked detection and discrimination experiment can therefore be described in this way (cf. eq.2):

d′ =

(

v ′ c ped

( + ∆c )

m′ c ped + ∆c 2

(

)

+ d ′ c ped + ∆c + n

)

2



(

v ′ c ped

(

m′ c ped

)

2

(

)

+ d ′ c ped + n

)

2

(2.16)

Since d’ is a nonlinear function of contrast, this approach makes calculation of d’ rather difficult. When external noise is ignored, d’ is at least a unitary function; this manipulation decomposes d’ into three functions of contrast, one of which is also a function of a noise distribution. A more straightforward approach which produces equivalent results is to combine the transducer function and noise distribution through convolution:

d ' (c, n ) = d ' ( c ) ∗ n

(2.17)

So that in a detection and discrimination experiment measuring a certain d’ level, the combination of nonlinear transducer and noise can be described in this way:

58

(

)

(

)

d ′ = d ′ c ped + ∆c ∗ n − d ′ c ped ∗ n (2.18)

The convolution can be computed quickly as multiplication in the Fourier domain, making this method most convenient. Adding noise to a S-F transducer in this fashion produces a result which is very similar with the aforementioned ‘gain control’ model of masking, where the S-F denominator is increased as a function of mask contrast (Foley 1994). Low contrast thresholds are elevated, with higher contrast discrimination thresholds converging with the baseline (in this case ‘no external noise’) TvC function. The primary difference with the Foley model is in the form of the low-contrast dipper, which is made shallower by external noise – the ‘sharp’ low contrast nonlinearity is essentially blurred by the noisy input distribution. This shallowness would be difficult to measure, since the shape of the noise-masked function is so similar to the ‘gain control’ masked function (with the higher z value) and any TvC experiment will use only a limited number of pedestal contrasts.

Figure 2.5 59

Figure 2.5 (p59) Comparison of the effects of introducing a normally distributed random contrast value to the fullrectified S-F transducer to the effect of increasing the z value of the transducer. The dashed line is the ‘baseline’ transducer, a function of pedestal contrast, with a z value of 0.02. The dotted line is the ‘gain control’ masked transducer, with a z value of 0.02. The solid line is the noisemasked transducer, with pedestal contrast normally distributed about the ordinate values with a standard deviation of 0.02, chosen to produce a similar amount of near-threshold masking.

A side effect of this shallow dipper is that there is relatively less suprathreshold crossover between the noise and no-noise functions than with the pure gain control model (Figure 2.5). With the Foley model, the crossover is significant because of the mechanism of masking (note the crossover at every mask contrast level in Figure 2.3): the part of the transducer with the steepest slope, near threshold, is being shifted to higher contrasts. Where the crossover is seen, it occurs because the local slope of the transducer function is now steeper than it was without the gain adjustment. With the noise model, the smaller crossover is the result of a different mechanism. At contrasts where the crossover occurs, the transducer function has become compressive, with the slope decreasing with increasing contrast. Part of the noise distribution will reach into higher contrasts with shallower d’ slopes, tending to increase thresholds, while the other part will reach into lower thresholds, where d’ slopes are not only steeper but where they transform into an expansive power function near threshold (the same effect that produces the ‘kink’ in the rising portion of the TvC function). So, the facilitatory crossover seen with the noise model is a result of the noise distribution occasionally producing low contrasts which are disproportionately more facilitatory than high contrast thresholds are elevated – all assuming a symmetric external noise source.

60

Figure 2.6

Comparison of effects by noise-masking of a half-rectified S-F transducer with gain-control masking. Dashed and dotted lines are the same as in Figure 2.6. Solid line is for the S-F transducer with normally distributed pedestal contrasts. A higher standard deviation was used to produce equivalent masking with the z = .04 function.

If instead of assuming a full-rectified input we assume a half-rectified input, very different results are obtained in the dipper region (Figure 2.6). At high contrasts, rectification becomes moot, and the TvC function becomes identical with that of the full-rectified case. The dipper is almost abolished, and even for significant noise magnitudes, facilitation will be obtained even in the absence of a pedestal. This occurs for the same reason as the crossover (which still occurs here) just explained. When there is no contrast pedestal, there are still noise values which function as such, thereby transferring facilitation from the dipper region of the TvC function to the point of absolute threshold. A result resembling this dipper-abolishment phenomenon was obtained by Henning et al (2007), though they interpreted it as the abolishment of cross-channel looking by a notched noise mask.

61

Summary

This chapter has reviewed most of the basic information which will be needed to evaluate data collected in contrast masking experiments using noisy stimuli. Nonlinear transducer theory suggests simply that all masking effects can be described by the parameters of the S-F function. Spatial models allow the stimuli used in an experiment to be involved directly in modeling of data, which can be particularly useful when stimuli are complex and difficult to quantify in all their properties. Uncertainty and noise-masking models suggest that TvC functions should change in characteristic ways under certain theoretical circumstances. All that is left is to introduce the specific stimuli and methods needed to collect data. This data can then be evaluated in the light of the models outlined here.

62

3. METHODS

Overview

Assumptions and Stimuli

These experiments were intended to show how sensitivity operates in the presence of a broadband amplitude spectrum, predicting that sensitivity is significantly altered by 1/f noise in a way that extends beyond noise masking. This involves obtaining masking functions and contrast discrimination functions, and analyzing these psychophysical data in terms of existing models of contrast sensitivity. To do this, two types of stimuli were used: narrowband 'target' stimuli, and a broadband 'mask' stimuli. Varying the contrast of the target band (called in this case the 'pedestal') and measuring discrimination thresholds (i.e. how much of a contrast difference is necessary to allow discrimination between two copies of the target) at each level provides us with a contrast discrimination or TvC function. We can make certain conceptually useful statements about what must be occurring within the visual system during such an experiment, by recalling what the underlying motivation is, in the first place, for using simple patterns as visual stimuli. First, assume that contrast detection and discrimination for a ‘channel-specific’ pattern 19 is mediated by a single visual mechanism, or by a set of similar mechanisms. This mechanism can be said to have internal properties which govern how it processes or responds to contrasts to

19

Recall the earlier discussion of channel theory: spatial vision in general is mediated by an array of contrastsensitive elements with narrow tuning in spatial and temporal frequency, orientation, direction of motion, and other properties. A ‘channel-specific’ stimulus would then be a stimulus designed so that it would be expected to stimulate only certain channels within the known array.

63

which it is sensitive, and these can be summarized as the within-mechanism effects of contrast for that mechanism. Second, assume that mechanisms can interact with one another, with these interactions broadly falling into the categories of either facilitation (increasing gain or sensitivity) or suppression (decreasing gain or sensitivity). The mechanism under study is thereby able to reflect, under certain conditions, cross-mechanism effects of contrast. The experiments described below can be seen as intending to describe the cross-mechanism effects which occur during viewing of 1/f spatial structure. To measure the within-mechanism properties of a particular spatial vision mechanism, a channel-specific stimulus is used in a contrast discrimination task. In such a task, two stimuli are presented to an observer, who is instructed to identify which of the two is of higher contrast. The contrast of the weaker stimulus is often referred to as the ‘pedestal’ contrast, with the to-beidentified increase in contrast referred to as the ‘target’ contrast. The same task can be carried out in the presence of an overlaid mask stimulus which the observer is instructed to ignore, in which case within-mechanism and cross-mechanism effects are simultaneously measured. Finally, if the 'pedestal' contrast is kept at zero and only the contrast of the mask is varied, then only the cross-mechanism effects are measured, since the mechanism itself is presumed ‘quiet’ except when it is contributing to detection of the target. In the experiments described in this chapter, the target and pedestal stimuli are patches of narrowband, oriented spatial noise, and the mask stimuli are broadband, isotropic 1/f spatial noise. Individual stimulus conditions consist of particular contrast values for the pedestal and mask, while a contrast threshold for the target is sought. Specific details relating to the construction and physical properties of these stimuli are given below in Stimulus Content.

64

Transducer Theory Review

The details of contrast transduction theory have been introduced (Chapter 2), but for readers who have skipped to this point the reasoning here may still be unclear. A simple example by illustration may be helpful. Refer to Figure 3.1. Three plots A, B, and C are shown, each with the same axes, the abscissa representing the contrast of a target stimulus and the ordinate representing the perceptual response of the observer to the target. On each plot three lines are drawn, representing three hypothetical relationships, essentially linear transducers, between the observer’s perceptual response and the contrast of the target. Plot A illustrates the case where no pedestal stimulus is present, a ‘baseline’ condition, so that when the target contrast is at zero there is no corresponding perceptual response. If we define an increase in perceptual response of ‘1’ as underlying the ability of the observer to consistently ‘see’ the target, then the three lines in Plot A each has a different corresponding contrast threshold; for example, if the observer’s perceptual response follows the course of the solid line, his ‘absolute’ contrast threshold will be 0.02. Shallower and steeper dashed lines illustrate respectively transducers with higher and lower thresholds. Plot B illustrates the case where a pedestal stimulus, a pattern identical with the target, is present which evokes a perceptual response of ‘2’ (i.e. when the target contrast is zero, the perceptual response is still 2.0). If the same observer’s perceptual response follows the course of the solid line in this case, his perceptual response must be increased to ‘3’ for him to see the increase in contrast, and his increment threshold will be 0.05. So, from measuring these two thresholds, we can say that the presence of a pedestal results in a shallower transducer function: that the within-mechanism effect is compressive, or suppressive.

65

Plot C illustrates the case where another pattern, different from the target, is present in the stimulus display. This would be the ‘mask’ pattern. Here, when target contrast is zero, perceptual response is also zero, as with the baseline condition. However, the transducer functions are now shallower than before, resulting in higher thresholds, a threshold of 0.04 for the solid line. The same is true of the functions obtained from ‘masked’ pedestals (D), which are now shallower than in (B), so that the increment threshold for an observer using the solid line will be 0.06. From this, it would appear that the effect of a mask on transduction is suppressive whether or not there is a pedestal. Of course, only a part of this example is directly available to experimenters: the contrast and increment thresholds. From these thresholds, responses can be inferred, through a process going back to Fechner and his system of limen. Perceptual response, or whatever name it goes by (apparent or perceived contrast, salience, d’), is an aspect of the subjective nature of perception, and cannot be directly measured, at least until we gain a precise understanding of exactly what components of neural activity are equivalent to aspects of subjective consciousness. Ultimately this is what perceptual science strives to measure: the structure and qualities of perceptual awareness. The experiments described below are designed to measure absolute and increment thresholds so that observers’ perceptual response functions can be estimated.

66

Figure 3.1 Simplified contrast response functions. Abscissa is target contrast, ordinate is overall perceptual response. In A and C, a pedestal contrast of zero is presumed, so perceptual response begins at zero. In B and D, a pedestal provoking a perceptual response of 2.0 is presumed. See text for further details.

67

Methods and Equipment

Methods for Threshold Measurement

Most discussion in this document thus far of the ‘threshold’ has used the term rather generally. Subsequent discussions should make clear that there is no perceptual ‘hard threshold’. The transition from ‘not seeing’ to ‘seeing’ a stimulus is measurably continuous, as illustrated by the sigmoid psychometric function. The dynamic portion of the psychometric function is probably the most correct object of the term ‘threshold’, but it must be quantified by multiple values, some of which are difficult in practice to measure. It is more common in psychophysics to define the threshold as a specific point on the psychometric function, and to assume that the rest of the function takes on a relatively standard form, and at the very least that it monotonically increases with stimulus intensity. One function commonly used as a reliable stand-in for the psychometric function of contrast is the Weibull function (Quick 1974):

(

Ψ = 1 − exp − (c t )

k

)

(3.1)

Here, c is stimulus contrast, t is the position of Ψ on c, and k is the steepness of Ψ. t can, in the example of the Weibull function, be taken as a general estimate of the threshold, with k suggesting the variance of the estimate. There are several options when it comes to procedures for measuring contrast thresholds. All of these (as far as I am aware) assume that the threshold is stationary during the course of an experiment; that is, that the object of measurement does not change over the course of several

68

trials. Most methods consist of sequential trials in which the subject is instructed to indicate whether she saw a specified stimulus (e.g. with yes-no or 2AFC methods), how intense was the stimulus she saw, or how certain she was that a stimulus was seen (e.g. Levi, Klein, and Chen 2005; Olzak and Thomas 2003). The method of adjustment, where the subject manually adjusts stimulus contrast until it is at a level subjectively defined as ‘near threshold’ was once widely used (e.g. Campbell, Kulikowski, and Levinson 1967), but its convenience is no longer considered to be justifiable, given its intrinsically subjective nature. The method of constant stimuli, where several selected near-threshold stimuli are repeatedly shown to a subject, with her responses tallied and transformed into hit-rates, is the most accurate means of measuring a threshold, but is rarely used because it can be so time consuming. Instead, adaptive algorithms use a subject’s prior responses to concentrate as many future trials as possible at a predetermined proportion-correct value, which will correspond to a desired threshold contrast. As a further step, the responses thus collected can be fit to a predicted form of the psychometric function, which in the case of contrast detection and discrimination is usually taken to be the Weibull function described above. The tack taken in this study is just this: to use an adaptive algorithm to obtain many trials for a given condition, and to fit the subject’s responses to the Weibull function. Further details, and a description of the intended behavioral contents of each stimulus trial, are given below in Threshold Measurement.

Equipment and Setting

The entire study was carried out using a single system of computer and video equipment. A monochrome P104 white phosphor monitor was used to display stimuli, set at a resolution of

69

800 by 600 pixels (35.2 by 26.4 cm) refreshed at 200Hz. Stimuli were delivered to the monitor through an attenuator network (Pelli and Zhang 1991) which combines the three color outputs of the video card to produce a finely graded voltage output, allowing for greater grayscale resolution. Monitor output was measured using the attenuator network, and a lookup table for voltage-to-luminance values was constructed, to be used during stimulus display. By this method, pixel luminance was ensured to follow a linear function of the numerical values specified in a given stimulus matrix. The video hardware handling the lookup table was an nVidia 8600 series video card, operated primarily through Psychophysics Toolbox (Brainard 1998; Pelli 1998) Matlab extensions. Stimuli were all generated as 512 by 512 pixel (22.5 by 22.5 cm) numerical matrices using Matlab 7. These were windowed with a circular-symmetric Gaussian with a width-at-halfheight of 256 pixels, and displayed in the manner described above. Subjects viewed the stimuli at a distance of 258cm, through a 27 cm circular aperture affixed to and obscuring the monitor bezel. At this distance, the aperture had a diameter of 5.9 degrees of visual angle, and the Gaussian-windowed stimuli had a diameter at half-height of 2.5 degrees. Experiments were performed in a darkened room, where the only light was from the stimulus display, whose output was maintained during blocks of trials at a mean luminance of about 30 cd/m2.

70

Threshold Measurement

Obtaining Threshold Estimates

To measure contrast thresholds, a two-alternative forced choice (2AFC) procedure was used in this study. In this procedure, two stimulus intervals 20 are presented in sequence in the center of the display aperture, each with duration of 200ms, separated by a 250ms inter-stimulus interval (Figure 3.2). The subject’s task is to judge which of the two stimulus intervals contains a specified target which he has been instructed to search for, and to identify that interval by pressing the appropriate button on the computer keyboard (in this case ‘1’ or ‘2’) 21. On each trial, the interval containing the target is randomly selected. On every trial, one of the two intervals contains the target, and the target never appears in both intervals. In each block of 50 trials, stimulus conditions (i.e. pedestal and mask contrasts) remained constant, while the contrast of the target stimulus was varied according to the subject’s responses. A correct response would be followed by a decrease in the contrast of the target stimulus on the next trial; likewise, an incorrect response would be followed by an increase. The contrast level on the next trial was selected through use of the QUEST algorithm (Watson and Pelli 1981), which presumes a general form of the psychometric function in order to find a target contrast level corresponding to a specified sensitivity level. In these experiments, a proportion correct level of 81.6% was

20

In vision science ‘stimulus interval’ is a standard way of referring to a period of time and area of space in which a stimulus might be presented. 21 Two other buttons were used by observers during the course of the experiment. First, observers were instructed that at the beginning of a series of trials they should be certain of exactly what target they were looking for. If they made a mistake at the very beginning of the series, possibly due to lack of preparation or confusion over what the target was, they were to press ‘7’ to restart the series at the beginning. Second, if observers had a lapse of attention on a particular trial and needed to see it again, they could press ‘4’ to repeat the trial. Observers were specifically instructed not to abuse this option in order to get correct answers.

71

specified. The QUEST distribution was periodically refreshed over the course of the experiment, to lessen the effects of a given observer’s lapse rate (Wichmann and Hill 2001).

Figure 3.2 Schematic of trial design. Fixation point was followed by stimulus interval 1, followed by fixation, followed by stimulus interval 2, followed by fixation. Target stimulus appeared randomly in either interval 1 or 2.

A block of 50 trials took on average about 3 minutes. Each response made by the subject was saved in a data file in the form of a pair of numbers denoting target contrast and ‘correct’ or ‘incorrect’ response. Over the course of many experimental sessions, a given stimulus condition would be repeated several times (usually 3 times), and the full set of subject responses would then be fit by a maximum likelihood method to a specific form of the Weibull function described above (Wichmann and Hill 2001). In this form of the Weibull function, t is the contrast value corresponding to a correct-response rate of 81.6%, which given the QUEST settings is where the majority of experimental trials should be concentrated: 72

(

(

Ψ = γ + (1 − γ − λ ) 1 − exp − (c t )

k

))

(3.2)

Here γ is the guess rate, which in a 2AFC paradigm is presumed to be 50% (since there are 2 intervals per trial, if an observer merely guesses on each trial he should have a 50% chance of giving a correct or incorrect response), and λ is the lapse rate, the rate at which subjects make mistakes regardless of stimulus contrast. λ is normally a small value, and in these experiments was on average on the order of 1%. (The QUEST algorithm is not effective at specifying k, and so k was set at a fixed value of 2.5 for all conditions.) To fit this function, the set of stimulus contrasts c and their associated 2AFC responses y were compiled for each subject and each stimulus condition. Correct-incorrect values y(c) were set equal to 1 or 0 respectively, and the following maximum likelihood parameter was maximized using a downhill simplex algorithm (Press, Teukolsky, Vetterling, and Flannery 1992), with θ representing the set of psychometric function parameters λ, t, and k:

l ( y (c ), θ ) = log( y (c ) + ε ) + y ⋅ log(Ψ (c, θ )) + (1 − y (c )) ⋅ log(1 − Ψ (c, θ ))

(3.3)

To estimate the quality of the threshold estimates thus derived, a bootstrapping procedure was used to construct confidence intervals for the function parameter t. With the best-fitting psychometric function given the data for a given condition, new ‘data’ were generated randomly and the maximum likelihood procedure repeated 200 times. With this set of simulated data and associated parameter fits, the 68% confidence intervals for the threshold estimate could be created by taking the standard deviation of the threshold parameters t. This procedure essentially 73

tests how well the set of contrasts and responses constrain the identification of the threshold; large confidence intervals would indicate that the QUEST did not do a good job of finding the threshold on a given set of trials. Some threshold fitting examples are given at the beginning of Results (Chapter 4).

Distributed Threshold Measurement

This maximum likelihood procedure can be applied to the data set more broadly than in the case described in the previous section. Rather than fitting trials from individual conditions to a Weibull function (i.e. estimating individual thresholds), the set of trials across all pedestal contrasts can be fit to the presumed d’ function underlying detection and discrimination (Chapter 2 outlines the theory underlying this idea). In other words, the same maximum likelihood parameter described in Eq.3.3 could be optimized for a TvC function which best fits all trials over many conditions. This procedure is somewhat more involved, but there is really only one missing step; since the general form of the d’(c) function is already known (the S-F function of Chapter 2), psychometric functions Ψ(d’) corresponding to specific segments of that function must be estimated. There is no analytic expression for the form of the psychometric function relating to individual sections of the S-F function; the Weibull function is a very close fit to psychometric functions at detection, and basically similar to psychometric functions for discrimination (Tyler and Chen 2000; Tyler 1997), but it is not an exact descriptor, and cannot be functionally converted to equate to a local S-F function (or, to do so would be complicated and ad-hoc). It is straightforward enough simply to calculate the Ψ(d’) values by brute force, as

74

proportion correct for a given 2AFC d’ value will be equal to the following integral (Green and Swets 1966):

Ψ (d ′) =

1 2π



d′

2 −∞

e

1 − x2 2

3.4

In words, this means that Ψ(d’) can be found by consulting a table of Z-scores, with Z = d’/√2 (Creelman and MacMillan 1991). A TvC function can then be fit to trial data from a number of different pedestal contrasts, in the following way. For each pedestal contrast condition, the d’ value for the target contrasts on each individual trial must be calculated. In a 2AFC design, when the target contrast is zero, whatever the pedestal contrast, sensitivity to the target should also be zero, so that the d’ value for the pedestal contrast should be subtracted from the total. Here, let d’SF represent sensitivity values calculated through the S-F function from Chapter 2. d’target will then represent sensitivity to each target contrast ctarget within a given pedestal contrast cped condition:

′ (c target + c ped ) = d SF ′ (c target + c ped ) − d SF ′ (c ped ) d target

3.5

These d’ values can then be plugged into Eq.3.4 to obtain proportion correct values to be used in the maximum likelihood parameter of Eq.3.3. This procedure is successfully applied to data in Chapter 5 Analysis. There, it can be seen that some extra information is gleaned from using all experimental information to fit TvC functions.

75

Stimulus Content

Stimulus Domain and Target Stimuli

A common spatial frequency domain can be described which encompasses the full range of content included in all the target and mask stimuli used in these experiments, and can conveniently be treated as a rectangular matrix of points defined by spatial frequency and orientation dimensions (Figure 3.3). The domain is 180° across in the orientation dimension including orientations spaced 3° apart, and includes spatial frequencies ranging from 0.2 to 24 cpd, spaced .3 octaves (a factor of 1.23) apart. For the purposes of stimulus creation, this space is divided into 32 rectangular sub-bands, each 45° across. In each sub-band, the top and bottom frequency are .6 octaves apart (a factor of 1.52). In the diagram shown below, each block in the grid contains three spatial frequency components (at 15 orientations). Each target band contains two adjacent sub-bands of the same orientation, as described below. Each sub-band image is generated and stored separately from all the others; for a specific stimulus condition, the subbands are combined as necessary. The target stimuli were oriented, channel-specific bands of spatial noise. These stimuli have a simple appearance, with an obvious orientation and spatial scale/frequency, somewhat resembling ‘scrambled’ sinewave gratings. The orientation of a target stimulus was specified in degrees in polar coordinates, where 0° is horizontal, 90° is vertical, 45° is a right-tilted oblique, and 135° is a left-tilted oblique. The spatial frequency bandwidth of the target stimuli was set at 1.5 octaves (as described above, there are six frequencies per target band, spaced .3 octaves apart), meaning that the highest frequency in a target band is 21.5 times higher than the lowest

76

frequency. This was intended to allow the targets to ‘cover’ most of the bandwidth of a typical psychophysical contrast mechanism (refer back to Channel Theory: psychophysical and physiological spatial frequency bandwidths range about 1.5 octaves).

a

b

Figure 3.3 Schematics representing the content of the stimuli used in these experiments. a) the pregenerated ‘sub-bands’ are represented by letters A through H. Each sub-band was the sum of three spatial frequencies at 15 orientations. ‘Bands’, used as target and pedestal stimuli, each consisted of two adjacent sub-bands, with center frequencies from .34 to 14.2 cpd. b) the overlapping structure of all the band stimuli is illustrated. Each band contained six spatial frequencies, three of which were also included in higher and/or lower bands.

The target stimulus bands were each composed of six sinusoidal spatial frequency components, and each band was centered at one of seven different spatial frequencies. Consequent bands overlap with one another, as shown in the illustration (Figure 3.3b), with the columns of gray disks representing the spatial frequency content of each target stimulus band. This arrangement allows for use of the entire 7 octave spatial frequency range for a 512 pixel image (from 1 to 128 cpp). Bands 1 and 2, centered at .34 and .63 cpd respectively, can be termed ‘low-frequency’ bands, being below (‘lower than’) the CSF (contrast sensitivity function) peak. Bands 3 through 5, centered at 1.2, 2.2, and 4.1 cpd respectively, can be termed ‘midfrequency’ or ‘peak frequency’ bands, since they all lie near where the basic CSF peak should 77

appear under these conditions. Bands 6 and 7, centered at 7.6 and 14.2 cpd respectively, are ‘high-frequency’ bands, well above the peak of the CSF. These 7 points made it possible to obtain a sampling of each of the basic features of the CSF. The target stimuli were made oriented by adding together gratings within a 45° band of orientations centered at the four values mentioned above (0°, 45°, 90°, and 135°). Each target band contained sine wave components at 15 orientations, spaced 3° apart, with 7 components clockwise and counter-clockwise of the center orientation. While components of the same orientation and different spatial frequency were added together with the same amplitude, components of different orientation and same spatial frequency (i.e. within one of the orientation arcs in Figure 3.3a) were weighted with a triangle function. The central orientation was given an amplitude of 1, with amplitude declining linearly with distance from the center orientation to 1/8 of peak amplitude for the outermost components 22. This peaked weighting function was applied for several reasons. First, it intensifies the oriented appearance of the target stimuli. Second, it resembles the orientation filter applied to broadband stimuli in earlier studies of broadband sensitivity orientation anisotropies (e.g. Essock et al 2003; Hansen and Essock 2006). Finally, it eliminates an illusion which has been shown to exist with sharp-edged orientation filters (Essock, Hansen, and Haun 2007), where the edges of the filter can be distractingly visible in the spatial domain.

22

8−

With amplitude A following this function of component orientation x for center orientation y: A =

x− y 3 8

78

Figure 3.4 79

Figure 3.4 (p79) Samples of target stimuli. From top to bottom, low to high spatial frequency samples are shown. From left to right, vertical, 45°, horizontal, and 135° oriented samples.

Pedestal Stimuli

Pedestal stimuli were identical with the target stimuli. During a block of trials, the pedestal contrast would be fixed and the pedestal image presented in both intervals of a trial. Targets were added directly to pedestals with matched phase structure so that the target stimuli were simply increments in pedestal contrast. Pedestal contrast values were relative to absolute thresholds for respective targets, as described below in TvC Paradigms.

Mask Stimuli

Mask stimuli were drawn from the same domain as the target stimuli. They were made up of the same sets of oriented gratings at the same spatial frequencies. There were two crucial differences. The first is the most relevant to this study: the mask stimuli were broadband and mostly isotropic, in that they contained all available spatial frequencies and orientations, except for those present in the relevant target stimulus. In a sense, this made the masks the spectral inverse of the targets, technically a form of notch noise. The second important difference was that no orientation weighting function was used for the content included in the mask stimuli. It was neither necessary nor desirable in the case of the masks, and none of the above concerns applied. It can be shown that a 1/f amplitude spectrum has an equal amount of component amplitude in all multiplicative frequency intervals (Brady and Field 1995). Since these mask

80

images were constructed in a log-frequency domain, and since component amplitude was not weighted in the frequency dimension, there is logically an equal amount of amplitude in every multiplicative interval. For example, two sub-bands centered at different spatial frequencies each contain three spatial frequencies spaced .3 octaves apart, so that their spatial structure and the resultant contrast will be equivalent. This multiplicative equality is often described as ‘scaling’, since bands with equivalent log-bandwidth will look like scaled versions of each other.

Figure 3.5 Left: 1/f noise generated by inverse FFT of a 1/f amplitude spectrum combined with random phase values. Right: 1/f noise generated through summation of sinewave gratings in logfrequency steps, as described in Methods.

These masks therefore consisted of isotropic, broadband 1/f spatial noise, and are indiscriminable from 1/f noise created in the Fourier domain (Figure 3.4). They were broadband and isotropic in that they contained all available spatial frequencies at all orientations, except for those in the target band. Refer to the illustration in Figure 3.3. The grid represents all of the content of a certain spatial image, with white space within the grid representing a non-zero component amplitude level, and gray space representing zero component amplitude. The image 81

represented at left has isotropic broadband structure, including all 7 previously defined spatial frequency bands, at three orientations, horizontal (0°), 45°, and 135°. The vertical (90°) band centered at 4.1 cpd is blank; this is the area of the target band. The ‘inverse’ is shown at right: this image contains only the triangle-weighted vertical 4.1 cpd target band, and no other structure is present. So, these two diagrams together can be taken to illustrate variables in the ‘vertical band 5’ stimulus condition.

Combining Target and Mask Stimuli

All experiments involved detection or discrimination of target stimuli as defined here. Masking conditions involved a combination of the broadband and target stimuli, through spatial summation. The procedure for this is simple: target and mask stimuli are stored as matrices with zero mean and normalized contrast. Desired contrast values are multiplied against the mask and target matrices, which are then added together. To this sum is added a value representing mean luminance. The final product can then be shipped to the video hardware. As can be seen in Figure 3.6, the targets in every sense comprised a part of the broadband stimulus. Target and mask shared the same display and visual space, and in the frequency domain their respective boundaries were immediately adjacent with one another. Each condition of these experiments involved choosing a specific spatial frequency and orientation band, whose relationship with the background pattern was then adjusted in various ways to meet particular experimental objectives. Figure 3.7 illustrates a set of several stimulus conditions using the pedestal and mask stimuli shown in Figure 3.6.

82

Figure 3.6 Top: Target/pedestal (left) and mask sample images (right). Bottom: frequency domain plots. Grey represents regions of zero contrast; brighter values represent regions of higher contrast. At left, the plot shows that there is orientation-ramped contrast at the horizontal frequency band centered at 4.1 cpd, which corresponds to a target/pedestal image. At right, the plot shows that the mask contains contrast at all orientations and spatial frequencies, except for the horizontal band centered at 4.1 cpd. These two images are therefore experimental counterparts; a condition which uses this type of target/pedestal image will necessarily use the corresponding mask.

83

Figure 3.7

84

Figure 3.7 (p84) Illustration of pedestal, mask, and combined pedestal + mask stimuli at a particular target condition (horizontal, 7.6 cpd). Left and right columns use a frequency-domain schematic to illustrate the spectral content of stimuli: x- and y-axes are spatial frequency and orientation values, and z-axis is component amplitude (contrast). Notice that the pedestal is peaked in the orientation dimension, and that the mask (right column) does not include content at the same values as the pedestal. The arrangement of the images mirrors the paradigm used to measure TvC functions, with an increasing pedestal contrast from top to bottom, and a fixed mask contrast in the mask condition (right columns).

Stimulus Generation, Normalization, and Contrast

Stimuli were generated in Matlab as numerical matrices, by summation of sets of random-phase sinewave gratings. Stimuli were stored in the form of the ‘sub-bands’ described above: one sub-band consisted of 15 orientations, centered at one of the main orientations and separated by 3°, at 3 spatial frequencies separated by .3 octaves as shown in the above illustration. Matched-phase pairs of flat- and triangle-weighted orientation sub-bands were generated and stored together. 10 random-phase copies of each sub-band were generated, and these could then be recombined into any desired target or mask image, as described in the previous section. This first step resulted in a set of matrices whose average (for the flat-orientation subbands) standard deviation was a value around 4.67, and whose individual means were close to zero. This held true for sub-bands C through H, but not for the lowest spatial frequency subbands A and B. For these bands, the low number of cycles per picture resulted in irregular sampling of the randomly generated spatial structure which was relatively homogeneous in the higher spatial frequency bands. In order to ensure that the all-important contrast variable had a consistent meaning and application throughout the experiments, some selection was performed on these lowest spatial frequency bands. Upon generation, each flat-orientation sub-band had its 85

mean and standard deviation measured. If either of these values deviated by more than 1% from the average mean (0.0) and standard deviation (4.67) of the higher-frequency bands, the generated band would be discarded and another would be created and tested. Through this process, all generated sub-bands were forced to have the same mean and standard deviation. With this done, 1000 random spatial groupings of all 32 sub-bands were generated, each group being summed together to produce an isotropic broadband noise pattern, with each one’s standard deviation recorded. All sub-bands were then divided by the average standard deviation of the sample broadband images (given the selection procedure described in the previous paragraph, there was very little variance about this estimate of the mean), which came out to about 26.4. This adjustment made it so that any random assembly of all 32 sub-bands would result in a broadband noise pattern with standard deviation of 1.0. The standard deviation of point luminances for a given image is normally called the image’s rms (root-mean-square) contrast, and that term will be used from here on out. The entire stored set of sub-bands could therefore be referred to as having a broadband normalized rms contrast, or rmsb, of 1.0. The normalized range of pixel luminances is from 0 (black) to 1.0 (white), so that an image with mean luminance and with pixel values within the display range should have a mean of 0.5 and an rms value a good deal less than 1.0. For broadband 1/f noise images, rms contrasts beyond .20 or so will be significantly distorted, with high and low luminances ‘clipped’ by the physical limits of luminance display 23. For a narrowband image, such as the target bands, an rmsb value of .20 will not produce distortion at all, having a lower rms contrast of its own (for the target bands, each image will have an rms contrast of around rmsb/4, around .05 if rmsb = .20), 23

Random noise created through inverse FFT of a random phase matrix, or by summation of sinewaves in the manner described here, will be made up of normally distributed luminance values. RMS contrast is the standard deviation of pixel luminances, so the amount of clipping (pixels with nominal values greater than 1.0 or less than 0.0) can be estimated by consulting a table of Z-scores. Twice the 1-tailed p value for a Z-score of .5/rms will correspond to the proportion of pixels exceeding the display limits.

86

and will have an apparent contrast of medium-to-high intensity. In order to adjust a target band or broadband mask to a useful contrast level, an image only need be generated and multiplied by the desired rmsb value. For example, if a broadband mask with rmsb contrast .10 is required, with a target pedestal with rmsb contrast .02, the appropriate sub-bands will be assembled for each stimulus component (mask and pedestal), each component will be multiplied by the appropriate contrast value, and the two components will be added together pixel for pixel. Since each sub-band image is itself a distribution of luminance values, and since the target and mask images are made up of 2 and 30 sub bands respectively, and since the ‘standard’ broadband isotropic images were made up of 32 sub bands, the actual rms contrast of either image can be estimated by the following (with rmsn denoting a sub-band image, rmst denoting a target image, rmsm denoting a mask image):

rmsb = 32 ⋅ rms n rms n = rmsb

2

32

rmst = rms n + rms n = rms n 2 = rmsb 2 2

rms m = rms n

2

2

30 = rmsb 30

32 = rmsb 4

32

TvC Paradigms

For the main experiment, three distinct masking conditions were tested for each included target condition: these three conditions are termed baseline, fixed-mask, and variable-mask. These conditions are summarized in Figure 3.8. Baseline threshold-versus-contrast (TvC) functions are obtained by measuring increment thresholds for a target stimulus, the pedestal, whose contrast is varied between different blocks of trials. This condition is referred to as the 87

baseline because the objective of these experiments is to measure the effect of non-target content on sensitivity to a channel-specific target. Masked TvC functions are obtained in the same fashion, except that on all blocks of trials another image, the mask stimulus, is added at a constant contrast. This is referred to as the fixed-mask condition because the contrast of the mask is fixed across all blocks of trials relative to the variable pedestal contrasts. Finally, a masking function is obtained when the contrast of the mask stimulus is varied between blocks of trials, with the target pedestal contrast set to zero. This is referred to as the variable-mask condition.

Figure 3.8 Schematic of general experiment design. Refer to text for details.

The material objective of this project was to measure TvC functions across the visible spatial spectrum, at as many as four orientations and seven spatial frequencies. Pedestal and mask contrasts therefore had to be placed in an economical fashion, in order to guarantee that 88

desired effects could be measured, and that models would be adequately constrained upon application. Recall that the ordinary TvC function is ‘dipper’ shaped, with increment thresholds against near-threshold-contrast pedestals significantly lower than the absolute threshold (i.e. facilitated), and increment thresholds against high-contrast pedestals significantly higher (i.e. masked). For a single TvC function, it is clear what features must be measured: the facilitatory dip at near-threshold contrasts, and the increasing threshold elevations at higher pedestal contrasts. So, some pedestal values should be placed across the value of the target threshold, and some should be placed at contrasts which are several times greater than the threshold. This means that detection thresholds (i.e. with zero pedestal contrast) should first be measured for each target and masking condition. For each subject a range of spatial frequency and orientation target conditions were determined to be appropriate, and the subject’s baseline sensitivity to each target was determined over several sessions. Target pedestal contrasts, used in both baseline and fixed-mask conditions, were set to fixed ratios of each subject’s detection threshold for a given target stimulus. For most observers, pedestal contrasts included contrast values relative to the detection threshold, with ratios of 0.25, 0.5, 1.0, 1.5, 2.5, 5.0, and 10.0 thresholds. The near-threshold values (ratios up to 1.5) were expected to produce facilitation, detailing the dipper portion of the typical TvC function, while the suprathreshold values (2.5 to 10) were expected to produce progressively greater masking and threshold elevation. For the fixed-mask conditions, although the detection threshold would be expected to be greatly higher than in the baseline case, the same pedestal contrasts were used as in the baseline condition. This allowed for direct comparison of measured increment thresholds between the baseline and fixed-mask conditions for particular pedestal levels.

89

A more difficult decision was in what contrast the masks should be set at in the fixedmask conditions. Too small a contrast might not produce significant masking at all, while too much contrast would result in excessive masking, making it impossible to measure a full dipper function in the fixed-mask conditions. Exploring potential rules for mask contrast value, masking functions were measured across all seven spatial frequencies included, at all four orientations, by the author of this study. Masking function measurement involves setting the pedestal contrast at zero, and measuring detection thresholds while the contrast of the mask is varied between blocks of trials. The typical finding in such an experiment is that there is little to no facilitation (some was detected in these experiments, a finding to be discussed later) at low mask contrasts, and that masking begins at some mask contrast and increases thereafter. It was determined here that once a broadband mask’s contrast reaches the detection threshold for a given target, masked target thresholds are elevated by at least 100% above baseline detection thresholds. So, it was decided that for the main experiment, mask contrast for each condition would be set at 100% of the detection threshold. This resulted in a clearly visible 1/f noise pattern at all target frequencies, though the pattern was noticeably faint for target frequencies near the peak of baseline sensitivity. Still, even at this spatial frequency, it will be seen that significant masking occurred. Masking conditions added after completion of the main experiment included more mask contrasts for particular target conditions. Finally, masking functions were measured for all observers in the main experiment. As described in the previous paragraph, a masking function is obtained by keeping pedestal contrast at zero, and increasing the contrast of the mask stimulus between blocks of trials. A fixed-mask paradigm affords information about how the mask interacts with contrast transduction of a target stimulus, but it only does so at a single mask contrast level. Measurement of a masking function

90

affords extra information, helping to calibrate the effects of changing the contrast of the mask. As it turns out, masking functions are rather robust and easily predictable given the measurement of a masked dipper function. So, in subsequent conditions included after completion of the main experiments, masking functions were not directly measured. In the main study, mask contrast levels were determined in the same manner as the pedestal contrasts, being assigned the same contrast values relative to the target’s detection threshold.

Subjects and Procedure

The author of this study participated in all the experiments described, first completing the main experiments at all four orientations and all seven spatial frequencies, and then completing a subsequent experiment focused mainly on masking at a single orientation (vertical) and two spatial frequencies. Three other subjects also participated in the main experiment, one completing target conditions at vertical for all seven spatial frequencies, one completing target conditions at all four orientations and at the three highest spatial frequency bands, and one completing target conditions at all four orientations at a single spatial frequency. Two more subjects participated in the subsequent experiment, one of these completing a vertical mid-high frequency target condition against three fixed mask contrasts, completing the same conditions (vertical target, two spatial frequencies) as the author. All subjects were screened for visual defects. Applicants with less than 20/20 visual acuity and measurable astigmatism were turned away. Participants were first trained on a contrast detection task, detecting oriented sinewave gratings, for 2 hours (two ‘sessions’), before being transferred to stimuli similar to those used in the main study. After another session

91

training with these stimuli, baseline detection threshold measurement began. Once baseline detection thresholds were established, TvC and masking function measurement could begin, with mask contrast and pedestal levels based as described above on the target condition’s respective detection threshold. Trials were blocked in the following manner, with orientation blocks nested within pedestal/mask contrast blocks, which were nested within spatial frequency blocks. 50 trials would be completed for a given target-mask condition. If more than one orientation was a part of the subject’s assignment, four blocks of fifty trials, each block differing only in the orientation of the target, would be completed in random order. This could be said to constitute an ‘orientation block’. During baseline measurements, different spatial frequencies would be completed in consecutive orientation blocks, each orientation block having a different spatial frequency. Once measurement of baseline thresholds was complete, TvC measurement could commence. Measurement of TvC functions would consist of consecutive measurement of orientation blocks with pedestal levels of progressively decreasing contrast 24. Once a full set of pedestal contrasts was completed, subjects would proceed to their next spatial frequency assignment and repeat the process. Procedure for measuring masking functions was similar, except that consecutive blocks had progressively increasing contrast 25.

24

This rigid organization of pedestal contrasts was included to minimize the potential for subjects’ confusion as to the current target condition. The rationale was thus: since subjects in the first experiment would be tested with many target stimuli at different orientations or spatial frequencies, if they were first tested in a zero-pedestal condition, they may become confused as to the identity of the current target, despite the availability of a by-request ‘target cue’ (by pressing ‘3’ subjects could view a mid-contrast sample of the current target orientation and spatial frequency). The author of this study reasoned that if subjects completed high-contrast conditions first, by the time they arrived at low-contrast or sub-threshold pedestal conditions they would have some recent experience with the spatial forms of the target, and would therefore be less likely to mistakenly search for the wrong spatial frequency or orientation. 25 The reasoning here was similar to that described in the previous note. Subjects would first be allowed to search for the target in the absence of noise, and subsequent blocks of trials would use increasing mask contrasts, thereby again giving observers experience with ideal conditions before the task was made more difficult.

92

All subjects were compensated for their participation, in accordance with IRB guidelines, and informed consent was obtained before participation began. Subjects were free to take rest breaks whenever they wished, and most were able to complete between 8 and 12 blocks of 50 trials (20 to 30 minutes) before resting, after which they would normally complete another 8 to 12 blocks. At this rate, depending on the conditions to which they were assigned after screening, subjects required from 2 to 10 weeks, participating up to 5 hours per week, to complete the experiments.

93

4. RESULTS

Overview

Threshold versus contrast (TvC) functions were measured for several observers at various spatial frequencies and orientations. This was done either for targets against a mean-luminance background - the ‘baseline’ condition - or for the same targets against a fixed-contrast 1/f noise patterned background - the ‘masked’ condition. Within both of these conditions, the only change from block to block of trials was in the contrast of the target pedestal. Masking functions were also measured, this involving obtaining thresholds for the targets against a 1/f noise pattern whose contrast was varied from block to block. In measuring masking functions, there was no target pedestal at all. The critical manipulation was in the addition of 1/f noise to the targets during measurement of TvC functions and, less informatively the masking functions, this predicted to produce particular effects, defined in detail in Chapter 2. This chapter will describe the basic features of all three conditions, including patterns of sensitivity across the major target dimensions of orientation and spatial frequency. Threshold measurement samples are also provided, as an explicit demonstration of the methods used to obtain these data in all conditions. In some respects, the overall pattern of results was as expected: the band-pass CSF shape was seen across spatial frequency, oblique effects were seen at high spatial frequencies, dipper functions were measured in both baseline and masking conditions. In other respects, results were not as expected: no horizontal effects were measured in the masking conditions, and there was no

94

overt evidence of noise masking at any spatial frequency. More information on all these points is drawn out in Chapter 5. Perhaps the major finding here is that there is so little overall change in the shapes of the TvC functions between baseline and masking conditions, this implicating a gain control process at the root of masking in 1/f noise. However, viewing these data through the prism of the models presented in Chapter 2 shows that the matter of sensitivity in 1/f noise is more complex than originally supposed.

Thresholds

Before going over the data sets as whole TvC functions, there should be some discussion of how the individual thresholds were estimated. As described in Chapter 3, a cumulative Weibull distribution was fit to the set of responses made by a subject to the set of contrasts determined by the adaptive QUEST algorithm (Watson and Pelli 1983). An example is shown in Figure 4.1. The first part of the figure (A) shows the course of two blocks of trials (trial number

indicated by the horizontal axis), in a single stimulus condition. With correct responses in the 2AFC task, target contrast (vertical axis) decreased, and with incorrect responses target contrast increased. The 102 trials collected (51 per QUEST) were then fit to a Weibull function with two important parameters (detailed in Chapter 3, Threshold Measurement). The final threshold estimate and confidence intervals are shown by the solitary symbol to the right of the QUEST runs. The second part of the figure (B) shows how the threshold estimate and confidence intervals were derived. The circle to the right of the center of the figure represents the bestfitting values of t and k (from Equation 3.2) to the data set of 102 trials, the threshold and slope parameters of the Weibull Ψ function respectively. The scattered blue symbols are bootstrapped

95

parameters, showing the best fitting parameters for simulated data generated given the psychometric function fit to the data. Since the QUEST algorithm is not designed to specify the slope parameter of the psychometric function, the bootstrap data are spread out along the slope axis more than along the threshold axis, a typical pattern. 68% confidence intervals σt are simply measured as the spread of the bootstrap data along the horizontal axis. Finally, the third part of the figure (C) shows the trial data, coded as correct = 1 or incorrect = 0 responses, plotted along with the best-fitting Weibull psychometric function. The threshold value and its associated confidence interval are shown at the 81.6% correct level on the psychometric function. This procedure was undertaken for every set of trials collected for every stimulus condition for every subject. This example was chosen because it was typical: the threshold ‘error’ estimate of σt was 0.000831 in this case, which was 13% of the threshold estimate of 0.00636 given by t. This error-estimate ratio is typical for contrast detection and discrimination experiments, where measurement error typically falls between 10% and 25%, or 1 and 2 dB, and is measured either in the manner described here (e.g. Meese and Holmes 2007) or by taking the standard deviation of several independent threshold estimates (which are averaged together for a final threshold estimate: e.g. Foley 2007; Essock et al 2009). This ratio was also typical for the subjects in the experiments described below. Average error rates for each subject are shown in the table below (Table 4.1). All of these values fall within a normal range, which is encouraging regarding the consistency of the sensitivity measurement methods used here as compared with previous similar studies. There were individual threshold estimates with error rates higher than 20%, in a few cases significantly higher. In these cases, where possible, additional data was gathered to replace the data underlying the poor measurement. When new data could not be collected, or if addition of new data did not improve the picture, data could be left ‘as is’, with

96

the broader confidence intervals causing it to lend less relative weight to model fitting later on. Another feature of the measurement error, which will be discussed in the next section, is that error for detection thresholds was much lower (tending to about half of the overall average) than error for discrimination thresholds.

Figure 4.1 Representation of the methods used to obtain threshold estimates. A) shows the course of two blocks of trials (more than two blocks were run on most conditions for all observers). In a ‘good’ QUEST run, correct responses to a given target contrast (ordinate) are followed by a trial with lower contrast; incorrect responses by a trial with higher contrast – the QUEST functions as an adaptive staircase, choosing target contrasts according to Bayes rule. Responses were coded by correct-incorrect response (1 or 0) and target contrast, and a Weibull function was fit as described in the text. A bootstrapping procedure (200 iterations, represented in B by the blue ‘x’s) was used to obtain confidence intervals for the threshold estimate. C) shows the best-fitting psychometric function against the trial data. Threshold estimate (82% correct point on the function) is shown with the associated confidence interval. 97

The third parameter of Equation 3.2, perhaps relating to the ability of the subjects to concentrate on and correctly perform the psychophysical task, is the lapse rate λ. This value can be thought of as an estimate of the proportion of trials on which a subject responds randomly, regardless of stimulus contrast, perhaps due to lapses in attention or finger errors. Functionally, it is the difference between the estimated peak of the best-fitting psychometric function and the ideal peak performance of 100%. It is arguable as to whether or not this parameter should be fixed for a given subject across all instances of experimental participation, or only within particular psychometric function estimates. Here, it was decided to go with the latter option, allowing the lapse rate to vary with stimulus condition (i.e. every fitted psychometric function allowed an independent estimate of λ). Some portion of the lapse rate may not be stationary, and during at least some blocks of trials a subject is likely to be temporarily distracted or uncomfortable, leading to a few trial responses which are randomly determined; on the other hand some portion of the lapse rate might be constant, an idiosyncratic property of a given subject’s performance. Personal experience and informal observation of subjects suggests that both cases are likely, and moreover that the former case is the factor more likely to be measured in these far-from-ideal conditions. For all subjects, the lapse rate, averaged across all conditions for which a psychometric function was fit to data, was below 2% (Table 4.1). AH1

AL

STP

CL

AH2

JX

DC

lapse

1.0%

1.7%

1.8%

1.8%

0.9%

0.5%

0.9%

error

12.5%

14.5%

17.2%

17.8%

13.8%

12.7%

10.3%

(detection)

6.1%

6.6%

7.0%

8.4%

4.8%

7.7%

Table 4.1

98

Lapse rate values, averaged across all stimulus conditions participated in by a given subject, should be considered delicately. In most cases (i.e. for most stimulus conditions), λ was found to be equal to zero: for all observers the median lapse rate was zero. In effect, the lapse rate is only measurable if enough trials are included well above threshold, where the correct rate should be near 100%, for random errors to be detected. If most trials are concentrated around the 81.6% correct level, as was intended in these experiments, lapses will simply have the effect of elevating thresholds slightly, and if the few high-contrast (relative to threshold) trials are ‘hit’ by the subject, a lapse rate of zero will be estimated. Likewise, if out of 102 or 153 trials a subject experiences a ‘bad start’, missing a couple of early, high-contrast trials, and does not report this to the experimenter, their lapse rate for that set of trials may be rated as higher than is generally appropriate, e.g. a value as high as 10%. These values are presented only to suggest that the lapse rate was small whatever its value, perhaps as low as one in 100 trials. Incidentally, it should be noted that a slight positive correlation appears between lapse rate and measurement error (R2 = .382). Whether these are due to common functional (high lapse rates increase measurement error) or individual factors (observers with higher lapse rates have less stationary thresholds) between these values is unknown, and probably is a question better asked and addressed elsewhere.

99

Baseline TvC Functions

Dipper Functions

Looking at all the baseline data on one set of axes, both relative to the detection threshold for each target condition and subject, the strength of the dipper pattern is obvious (Figure 4.2). Thresholds are most facilitated when the pedestal contrast is equal to the detection threshold – when on the abscissa pedestal contrast is equal to 100% of threshold contrast, increment thresholds are facilitated in 100% of cases. When the pedestal contrast is subthreshold or as much as 50% greater than the absolute threshold, more than 90% of measured thresholds, across all subjects and conditions, are facilitated. Facilitation declines rapidly at greater pedestal contrasts, and disappears completely once pedestal contrast exceeds 10 thresholds. The distribution of thresholds seen here is a matter for further analysis, as it implies that the shape of the dipper function, while always resembling the average form represented here by the solid black line, is dependent on target condition and on subject parameters. Individual data are represented by the dotted lines and solid symbols, for three observers. One of these observers is the author of this document; his is the data connected by the red (lowest) dotted line. In Chapter 5, a simple hypothesis based on parametric analysis will be put forward to explain this discrepancy in observers; suffice it to say that differences in target conditions run by the different observers is not enough to explain the differences in these normalized data.

100

Figure 4.2 All baseline data collected for four observers, all of whom ran a different combination of target conditions. Each measured threshold and associated pedestal contrast were divided by the detection threshold for the respective target condition, and plotted here as the gray disks. Individual observers’ average data are plotted as the outlined symbols connected with dashed lines. The mean of the observers’ data is plotted as the solid black line and symbols. The mean of the four subjects is shown rather than the mean of all data points. The three naive observers’ mean thresholds (diamonds, triangles, circles) overlap, while the one experienced observer’s thresholds (squares) tend to be much lower at suprathreshold contrasts.

Orientation

Three subjects were included in stimulus conditions at more than one orientation. The author of this document (subject AH) completed all orientation conditions at all spatial frequencies, including bands 1 to 7 (.34, .63, 1.2, 2.2, 4.1, 7.6, and 14.2 cpd). Another subject (STP) completed orientation conditions at the three highest spatial frequency bands (5 to 7, 4.1,

101

7.6, and 14.2 cpd), and another subject (CL) only at band 6 (7.6 cpd). The oblique effect of orientation is normally seen only above the peak of the CSF (above 4 cpd or so), with its magnitude increasing with spatial frequency. The horizontal effect has been documented at lower spatial frequencies (at 1cpd in Kim, Haun, and Essock 2009) and perhaps at even lower spatial frequencies (at .63cpd in Haun and Essock 2009), but there is clear evidence that it is present at mid-high spatial frequencies (7.9cpd in Haun and Essock 2009; 8cpd in Essock et al 2009; an 8-16cpd band in Hansen and Essock 2006). Since the orientation conditions were included here in an effort to measure these orientation anisotropies, mid-high spatial frequency targets were used for subjects STP and CL, straddling the ~8cpd point of interest described in the previous studies just mentioned. When detection thresholds were tested for included stimulus conditions, oblique effects were found in the expected places. Subject AH showed an oblique effect increasing in magnitude at the 7.6 and 14.2 cpd bands, with a no comparable effects at lower spatial frequencies. STP as well showed an increasing oblique effect from 7.6 to 14.2 cpd. CL showed an oblique effect at the 7.6 cpd band. The pattern of orientation effects shown here is consistent with the expectations described above: oblique effects are obtained reliably at high frequencies, and not at or below the CSF peak. Data are illustrated below (Figure 4.3).

102

Figure 4.3 Oriented absolute threshold data. Each set of four oriented thresholds is vertically offset from the other sets, to make orientation effects clear; the ordinate scale is arbitrary but accurate within each set of four data points. Three observers were run on multiple orientations at 7.6 cpd, and each showed an oblique effect here. Two observers were run at a higher spatial frequency and both showed increased oblique effects. The same two observers were run at lower spatial frequencies where no orientation effects were expected, and none were found.

Next, pedestal values were set as predetermined ratios of the detection threshold for each condition (for subject CL, pedestal values were set as ratios of the average of his baseline thresholds). TvC functions for these pedestal contrasts formed clear dipper functions for each subject, in each stimulus condition (Figure 4.4). Orientation effects for the TvC functions are generally unclear, and vary across observers. Overall, suprathreshold discrimination thresholds do not appear to vary regularly with orientation, with the stark exception of subject STP at 7.6 cpd. On this condition, this subject had significantly higher thresholds for horizontal targets discriminated from suprathreshold backgrounds. Similar behavior, with discrimination thresholds separated so consistently across pedestal level by orientation, is not seen for the other two subjects.

103

Figure 4.4 Oriented TvC functions for the same three subjects as in Figure 4.3. Red and blue lines correspond to horizontal and vertical data respectively, while green and cyan correspond to 45° and 135° (right and left tilt) data respectively. The shape of the CSF can be seen in data for subjects AH and STP. The oblique effects seen in Figure 4.3 can be seen here, though they do not seem to propagate through to higher pedestal contrasts.

All three of these subjects were chosen for the orientation conditions because on baseline measurements with a 10 cpd sinewave grating, they showed clear, symmetric oblique effects. Subjects AH and STP completed their multi-frequency conditions first, and when it became apparent that there was more than the expected amount of between-orientation variance in the discrimination conditions, a third subject (CL) was chosen to run at a single frequency. His results were similar to the original subjects, in that the orientation effects were not as symmetric for the discrimination conditions as at the detection conditions. Still, it must be remembered that all of these sensitivity values, detection and discrimination, are presumed to reflect the sensitivity 104

of a single, internal, monotonic transducer function. The objective of these experiments was to seek differences in, and effects on, these transducer functions across target conditions and in the presence of 1/f noise. There is still much more information that can be drawn from these threshold values if they are viewed as sets, and from the trial data from which they are derived.

Spatial Frequency

Three subjects participated in multiple spatial frequency conditions. One of these (AH) completed all seven available spatial frequency condition at each of four orientations; another (AL) completed all seven frequency conditions for vertical targets only; and another (STP) completed the three high spatial frequency bands (5, 6, and 7) at each of four orientations. Data are shown in Figure 4.5. Data for two of these subjects, AH and STP, were described in the previous section. Three sets of contours are drawn for each observer, connecting data points across spatial frequency. The central contour connects absolute detection thresholds across spatial frequency, showing contrast sensitivity functions (CSFs) for each observer, and showing that baseline data followed the familiar bowed CSF function shape. For the two subjects who completed target conditions at all seven spatial frequencies, peak baseline sensitivity was found at the fourth band, at 2.2cpd, with sensitivity declining sharply towards the lowest and highest bands. Subject STP showed a similar decline in overall sensitivity towards the highest spatial frequencies. Two other contours are shown in Figure 4.5, showing cross-frequency sensitivity at the base of the facilitatory dipper (which in most cases was at the fourth point in the TvC function, where the pedestal contrast was equal to the detection threshold), and cross-frequency sensitivity at the end of the high-contrast arm of the dipper ‘handle’. The facilitated CSF is

105

largely parallel to the detection CSF, perhaps somewhat more peaked at the same frequencies, which would imply a greater exponent for the near-threshold transduction nonlinearity at midrange spatial frequencies. The high-contrast CSF is less peaked than the detection CSF, reflecting the well-known phenomenon of contrast constancy (Georgeson and Sullivan 1975; Brady and Field 1995), though it is not a dramatic effect. The latter phenomenon is to be expected, while the former may be a sign of something new, which will be addressed in later analyses. Dipper functions across frequency are largely similar, with the differences noted above in description of CSF functions at various pedestal levels. Dippers appear to be deeper at mid-tohigh frequencies, and shallowest at the lowest spatial frequency (.34cpd, only measured for two observers) and at the highest frequency (14.2cpd). The suprathreshold-pedestal dipper handles are largely parallel, at least within observers (Figure 4.6), and possibly between them. For all three observers (and for other observers, included in other conditions), a power function of pedestal contrast with an exponent between 0.7 and 0.9 effectively describes the suprathreshold portion of each TvC function, which is consistent with earlier findings (e.g. Legge 1981, Bird, Henning, and Wichmann 2002). In Figure 4.6, the diagonal grid lines follow a slope of 0.8, and seem to describe fairly well the general shape of the data.

106

Figure 4.5 TvC functions across spatial frequency. Data for subjects AH and STP are as presented in Figure 4.4. Contour lines have been drawn to illustrate the bowed near-threshold CSF, and the relatively flatter suprathreshold CSF. Colors as are for Figure 4.4.

107

Figure 4.6 Suprathreshold pedestal TvC regimes. Averaged across orientation in observers AH and STP, and at vertical for AL. Error bars are standard error averaged across orientation. Functions are parallel within and between observers. Diagonal grid lines trace a dB/dB slope of 0.8, as discussed in the text.

108

Masking Functions

For three subjects (AH, STP, and AL), masking functions were measured, primarily to determine a minimum mask contrast which would reliably produce threshold elevations, and which therefore could be used for the masked TvC conditions. Two conditions for a masking contrast were predetermined. First, the mask contrast should not result in too much threshold elevation, since greatly elevated dipper functions would be difficult to model due to a lack of suprathreshold data points (due to physical limitations of contrast display). Three important elements of the masked TvC function were important to measure: the rate of low-contrast facilitation (the dipper-depth), the amount of crossover between the masked and baseline TvC functions (indicative of gain-control), and the rate of high-contrast masking. Too much masking might result in a masked TvC function with unmeasurable high-contrast increment thresholds. Second, the mask contrast should be the same for all frequencies relative to their detection thresholds. This condition was set in order that potential stimulation of the target-detecting mechanisms by the mask (which, ideally, was constructed so that it would minimally overlap with target-detecting mechanisms) would be physically equalized. This way, frequencydependent detector bandwidth would be left as the only remaining factor which might produce a crossover between ‘suppression’ by a mask and ‘excitation’ by a target. The use of a broadband contrast mask for a narrowband target presents numerous methodological problems; it must not be assumed that the primary source of masking is necessarily proximal in the frequency domain, particularly for spatial frequencies far from the peak of the CSF: even if the mask is set to near the detection threshold for these frequencies, faraway frequencies might have a disproportionately large effect on sensitivity. Increasing mask

109

contrast far above the detection threshold might worsen the problem, magnifying the disproportionate contribution, or might hopelessly combine contributions from proximal and distant frequencies. By aiming for the lowest effective mask contrast, it was hoped that the complexity of the masking effects could be minimized, with the ‘most effective’ part of the broadband stimulus doing most of the masking.

Figure 4.7 Masking functions, measured by varying broadband mask contrast (abscissa) while keeping pedestal contrast fixed at zero. Data have been normalized by detection threshold as in Figure 4.3. At top, data are averaged across orientation for observers AH and AL, showing how at lower spatial frequencies masking functions are relatively elevated. At bottom, data are averaged across the three highest spatial frequencies for observers AH and STP, showing how the oblique effect persists even to high mask contrasts. Based on previous findings, a horizontal effect would have been expected here. 110

For each subject and for each target spatial frequency, five mask contrasts were chosen as values relative to the average detection threshold for that subject at that frequency. Mask contrasts (see Methods) of 0.25, 0.50, 1.0, 2.0, and 4.0 re threshold were used. It can be seen from the plots above, which average masked thresholds across orientation for subjects AH and STP, that a mask contrast of 1.0 re threshold is sufficient to raise target thresholds by at least 30% at all spatial frequencies and for all subjects. For subjects AH and AL, it can be seen that lower spatial frequencies are relatively more masked, reflecting the 1/f spatial frequency CSF described elsewhere (Haun and Essock 2009; Schofield and Georgeson 2003). On conditions where an oblique effect was measured at baseline (bands 6 and 7 for subjects AH and STP), the effect was more or less retained even at high mask contrasts, which was unexpected; given the results of earlier experiments using 1/f noise as a background stimulus, a horizontal effect would have been expected at these frequencies, particularly given the results of Haun and Essock’s 2009 study. The reason for the absence of the effect in these conditions may be related to differences between the stimuli and paradigm used here and that used in the prior studies. This matter will be discussed further in subsequent chapters. As with the suprathreshold portion of the baseline dipper functions, the slope of the masking functions is a relatively regular feature, both within and between subjects. It is plain to see that within each subject’s data set, the masking functions are parallel on log-log axes. These can be described as power function exponents, similarly as with the suprathreshold TvC functions. The slopes are, on average, similar in magnitude with the TvC slopes, though they hint at a negative trend with spatial frequency (Figure 4.8). The slope of the masking function can be interpreted as the reciprocal increase in masking with increase in mask contrast; the negative relationship between spatial frequency and slope therefore is reminiscent of previous

111

findings indicating that contrast masking is relatively more intense towards lower spatial frequencies (Haun and Essock 2009; Meese and Hess 2004). Between subjects, there was not much difference in the masking function slopes at particular spatial frequencies. Likewise, as a function of orientation for subjects AH and STP, masking function slopes did not appear to vary in any consistent or meaningful way.

Figure 4.8 Slope values for threshold elevation by (left) pedestal masking or by (right) broadband masking. Slopes were calculated by linear regression of the four highest contrast data points.

The masking functions demonstrate that when the broadband mask contrast is equivalent to the threshold target contrast, threshold elevation will reliably occur. So, for the first three subjects in the masked TvC condition, mask contrast was set at 100% of the average threshold for a given spatial frequency. This, by definition, would result in the visibility of content similar to the target and, if the target frequency was different from the CSF peak frequency, the relatively clearer visibility of content at other spatial frequencies. It was hoped that this compromise would allow measurement of complete masked TvC functions, and avoid the

112

potential for luminance range distortion at higher pedestal contrasts. As can be seen in the next section, this procedure brought about the intended result: measurably elevated TvC functions.

Masked TvC Functions

Masked TvC functions were measured by repeating the variable-pedestal procedures of the baseline condition, with an additional fixed-contrast broadband mask present during all trials. For a given target condition, the mask included all available spatial frequencies and orientations except those included in the target stimulus. For the first three subjects (AH, AL, and STP) the mask contrast was set at 100% of the average spatial frequency detection threshold; for the fourth subject (CL), the mask contrast was set at a higher level (0.08) for the single spatial frequency included (band 6, at 7.6cpd), which in his case amounted to a mask equivalent to 180% of his average detection threshold at that frequency. Subject AH and two others (DC and JX) participated in a further condition which allowed the mask contrast to be varied across several levels while TvC functions were measured; these results are described in context, in the next chapter (Analyses). Masked TvC functions are shown here (Figure 4.9) for data collected from the four subjects included in multiple target conditions (AH, AL, STP, and CL), whose baseline data was described in an earlier section. If viewed without direct comparison with the baseline TvC data, the masked functions appear to follow a similar pattern, with deepening dipper functions toward mid-high spatial frequencies. As with the masking functions measured in the prior section, there do not appear to be horizontal effects of contrast sensitivity in the three subjects included in multiple orientation conditions.

113

Figure 4.9 Masked TvC data for four observers. Colors are as for Figure 4.4.

If we compare the typical (average) confidence interval to threshold ratio between baseline and masking conditions, there is not any overall difference (Figure 4.10). There were idiosyncratic changes within subjects, but on average the error ratio for thresholds in the masked TvC conditions was 14.4%, which is not significantly different from the average ratio of 14.1% for the baseline condition (F(1,12) = .026, p = .875).

114

AH1

AL

STP

CL

AH2

JX

DC

error (baseline TvC)

12.5%

14.5%

17.2%

17.8%

13.8%

12.7%

10.3%

error (masked TvC)

21.1%

11.7%

18.8%

14.1%

9.7%

14.1%

11.4%

Table 4.2

So, judging at least from the accuracy of the threshold-seeking algorithm, the presence of a spatial noise mask did not appear to add to the ‘noisiness’ of contrast thresholds. This may be an illusory absence of effect, however: if the noises introduced into the method by the trial-to-trial unpredictability of target contrast and the subject’s own moment-to-moment variations in visual ability are large enough, they might simply ‘mask’ the effects of external noise on outright measures of performance variance. This measure, of the ‘goodness of measure’ for contrast thresholds, does not directly address the potential effects of noise on contrast sensitivity. If we compare increment thresholds for corresponding pedestal contrasts within given spatial frequencies (averaging across orientation in Figure 4.10, for clarity), the effect of 1/f noise on contrast sensitivity is clear: TvC dipper functions are shifted up and to the right along log contrast axes. Here, baseline functions are represented by the gray lines, and masked functions by the black lines. In most cases, the average suprathreshold discrimination threshold (the rising portion of the TvC function) overlaps for the masked and baseline conditions. At lower contrasts, the noise masks produce threshold elevation. Although elevated, it appears as if the facilitatory decrease in threshold is parallel in the baseline and masked conditions: the ‘dip’ is neither severely attenuated nor magnified by the noise mask. Figure 4.11 shows the difference between the (orientation-averaged) masked and baseline TvC functions for each subject, at each test frequency. Here, it can be seen that there is relatively little suprathreshold facilitation, at

115

least of the form predicted by the gain control theory. Figure 4.12 shows that some facilitation occurs for some subjects at the higher spatial frequencies.

Figure 4.10 Baseline (gray lines and round symbols) and noise-masked (black lines and triangle symbols) TvC functions for each observer, averaged across orientation.

116

Figure 4.11 The effect of broadband masking on TvC functions, averaged across spatial frequency. Error bars are the mean, across conditions noted, of the combination of decibel confidence intervals between baseline and masking conditions: ci = √(cibaseline2 + cimasked2).

117

Figure 4.12 The effect of the broadband masking stimulus on TvC functions, across observers and target conditions. Abscissa represents the mean pedestal relative to threshold value in decibels, with 0 indicating that the pedestal was equal to the detection threshold. Ordinate represents the difference in masked and baseline thresholds in decibels, with positive values indicating masking, negative values indicating facilitation, and 0 indicating no effect. Data for subjects AH and AL are represented again on the bottom row, across only the highest three spatial frequencies.

So, the effect of 1/f noise on contrast sensitivity is directly comparable with the effect of a cross-oriented grating added to a target grating (Foley 1994; Holmes and Meese 2004; Ross and Speed 1991), in that the dipper shape of the TvC function is preserved, low contrast discrimination is impaired, and high contrast discrimination is relatively unaffected. Such an effect is normally explained as being the result of a dynamic ‘gain control’ network, where the outputs of differently tuned filters interact and suppress or inhibit one another’s responses (Heeger 1992). In most prior experiments, target and mask stimuli were predictable and regular from trial to trial: sinewave gratings or Gabor patches of fixed phase and position. For these 118

measurements, the mask and target stimuli had spatially randomized structure, which differed from trial to trial, and which would be expected to produce significant fluctuations in any linear filter exposed to a particular spatial position within the stimulus display. In short, external noise and attentional factors might play some part in the masking effects seen here. The next chapter deals with this problem by asking whether the effect of external noise is likely to have contributed to these results, or whether their relative contributions might in fact pale in significance in comparison with the action of a general, and in that case rather powerful, contrast gain control mechanism.

119

5. ANALYSIS: THE S-F FUNCTION

Measuring TvC Functions

One stimulus added to a target display can make the target more difficult to see. In the experiments described in the preceding sections, the measure of ‘difficulty’ is stark, where a target is either seen or unseen, with the transition between these states defined as the threshold. Addition of a mask stimulus to a target elevates thresholds in some cases; in other cases, it has no effect; and potentially, in some cases a mask might lower thresholds. Threshold elevation can be produced through diverse underlying causes, and some of the main theoretical contenders were described in Chapter 2 (Contrast Transduction). Active response normalization might occur, shifting the most sensitive part of the transducer to higher contrasts by a contrast gain control mechanism; external noise might contribute significantly to the signal-noise ratio used by observers in detecting targets; and, attentional factors might be affected by a mask stimulus, factors related to the ability of an observer to use precisely the correct perceptual equipment to accomplish the task at hand. Simple visual inspection of the results obtained from the main experiments suggests that the black-box ‘nonlinear transducer’ approach will be a successful match to the data. This approach should only be a starting point, however; the parameters of the transducer might be relatable to physiological or attentional processes. If a relationship seems to be necessary (e.g. based on implementation of the ‘black box’ parameters in a spatial simulation of the experiments) 120

but is not seen in fits of the transducer to data, additional steps must be taken to reconcile the discrepancy. The first step that will be taken in this chapter will be to apply the four-parameter ‘Stromeyer-Foley’ contrast transduction equation independently to all the TvC functions obtained in the main experiment. From the resulting bank of parameter values, for baseline and masked TvC functions, at various orientations and spatial frequencies, we can begin to speculate as to the perceptual processes underlying detection and discrimination behavior with narrowband and 1/f broadband noise stimuli. For example, results suggest that a consistent effect of noise mask on the threshold parameter z of the S-F function will be measured, strongly implicating the action of a gain control process. The following sections detail the potential interactions that might be found between the presence of a mask stimulus and the structure of a subject’s contrast sensitivity, and how these might be identified.

Parameterizing TvC Functions

It is certain that psychophysical contrast sensitivity must consist of multiple sensitive components and levels of signal transformation. First, logically, for an observer to respond consciously to a stimulus it must be processed through the cerebral cortex, meaning that contrast information must be encoded by the retina, processed in the thalamus, and ultimately encoded in primary visual cortex (V1), beyond which still more cortical processing is probably still necessary (e.g. comparison of the seen stimulus to a memorized template, maintenance of attention to peripheral visual processes, etc.), though subsequent encoding of visual stimuli deviates progressively from anything that could described as ‘luminance contrast’. In the first chapter, the notion was introduced that contrast encoding at V1 might be sufficient to explain the

121

basic properties of contrast sensitivity, including orientation and spatial frequency bandwidths and perhaps the form of the d’ function for contrast. Even with this considerable simplification of matters, the fact must still be confronted that no single V1 neuron is sufficient to explain contrast sensitivity. Contrast sensitivity and discrimination for a single pattern operates over a range of contrasts as great as 3 log units (a factor of 1000), while the average contrast-sensitive neuron has a much narrower dynamic range closer to 1.5 log units. So, it follows either that multiple neurons with differently placed dynamic ranges are used to both detect and discriminate the contrast of a pattern (cf. Chirimuuta and Tolhurst 2005, or Goris et al 2009), or that a single neuron might be used with its response modified by auxiliary neurons affecting its dynamic range through response modulation. Ultimately, a theory of sensitivity and discrimination must boil down to neural components. For now, more operational descriptions of the various sources of noise and nonlinearity will be discussed, both in theory and in their potential effects which might be measured through parameterization of TvC functions. Independent fits of the S-F function each set of TvC data collected in these experiments will individually be close to overfit, with most measured TvC functions made up of only 9 data points compared with 4 parameters per d’ function. Some measured thresholds may deviate by random chance from the ‘actual’ thresholds to such an extent that they significantly influence parameter estimates in one direction or another. Still, this initial approach allows for a rough illustration of the changes that contrast sensitivity undergoes in the context of broadband 1/f noise. We can quickly evaluate what effects 1/f noise has on what parameters, and then set about testing alternative explanations for these changes. The initial approach, of independently fitting S-F parameters to every measured TvC function, will not be sufficient for later model testing and evaluation; it will be necessary to control somehow for the effects of threshold measurement

122

error on those estimates. This will be done by searching for the least number of parameters that must be allowed to vary, across masking condition, across spatial frequency, and across orientation, in order to afford a good fit of model to data. As has been made clear by now, the objective of these experiments was to measure the sensitivity functions underlying performance in a contrast discrimination task. The objective was not to measure contrast thresholds and to analyze these directly. Instead, sets of thresholds (or sets of trials, depending on the fitting method) were to be taken together to represent a subject’s perception of a single target stimulus at various contrasts. The S-F d’ function will be used to parameterize TvC functions, and changes in these parameters due to the introduction of a 1/f noise mask can then be interpreted in terms of more theoretically sophisticated models of performance. To begin, what follows first here is a discussion of each of the four parameters of the S-F d’ function, including what the potential causes might be for changes in those parameters.

Interpreting S-F Parameters

The four parameters of the d’ function for contrast detection and discrimination can be apportioned within the equation itself in different ways, depending on one’s view of the order of operations of the mini-black-boxes that make up the front end of the S-F model. Yu, Klein, and Levi (2003) took the approach of analyzing the function parameters from the point of view of potential changes in the response function itself; that is, they modeled potential modifications of the response function, then noted how the d’ function parameters would need to change in order to afford such modifications. This is a good starting point for our analysis of the effects of 1/f noise on contrast sensitivity, and an illustration derived from their approach is given below. The

123

model is arranged in such a way that it mirrors the form of the S-F d’ function (Eq.5.1), in terms of order of operations (Figure 5.1).

Figure 5.1 Graphical schematic of Equation 5.1, after Yu et al 2003, showing the various nodes at which d’ response (top boxes, gray background) and threshold (bottom boxes, white background) are affected. Black and red lines are corresponding between the upper and lower boxes, with different values for the parameters in each node heading. Black lines are the same throughout, as a baseline case; red lines are the result of the change in parameter value denoted by the box headings – see Figure 5.2 for more details.

It is generally assumed that the order of physiological and psychophysical operations is preserved in this mathematical model (Meese and Georgeson 2005; Watson and Solomon 1997), but there is considerable distortion of reality. For example, excitatory transmission of contrast information through the visual system involves a series of nonlinear transducers; first through retinal ganglion cells, then through thalamic neurons, then through cortical neurons, and so on. None of these processes is entirely linear with respect to contrast, at least not over such a range of contrasts as is used in a typical contrast discrimination experiment. Along this pathway, selectivity for spatial features such as orientation and spatial frequency develop gradually. Yet, a mathematical model like the one presented here reorganizes all transduction nonlinearities and 124

sensitivity selectivities into two stages: a linear-with-contrast filter, tuned in relevant stimulus dimensions, followed by nonlinear transformation of the filter’s output. So, the structure of the d’ function does not exactly encapsulate the complexities of contrast transduction. In addition, as we shall see shortly, the parameters of the d’ function do not precisely control what appear to be the outwardly fundamental properties of the psychophysical d’ function; namely, its expansive nonlinearity near and below threshold, and its compressive nonlinearity above threshold and at high contrasts. In some cases, to use the S-F function as a measurement device for psychophysical contrast sensitivity, it is necessary to monitor more than one parameter at a time in order to check for what is, seemingly, a simple feature of the sensitivity function. The basic form of the S-F function is re-presented below. Soon it will be necessary to add further terms, already noted in Figure 5.1, to effect a complete description of the various ways in which the function might be adjusted or deformed.

d ′(c) =

r ⋅ (c )

p

z p − q + (c )

p −q

(5.1)

Broadly speaking, the four parameters of the S-F function do indeed correlate directly with the most basic features of psychophysical contrast transduction. Let’s enumerate those basic four features, and add an obvious fifth: 1) the S-F function has an amplitude r, which is measured in d’ units so that it will always be much greater than 1.0 (with r normally in the range of 15-50, depending on stimulus conditions and subject: Haun and Essock 2009; Holmes and Meese 2004; Foley 1994; Legge and Foley 1980). This value signifies the absolute gain of the transducer function, or the sensitivity of the transducer to a c value of 1.0. 2) Below and near the detection threshold, transduction is expansive, with d’ increasing as a power of p, typically a value 125

between 2 and 3. 3) Above threshold, transduction is compressive, another power function, this one with an exponent of q, which tends to a value around 0.40. 4) The transition point between these two power function regimes is denoted by z, and this term can fairly be referred to as a ‘threshold’ parameter. 5) In addition, the contrast value c itself could be multiplied by a constant, which we’ll call k, amounting to a gain control at the level of the linear filter that provides input to the S-F function.

Figure 5.2 Changes in sensitivity at each of the nine nodes of the S-F function, in both d’ (top boxes, gray background) and threshold (bottom boxes, white background) units. Black lines can be considered ‘baseline’, and are the same across all seven pairs of boxes. Red lines are for sensitivity changed by manipulating the parameters described above each box. In parentheses is the extra parameter (k, s, and h) represented at certain nodes.

126

These relationships between the S-F function parameters and d’ function features are direct, but are not one-to-one. For example, an increase in the value of q will increase the suprathreshold slope of the d’ function, but will also decrease the absolute height of the function, thus mimicking a decrease in r. If we seek to model simply a change in the slope of the suprathreshold portion of the function, without causing other changes, it will be necessary to adjust the value of r upward in order to keep the low contrast portion of the function from shifting. So, in the context of the data presented in the following section, a positive correlation between q and r between baseline and masking conditions might be taken to indicate a change in the suprathreshold slope of the d’ function. The set of such relationships is described in Figure 5.2. Yu et al showed that the basic properties of the d’ function could be altered in distinct ways, at nine ‘nodes’ along their structural dissection of the S-F function (actually, they made eight explicit, and implied the action of the ninth, represented in Figure 5.1 by Node 1). In Node 1, the gain of the linear operator (the receptive field, or ‘perceptive field’ defining channel selectivity) might be adjusted, this implemented as a multiplier against c, so that effectively c = kc. This would basically amount to changing the contrast scale; the shape of the d’ function and the resulting TvC function would not change, but they would be shifted along input and output contrast axes by the factor k. For example, in Figure 5.2 (Node 1) for the red line k is a value less than 1.0, so the linear gain has been decreased. This results in a rightward shift by the factor k (a constant distance on log contrast axes) of the d’ function. The resulting TvC function is shifted on both pedestal (abscissa) and threshold (ordinate) axes by the same factor. Adding an extra parameter k to the S-F function would not constrain parameter solutions, as its value would

127

interact with the values of r and z. We can see how the value of k can be redistributed to r and z by inserting the term into Equation 5.1 and rearranging:

d ′(c) = d ′(c) = d ′(c) =

r ⋅ (kc )

p

z p − q + (kc )

p −q

r ⋅ k p ⋅ (c )

, p

z p − q + k p − q ⋅ (c )

(

)

p −q

r ⋅ k p k p − q ⋅ (c )



p

(z k ) p −q + (c ) p −q

=

1 k p −q , 1 k p −q

( )

r ⋅ k q ⋅ (c )

(5.2) p

(z k ) p −q + (c ) p −q

We can see that a change in k is equivalent to a change in r by the factor kq and a change in z by the factor 1/k. An increase in the pre-nonlinearity contrast gain would therefore manifest through the S-F parameters as an increase in r accompanied by a decrease in z by a related factor, and vice versa. Node 2 corresponds to the exponent p, which determines the low-contrast slope of the d’ function. On log-log axes, changing the slope of the low-contrast regime is like swinging one end of a joint while keeping the other part still. A change in this value of p alone will also produce a slight intersection in the region of z if z is not correspondingly adjusted; so, a change in the low-contrast exponent of the d’ function would appear as a shift in p as well as a small (perhaps undetectable) shift of z in the same direction. As described in the example above, the situation is similar when a change in the suprathreshold slope of the d’ function is sought (Nodes 6 and 9), with a corresponding change (in the same direction) of r also necessary in order to keep the near-threshold regime from shifting out of place. Nodes 3 and 4 describe the ‘excitatory’, or numerator, component of the S-F function: Node 3 reflects a change in the overall gain of the d’ function, and it is easily described as an 128

isolated shift in the value of r. As can be seen in Figure 5.2, a change in r shifts the d’ function vertically on log-log axes. Node 4 reflects a change in the numerator exponent only, without a corresponding change in the denominator exponents. To bring about this change, p and q both need to increase, by an amount which cancels any increase in p-q. This means that increases in q will be much greater than increases in p if this change occurs in the d’ function. Since a change in p and q affects both low and high contrast regimes of the d’ function, the entire TvC function is also affected; if the d’ function is fixed so that it is unchanged at the low-high contrast transition point (corresponding to z), the resulting TvC function will include low-contrast masking and high-contrast facilitation (naturally, a combination of the effects of independently increasing p or q). Nodes 5, 6, and 7 describe the ‘inhibitory’, or denominator, component of the S-F function. Node 5 reflects a variability in the weighting of the denominator cp-q term, as by the parameter s in the following reformulation of Equation 5.1:

d ′(c) =

r ⋅ (c )

p

z p − q + s ⋅ (c )

p −q

(5.3)

As with the parameter k introduced above, it should be apparent that the value of s is not constrained by the other parameters of the S-F function. As with k, the value of s is confounded with the values of r and z. So, we can take the same approach as with k, rearranging terms so that s functions as a pair of factors applied to r and z:

129

d ′(c) =

s −1 ⋅ r ⋅ (c )  s − 1 p −q z     

p −q

p

+ (c )

(5.4) p −q

In this way we can see that a change in s is equivalent to a change in r and z in the same direction, with r shifted by a factor of 1/s, and z by a factor of 1/s p-q. So, this change in the d’ function would have an ‘even symmetric’ change in r and z, as opposed to the change denoted by k, whose signature is an ‘odd symmetric’ change in the same parameters. Node 6 involves a change in the suprathreshold slope of the d’ function, which is primarily related to a change in the value of q. A change in the suprathreshold slope of the d’ function results directly in a change in opposite direction of the slope of the suprathreshold discrimination function, as an instance of Stevens’ law (Stevens 1957), with the TvC slope approximately equal to 1 - q. This change was addressed at the beginning of this section: a change in the suprathreshold slope alone, without any shift in the low-contrast regime (i.e. swinging the other end of the d’ joint) will be produced by a change in q accompanied by a change in the same direction of r. So for example, an increase in the suprathreshold slope alone will correspond to an increase in q and an increase in r. Node 7 is the location of the threshold-shifting gain control used in Foley’s (1994) general model of pattern masking. Here, a weight h is multiplied against non-target contrasts of mask components cm according to their particular characteristics, which are also raised to the exponent p-q, and added to the denominator. At a fixed mask contrast, this is equivalent to an increase in the other fixed parameter in the denominator, z:

130

d ′(c) =

r ⋅ (c )

z p − q + (c )

p−q

p

+ h ⋅ (c m )

p −q

(5.5)

For overlaid contrast masks of fixed or varied contrast, this formulation is very successful at describing the effects of masks on target detection (Foley 1994; Holmes and Meese 2004). It is particularly attractive because it directly incorporates the contrast cm of the mask stimulus, and predicts the form of masking functions in many cases. Since contrast values are presumed to be rectified before input to the S-F model and are therefore always positive, rightward shifts of the near-threshold d’ regime will correspond with a positive h value (and thus an increase in z in the four-parameter model) and leftward shifts in a negative h value (and thus in a decrease in z). As we shall see later, and as should be apparent given the preceding discussion of other nodes, ‘gain control’ is not the only explanation of an increase in z, especially if it is accompanied by changes in other parameters. Given certain assumptions about how observers go about collecting information to use in detecting target stimuli in these tasks, external noise can cause an effect on TvC functions which is very similar with the ‘gain control’ hypothesis represented by Node 7 (recall the discussion of noise masking in Chapter 2). The final two Nodes, 8 and 9, are qualitatively the same as Nodes 3 and 6. The overall gain of the d’ function (nodes 3 and 8, reflected by an increase in r) can be increased either by a multiplicative gain control at the excitatory component of the S-F function, or by a multiplicative gain control at the output of the whole function, after the action of the divisive ‘inhibitory’ components. Likewise, the slope of the discrimination function, reflected by q, could perhaps be due to divisive gain control which is combined with another static parameter z, or it could be determined by multiplicative noise dependent on the overall output level, after the absolute gain has already been set. These ambiguities cannot be resolved by these results, or by any known 131

analyses of contrast discrimination data (Gorea and Sagi 2001 argue differently, though their method is controversial). What can be resolved is the form of the effects of noise on the d’ function. With these basic alternatives laid out, we can proceed to the data to address questions relating to precisely what, in an operational sense, is happening with the subjects included in these experiments.

Baseline and Masked TvC Parameters

As described in Methods, S-F functions were fit independently to each pair of baseline and masked TvC functions measured for each subject. This was done by minimizing the squared values of the confidence-interval σ weighted differences between predicted (model) thresholds µ and observed (data) thresholds x, the chi-squared statistic (Bevington and Robinson 2002):  x − µi χ = ∑  i σi i  2

  

2

(5.6)

For the initial analysis, all parameters were free to vary, across orientation and spatial frequency conditions as well as between baseline and masking conditions. This resulted in significant variation in parameter values both between and within subjects, but some patterns emerge. Figure 5.3 shows that all data sets were well-fit by the S-F function, and Figures 5.4, 5.5, and 5.6 show the best-fitting parameters p, q, z, and r for each observer (triangular symbols), plotted against one another, by spatial frequency (averaged across orientation where applicable), and by orientation (averaged across frequency where applicable) respectively. Solid and open symbols correspond to baseline and masked data, respectively.

132

Also shown in Figure 5.5 and Figure 5.6 are parameters derived from the maximumlikelihood procedure described in Chapter 3 (Distributed Threshold Measurement; round symbols). Overall these are not much different than those estimated by the least-squares method. In some cases they appear to be smoother or more regular than the least-squares estimates (particularly with the oriented data), and the maximum-likelihood p values tend to be lower and flatter with frequency than the least-squares values. This at least is understandable: information about the slope of the individually measured thresholds is not used in the least-squares procedure, which would leave that aspect of the underlying d’ function relatively unmeasured. Looking at the baseline parameters (solid symbols in Figure 5.5), for the two observers who completed seven spatial frequency conditions (AH and AL), we can see that z and r follow bowed functions in opposite directions. Observer STP, who completed the three higher spatial frequency conditions, suggests similar trends, with increasing and decreasing functions of z and r respectively with spatial frequency. Effects of spatial frequency are not clearly present for the slope parameters p and q, though there is a hint of an increase in the masked p values at the higher spatial frequencies. 1/f noise does not have an obvious effect on any parameter but z, which is consistently higher in the masking condition than in the baseline (F(1,87) = 46.4, p < .001). On the whole there is no effect on r, even when different spatial frequencies are considered (F(1,87) = .942, p = .336). A convenient way of illustrating these changes is to do away with stimulus axes entirely, and to plot parameters directly, on baseline-versus-masked axes (Figure 5.4; parameters estimated through the maximum-likelihood procedure are shown here). The variance in parameter values is plainly seen here, as is the large positive effect of noise on z (increasing on average by 140% across subjects and conditions) and the slight negative effect on r (decreasing on average about 13%). p and q seem broadly dispersed and

133

prone to unusually large and small values in a few cases, both increase slightly on average with the addition of the noise masks (F(1,87) = 5.5, p = .021 and F(1,87) = 8.6, p = .004), and increases in neither p nor q seem to be related with the frequency (F(6,87) = 1.58, p = .164 and F(6,87) = 1.22, p = .302) or orientation (F(3,87) = .778, p = .509 and F(3,87) = 2.487, p = .065) of the target.

Figure 5.3 Data and fitted curves for baseline and masked TvC functions. Each panel contains data from one subject. Sets of TvC functions are arranged by frequency from left to right, and by orientation from top to bottom. Ordinate is threshold contrast and abscissa is pedestal contrast. Each subject completed a different subset of target conditions; subject AH completed every target condition.

We can also look at the parameter values as functions of orientation. In Figure 5.6, for subject AH only the highest three spatial frequencies have been averaged, since these are the

134

only frequencies where orientation anisotropies might be expected. More than with the orientation-averaged frequency data, there seems to be more variation in parameter estimates here, particularly when the difference between the maximum-likelihood and least-squares methods are compared (round and triangular symbols respectively). For the baseline condition (solid symbols and lines), no consistent effects of orientation across three subjects can be seen for any parameter. The clearest effect of 1/f noise (open symbols and dashed lines) is to raise the value of z.

Figure 5.4 ‘Masked verses Baseline’ plots for the four parameters of the S-F function fit to data from the main experiment. Values for p and q seem clustered about the main diagonal, implying no overall effect, though they do significantly deviate from the diagonal (see text). z is clearly higher in the Masked case than in the Baseline. r does not deviate systematically from the diagonal. Different symbols are from different observers, representing individual target conditions.

135

Figure 5.5 S-F function parameters by spatial frequency for three observers. For AH and STP values are averaged across orientation; for AL values are for vertical targets. Solid symbols/solid lines are for baseline TvCs; open symbols/dashed lines are for masked TvCs. Round symbols are from maximum likelihood estimates, and triangular symbols are from least-squares estimates.

136

Figure 5.6 S-F function parameters by orientation for three observers. For AH and STP values are averaged across frequency bands 5 through 7; for CL values are for targets at band 6. Symbols are as in Figure 5.5.

137

Simplifying Mask Effects

In fact, it is not clear that all these parameters should be allowed to vary independently in this manner, and it is certainly troublesome from a statistical viewpoint since the different parameters appear naturally to be correlated with one another (as described in the beginning of this chapter). To simplify matters, the 4-parameter S-F model was refit to the data set, this time allowing different combinations of parameters to have fixed values between baseline and masking conditions. Refer to Figure 5.7. Here, the abscissa lists different sets of parameters which have been allowed to vary between baseline and masking conditions for each target condition. On the ordinate axis, chi-squared values are represented relative to the degrees of freedom df for each combination of data and parameters, with df = N – P, the reduced chisquared statistic rχ2 = χ2/df (Klein 1992; Press et al 1992; Bevington and Robinson 2002; cf Essock, Haun, and Kim 2009). For each subject, the total rχ2 for respective models can be compared directly with the ‘vary none’ rχ2 (denoted by “----”; fixing all parameters between baseline and noise-masked conditions resulted uniformly in a poor fit of model to data), so that we can look at the relative reduction in model-data error afforded by allowing certain parameters to vary. A rχ2 value of 1.0 would indicate an ideal fit to the data, with the model passing on average 1 standard error above or below the measured data; lower or higher rχ2 values indicate progressively ‘overfit’ or ‘underfit’ models, respectively. Models are ordered by complexity, i.e. number of parameters allowed to vary between masked and baseline conditions. There are 24 = 16 models, divided into five ‘families’ corresponding to the number of parameters allowed to vary:

138

Figure 5.7 Reduced χ2 values for the array of model types tested against data sets for four observers. Each observer completed a different set of target conditions. Different levels of model complexity, i.e. number of parameters, are partitioned along the abscissa. Best-fitting one- and two-parameter models are highlighted.

It might be nice if a single parameter could account for the effects of 1/f noise on contrast sensitivity, so we can first look at the cases where only one parameter was allowed to vary. We can think of these as the simplest models of masking. z is clearly the winner here with all four subjects’ data sets, with an average rχ2 value of around 2.2. r is least useful on its own, affording a rχ2 value near 8.0, and the slope parameters each afford a rχ2 around 5.0. On average, z is actually more effective at describing the data than any combination of the other three parameters (i.e. models which do not vary z). A fit of the vary-z-only model to the data set is shown in Figure 5.8, and appears to be very good at describing the data for all conditions and subjects. This is, essentially, a fit of the gain control model employed in grating masking studies and described in Chapter 2 (Foley 1994), since an increase in z is equivalent with addition of a

139

weighted mask-contrast term h·cmask to the denominator of Equation 5.1. This analysis suggests that the gain control model is a most effective means of describing pattern masking by 1/f noise.

Figure 5.8 Data plotted with best-fitting vary-z model parameters. The most effective pair of parameters are z and r (Figure 5.7), with a rχ2 of about 1.6. Given that these are the two parameters which showed the most clearly patterned distribution as a function of spatial frequency, a closer look at how they vary due to 1/f noise might be informative. Recall that the ‘linear contrast gain’ parameter k, described above as Node 1, was predicted to have a signature effect on the values of z and r, shifting them in opposite directions. Whether or not k is the source of our masking can be tested simply by looking at the values of z and r, converting them into k factors via Equations 5.2. Recall that if k changes, r should change by a factor of kq, while z should change by a factor of 1/k. So, using the best-fitting values of the

140

S-F parameters, with p and q fixed between masking conditions, the following identity can be arranged, where z1 and r1 are the baseline values and z2 and r2 are the masked values:

 r2  z 2 1    r2 q q  = k  &  =   ⇒  = ( z1 z 2 )     z1 k    r1  r1

(5.7)

If a change in the linear contrast gain is being altered by the 1/f noise masks in these experiments, there should be an identity, or at least some sign of a positive linear relationship, between the independently calculated values on the left and right sides of Equation 5.7. These values are plotted for each observer, for each stimulus condition, in Figure 5.9: the regression lines plotted for each subject demonstrate that there is no k-like relationship between z and r, at least not one significant enough to be measured in these experiments.

Figure 5.9 Test of the k-identity predicted by Equation 5.2. Predicted r ratios on the ordinate do not adhere to actual values on the abscissa. 141

Fixing Parameters Across Spatial Frequency

As a function of spatial frequency, contrast sensitivity normally varies significantly, with a rapid decline in sensitivity above a spatial frequency that is dependent on pattern luminance and temporal frequency, somewhere in the range of 4cpd (the contrast sensitivity function, described in Chapter 1). Multiple changes in the d’ function might account for variation in absolute sensitivity. Are some changes more significant than others, or even solely responsible for the contrast sensitivity function? Also, how does the effect of 1/f noise masking vary with spatial frequency? With this data set, these questions can be addressed. The previous finding, that only z need be increased to account for the effects of noise on sensitivity, make a simple model available which can be applied to the entire data set for a given subject. Here, it was determined which parameters could be fixed across spatial frequency and which must be allowed to vary, in order to afford a good fit to the data. In the previous section, it was shown that the vary-z model of masking was sufficient to describe all masking effects found in these experiments. Since this corresponds to an established model of pattern masking with simple stimuli, we may then use this specific model in analyzing the effects of spatial frequency. In this case there are five parameters to the d’ function, following the model of pattern masking set by Foley (1994). The four S-F function parameters are as before, with an additional value h representing the weighting of mask contrast added to the S-F function denominator; essentially, the increase in the value of z:

d ′(c) =

r ⋅ (c )

z p − q + (c )

p−q

p

+ h ⋅ (c m )

p −q

(5.8)

142

Five parameters free to vary or not vary as a function of spatial frequency corresponds to six levels of complexity and 25 = 32 models, shown along the ordinate axis in Figure 5.10a. Again, rχ2 values are informative as to the quality of these different models. This time, no 1-parameter model is close to satisfactory: allowing only z to vary with spatial frequency is the best 1parameter model, and gives an average rχ2 value just a little lower than 4.0. Allowing both z and h to vary gives an average rχ2 value around 1.5, a good value considering that this allows the exponents and r to remain fixed at all frequencies. It is interesting that the same type of change in the d’ function that accounts for pattern masking also affords a good fit to the regular contrast sensitivity function. Best-fitting values for z and r are plotted in Figure 5.10b. z follows a ushaped function, thereby producing the familiar CSF shape; h is relatively level towards the higher frequencies, and increases sharply towards the low frequencies. This is similar to the result obtained by Haun and Essock (2009). By allowing one more parameter to vary along with z and h, we can come to an interesting fact about contrast sensitivity as a function of spatial frequency. Most 3-parameter models are no better than zh, but allowing r to vary along with z and h results in an ideal rχ2 value for all three subjects, averaging just under 1.0 (Figure 5.10). z and r vary with frequency (Figure 5.10), and we see that they vary in opposite directions, with z reaching its minimum around 4cpd, and r reaching its maximum in a similar range. Again, this opposite-direction relationship is reminiscent of what would be expected given a change in k, as described in the previous section. The ‘k-like’ identity (Eq.5.7) described in that section did not hold for the masking effects. It does, however, appear to hold for the effects of spatial frequency (Figure 5.11a).

143

Figure 5.10 A) Reduced chi-squared values three observers. Ordinate axis lists the parameters allowed to vary with spatial frequency for a given model. Two models are highlighted: zh and zhr. B) z, r, and h values for each subject, by spatial frequency, for the zhr model.

144

Figure 5.11 Estimated r ratios on the ordinate do seem to increase along with actual values on the ordinate. Based on this correlation, k values (see text) are calculated and plotted at right.

Values for k estimated using this identity are plotted in Figure 5.11b. It appears that variation in sensitivity with spatial frequency can be attributed primarily to differences in the weighting of the contrast response prior to the divisive operation implied by the S-F function. We will return to this point later in the Discussion. With the zhr model selected, we can look at how the free parameters vary with spatial frequency, and note the values of the fixed parameters. p and q values for the three subjects included in the analysis are shown in the table below. r, z, and h values are plotted as functions of spatial frequency, averaged across spatial frequency. k-values estimated from z and r are also plotted. The arc-shaped k-function accounts for the CSF shape, and is perhaps convenient in suggesting a simple relationship between linear contrast attenuation (as by the MTF of the eye toward higher spatial frequencies, or by neural factors toward lower spatial frequencies) and absolute contrast sensitivity.

145

Table 5.1 The h function is interesting for its shape. At the mid-high to high frequencies (bands 5, 6, and 7), h on average does not clearly increase or decrease. Towards lower spatial frequencies, its value increases dramatically. This is similar Haun and Essock’s (2009) estimation of h for data collected under comparable conditions using a much sparser data set. They suggested that this shape could be compared with masking functions viewed through an equivalent noise paradigm; the high-frequency portion representing a constant gain-set property present at all spatial frequencies; and the low-frequency portion representing progressively greater noise masking due to the broadening of filter bandwidths with decreasing spatial frequency. Critical bandwidth of noise masking would intersect with the gain sensitivity at a mid-high spatial frequency. The following chapter, concentrated on the theoretical phenomenon of noise masking, suggests a means for determining whether this is a feasible explanation for the shape of the hfunction.

146

6. ANALYSIS: NOISE MASKING

Noise in the Contrast Stream

Detecting a target involves collecting energy in such a fashion that energy from the target is somehow collected preferentially. During the detection process, energy from other sources, unrelated to the target, inevitably gets collected along with the signal energy. This likely occurs to some degree at every level of visual processing: retinal activity is noisy by fundamental virtue of the random aspect of photons as stimuli (Pelli 1990), the complex behavior of neurons and neural networks in general, and due to the effects of biologically produced heat and mechanical action which can on occasion stimulate the false detection of a photon (Barlow, Levick, and Yoon 1971), and the effects of eye movements also are introduced here (Tyler and Chen 2000); LGN receives information from higher brain areas and uses this information to modulate its responses to signals delivered by the optic nerve (Sherman and Guillery 2003), thereby introducing the notoriously complex and chaotic activity (e.g. Buzsaki 2006) of the cerebral cortex into processing of contrast information at an early stage; at visual cortex, signals are subject to attentional modulation and learning effects, which are known to be measurable in tests of contrast sensitivity (Yu, Klein, and Levi 2004; Huang and Dobkins 2005; Carrasco, PenpeciTalgar, and Eckstein 2000); deeper into the cortical network, the decision stage is potentially subject to every aspect of psychological processing, which as we all know from personal experience is variable and complex beyond comprehension (an attempt at comprehension 147

constitutes the fields of neuroscience, psychology, the humanities, and beyond). All of these sources of variability might impinge on contrast sensitivity. When simple stimuli are used, they are all subsumed into the denominator of the d’ construct: we may question the sources their relative contributions, but we see that they are largely dependent on the observer and his neural apparatus. Experiments such as those presented in this document pose a further problem, in that they add variability directly to the target. Noise is explicitly a part of the stimulus array, both in the case of the narrowband oriented targets, and the broadband isotropic masks. The target stimuli in these experiments were themselves ‘noisy’, since their phase structure varied from trial to trial. More importantly, the mask stimuli were made up of broadband noise which did not resemble the target, but which still might be expected to contribute random trial to trial variation to the target’s structure and contrast. The effect of random variation on contrast discrimination functions depends on specific details of our model for contrast sensitivity. First is the assumption that as far as contrast sensitive mechanisms are concerned, there is no negative contrast. As discussed in Chapter 2, Contrast Transduction, phase-sensitive mechanisms (for example the ‘simple cells’ of V1) will see a zero-crossing contrast gradient as ‘half-rectified’, consisting of positive or zero values. Phase-insensitive mechanisms (for example the ‘complex cells’ of V1) will see the same gradient as ‘full-rectified’, consisting entirely of positive values. Second is the problem of the number of mechanisms, or contrast samples, used by the observer to determine his overall response to a stimulus. Are many pooled over space simultaneously? Is a single mechanism used several times in a span of 200ms adjusted spatially by eye movements? Is a single ‘best’ sample used? Are multiple samples pooled together, and if so, how is this

148

pooling performed? These issues were introduced to some degree in Chapter 2; now we will put them to some material tests in the context of the data that has been collected.

Adding Noise to a Contrast Transducer

Consider the following abstract example. If a random value is added to a contrast variable from sample to sample (e.g. from trial to trial in a sensitivity experiment), such as the linear response of a single spatial filter convolved with a spatial stimulus, the effect of that variation over many samples will depend on the form of the noise and on the rectification properties of the filter. Let’s assume that the added noise is normally distributed and has a mean of zero, in that the average negative fluctuation has the same magnitude as the average positive fluctuation. It will be shown shortly that this is true of the mask stimuli used in these experiments. Also assume that there is no other stimulus present which can increase the response of the filter. If no rectification is performed on the filter response, the average response will be zero over many trials, and the standard deviation of the response sr will simply be the product of the filter’s gain to the stimulus and the standard deviation of the noise sn. If the response is half-rectified, with negative values set to zero, the average response over many trials will now be positive (about 40% of the input standard deviation times the filter gain), and the standard deviation of the response will be decreased (to about 60% of the input standard deviation times the filter gain). If the response is full-rectified, with negative values made positive, the average response will be double that of the half-rectified case, and the standard

149

deviation of the response will be equivalent to the half-rectified case 26. So, the effects of noise added to a rectified transducer would be to raise its mean response level and to increase its output variance. Both of these effects, if large enough, would affect sensitivity dependent on subsequent transduction nonlinearities. The contrast transduction model applied in the previous chapter assumes a linear input. At the end of that chapter it was demonstrated that the gain k of that input (assigned a default value of 1.0) was not affected by the noise masks, though it does seem to be affected by target spatial frequency. However, this finding does not speak to whether or not significant external noise was added to the filter response. Three questions should be addressed now. First, what would be the effects of external noise on sensitivity functions as measured in these experiments? This can be addressed mathematically, as described in Chapter 2 (Equation 2.17). Next, are these effects similar to what occurred in the experiments included in this project? This can also be addressed using Equation 2.17, in conjunction with the fitting methods described in the previous chapter. Finally, is the noise produced by the broadband masks sufficient to produce such effects? This question can be addressed most directly by use of a spatial model of contrast transduction and detection. By constructing a simulated observer with limited assumptions about its internal structure (except for those entailed by the sort of model outlined in the previous chapter), we can estimate whether or not external noise is a plausible source of the masking effects measured in these 26

The means of the rectified distributions can be calculated directly if we know their probability density functions,

since x =



∑ xi ⋅ p(xi ) . For the half-rectified distribution p(xi < 0) = 0 , so x = ∑ xi ⋅ p(xi ) . For the fulli =0

i

rectified distribution p ( xi < 0 ) = p ( xi > 0 ) , so x = deviations s is then computed by s = 2

1 2



∑x i =0



i

⋅ p( xi ) + ∑ xi ⋅ p( xi ) . The expected standard i =0

∑ (x − x ) ⋅ p(x ) . i

x

150

experiments. In the next section, predicted patterns of sensitivity for a particular simulated observer are shown, which can then be compared with data measured in human observers.

Application of a Model Observer

Modeled Sensitivity to Experimental Stimuli

The question to be addressed by this model is whether or not it is plausible that external noise is the cause of masking observed in these experiments. From the prior analysis, it appears that active gain control (Foley 1994) can account for the observed effects. Still, I have mentioned earlier that external noise can have effects on detection and discrimination which are similar to Foley-type gain control. How similar are these effects? As shown in Chapter 2, convolving a noise distribution with a response function can produce results very similar to gain control, with difficult-to-measure identifying characteristics. First, at low pedestal contrasts, external noise will produce elevated thresholds depending on its variance (very low-power noise can produce meager facilitation). Second, at high pedestal contrasts, external noise will have a minimal effect on thresholds. Overall, this low-contrast masking pattern is similar in form with some of the nodal effects described in the previous section, particularly the z increase. There are other effects, potentially measurable in terms of the four S-F parameters, which can be seen particularly for high-contrast masks. Still, the variance of the S-F parameters measured in Chapter 5 suggests that fine distinctions may be hard to make with the primary data set. To get a more precise look at the effects of 1/f noise on TvC functions, new data was collected at multiple noise contrast levels.

151

The procedure was identical to that described in Chapter 3; three observers were included, including two new to the experiment (JX and DC) and the author (AH). For all three observers, TvC functions were measured for a vertical target at the fifth frequency band (4.1 cpd), against a mean luminance background or against three different levels of background 1/f noise. For two observers (AH and JX), the same experiment was repeated at the second frequency band (.63 cpd). Three fixed (not relative to the target threshold) levels of masking noise were used: 0.02, 0.04, and 0.08. Results are discussed below. The rate at which external noise elevates thresholds is determined by the bandwidth of the filters being used by the observer to detect the targets. In these experiments, this is especially true since the masks were devised so that they should minimally intersect with an ‘average’ spatial channel 27. Shown in Figure 6.1a are simulated TvC functions, obtained in the presence of variable noise levels, using full-rectified contrast responses from three different bandwidth specifications, all within the range of plausible psychophysical and physiological contrast bandwidths (see DeValois et al 1982, Wilson et al 1983, Phillips et al 1984): ‘narrow’ used a 30°/1-octave (width at half-height in both dimensions) filter, ‘medium’ used a 45°/1.6-octave filter, and ‘broad’ used a 60°/2.0-octave filter. Cosine-phase filters with these specifications are illustrated in Figure 6.1b, along with plots of normalized bandwidths for the respective filters. The figure shows that the broader bandwidth filters were relatively more affected by the masking stimuli. It is important to remember here that no ‘inhibitory’ or gain-control-like processes are involved in this simulation. Performance is limited entirely by the nonlinear combination of additive internal noise and the variance of the stimuli used in the actual experiments.

27

Recall that target bands and the corresponding ‘gaps’ in the mask images were defined as being 45°/1.5-octave wide, though this gap was smoothed somewhat by the localized and Gaussian-windowed nature of the stimuli. Refer back to Methods for details.

152

Figure 6.1 Results from a simulated observer using three different sensory bandwidth settings, plotted at top: broader bandwidths resulted in better baseline sensitivity but greater influence by masks. Four noise mask contrasts were used, represented by the different symbols and line colors. At bottom left, spatial filter shapes are shown, in cosine and sine phase (see Methods, re. quadrature filters): to emphasize low-amplitude but very broad areas within each filter, the colormap is discontinuous at 0.0. Filters with narrower bandwidths have more extensive and well-defined regions of high sensitivity. At bottom right, the frequency bandwidth of the filters shown is plotted. Orientation bandwidths varied similarly (see text).

Masking function data becomes useful here. Recall that masking functions were obtained as part of the main experiment. These functions describe how absolute threshold increases with mask contrast – since absolute threshold is where the greatest effect of the mask should be seen, we can look here to see how the modeled detection thresholds match with the empirical ones. In the figure below (Figure 6.2), masking functions are illustrated for the three subjects from the main experiment. Mask and threshold contrasts for each condition (different spatial frequencies) 153

have been divided by the respective detection threshold, making them relative values. Masking functions at higher spatial frequencies (bands 4-7) overlap, with lower frequency bands shifted upward. Also illustrated are simulated masking functions, obtained through the model described above. The two broader bandwidths used produce masking functions which closely track the higher spatial frequency data. This suggests that to a point, the empirical masking functions can be explained simply by a combination of specified perceptual bandwidth and stimulus variance. If noise masking is responsible for the effects described in Chapters 4 and 5, filter bandwidths greater than the ‘medium’ setting shown here are implied.

Figure 6.2 Masking functions for three observers plotted as solid lines, with thick dashed lines representing masking functions obtained for the simulated observer at three bandwidths. Medium-broad bandwidths produce masking functions comparable with those obtained with human observers.

However, other features of the empirical and simulated data must be considered in order to come to this conclusion. Recall that in the earlier analysis of S-F function parameters, noise had minimal effect on the exponents of the S-F function, and that if any effect was measured it was of an increase in the value of p (cf. Figures 5.4, 5.5). This is because in the data, the

154

facilitatory ‘dips’ in the measured TvC functions were parallel between masking and baseline conditions, if anything becoming slightly steeper in a few cases (corresponding to the increase in p). Refer to Figure 6.1, and notice the how the TvC functions change as they are elevated above the baseline level. It should be apparent that as mask contrast increases, the dippers get shallower, especially for the broader bandwidth filters. This occurs due to, essentially, a blurring of the sharp low-contrast nonlinearity by the external noise source (refer back to Transducer Theory), but since full-rectified filters were used in this simulation, less dipper-abolishment might have been expected. Other unpredicted nonlinearities might be introduced by the threshold-seeking process, which was still the same as in the human experiments, or by the filter shapes and arrays used in the simulation. It may be that the mask contrasts used to measure masked TvC functions were not sufficiently high to measure significant abolition of the facilitatory dipper (though one subject already described (CL) did run with a relatively high mask contrast and showed no sign of abolished dippers). The second experiment, described above, was carried out where the mask contrast was varied across several levels, in order to determine whether there is any sign of dipper damping at higher noise contrasts.

Testing Noise-Masking Predictions

The procedure and conditions were as described in Methods. TvC functions were measured for vertical at the 4.1cpd band (band 5) for the author and two new subjects (JX and DC). Baseline functions were measured (no masking noise) and masked TvC functions were measured against noise backgrounds of .02, .04, and .08. As can be seen in Figure 6.3, masking

155

noise progressively increased thresholds for low-contrast pedestals, and had little effect at highcontrast pedestals, similar with the effects seen in the original experiment.

Figure 6.3 Results from the multiple-mask experiment. Top row is for observers viewing 4.1 cpd targets; bottom row is for observers viewing .63 cpd targets. Vertical targets were used in every case.

156

Figure 6.4 Top: Chi-squared values for different observers viewing 4.1 cpd targets. The best and simplest model is one that varies the parameters z and r. Bottom: slope and amplitude parameters obtained for human (left) and simulated (right) observers.

Visual inspection of the data for the 4.1 cpd targets (Figure 6.3, top row) suggests that there is no damping of the TvC dippers, but this can also be demonstrated quantitatively. Both data sets (obtained from three subjects and from the three simulation bandwidths) are well-fit by similar models (Figure 6.4). Again, z is the best 1-parameter model, and zr is the best 2parameter model. For the simulated data, on average, zr is only marginally better than pz (rχ2 of 1.8 and 2.1 respectively). The best 3-parameter model is pzr, which yields a rχ2 near one for both data sets. These ‘best’ models can be used as descriptor cases for judging the effects of 157

noise on individual parameters. For the simulated data, p and r values tend to decrease with increasing mask contrast, describing the shallower dippers in the simulated data. For the narrowest filter bandwidth, the decrease in p is delayed due to the delayed overall effect of the noise (Figure 6.4). Regardless, the full range of filter bandwidths displays the decreasing parameter values. For the data obtained from human subjects, no overall decline in the value of p is seen (subject AH had inordinately high p values which followed a non-monotonic course with mask contrast). For two of the three subjects (JX and CD) r decreased with mask contrast, perhaps reflecting influence of additional spatial content on attentional ability (e.g. Huang and Dobkins 2005 suggested r might be modulated by attention). As noted earlier, the parameter determining the degree of masking, h, increases dramatically below 4.1 cpd. Haun and Essock (2009) have suggested that this indicates a crossover between the effects of gain control, which are more-or-less constant with spatial frequency, and external noise, whose effect should track with filter bandwidths (cf. Schofield and Georgeson 2003). It was for this reason that the initial follow-up was performed at 4.1 cpd: at a spatial frequency where the gain control hypothesis is predicted to hold, does it? The preceding discussion indicates that it does hold at this spatial frequency. However, below 4.1cpd, the increase in h has been hypothesized (Haun and Essock 2009) to be due to increasing noise masking. To test this hypothesis, two subjects from the previous condition were run in the same experiment, with the same three mask contrasts, at the second-lowest spatial frequency (.63cpd, Band 2). If noise masking is indeed the cause of the increased h values measured at the lower spatial frequencies, progressive abolition of facilitation should be measured with increasing mask contrasts. Data are plotted on the bottom row of Figure 6.3.

158

Here, it appears that there is a difference between the baseline and masked TvC functions, where the baseline functions show more pronounced dippers than the masked functions. The effect is much clearer for JX, but is also seen for AH. So, there does seem to be some sign of noise masking at this lower target frequency. Since the spatial simulation is so computationally intensive, it is not feasible to attempt to fit its parameters to these data sets. Instead, the functional model of noise masking can be applied to data, and results can be compared directly with the results of the presumed ‘gain control’ model described in Chapter 5. In Chapter 2, Equation 2.17 describes how a sampling distribution of contrast values can be combined with a transducer function through convolution. A simple model of contrast masking would assume simply that that some fixed proportion of stimulus noise gets through to the transducer input. Adopting this approach allows us to treat the noise masking model in a similar way as the gain control model: the four S-F parameters are kept constant at all mask contrasts, with an additional parameter, a noise coefficient g multiplied against the mask contrast σ:

d ′(c ) = n(0, gσ ) ∗

r ⋅ (c )

p

z p − q + (c )

p −q

(6.1)

Using this model, d’ functions of this type were fitted to the data sets for all three observers, at both spatial frequencies. Results in the form of rχ2 values are shown in Figure 6.5a, labeled as the ‘pure external noise’ (PEN) model. At 4.1 cpd, for observers DC and JX, the PEN model is a good fit to the data, while for AH the fit is much poorer. At .63 cpd, the PEN model is a good fit for both subjects, including AH. Also shown are rχ2 values for three other types of model. The best overall is the ‘vary zr’ model described in Chapter 5, as it has the lowest rχ2 for

159

each subject and each condition, except for AH at .63 cpd where it is equivalent with the others. PGC is a ‘pure gain control’ model, a fit of Equation 5.5 to these data. The PGC model has a moderately good fit to the different conditions and observers’ data; rχ2 values fall between 2 and 3 in each case. Finally, a ‘hybrid’ model was applied, allowing both gain control (i.e. the h coefficient from Eq.5.5) and external noise to be incorporated:

d ′(c ) = n(0, gσ ) ∗

r ⋅ (c )

z p − q + (c )

p −q

p

+ h ⋅ (σ )

p −q

(6.2)

The hybrid model is labeled as GC-EN in Figure 6.5. For the 4.1 cpd data, the GC-EN model fit is comparable with a combination of the PGC and PEN models. For the .63 cpd data, the fit is as good as the PEN model. Note the values for g and h shown in the table at the bottom of Figure 6.5. At 4.1 cpd, g and h are both significant values, indicating that both processes might contribute significantly to masking here. However, at .63 cpd, the value of h is set to nearly 0.0, essentially reducing the hybrid model to the pure external noise model, indicating that at this frequency external noise is the dominant source of masking.

160

Figure 6.5 Chi-squared values for four models of sensitivity tested against five data sets (two spatial frequencies, three observers). PEN is the ‘pure external noise’ model with noise-amplitude coefficient g. PGC is the ‘pure gain control’ model, with gain control coefficient h. GC-EN is a combined model with both g and h coefficients. Vary rz is the best-fitting free-parameter model described in earlier sections.

161

Model Fitting Using Maximum Likelihood

Unfortunately, even these data do not seem to be enough to clearly indicate what the ‘correct’ model might be. One last option is to use the measure of maximum likelihood described earlier to compute the goodness of fit of these different models. Several measures of goodness of fit using likelihood functions have been proposed (e.g. Pitt and Myung 2002; Zucchini 2000), including the Akaike Information Criterion (AIC) which takes into account the number of parameters in a given mode, the Bayes Information Criterion (BIC) which takes into account both the number of parameters and the size of the data set, and the Bayes factor which takes into account both parameters and data points, and also considers the prior probabilities of particular parameter values. For our purposes, the BIC was deemed most useful, since it is not clear what form the priors for the individual models should take (Wasserman, 2000, notes that with weak constraints on priors, the Bayes factor tracks roughly with the BIC; also see Kass and Raftery 1995). The BIC is calculated as BIC = 2 ln (L ) + k ln (n ) , with L denoting the maximum likelihood parameter for a given model and data set, k the number of parameters to the fitted model, and n the size of the data set (in this case the number of trials fitted to the model d’ function). Since the value given in Equation 3.3 is the logarithm of the likelihood function, BIC was calculated as BIC = 2l ( y (c ), θ ) + k ln(n ) . In either case, a lower BIC value indicates a better fit of model to data. In Figure 6.6, BIC values are plotted relative to the ‘default’ gain control model, referred to as PGC in the previous section. Two models are compared with the PGC model. The first was ‘vary-r’, included in place of ‘vary-zr’ from the previous section. Vary-r consisted of the PGC model with independent r values at each mask contrast, for 8 total parameters (p, q, z, h,

162

and rs 1 through 4). This gives up very little of the flexibility of the zr model, while reducing the number of parameters from 11 to 8. The second model was the pure external noise (PEN) model. In Figure 6.6, the plot and table show values for BICPGC – BICvary-r, or for BICPGC – BICPEN. So, positive values indicate that the denoted model is better than the pure gain control model. At 4.1 cpd, both JX’s and DC’s data seem to be best explained by the vary-r model, with the PEN slightly less preferred but still better than PGC. For AH at 4.1 cpd, PGC is the best model, followed by vary-r, with PEN the clear loser. At .63 cpd, there is a weak preference in both observers’ data for the PEN model. Overall, the results of the BIC analysis are in agreement with the results of the reduced chi-squared analysis of the previous section. At the higher spatial frequency, some form of the gain control model, with or without allowed variation of the absolute gain parameter r, best accounts for the data. At the lower spatial frequency, the external noise model seems to be most explanatory. As it turns out, this is in accord with Haun and Essock’s initial hypothesis as to the sources of masking across spatial frequency in 1/f noise.

163

Figure 6.6 BIC decreases for the vary-r gain control model and the PEN model relative to the PGC model. Positive values correspond to a decrease from the PGC model BIC value. At 4.1 cpd, JX and DC’s data are better fit by both models, best by the vary r model. AH’s data is best fit by the PGC model, with a slightly poorer fit by the vary r model. At .63 cpd, the PEN model is better for both observers; the vary r model is comparable with the gain control model.

Summary

In the end, it appears that noise masking is a likely explanation for some portion of the masking effects measured in these experiments, particularly at lower spatial frequencies. This was not unexpected, since it appears that a simple simulation of detection and discrimination behavior, as commonly employed by those modeling spatial vision (Watson and Solomon 1997, Meese and Summers 2007), would predict that there should be substantial masking by the noise masks used in these experiments at any spatial frequency, accompanied by a measurable decline in the depth of the TvC dipper. The presence of this decline in human data only under certain conditions suggests a solution somewhere along the following continuum: At one end, masking in these data is due entirely to a gain control process, with human subjects able to significantly

164

discount the effects of noise, perhaps through an intentional or reflexive narrowing of their perceptual filter bandwidths. At the other end, masking in these data is due to noise masking, with an accompanying blurring of the low-contrast expansive nonlinearity, along with another process such as increased channel uncertainty which has the coincidental effect of ‘correcting’ for the dipper shape. Some combination of these two solutions might also occur, as is likely the case at the lower spatial frequencies. The problem is that the two ends of this continuum are similarly complex, and difficult to disambiguate since they would be predicted to produce very similar data in most conceivable psychophysical tasks. Remaining questions relating to this and other problems raised by these experiments are discussed in the final chapter of this dissertation.

165

7. DISCUSSION

A running theme in this work has been the difficulty in disentangling certain aspects of an observer’s sensitivity to stimuli. External and internal noise, attention, gain control, uncertainty, and other factors all have effects on sensitivity which are to some extent interchangeable. Some problems seem intractable given psychophysical methods: the d’ puzzle of internal response and internal noise components is probably best left to physiological methods and applied theories (e.g. Sit, Chen, Geisler, Seidemann, and Miikkulainen, 2008). So, there are depths that cannot be reached using the methods of this study. Some questions can be answered however, as well as some new questions asked. The fundamental question to be answered by these experiments was what, exactly, is the effect of 1/f noise on contrast sensitivity. It has now been shown that a shift, to higher contrasts, of the nonlinear psychophysical contrast transduction function is sufficient to explain changes in contrast sensitivity due to addition of 1/f noise. At lower spatial frequencies, it appears that masking by external noise is at least as plausible an explanation as gain control, perhaps more so. In addition, these experiments have yielded interesting perimetric 28 results, suggesting certain interesting aspects of contrast sensitivity across different spatial frequencies. Further discussion of these issues will be divided below into

28

‘Perimetry’ is most often used in vision science to refer to measurement of the spatial extent of the visual field, to sensitivity as a function of spatial coordinates. Here the term is used to refer to the measurement of sensitivity as a function of more general parameters: contrast, spatial frequency, and orientation.

166

two sections, focusing respectively on the implications of the perimetric findings and on the problem of coming to conclusions regarding the source of masking in 1/f noise.

What kind of stimulus is 1/f noise?

Before getting to discussion of experimental findings, a review of what was at stake with these experiments might be in order. As its name implies, 1/f noise entails two important, and very different, stimulus factors. The 1/f property is found in naturalistic stimulation throughout visual experience, in addition to other natural phenomena (Field 1987; Voss and Clarke 1975). This is the reason 1/f noise is so commonly used in studies of visual perception – since the visual system encodes images early on in the form of contrast, before it encodes more complex structural image features, broadband 1/f noise is presumed to impact, or ‘load’, the visual system similarly with naturalistic broadband stimulation. The noise property is normally considered a neutralizing factor; a noise image by definition contains no representational content, being made up of randomly determined patches of light. So, studies which use 1/f noise as a visual stimulus normally cite its neutral, broadband visual nature, and presume that it acts primarily as a contentfree contrast stimulus (e.g.in studies of eye movements: Tavassoli et al 2009, 2007; visual search: Geisler et al 2006, Rajashekar et al 2006, Clarke et al 2008; texture perception: Padilla et al 2008, Essock et al 2003). Only in a very few cases has 1/f visual noise been treated explicitly as a noise stimulus (Schofield and Georgeson 2003), and in those cases its properties as a broadband contrast load are ignored. In this project, experiments were carried out which were intended to both describe the overall shape of the impact that broadband 1/f noise has on the visual system, and also to speak

167

to the relative contributions of its loading aspect and its noise aspect. The experiments were designed to measure contrast thresholds, and therefore these findings were sought in the effects of 1/f noise on contrast thresholds. Since psychophysical sensitivity is normally defined as a signal-to-noise ratio, it is not difficult to imagine why noise would have an effect on sensitivity – adding noise to a target will, if anything, increase the noisiness of the target and make it more difficult to see. On the other hand, without some familiarity with the background of visual psychophysics (provided in Chapters 1 and 2), it is not immediately obvious why adding contrast to a target should make the target harder to see. Yet, this is what normally occurs when two different contrast stimuli are combined – they impair one another’s visibility 29. Currently, the dominant explanation for these effects is that they reflect contrast gain control in the visual system (Foley 1994; Watson and Solomon 1997), whereby sensitivity to simultaneously viewed stimuli is impaired by a process of neural inhibition (Heeger 1992). It has been suggested that the function of neural contrast gain control is to normalize stimulus contrasts, to maximize the efficiency of the visual system’s processing of contrast in imagery (Hansen and Essock 2003; Carandini, Heeger, and Movshon 1997). It has also been suggested that perimetric anisotropies in contrast sensitivity might be accounted for by anisotropies in these gain control mechanisms, as in the case of the oriented oblique and horizontal effects (Essock et al 2003, 2009; Hansen et al 2003). Measurement of the threshold versus contrast function reveals the shape of an observer’s sensitivity function for a particular type of stimulus (Nachmias and Sansbury 1974). It is

29

As made clear in Chapters 1 and 2, adding together two identical, or similar, low-contrast stimuli will tend to facilitate sensitivity. Also, sensitivity to low-contrast targets which are surrounded by annular or flanking contrast stimuli and viewed foveally will tend to be facilitated (Chen and Tyler 2008; Yu et al 2003). However, these effects occur only over very narrow ranges of contrasts near the absolute detection threshold. When broader ranges of contrasts, above threshold, are explored, masking tends to be the rule when it comes to interactions between contrast stimuli.

168

because these sensitivity functions can be linked with neural response functions (Kwon et al 2009; Boynton et al 1999; Ross and Speed 1991; Campbell and Kulikowski 1972) that the link between neural processes such as gain control and perceptual phenomena such as contrast masking can be made. Below, drawing on this logic, two questions are addressed. First, having measured these TvC functions across the frequency domain, with and without 1/f noise interfering with sensitivity, what can now be said about the spectral layout of contrast sensitivity and of contrast gain control? Second, is gain control the best explanation of the effects of 1/f noise on sensitivity, and are there other factors that may be equally or more important?

Perimetry

Narrowband (‘Baseline’) Contrast Sensitivity

As a function of spatial frequency and orientation, contrast sensitivity is known to vary in predictable ways, given that temporal properties and mean luminance of the stimuli are fixed (DeValois, Morgan, and Snodderly 1974; Watson and Solomon 1997; Watson and Ahumada 2005). For this discussion, since a single stimulus duration and mean luminance were used in all experiments presented here, assume that these factors are fixed. With spatial frequency, sensitivity is progressively more attenuated at spatial frequencies above 4 cpd or so, depending on the particular observer, until sensitivity becomes nonexistent at a high frequency of around 60 cpd. This drop-off in sensitivity is partially attributable to blurring induced by the optics of the eye (Campbell and Green 1965), but is primarily due to neural and sampling factors which begin in the retina (DeValois and DeValois 1990). At lower frequencies than the CSF peak, sensitivity

169

drops off at a similar rate (if the CSF is viewed as a function of log frequency); here, the drop-off is attributable entirely to neural factors, perhaps most significantly the center-surround interactions which occur in retinal processing. These facts about the origins of the CSF are not gleaned through psychophysical study, but rather through study of physical and physiological factors. Yet, psychophysical measurements across spatial frequency and orientation have been suggestive of some of the complexities underlying contrast sensitivity. In particular, measurement of TvC functions across spatial frequency indicates that some factors in contrast perception are more or less constant, while other vary considerably (Bradley and Ohzawa 1986; Boynton et al 1999; Bird et al 2002). Absolute threshold is what normally defines the CSF; if increment thresholds are measured instead, the sensitivity function is much flatter ('contrast constancy'; Georgeson and Sullivan 1975). This has been taken to mean that the factors which determine absolute threshold are relatively ineffective once a pattern becomes visible, i.e. that different mechanisms come into play at suprathreshold contrasts. The data collected in these experiments confirms this notion, and adds a more specific interpretation. Recall the nonlinear d' function described at multiple points throughout this dissertation (Equations 2.3, 5.1), referred to as the S-F function. This function takes an input which is linearly related with stimulus contrast, and converts this input into a quasi-sigmoid, non-saturating sensitivity function. There is some evidence (Kwon et al 2009; Boynton et al 1999) and much opinion (Chirimuuta and Tolhurst 2005) suggesting that the nonlinearities of the S-F function are tied to cortical processes. Constrained fits of the S-F model to threshold data in Chapter 5 show that the contrast sensitivity function can be reduced to a CSF-like function of the S-F parameters z and r (Figure 5.10b). Furthermore, it was shown that together, these z and r

170

functions are essentially equivalent to a dependency on the parameter k, the linear input to the nonlinearities of the S-F function (Figure 5.11). As far as these data are concerned, the nonlinearities of contrast sensitivity, namely the transduction exponents p and q, can be held constant across all frequencies. We can therefore interpret this finding of the k function as suggesting that, if the methodologies of Kwon et al and Boynton et al are justified (i.e. tying the nonlinearities of transduction to visual cortex), then the shape of the CSF arises primarily due to pre-cortical factors, and that the shape of the d’ function for contrast is due primarily to cortical factors. Conversely, if the k-like relationship shown in Chapter 5 (Figure 5.11) is merely coincidence, the bowed functions of z and r with spatial frequency (Figure 5.10b) would suggest that some fundamental processes in visual cortex vary with spatial frequency. These two parameters have been referred to as setting, respectively, the contrast gain (z influences at what contrast the transducer begins to operate; ‘higher’ contrast gain would therefore correspond with a lower value of z) and the response gain (r influences the rate at which the transducer response increases with contrast) of the transducer function. There are two related parameters to the Naka-Rushton function commonly used to describe similar properties of neural contrast response functions (i.e. their position on the contrast axis and magnitude on the response axis): c50 (the ‘semi-saturation constant’) and Rmax (Equation 2.1). Though there is a wealth of information on these parameters and their distribution (as attached to individual neurons) in visual cortex, there is relatively little on how they correlate with one another. The psychophysical findings reported here imply a rather stark negative correlation between r and z: when z is small, r is large (i.e. contrast and response gain are positively correlated). However, physiological findings seem to go in the opposite direction, if any correlation at all can be detected: Carandini and Sengpiel’s

171

(2005) optical imaging study indicated a relatively strong (r = .49) positive correlation between Rmax and c50 after the fit of a (rather overfitted) model to their data. Whether or not these psychophysical results can be tied to physiological data is therefore unclear. The robustness of the suprathreshold ‘response compression’ (or Stevens’ law) exponent mentioned in Chapters 1 and 2 was confirmed with these results, falling at a value near 0.40, especially when fixed across multiple target conditions. If considered with respect to physiological CRFs, q is the ‘odd’ parameter of the S-F model of contrast sensitivity. Whereas the low-contrast exponent p, the contrast gain parameter z, and the response gain parameter r can all be tied directly to components of the typical neural CRF, deriving q through a physiological model of contrast perception requires that responses from multiple neurons be combined according to some relatively more-central pooling rule (e.g. by Bayesian combination of responses: Chirimuuta and Tolhurst 2005, Geisler and Albrecht 1997; or by summation of responses: Goris, Wichmann, and Henning 2009, Watson and Solomon 1997). The fact that this aspect of sensitivity appears to be constant across stimulus conditions suggests that its value is determined by processes which are relatively more central to the observer (i.e. occurring at a cortical stage distinctly later than neural transduction) than the neurons which are directly responsible for contrast transduction. The S-F exponent p, responsible for the facilitation of sensitivity by low-contrast pedestals, was observed in most cases to fall at a value just above 2 + q. For baseline data, median values for p fell around 2.48, with the distribution skewed towards higher values – only 20% of values fell below 2.3. Since q is normally near 0.4, this means that p – q usually fell at a value just above 2.0, which would allow us to see contrast transduction as making use of energy mechanisms, where contrast is encoded as signal power, the square of contrast, rather than

172

amplitude. It has been discussed elsewhere (Gottschalk 2002; Solomon 2009; Heeger 1992) how such an arrangement would be computationally efficient and even ideal in some cases (recall how in Chapter 2 it was described how squaring and summing of fixed-phase receptive fields could afford a phase-independent measure of contrast). If the visual system does carry out something similar to a squaring of signal contrast, it would explain what baseline measurements of d’ functions rarely afford p values less than 2 + q, and why if anything values will be higher than this. Higher values could be obtained if there is some degree of transducer uncertainty (Chapter 2), and if observers use something like a ‘max response’ rule across a pool of relevant and irrelevant transducers to make decisions about whether or not they believe they have detected a target. Higher-than 2 + q p values obtained in the baseline conditions in this study might reflect a small amount of uncertainty on the part of the observers, which might be reduced simply to an inability to constrain attention precisely to only target-sensitive transducers.

1/f Masked Contrast Sensitivity

The only major systematic effect of adding 1/f noise to a target is to increase the value of the gain control constant z. As seen in Chapter 5 (Figure 5.10), this effect is progressively stronger at spatial frequencies below 4 cpd. Furthermore, this increase in z takes the form of z + h·(cmask)p-q, which means that the standard model of overlay pattern masking (Foley 1994) is adequate to describe the effects of 1/f noise on contrast sensitivity. Complications related to this assertion of the gain control model’s adequacy are addressed in the next section. For now, two

173

other potential effects will be discussed, the effect of 1/f noise on the low-contrast transduction exponent, and its effect on the amplitude of the d’ function. In the previous section, it was noted that an increase in p could be explained as the result of two factors: an observer’s being unable to pool together precisely the correct perceptual equipment for the task, and the observer’s use of a ‘max response’ rule for choosing the likeliest location of the target (Chapter 2; Pelli 1985). Such transducer uncertainty could occur in the baseline condition, when there is no stimulus to be seen but the target, thereby raising the value of p above the ‘true’ transducer value. It could also occur in the masked condition, in one of two broadly defined conditions. First, it could be true that there is uncertainty in the baseline case, and that this uncertainty becomes active again once the contrast of ‘distracter’ components is elevated to the target’s pedestal contrast level. Second, it may be that there is no, or relatively little, uncertainty in the baseline condition, but that the complexity of the broadband mask is, for lack of a better word, confusing, causing the observer to become more uncertain of how to use his transducer array than he was in the baseline condition. In the baseline condition, p values (across all subjects and target conditions from the main experiment) were tightly clustered around a mean value of about 2.68; in the masking condition, the mean was higher, around 3.03. It did not appear that this difference was linked with spatial frequency (Chapter 5). This increase in the value of p with introduction of a noise mask is precisely the opposite of what was observed in the second set of experiments (Chapter 6), where the value of p was observed to decrease with introduction of 1/f noise (with the .63 cpd targets) or not to change significantly in either direction (with the 4.1cpd targets). Whether or not significant transducer uncertainty occurred as a function of whether or not a broadband mask was imposed may have depended on the conditions run by a given observer.

174

In the main experiment, each observer was assigned a particular set of orientations and spatial frequencies to complete as target conditions. From day to day, and within each day, these conditions were cycled through by an observer (Chapter 3), so that on a given day she might be tasked with detecting targets at all four orientations at a single spatial frequency, or at 4 spatial frequencies at a single orientation. Conversely, in the second experiment, observers were assigned to complete a single target condition, and on every session they were tasked with detecting this single orientation and spatial frequency. It may be that the interleaving of different target conditions, in the main experiment, interfered with the observers’ ability to precisely target the most relevant perceptual mechanisms for detecting those targets, and that this difficulty was more pronounced during viewing of the complex, broadband noise masks. This problem was anticipated to some extent in the design of these experiments: as described in Methods (Chapter 3), TvC functions were measured in a fixed series of blocks, starting with high-contrast pedestals and ending with low-contrast pedestals. It was hoped that this would help to prevent observers in the main experiment from, essentially, forgetting what target they were looking for when the

175

Figure 7.1

176

Figure 7.1 An illustration of hypothetical uncertainty effects in the frequency domain. Four situations are shown: at left, the stimulus/channel array in a baseline condition, with a target present at 7cpd and 45° (the dark gray, colored region), and no content at other values; at right is a broadband mask condition, where the target is the same but now there is content (the light gray coloring) within the range of other receptive fields. The thin black circle represents the ideal, most relevant receptive field for the job of detecting the target; thin grey circles are less relevant receptive fields since they overlap only slightly with the target region; dashed grey circles are irrelevant receptive fields, since they have no overlap with the target. The thick dotted oval represents the observer’s attentional window, defining the range of receptive fields pooled together in order to make a detection judgment. In the baseline condition, the window is wellconstrained to the target area. In the masking condition, the window is more spread out, pooling together more irrelevant receptive fields.

pedestal was invisible. This technique may have been less effective during the masking conditions than during the baseline conditions. Figure 7.1 illustrates how uncertainty, shown here as a dotted-line attentional ‘pooling window’ could vary between conditions and experiments. Unfortunately, the low-contrast slope of the d’ function is the only piece of information available to use, given these data, regarding whether or not transducer uncertainty occurs. If it does occur, it would seem that it does so under circumstances similar to those described here. If it does not, some other process, as yet described in psychophysical or physiological literature, may be responsible for the adjustment of the low-contrast d’ function slope.

Orientation Effects

So far, discussion of perimetric results has focused on the spatial frequency dimension. The reason for this is clear: this is where interesting effects were observed. Based on prior studies, in particular one which used near-identical stimuli (Haun and Essock 2009), it was 177

expected that in the masking conditions, there should have been an anisotropy of masking, with thresholds most elevated at horizontal and less so at other orientations – in fact this was the primary reason for including targets and masks at multiple orientations. The horizontal effect anisotropy, with more masking by broadband contrast at horizontal than at other orientations, has been described multiple times, under multiple conditions (Essock et al 2003, 2009; Hansen and Essock 2005; Kim et al 2009; Haun and Essock 2009). Failure to obtain the effect in these experiments is curious indeed, and suggests that perhaps there are other qualities to a broadband stimulus besides its bandwidth and phase structure that determine the evocation of the horizontal effect. Perhaps the most crucial difference between conditions in this experiment and in those in other studies mentioned above is in the experimental design. All of the prior studies, except for one experiment, used broadband masks which included contrast structure at the same frequencies and orientations as the target – that is, as far as the target content was concerned, the masks were locally ‘smooth’. The one exception was in Essock et al 2009, where the orientation bandwidth of masking was measured with masks that did not include content at the same orientation as the target. In that experiment, horizontal effects were still obtained, though in most cases they were idiosyncratic and noisy. In the experiments reported in this dissertation, the broadband mask pointedly excluded content which overlapped with the target domain, so that observers had to search for and detect the target through something like a spectral hole. This situation may bring with it some difficulties which are not predicted by theories of spatial vision or pattern masking, and may require application of paradigms along the lines of spectral search tasks to resolve. This is ironic, since the design demonstrated in this dissertation was intended to mimic the simplest

178

form of pattern masking studies as carried out with more ‘traditional’ grating stimuli in the past (e.g. Foley 1994; Petrov, Carandini and McKee 2005; Meese and Holmes 2007). Another possibility as to why horizontal effects were not obtained could lie with the observers themselves; the structure of each individual’s visual system is different, impacted by learning and experience over a lifetime. Orientation anisotropies vary in their magnitude between individuals (Essock et al 2009 provide an illustration of horizontal effects across many different observers, and the effects are indeed very variable). One might argue that, perhaps, the observers selected for these experiments simply are not likely to obtain a horizontal effect in any stimulus scenario. However, all three (AH before the current study, CL and STP since) have been subjects in other experiments using stimuli more like those in previous studies, and all three have reliably produced horizontal effects.

It seems most likely that there is some stimulus-

specific factor which negated the horizontal effect with these stimuli. Discovering just what that factor is will require some exploration of stimulus parameters, which was well beyond the scope of these experiments.

Gain Control or Noise Masking?

Effects of Gain Control and Surround Suppression

The findings of Haun and Essock (2009) were generally supported by the experiments presented here, in that the gain control model is a good fit to the effects of a fixed-contrast 1/f noise mask on TvC functions. The best single-parameter fit of the d’ function for contrast to the data is a shift of the ‘threshold’ parameter z to higher contrasts, which is equivalent to the

179

prevailing model of contrast gain control (Chapter 2; Foley 1994). One problem with this account is in the relative lack of measured ‘dipper crossing’ (cf. Figures 4.12, 4.13), which was observed in some cases, but not in others, where it would have been expected. In the main data set, there was very little crossing, manifesting in the parameter analyses as other moving parameters which raised thresholds overall. In the second experiment, some observers’ data hinted at the reason why there might be both an effect of gain control and a lack of suprathreshold facilitation. For one of these observers (AH), significant crossing was observed, lending support to the gain control hypothesis. For the other two observers (JX and DC), less or no crossing occurred; fits of the S-F parameters to their data described this lack of crossing as a decrease with mask contrast of the amplitude parameter r. This change can be interpreted in numerous ways, but one is perhaps most pertinent: surround masking. Multiple studies in recent years have focused on the effects of a surround mask (such as an annulus surrounding a target, or multiple gratings flanking a target) on sensitivity to a target. Accumulated evidence indicates that the dominant effect is of response suppression, i.e. a decrease in the amplitude of the response function (with fMRI and psychophysics: ZengerLandolt and Heeger 2003; with psychophysics Chen and Tyler 2008; in single neurons: Cavanaugh, Bair, and Movshon 2002). For foveated targets, surround interactions are most measurable when there is a target pedestal (Chen and Tyler 2008; Zenger-Landolt and Heeger 2003), since surround interactions impact the entire d’ function. To date, there is no widely accepted model of how r changes with surround contrast, except that it decreases. The type of change effected by surround interactions is generally thought to be a form of response suppression, different from the ‘contrast gain control’ (Heeger 1992) commonly associated with overlay masking. At any rate, a combination of r suppression and contrast gain control (i.e. the

180

‘vary zr’ model of Chapter 5, or the ‘vary r’ gain control model of Chapter 6) would be sufficient to explain the bulk of the effects of 1/f noise on sensitivity. Adding to this hypothesis is the fact that stimuli used in these experiments were large, extending especially at the higher spatial frequencies to several target cycles beyond the fixation point. Given such a stimulus display, previous studies of surround masking would undoubtedly predict that surround masking should occur to some degree with the stimuli used here. In the main experiment, decreased r values were seen for some observers in some conditions, but there was no systematic effect (i.e. no overall effect, or interaction between spatial frequency and masking condition). However, mask contrasts used in the main experiment were rather low, and the only measurable effect of surround masking may have been to eliminate some suprathreshold facilitation which would have resulted from the dominant gain control mechanism. In the second experiment, however, progressively higher mask contrasts were used, and fits of the S-F parameters to observers’ data make clear that the value of r decreased with mask contrast. There are other potential explanations of this change in sensitivity, though none are as convenient as the surround masking hypothesis. It may be that adding 1/f noise to the target display makes discrimination more difficult through a higher-level mechanism than an external noise paradigm (Chapters 2 and 6 make clear that noise added to a compressive transducer will be reduced to insignificance at high input contrasts).

Noise Masking

The analyses in Chapter 6 should make clear that noise masking is a strong theoretical candidate when it comes to mechanisms of masking by 1/f noise. The principal piece of

181

evidence is in the damping of the facilitatory dips in the masked TvC functions. That this damping occurs most detectably at the lower spatial frequency tested is further support for the idea that the ‘noisiness’ of the 1/f masks was itself a contributor to their masking power. This is because it is well-established that the bandwidth of contrast sensitivity increases towards lower spatial frequencies (DeValois et al 1982). The increase is relatively slight, so that it is fair to make statements such as those made in Chapter 1 of this dissertation: contrast sensitivity bandwidths are a little larger than one octave at all spatial frequencies. This is a convenient simplification, since it makes it seem that the structure of sensitivity bandwidths is directly consistent with the 1/f structure of natural scenes (Field 1987). Still, the broadening of bandwidths towards lower frequencies is likely to directly underlie the effects of masking seen here (and also in Haun and Essock 2009 and Schofield and Georgeson 2003, both of whom used 1/f noise masks with narrowband targets). The surprising thing about the results of analyses shown in Chapter 6 was that noise masking was not such a strong candidate, as a masking mechanism, at the higher frequencies. A simulated observer showed that even with relatively narrow filter bandwidths, the stimuli used in these experiments should still have had a significant effect on sensitivity even in the absence of a gain control mechanism. Reduction or exclusion of external noise effects has been described in attentional spatial vision literature (Smith and Ratcliff 2009; Lu and Dosher 2005; Dosher and Lu 2000), although the context for these hypotheses has consistently been the spatial rather than the frequency domain. Dosher and Lu’s studies have repeatedly shown that when attention is directed to a particular spatial location where noise is added to a target, sensitivity improves in such a way that it appears that the masking noise is somehow being removed somewhere along the visual stream. Goris et al (2009) have suggested that a neural response pooling rule which is

182

able to account for correlated noise across multiple detectors can effectively reduce the effects of external noise. In the case of the experiments described in this dissertation, it may be that noise exclusion is relatively more effective at higher spatial frequencies than at low – in fact, this would be consistent with Goris et al’s account, since with these stimuli higher spatial frequency stimuli contained more cycles, and would thereby stimulate more relevant neurons than lower spatial frequency stimuli. Unfortunately, there is very little perimetric data on either noise masking or the effects of attention on spatial vision abilities, so we do not know whether noise exclusion (or Goris et al’s sensitivity model) would be applicable at all spatial frequencies, or whether it would be more effective at some frequencies than at others. It would appear that resolution of this question would require use of paradigms which are able to manipulate attention, and to specifically relate the bandwidth of sensitivity to the capacity of a noise stimulus to mask a target. This would require measurement of ‘perceptive field’ bandwidths (cf. Wilson et al 1983; Essock et al 2009), implementation of detection models using these bandwidths, and comparison of simulated results with human behavior. Such work would not necessarily be difficult; but is well beyond the scope of the experiments included here.

Summary

The overarching goal of this project was to describe the effects of 1/f noise on contrast sensitivity. The dominant effect, across the frequency domain, seems to be a shift of the contrast d’ function to higher stimulus values, this commonly described as contrast gain control. Since there is already a widely used model for describing such effects, this is convenient. Another

183

effect of 1/f noise seems to be to increase the slopes (low and high contrast) of the sensitivity function. The reasons for this may be attentional and described in terms of uncertainty theory, though this is a difficult interpretation to pin down. Still, this does emphasize that gain control alone cannot be the sole effect of broadband structure on tests of contrast sensitivity. Furthermore, the strength of the putative gain control effect seems to increase towards lower spatial frequencies. This presents another element to the problem, since this increase is in some ways consistent with the effects of external noise on contrast sensitivity. The effects of noise can be simplified, as in Chapter 6, and fit to data; doing just this adds to the likelihood that noise, rather than gain control, is responsible for raising thresholds at low frequencies. As something of a side note, interesting perimetric data was obtained, in particular the finding that the CSF can be explained as a change in the weighting of contrast before exposure to the nonlinearities of the d’ function. As a wholly new finding with implications for how contrast is processed by the visual system, this warrants further exploration in the future.

184

Conclusions

Conclusions can be summarized as follows: First, differences in unmasked sensitivity across spatial frequency can, through the model of sensitivity used in detail in Chapter 5, be attributed primarily to differences in the linear weighting of contrast passed through the contrast d’ function. Second, when stimuli are detected within a field of 1/f noise, differences in sensitivity can be attributed to a combination of the first factor and differences in the strength of a contrast gain control coefficient. Third, on the basis of analyses of data collected, it is hypothesized that when many stimulus conditions are included for a single observer, there are attentional effects which can be described through implementation of uncertainty theory. Fourth, at lower spatial frequencies, a noise masking model is at least as good of a fit to 1/f masked contrast discrimination data as a gain control model, indicating that further study is warranted in this area.

185

References:

Albrecht DG and Geisler WS (1991). Motion selectivity and the contrast-response function of simple cells in the visual cortex. Visual Neuroscience 7, 531-546. Albrecht DG and Hamilton DB (1982). Striate cortex of monkey and cat – contrast response function. Journal of Neurophysiology 48, 217-237. Appelle S (1972). Perception and discrimination as a function of stimulus orientation: The oblique effect in man and animals. Psychological Bulletin, 78, 266-278. Barlow HB and Levick WR (1969). Three factors limiting the reliable detection of light by retinal ganglion cells of the cat. Journal of Physiology 200, 1-24. Barlow HB, Levick WR, and Yoon M (1971). Responses to single quanta of light in retinal ganglion cells of the cat. Vision Research Supplement No. 3, 87-101. Bevington PR and Robinson DK (2002). Data reduction and error analysis for the physical sciences. Columbus, McGraw-Hill. Bex PJ, Mareschal I, and Dakin SC (2007). Contrast gain control in natural scenes. Journal of Vision 7, 1-12. Bird CM, Henning GB, and Wichmann FA (2002). Contrast discrimination with sinusoidal gratings of different spatial frequency. Journal of the Optical Society of America-A 19, 1267-1273. Blake R and Holopigian K (1985). Orientation selectivity in cats and humans assessed by masking. Vision Research 25, 1459-1468. Blakemore C and Campbell FW (1969). On the existence of neurones in the human visual system selectively sensitive to the orientation and size of retinal images. Journal of Physiology 203, 237-260. Boynton GM, Demb JB, Glover GH, and Heeger DJ (1999). Neuronal basis of contrast discrimination. Vision Research 39, 257-269. Bradley A and Ohzawa I (1986). A comparison of contrast detection and discrimination. Vision Research 26, 991-997. Brady N and Field DJ (1995). What’s constant in contrast constancy? The effects of scaling on the perceived contrast of band-pass patterns. Vision Research 35, 739-756. Brainard DH (1997). The psychophysics toolbox. Spatial Vision 10, 433-436. 186

Burton GJ and Moorhead IR (1987). Color and spatial structure in natural scenes. Applied Optics 26, 157-170. Buzsaki G (2006). Rhythms of the brain. Oxford: Oxford University Press. Campbell FW and Green DG (1965). Optical and retinal factors affecting visual resolution. Journal of Physiology 181, 576-593. Campbell FW and Kulikowski JJ (1967). Orientational selectivity of the human visual system. Journal of Physiology 187, 437-445. Campbell FW and Kulikowski JJ (1972). The visual evoked potential as a function of contrast of a grating pattern. Journal of Physiology 222, 345-356. Campbell FW, Kulikowski JJ, and Levinson JZ (1967). The effect of orientation on the visual resolution of gratings. Journal of Physiology 187, 427-436. Cannon MW and Fullenkamp SC (1991). A transducer model for contrast perception. Vision Research 31, 983-998. Carandini M and Sengpiel F (2005). Contrast invariance of functional maps in cat primary visual cortex. Journal of Vision 4, 130-143. Carandini M, Heeger DJ, and Movshon JA (1997). Linearity and normalization in simple cells of the macaque primary visual cortex. Journal of Neuroscience 17, 8621-8644. Carlson CR (1978). Thresholds for perceived image sharpness. Photographic Science and Engineering 22, 69-71. Carrasco M, Penpeci-Talgar C, Eckstein M (2000). Spatial covert attention increases contrast sensitivity across the CSF: support for signal enhancement. Vision Research 40, 12031215. Cavanaugh JR, Bair W, and Movshon JA (2002). Nature and interaction of signals from the receptive field center and surround in macaque V1 neurons. Journal of Neurophysiology 88, 2530-2546. Chen CC and Tyler CW (2008). Excitatory and inhibitory interaction fields of flankers revealed by contrast-masking functions. Journal of Vision 8. Chen CC and Tyler CW (2001). Lateral sensitivity modulation explains the flanker effect in contrast discrimination. Proceedings of the Royal Society of London: Biological Sciences 268, 509-516.

187

Chirimuuta M and Tolhurst DJ (2005). Does a Bayesian model of V1 contrast coding offer a neurophysiological account of human contrast discrimination? Vision Research 45, 29432959. Clarke ADF, Green PR, Chantler MJ, and Emrith K (2008). Visual search for a target against a 1/f (beta) continuous textured background. Vision Research 48, 2193-2203. Daugman JG (1984). Spatial visual channels in the Fourier plane. Vision Research 24, 891-910. Daugman JG (1989). Entropy reduction and decorrelation in visual coding by oriented neural receptive-fields. IEEE Transactions on Biomedical Engineering 36, 107-114. DeValois RL, Albrecht DG, and Thorell LG (1982). Spatial frequency selectivity of cells in macaque visual cortex. Vision Research 22, 545-559. Derrington AM and Lennie P (1984). Spatial and temporal contrast sensitivities of neurones in lateral geniculate nucleus of macaque. Journal of Physiology 357, 219-240. DeValois RL and DeValois K (1990). Spatial vision. New York: Oxford University Press. DeValois RL, Morgan H, and Snodderly DM (1974). Psychophysical studies of monkey vision: 3. Spatial luminance contrast sensitivity tests of macaque and human observers. Vision Research 14, 75-81. Dosher BA and Lu ZL (1999). Characterizing human perceptual inefficiencies with equivalent internal noise. Journal of the Optical Society of America A – Optics, Image Science, and Vision 16, 764-778. Dosher BA and Lu ZL (2000). Noise exclusion in spatial attention. Psychological Science 11, 139-146. Enroth-Cugell C and Robson JG (1966). The contrast sensitivity of retinal ganglion cells of the cat. Journal of Physiology 187, 517-552. Essock EA, DeFord JK, Hansen BC, and Sinai MJ (2003). Oblique stimuli are seen best (not worst!) in naturalistic broad-band stimuli: A horizontal effect. Vision research 43, 13291335. Essock EA, Hansen BC, and Haun AM (2007). Illusory bands in orientation and spatial frequency: a cortical analog to Mach bands. Perception 36, 639-649. Essock, E. A., Haun, A. M., & Kim, Y. J. (2009). An anisotropy of orientation-tuned suppression that matches the anisotropy of typical natural scenes. Journal of Vision, 9(1):35, 1-15 Field DJ (1994). What is the goal of sensory coding. Neural Computation 6, 559-601.

188

Field DJ (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America-A, 4, 2379-2394. Field DJ and Brady N (1997). Visual sensitivity, blur, and the sources of variability in the amplitude spectra of natural scenes. Vision Research 37, 3367-3383. Foley JM (1994). Human luminance pattern-vision mechanisms – masking experiments require a new model. Journal of the Optical Society of America-A 11, 1710-1719. Foley JM and Legge GE (1981). Contrast detection and near-threshold discrimination in humanvision. Vision Research 21, 1041-1053. Foley JM and Schwarz W (1998). Spatial attention: effect of position uncertainty and number of distractor patterns on the threshold-versus-contrast function for contrast discrimination. Journal of the Optical Society of America-A 15, 1036-1047. Foley JM, Varadharajan S, Koh CC and Farias MCQ (2007). Detection of Gabor patterns of different sizes, shapes, phases, and eccentricities. Vision Research 47, 85-107. Geisler WS and Albrecht DG (1997). Visual cortex neurons in monkeys and cats: detection, discrimination, and identification. Visual Neuroscience 14, 897-919. Geisler WS, Perry JS, and Najemnik J (2006). Visual search: the role of peripheral information measured using gaze-contingent displays. Journal of Vision 6, 858-873. Georgeson MA and Sullivan GD (1975). Contrast constancy – deblurring in human vision by spatial frequency channels. Journal of Physiology 252, 627-656. Gold JM, Sekuler AB, and Bennett PJ (2004). Characterizing perceptual learning with external noise. Cognitive Science 28, 167-207. Gorea A and Sagi D (2001). Disentangling signal from noise in visual contrast discrimination. Nature Neuroscience 4, 1146-1150. Goris, Wichmann, and Henning (2009). A neurophysiologically plausible population-code model for human contrast discrimination. (In Preparation). Gottschalk A (2002). Derivation of the visual contrast response function by maximizing information rate. Neural Computation 14, 527-542. Graham N (1989). Visual Pattern Analyzers. New York: Oxford University Press. Graham N and Sutter A (1998). Spatial summation in simple (Fourier) and complex (nonFourier) texture channels. Vision Research 38, 231-257.

189

Green DM and Swets JA (1966). Signal Detection Theory and Psychophysics. Huntington NY, Krieger. Greenlee MW and Heitger F (1988). The functional role of contrast adaptation. Vision Research 28, 791-797. Grossberg S and Mingolla E (1985). Neural dynamics of form perception: boundary completion, illusory figures, and neon color spreading. Psychological Review 92, 173-211. Guenther E and Zrenner E (1993). The spectral sensitivity of dark- and light-adapted cat retinal ganglion cells. Journal of Neuroscience 13, 1543-1550. Hansen BC and Essock EA (2005). Influence of scale and orientation on the visual perception of natural scenes 12, 1199-1234. Hansen BC and Essock EA (2006). Anisotropic local contrast normalization: The role of stimulus orientation and spatial frequency bandwidths in the oblique and horizontal effect perceptual anisotropies. Vision Research 46, 4398-4415. Hansen BC and Hess RF (2006). The role of spatial phase in texture segmentation and contour integration. Journal of Vision 6, 594-615. Hansen BC, Essock EA, Zheng Y, and DeFord JK (2003). Perceptual anisotropies in visual processing and their relation to natural image statistics. Network: Computation in Neural Systems 14, 501-526. Haun and Essock 2009. Contrast sensitivity for oriented patterns in 1/f noise. (Journal of Vision, Submitted). Heeger DJ (1992). Normalization of cell responses in cat striate cortex. Visual Neuroscience 9, 181-197. Henning GB and Wichmann FA (2007). Some observations on the pedestal effect. Journal of Vision 7, 1-15. Henning GB, Bird CM, and Wichmann FA (2002). Contrast discrimination with pulse trains in pink noise. Journal of the Optical Society of America A, 19, 1259-1266. Holmes DJ and Meese TS (2004). Grating and plaid masks indicate linear summation in a contrast gain pool. Journal of Vision 4, 1080-1089. Huang LQ and Dobkins KR (2005). Attentional effects on contrast discrimination in humans: evidence for both contrast gain and response gain. Vision Research 45, 1201-1212. Hubel DH and Weisel TN (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. Journal of Physiology 160, 106-154. 190

Hubel DH and Weisel TN (1968). Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology 195, 215-243. Kass RE and Raftery AE (1995). Bayes factors. Journal of the American Statistical Association 90, 773-795. Katkov M, Tsodyks M, and Sagi S (2007). Inverse modeling of human contrast response. Vision Research 47, 2855-2867. Kim Y-J, Haun AM, Essock EA (2009). Suppression is Local in the Spatial Fourier Plane and Anisotropic: “Sustained” and “Transient” Spatiotemporal Characteristics of Masking. Vision Research submitted. Klein SA (1992). An Excel macro for transformed and weighted averaging. Behavior Research Methods, Instruments, and Computers 24, 90-96. Kontsevich LL, Chen CC, and Tyler CW (2002). Separating the effects of response nonlinearity and internal noise psychophysically. Vision Research 42, 1771-1784. Kulikowski JJ and King-Smith PE (1973). Spatial arrangement of line, edge, and grating detectors revealed by subthreshold summation. Vision Research 13, 1455-1478. Kulikowski JJ, Abadi R, King-Smith PE (1973). Orientational selectivity of grating and line detectors in human vision. Vision Research 13, 1479-1486. Kwon MY, Legge GE, Fang F, Cheong AMY, and He S (2009). Adaptive changes in visual cortex following prolonged contrast reduction. Journal of Vision 9, 1-16. Kulikowski JJ, Abadi R, King-Smith PE (1973). Orientational selectivity of grating and line detectors in human vision. Vision Research 13, 1479-1486. Legge GE and Foley JM (1980). Contrast masking in human vision. Journal of the Optical Society of America 70, 1458-1471. Legge GE, Kersten D, and Burgess AE (1987). Contrast discrimination in noise. Journal of the Optical Society of America-A 4, 391-404. Levi DM, Klein SA, and Chen I (2005). What is the signal in noise? Vision Research 45, 18351846. Li BW, Peterson MR, and Freeman RD (2003). Oblique effect: A neural basis in the visual cortex. Journal of Neurophysiology 90, 204-217. Ling S and Carrasco M (2006). Sustained and transient covert attention enhance the signal via different contrast response functions. Vision Research 46, 1210-1220. 191

Lu ZL and Dosher BA (1999). Characterizing human perceptual inefficiencies with equivalent internal noise. JOSA-A 16, 764-778. Lu ZL and Dosher BA (2004). Spatial attention excludes external noise without changing the spatial frequency tuning of the perceptual template. Journal of Vision 4, 955-966. MacMillan NA and Creelman CD (1991). Detection theory: A user’s guide. Cambridge: Cambridge University Press. Magnussen S and Greenlee MW (1999). The psychophysics of perceptual memory. Psychologische Forschung 62, 81-92. Mante V, Frazor RA, Bonin V, Geisler WS, and Carandini M (2005). Independence of luminance and contrast in natural scenes and in the early visual system. Nature Neuroscience 8, 1690-1697. Meese TS and Georgeson MA (2005). Carving up the patchwise transform: towards a filter combination model for spatial vision. In Advances in Psychology Research, vol 34, pp 51-88. New York: Nova Science Publishers. Meese TS and Hess RF (2004). Low spatial frequencies are suppressively masked across spatial scale, orientation, field position, and eye of origin. Journal of Vision 4, 843-859. Meese TS and Holmes DJ (2002). Adaptation and gain pool summation: alternative models and masking data. Vision Research 42, 1113-1125. Meese TS and Holmes DJ (2007). Spatial and temporal dependencies of cross-orientation suppression in human vision. Proceedings of the Royal Society B: Biological Sciences 274, 127-136. Meese TS and Summers RJ (2007). Area summation in human vision at and above detection threshold. Proceedings of the Royal Society B – Biological Sciences 274, 2891-2900. Morrone MC and Burr DC (1988). Feature detection in human vision: a phase-dependent energy model. Proceedings of the Royal Society of London B 235, 221-245. Movshon JA and Blakemore C (1973). Orientation specificity and spatial selectivity in human vision. Perception 2, 53-60. Movshon JA, Thompson ID, and Tolhurst DJ (1978). Spatial and temporal contrast sensitivity of neurones in areas 17 and 18 of the cat’s visual cortex. 283, 101-120. Nachmias J and Sansbury (1974). Grating contrast – discrimination may be better than detection. Vision Research 14, 1039-1042.

192

Naka KI and Rushton WAH (1966). S-potentials from colour units in the retina of fish (cyprinidae). Journal of Physiology 185, 536-555. Nandy AS and Tjan BS (2007). The nature of letter crowding as revealed by first- and secondorder classification images. Journal of Vision 7, 1-26. Ohzawa I, Sclar G, Freeman RD (1985). Contrast gain control in the cat’s visual system. Journal of Neurophysiology 54, 651-667. Olzak LA and Thomas JP (2003). Dual nonlinearities regulate contrast sensitivity in pattern discrimination tasks. Vision Research 43, 1433-1442. Padilla S, Drbohlav O, Green PR, Spence A, and Chantler MJ (2008). Perceived roughness of 1/f(beta) noise surfaces. Vision Research 48, 1791-1797. Pantle A and Sekuler R (1968). Size-detecting mechanisms in human vision. Science 162, 11461148. Parker AJ and Hawken MJ (1988). Two-dimensional spatial structure of receptive fields in monkey striate cortex. Journal of the Optical Society of America-A, 598-605. Parraga CA, Troscianko CA, and Tolhurst DJ (2000). The human visual system is optimised for processing the spatial information in natural visual images. Current Biology 10, 35-38. Peli E (1990). Contrast in complex images. Journal of the Optical Society of America-A 7, 20322040. Pelli DG (1981). The effects of visual noise. Ph.D dissertation (Department of Physiology, Cambridge University, Cambridge UK, 1981). Pelli DG (1985). Uncertainty explains many aspects of visual contrast detection and discrimination. Journal of the Optical Society of America-A 2, 1508-1532. Pelli DG (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision 10, 437-442. Pelli DG and Farell B (1999). Why use noise? JOSA-A 16, 647-653. Pelli DG and Zhang L (1991). Accurate control of contrast on microcomputer displays. Vision Research 31, 1337-1350. Petrov Y, Verghese P, and McKee SP (2006). Collinear facilitation is largely uncertainty reduction. Journal of Vision 6, 180-178. Phillips GC and Wilson HR (1984). Orientation bandwidths of spatial mechanisms measured by masking. Journal of the Optical Society of America-A 1, 226-232. 193

Pitt MA and Myung J (2002). When a good fit can be bad. Trends in Cognitive Sciences 6, 421425. Pointer JS and Hess RF (1989). The contrast sensitivity gradient across the human visual-field with emphasis on the low spatial-frequency range. Vision Research 29, 1133-1151. Press W.H., S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. 1992. Numerical Recipes in C. 2nd ed. Cambridge: University Press. Quick RF (1974). A vector-magnitude model of contrast detection. Kybernetik 16, 65-67. Quick RF and Reichert TA (1975). Spatial-frequency selectivity in contrast detection. Vision Research 15, 637-643. Rajashekar U, Bovik AC, Cormack LK (2006). Visual search in noise: Revealing the influence of structural cues by gaze-contingent classification image analysis. Journal of Vision 6, 379-386. Ross J and Speed HD (1991). Contrast adaptation and contrast masking in human vision. Proc. R. Soc. B 246, 61-69. Ross J, Speed HD, and Morgan MJ (1993). The effects of adaptation and masking on incremental thresholds for contrast. Vision Research 33, 2051-2056. Sachs MB, Nachmias J, and Robson JG (1971). Spatial-frequency channels in human vision. Journal of the Optical Society of America 61, 1176-1186. Schade OH (1956). Optical and photoelectric analog of the eye. Journal of the Optical Society of America 46, 721-739. Schofield A and Georgeson MA (2003). Sensitivity to contrast modulation: The spatial frequency dependence of second-order vision. Vision Research 43, 243-259. Schwartz O and Simoncelli EP (2001). Natural signal statistics and sensory gain control. Nature Neuroscience 4, 819-825. Sclar G, Maunsell JHR, and Lennie P (1990). Coding of image contrast in central visual pathways of the macaque monkey. Vision Research 30, 1-10. Shapley RM and Enroth-Cugell C (1984). Visual adaptation and retinal gain controls. Progress in Retinal Research 3, 263-346. Sherman SM and Guillery RW (2003). The role of the thalamus in the flow of information to the cortex. Philosophical Transactions of the Royal Society of London B 357, 1695-1708.

194

Sit YF, Chen Y, Geisler WS, Seidemann E, and Miikkulainen R (2008). A population gain control model for responses in the primary visual cortex. In Society for Neuroscience Abstracts. Washington DC, Society for Neuroscience. Smith PL and Ratcliff R (2009). An integrated theory of attention and decision making in visual signal detection. Psychological review 116, 283-317. Snowden RJ (1991). Measurement of visual channels by contrast adaptation. Proc. R. Soc. Lond. B 246. 53-59. Snowden RJ (1992). Orientation bandwidth – the effect of spatial and temporal frequency. Vision Research 32, 1965-1974. Solomon JA (2009). The history of dipper functions. Attention, Perception, and Psychophysics 71, 435-443. Stevens, SS (1957). On the psychophysical law. Psychophysical Review 64, 153–181. Stromeyer CF and Julesz B (1972). Spatial-frequency masking in vision – critical bands and spread of masking. Journal of the Optical Society of America 62, 1221-1232. Stromeyer CF and Klein SA (1974). Spatial frequency channels in human vision as asymmetric (edge) mechanism. Vision Research 14, 1409-1420. Tavassoli A, van der Linde I, Bovik AC, and Cormack LK (2009). Eye movements selective for spatial frequency and orientation during active visual search. Vision Research 49, 173181. Tavassoli A, van der Linde I, Bovik AC, and Cormack LK (2007). Orientation anisotropies in visual search revealed by noise. Journal of Vision 7, 1-8. Taylor CP, Bennett PJ, and Sekuler AB (2003). Noise detection: Bandwidth uncertainty and adjustable channels. VSS abstract. Tolhurst DJ and Tadmor Y (1997). Band-limited contrast in natural images explains the detectability of changes in the amplitude spectra. Vision Research 37, 3203-3215. Tootle JS and Berkley MA (1983). Contrast sensitivity for vertically and obliquely oriented gratings as a function of grating area. Vision Research 9, 907-910. Tyler CW (1997). Why we need to pay attention to psychometric function slopes. Visual Science and Its Applications: Optical Society of America Topical Meeting 1997, 240-244. Tyler CW and Chen CC (2000). Signal detection theory in the 2AFC paradigm: attention, channel uncertainty and probability summation. Vision Research 40, 3121-3144.

195

Voss RF and Clarke J (1975). 1-f noise in music and speech. Nature 258, 317-318. Wainwright MJ, Schwartz O, and Simoncelli EP (2001). Natural image statistics and divisive normalization: Modeling nonlinearities and adaptation in cortical neurons. In R Rao, B Olshausen and M Lewicki (Eds.), Probabilistic models of the brain: Perception and neural function. Cambridge, MA: MIT Press. Wasserman L (2000). Bayesian model selection and model averaging. Journal of Mathematical Psychology 44, 92-107. Watson AB (1990). Gain, noise, and contrast sensitivity of linear visual neurons. Visual Neuroscience 4, 147-157. Watson AB and Ahumada AJ (2005). A standard model for foveal detection of spatial contrast. Journal of Vision 5, 717-740. Watson AB and Pelli DG (1983). QUEST: A Bayesian adaptive psychometric method. Perception and Psychophysics 33, 113-120. Watson AB and Solomon JA (1997). A model of visual contrast gain control and pattern masking. Journal of the Optical Society of America 14, 2379-2391. Watt RJ and Morgan MJ (1985). A theory of the primitive spatial code in human-vision. Vision Research 25, 1661-1674. Wichmann FA and Hill NJ (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception and Psychophysics 63, 1293-1313. Wilson HR and Humanski R (1993). Spatial-frequency adaptation and contrast gain-control. Vision Research 33, 1133-1149. Wilson JR and Sherman SM (1976). Receptive-field characteristics of neurons in cat striate cortex: changes with visual field eccentricity. Journal of Neurophysiology 39, 512-533. Wilson HR, McFarlane DK, and Phillips GC (1983). Spatial-frequency tuning of orientation selective units estimated by oblique masking. Vision Research 23, 873-882. Yu C, Klein SA, and Levi DM (2004). Perceptual learning in contrast discrimination and the (minimal) role of context. Journal of Vision 4, 169-182. Yu C, Klein SA, and Levi DM (2003). Cross- and Iso-oriented surrounds modulate the contrast response function: The effect of surround contrast. Journal of Vision 3, 527-540. Zenger-Landolt B and Heeger DJ (2003). Response suppression in V1 agrees with psychophysics of surround masking. Journal of Neuroscience 23, 6884-6893.

196

Zhan C, Ledgeway T, and Baker CL (2005). Contrast response in visual cortex: quantitative assessment with intrinsic optical signal imaging and neural firing. NeuroImage 26, 330346. Zucchini W (2000). An introduction to model selection. Journal of Mathematical Psychology 44, 41-61.

197

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.