Human Pattern Recognition

Perception, 1994, volume 23, pages 411 -427

Human pattern recognition: parallel processing and perceptual learning

Manfred Fahle Section for Visual Science, Department of Neurophthalmology, University Eye Clinic, Rontgenweg 11, D72076 Tubingen, Germany Received 12 July 1993, in revised form 23 December 1993

Abstract. A new theory of visual object recognition by Poggio et al that is based on multidimen-sional interpolation between stored templates requires fast, stimulus-specific learning in the visual cortex. Indeed, performance in a number of perceptual tasks improves as a result of practice. We distinguish between two phases of learning a vernier-acuity task, a fast one that takes place within less than 20 min and a slow phase that continues over 10 h of training and probably beyond. The improvement is specific for relatively 'simple' features, such as the orien-tation of the stimulus presented during training, for the position in the visual field, and for the eye through which learning occurred. Some of these results are simulated by means of a computer model that relies on object recognition by multidimensional interpolation between stored templates. Orientation specificity of learning is also found in a jump-displacement task. In a manner parallel to the improvement in performance, cortical potentials evoked by the jump displacement tend to decrease in latency and to increase in amplitude as a result of training. The distribution of potentials over the brain changes significantly as a result of repeated exposure to the same stimulus. The results both of psychophysical and of electrophysiological experiments indicate that some form of perceptual learning might occur very early during cortical information processing. The hypothesis that vernier breaks are detected 'early' during pattern recognition is supported by the fact that reaction times for the detection of verniers depend hardly at all on the number of stimuli presented simultaneously. Hence, vernier breaks can be detected in parallel at different locations in the visual field, indicating that deviation from straightness is an elementary feature for visual pattern recognition in humans that is detected at an early stage of pattern recognition. Several results obtained during the last few years are reviewed, some new results are presented, and all these results are discussed with regard to their implications for models of pattern recognition.

1 Introduction Visual object recognition relies on the comparison between an actual retinal image of an object and stored examples of previously experienced (or described) visual objects. Neuronal mechanisms in the brain then try to identify the actually presented object by matching it with stored object descriptions (cf eg Boucart et al 1994). The description most similar to the actually presented object will be regarded as the object (eg a zebra) corresponding to the actual patch of black and white stripes on the retina. The task of correlating the actual with the stored images is far from trivial, as was experienced by researchers in artificial intelligence trying to teach computers how to see. Objects will rarely reappear at exactly the same distance, hence the size of their image varies. They might be at a different visual field position, though foveation, ie looking towards the object, will often compensate for this problem. Objects might appear under very varying illumination, as regards both luminance and spectral composition, and, most important of all, they might be rotated relative to the previous appearance, or might have changed their form, as in the case of a running versus a lying zebra. Somehow, our brain is able to cope with these problems much better than computer programs are up to now, and to extract invariances from the visual image (cf Van Gool et al 1994).

One possible explanation is that our brain makes use of a powerful strategy to identify objects even from novel views: multidimensional interpolation between stored

412 M Fahle

examples (Poggio 1990). This hypothesis assumes that templates or views of an object are stored on first appearance, and that the brain is able to interpolate between these templates in order to recognise the object even after rotation, translation, or slight changes of shape or form. Pattern recognition would then rely not on the formation of a complex three-dimensional model of the object, requiring quite high-level computational processes, but on a heavily memory-based strategy with far less com-putation required. Learning of relatively simple feature combinations plays a key role here, and the model requires that fast and specific perceptual learning occurs even in adults.

It has been known for some time that recognition of visually presented objects indeed improves through practice. For instance, to all students of histology most sections look quite similar, be they taken from liver, lung, or kidney. But after some time, shorter for some than for others, the more advanced observer wonders why he or she was ever able to miss the difference. It has also been known for a couple of decades that performance improves as a function of training in much less complex tasks, such as vernier discrimination, line-orientation discrimination, or stereoscopic depth perception (Bennett and Westheimer 1991; Fendick and Westheimer 1983; McKee and Westheimer 1978; Shiu and Pashler 1991; Vogels and Orban 1985). Most researchers even take precautions against learning effects contaminating their results, usually by starting experiments with a lengthy training phase for observers to arrive at their 'baseline' performance, and/or by counterbalancing the order of measurements between observers. Discrimination of spatial phase in complex luminance-modulated gratings improves relatively fast but 'returns' to baseline after rotation of the stimulus by 90 (Fiorentini and Berardi 1980, 1981). This was a first indication that learning in visual perception might be quite specific for the exact features of the stimulus presented. Learning does transfer from one eye to the partner eye in monocular learning experiments but is specific for visual field position (Fiorentini and Berardi 1981; Ramachandran and Braddick 1973), as well as for orientation in a motion detection task (Ball and Sekuler 1987). Even in flies, learning of visual patterns is specific for visual field position (Dill et al 1993). Learning also plays an important role in the detection and discrimination of textures. Discrimina-tion between figure and ground is specific for the eye and the orientation of the stimulus elements, but not of the elements of the surround (Kami and Sagi 1991; cf also Sagi and Polat 1992), and learning depends on which feature of the stimulus is attended to (Ahissar and Hochstein 1993).

In the following, I will review contributions to the investigation of 'early' perceptual learning. Learning in this context is defined as an observable modification of behaviour, namely improvement in perceptual performance, as a result of training. The first part of the paper (section 2) will deal with computational considerations and models for fast perceptual learning. Section 3 will be devoted to the fast and the slow phase of perceptual learning in vernier and stereoscopic depth discrimination and a specific computer simulation; in sections 4 and 5 I will present further evidence for the specificity of the learning, and then present correlations between psychophysical and electrophysiological results in a jump-displacement task in humans. In section 6 I will review the finding that vernier breaks are detected in parallel over the visual field, indicating a relatively early stage of pattern recognition and of perceptual learning.

2 A model of perceptual learning Visual object recognition can be considered as establishing more or less unique relations between retinal images and cerebral representations or concepts of objects. One might speculate that a number of views of an object are stored in visual memory for all objects that can be recognised, and at the time of presentation of a visual

Parallel processing and perceptual learning 413

object the memory bank is searched through for the most similar view stored. If the view, or a very similar one, has been stored, the corresponding 'basis function' will provide a large output. If no sufficiently similar view is found, no 'basis function' or 'receptive field' will provide a significant output, and the new view would be added to the views already stored. Such a procedure requires relatively little computational power, but a large amount of (visual) storage capacity. Therefore, the new model of object recognition requires relatively fast perceptual learning to store new views upon request in the part of the brain that deals with visual perception.

It is important to note that the model does not require the actual view of an object to be identical with a stored view, in the way a look-up table would. On the contrary, the model assumes hyperradial basis functions (HBF) to be spread out through an n-dimensional vector space. The basis functions serve as a kind of 'fuzzy' templates that identify classes of features, rather than individual features. In this sense, the HBF model postulates a form of a blurred look-up table as the basis for visual pattern recognition. In our simplistic model, we feed the input of idealised photoreceptors into an HBF network. When the orientation of the stimulus changes by 90, a different subset of receptors is excited and the input to the HBF network changes. Hence, the model would predict that learning does not generalise across an orientation change of 90. Details of the model can be found in Poggio (1990), Poggio and Girosi (1990), Poggio et al (1992a), and Weiss et al (1993).

3 Orientation specificity of the fast and slow phase of learning We were especially interested in the time course of perceptual learning, and in the question of whether a fast phase of perceptual learning for vernier discrimination exists in addition to the slow phase described in the literature (McKee and Westheimer 1978). We indeed found such a fast phase of vernier learning that takes place within some tens of minutes whereas a second, slow phase requires hours to weeks. Both phases are quite specific for the stimulus used for learning: there is hardly any transfer of learning when the stimulus is rotated by 90. This orientation specificity indicates that improvement is not mainly caused by the fact that observers concentrate better on the task or develop a better general strategy to solve hyperacuity experiments in general.

3.1 Methods Stimuli were presented on a CRT screen (Tektronix 608 or HP 1336) under computer control. A vernier target was presented for 100 ms. It was 10 min arc long and 2 min arc wide, with a luminance around 400 cd m~2 on a surround of 25 cd m~2. Viewing distance was 2.0 m. Observers were paid students of Tubingen University. They had normal or corrected-to-normal visual acuity, were naive as to the aim of the study, and had not previously participated in psychophysical experiments. Stimuli were oriented either horizontally or vertically. Observers had to decide, in a modified two-alternative forced-choice task, whether the lower segment of the stimulus (or the right-hand one) was offset to the left or to the right (or up or down) relative to its partner segment and to indicate their decision by pressing the appropriate one of two push buttons. The computer provided auditory feedback on the correctness of their response.

In the experiments on the slow phase of learning, an adaptive-staircase procedure was used to measure thresholds (PEST; Taylor and Creelman 1967), but the higher temporal resolution required to measure the fast phase did not allow the reliable calculation of thresholds. Here, percentages of incorrect responses were measured for a fixed vernier offset, usually of 15 s arc.

414 M Fahle

3.2 Results Observers quickly improved performance as a result of practice. Mean performance of twelve inexperienced observers improved from 26% to around 15% incorrect responses within less than 30 min (figure 1). Half of the observers were trained with vertical stimulus orientation, the other half with horizontally oriented verniers. After 1 h, stimulus orientation was rotated by 90. Performance deteriorated dramatically as a result of stimulus rotation, to pretraining levels or below.

EHGD

1.0 Time/h

Figure 1. Fast perceptual learning of vernier acuity. Performance [means and standard errors (vertical bars) of percentage of incorrect responses] of twelve observers. Six observers started with vertical orientation of the verniers, six started with horizontal orientation. After 1 h of training, stimulus orientation was changed by 90 (broken vertical line), and the experiment continued for another hour. In spite of large interindividual variation, there was a highly signifi-cant improvement of performance as a result of training that was highly specific for stimulus orientation, ie that did not transfer between orientations (after Poggio et al 1992b).

Similar results were obtained with another twelve inexperienced observers in a long-term learning experiment. Here, thresholds rather than percentages of correct responses were measured, and the total training time per observer was 10 h, rather than 2 h as in the previous experiment. All average thresholds were clearly below 30 s arc, ie in the hyperacuity range, below the diameter of foveal photoreceptors (figure 2).

The fast phase of learning is partly masked in this latter experiment; since each data point represents 240 presentations, rather than 60 presentations as in the previous experiment, temporal resolution is much poorer. As is evident from figure 2, learning of the vernier-discrimination task continues over at least 5 h, and if stimulus orientation changes from vertical to horizontal or vice versa after 5 h thresholds increase to and beyond pretraining levels. The improvement is significant both for the first and for the second part of the curve (p < 0.01). In additional experiments with constant stimulus orientation, improvement of thresholds continued throughout the complete experiment, which lasted for 10 h, split into 10 daily 1 h sessions (Fahle and Edelman 1993). 3.3 Discussion The results on orientation specificity of vernier learning show that there are two distinct phases of perceptual learning, a fast one that improves performance within a few tens of minutes, and a slow one that continues to improve thresholds over hours.


CD 0 -

Z."5 '

ZU "

1 J '

1 n .

5 .

0 -

I ,

ijJWlrl i t l i fW^Hfl f****j

1 I I I I I I I 1

T

if | T T i r T _ UtJIk I IT T j f f l^^ x f ^ j i r J- x 1T-LJI|u1lfi i i i

i i i i t 1 1 " H

0.0 2.5 7.5 10.0 5.0 Time/h

Figure 2. Long-term perceptual learning of vernier acuity. In contrast to figure 1, thresholds rather than percentages of incorrect responses were measured in another twelve inexperienced observers. Each data point represents mean values for 240 presentations per observer, ie almost 3000 presentations; vertical bars indicate standard errors. After 5 h of training, stimulus orientation was rotated by 90 (broken vertical line). Again, there were large interindividual variations, but a clear improvement of thresholds as a result of practice. Improvement did not transfer after rotation of the stimulus. Quite to the contrary, mean thresholds for the new orien-tation were higher than for completely inexperienced observers (after Fahle and Edelman 1993).

The fast phase of perceptual learning immediately improves performance, as in the case of grating discrimination (Fiorentini and Berardi 1980, 1981) but unlike in texture-discrimination tasks (Kami and Sagi 1991). Of course, the different slopes of the learning curve might be distinct phases of the same process, eg one that can be described by an exponential function. The results on learning in grating discrimina-tion (Fiorentini and Berardi 1980, 1981) and on texture segregation (Kami and Sagi 1991; Sagi and Kami 1993) indicate that learning may take place at quite different levels of visual pattern recognition, with different time constants (not just 'fast' and 'slow'), with varying degree of eye specificity (transfer versus no transfer between the eyes), and with or without the need of a phase of consolidation or rest. Informal experiments indicate that the slow phase of learning might continue, at a very low speed, for weeks or even months. These findings indicate that several levels of perceptual learning might exist, not just one. Moreover, more cognitive factors, such as insight into and adaptation to the test, might play a certain role as well. Both the slow and the fast learning process are astonishingly specific: improvement does not transfer to an identical stimulus rotated by 90. This result argues against the assumption that observers learn mostly how to perform optimally in psychophysical experiments in general, ie to concentrate, fixate constantly, and push the correct button. Quite on the contrary, observers seem to learn specific features of the stimulus-features that are so specific that they are of no use (or might even be disadvanta-geous) when the stimulus is rotated by 90. The learning of stimulus-specific features is a prerequisite for the model of pattern recognition based on radial basis functions, as discussed above. Computer simulations, based on a model that uses radial basis functions, were in excellent agreement with the experimental data on the orientation specificity of perceptual learning (Poggio et al 1992b).

416 M Fahle

4 Specificity for visual field position and for the eye used for learning The results of the first experiment indicate that perceptual learning of vernier acuity operates on a level in the visual system where neurons are orientation specific, since, otherwise, transfer of training effects would be expected between orientations. The next question is whether the learning is also specific for the visual field position where the stimulus was learned and for the eye used during monocular learning.

4.1 Methods As in the previous experiment, stimuli appeared on a CRT screen under computer control. Presentation time, stimulus size, luminance, contrast, and viewing distance were as before. A new group of eight students participated in the first part of the experiment, in which specificity for visual field position was tested. The main differ-ence from the previous experiment was that fixation was not on the stimulus but on a fixation point located at different positions around the monitor, such that stimuli were presented for 150 ms (too short for voluntary saccades to the stimulus) at an eccentricity of 10 deg. A video camera and display monitored observers' stability of fixation. In eight observers, eight visual field positions of equal eccentricity, as indi-cated in figure 3, were tested in fixed order. During the first four changes of visual field position, stimulus orientation changed simultaneously with position, ie when a new visual field position was tested, the stimulus not only moved to this new position but also changed orientation by 90. The later transitions, however, were pure changes of position with constant orientation. Vernier offsets varied between 50 and 90 s arc for the peripheral tests, but were constant for each individual observer. Percentages of incorrect responses, rather than thresholds, were measured for a fixed vernier offset that was slightly below the initial threshold for the individual observer and visual field location.

In the second part of the experiment, fixation was again to the stimulus, ie central. To investigate whether learning of a standard line vernier in the centre of gaze was monocular, four observers were trained on vernier discrimination with the right eye for 5 h and were then tested with the left eye; the sequence was reversed for the remaining four observers. Here, thresholds rather than percentages of correct responses were measured.

4.2 Results When the fixation point was above the monitor, the lower visual field was tested (positions 1, 2, and 8 in figure 3a), whereas the upper visual field was tested during fixation on positions 4 - 6 . At most visual field positions, mean performance of all observers improved during the 1 h periods during which this eccentric visual field position was used for training (figure 3b). Performance improved, on average, by 7.09% (standard error 1.4%; p = 0.0002, paired Mest), but decreased by 7.1% (standard error 3.4%) at the transition to a new visual field position. This decrease of performance was significant both for all transitions {p = 0.03, paired Mest) and for the last three transitions in figure 3, where stimulus orientation was constant. Further experiments with a pseudorandom order of testing at constant stimulus orientation (Fahle et al 1994) fully confirmed the specificity of learning for visual field position. The results shown in figure 3, moreover, clearly indicate that performance is better in the lower than in the upper visual hemifield.

When testing was monocular, with another four observers being trained with the right eye and four with the left eye, thresholds improved as a result of training (figure 4). But when testing was with the opposite eye, performance returned to pretraining levels or even above. The results show the same overshoot that was present in long-term learning after change of orientation. Learning in the second eye did not interfere


8 ^ - 2

/ > positions of fixation point

change of position of fixation point

U -10

(b) Position constant Position change positions of fixation point

Figure 3. (a) Performance of eight inexperienced observers. Means and standard errors (vertical bars) of percentage of incorrect responses for eight different positions in the visual field. The fixation point moved to a new position after every hour of training. The positions of the fixa-tion points in relation to the screen bearing the stimulus are shown at the top of the figure, and the corresponding numbers are indicated below the data points. For nearly all positions, performance improved during the training, but deteriorated after the change of position. For the first four changes of positions, as marked by the heavy dashed line, orientation changed simultaneously with position. Positions in the upper visual field (fixation below stimulus, ie positions 4, 5, and 6) tended to yield poorer results than positions in the lower visual field. The first position was retested at the end of the experiment, (b) Average change of performance for individual observers and mean change and standard error (vertical bars) during training at each of the eight positions in the visual field ('position constant'), and at the transition between visual field positions ('position change'). Mean improvement within positions was 7.1% 1.4%, and mean increase of errors at change of position was 7.1% 3.4%.

418 M Fahle

with learning in the first eye: a retest through the first eye after learning through the second eye showed neither significant decrease nor increase of performance.

right eye

eye i

(T5P| eye

right I eye

25

20

15

10

0

\ 1 J 1 J I T 1 1 T

JJJJTTTIT T m ,

1 In * T . HjHkyi Lii i TT T I m w W - - ^ l A I - M J j

1 < 1 J 1 1

1 1 ' 1 ' 1 ' 1

0.0 2.5 7.5 10.0 5.0 Time/h

Figure 4. Transfer of learning between the two eyes. Monocular thresholds were tested in another eight observers for 10 h per observer. Four observers started training with the left eye and four with the right eye. Testing was with the opposite eye; its start is indicated by the broken vertical line. Data points are means for eight subjects, and vertical bars indicate standard errors. The results do not show any transfer of learning between the eyes, but an overshoot of thresholds similar to that after the transition between orientations.

4.3 Discussion The results indicate that perceptual learning of hyperacuity is specific for the visual field position and for the eye used during learning. These findings, together with the orientation specificity of learning as present in the first experiment, effectively constrain the possible localisation, in the visual system, of perceptual learning. The orientation specificity we found requires that the neurons that learn are orientation specific, since neurons that are not orientation specific would be trained by all orien-tations and learning would transfer between orientations. On the other hand, the fact that learning is mostly eye specific suggests that the neurons that learn are mostly monocular. (This result is by no means trivial since most other learning results which probably concern more complex functionsshow transfer between the eyes, as outlined in the introduction.) Last, the position specificity of learning indicates that the underlying neuronal processes occur in a cortical area where position invariance has not yet been achieved. These results suggest area VI as the most probable candidate for learning of visual hyperacuity: neurons there are orientation specific, retinotopically organised, and, at least in layer 4, mostly monocular.

5 Learning in motion perception and a physiological correlate Training of vernier acuity considerably improves perceptual thresholds, and we have just speculated about a possible location in the brain where this learning might occur. One possible way to find out more about the neuronal mechanisms underlying per-ceptual learning would be to use electrophysiological methods, ie, in humans, visually evoked cortical potentials. It is indeed possible to evoke cortical potentials by intro-ducing vernier breaks in previously straight lines (Steinman et al 1985). We used similar stimuli to investigate the neuronal mechanisms of perceptual learning in humans.


5.1 Methods The stimulus was a straight line consisting of three elements replicated five times, as indicated in figure 6c. The middle portions of these lines were displaced in one step either to the left or to the right. Thresholds for the discrimination between jump dis-placements to the right and those to the left were measured by means of the usual adaptive-staircase procedure, and percentages of incorrect responses were subsequently measured as a function of training for a constant jump displacement roughly corre-sponding to the detection threshold. Five observers started with this vertical stimulus orientation and five started with a similar stimulus, but in a horizontal orientation, ie jump displacements either up or down. Block size was 100 presentations of 150 ms each. After around 30 min stimuli were rotated by 90. Therefore each data point in the graph relies upon identical numbers of horizontal and vertical stimulus presenta-tions. Stimulus luminance and contrast corresponded to the ones in the preceding experiments. No feedback about the observer's response was provided.

In order to evoke cortical potentials sufficiently large to be clearly identified and compared before and after training, the same stimulus was used for the electrophysio-logical experiments. We recorded scalp potentials at sixteen positions over the occipital skull in another ten naive observers. Observers fixated the monitor while their brain potentials were recorded. The middle portions of the five parallel vernier targets jumped to the right (or up) and back at a frequency of 0.8 Hz. The responses evoked both by the jump displacement and by the jump back were averaged over blocks of 600 cycles each. Short breaks were made after each 100 cycles. After 1200 cycles, stimuli were rotated by 90, either from vertical to horizontal or vice versa. Jump size was constant at 45. We analysed the evoked potentials of all sixteen positions with regard to amplitude and latency of the P100 component as well as to the spatial distribution of the potentials over time. Means of the first 600 responses were compared with the means of the second 600 responses for both stimulus orientations. 5.2 Results The psychophysical experiment yields results that agree well with those on orientation dependence of perceptual learning in vernier acuity. The number of incorrect responses decreased on average by almost 10% within less than 30 min. After rota-tion of the stimulus by 90, the number of incorrect responses increased to more than pretraining levels, and decreased continuously by more than 10% as a function of training thereafter. Training to the second orientation did not interfere with perfor-mance in the stimulus orientation trained first, as is obvious from the rightmost data point of figure 5. The improvement of performance during both the first and the second half of the experiment is significant (p = 0.05, analysis of variance). Also, the increase in incorrect responses after the change of orientation is highly significant (p = 0.005, paired Mest; p = 0.02, Mann-Whitney U-test).

The electrophysiological experiment showed a significant decrease in the latencies of the so-called P100 component of the visually evoked response from 117 to 104 ms for the vertical stimuli and from 125 to 115 ms for the horizontal stimuli. These differences are significant at the level p = 0.025 and p = 0.02, respectively. At the same time, mean amplitudes of the PI00 increased as a function of learning (figure 6a). However, the most significant change caused by the training concerns the spatial distribution of potentials over the occipital pole. There were highly significant differences in this distribution between the first and second 600 stimulus presenta-tions (figure 6b) for several latencies, especially at around 80 ms after stimulus onset above VI, as well as at 250 ms over more temporal and parietal areas (p = 0.01, Mest; cf Fahle and Skrandies 1994).

420 M Fahle

CD-40

30

5 20

10

0

1 1 [ i r N

- l ^ K , i 1 ^ ^ i i-ki 1

' ' 1 i l 1

0.00 0.25 1.00 1.25 0.50 0.75 Time/h

Figure 5. Lack of transfer of learning between different stimulus orientations in a jump-displacement task. Five observers were trained for 30 min with a vertical stimulus orientation, another four observers with a horizontal orientation. Stimuli were rotated by 90 (as indicated by the broken vertical lines) after 30 min, and testing and learning of the new orientation followed. Performance decreased significantly after the transition between orientations. The rightmost data point represents a return to the original stimulus orientation (after Fahle and Skrandies 1994). 5.3 Discussion The orientation specificity of perceptual learning found previously for vernier stimuli occurs also in a jump-displacement task. The same orientation specificity is also found for stereoscopic depth perception both with random-dot stereograms (Ramachandran and Braddick 1973) and for two-dot stimuli (Fahle et al 1994). All these results corroborate the hypothesis that perceptual learning can indeed occur at a relatively 'early' level of visual information processing, where rotation invariance has not yet been achieved. It is reassuring that learning of the stimulus in a new orientation (or with the partner eye, see above), does not interfere with performance in the previ-ously learned orientation (or eye). Therefore, the improvement in performance cannot be caused by short-term allocation of 'neuronal resources' (pools of cells) to one task or the other. Quite to the contrary, the specific task seems to be learned. Unsystematic pilot studies on the long-term behaviour of perceptual learning completely agree with this view: performance of three observers who were retested more than 1 year after the experiment proper achieved a performance close to the level they had reached at the end of the experiment, much better than the pretraining levels.

The electrophysiological experiment, to my knowledge, for the first time ever demonstrates a direct, objective correlate of perceptual learning in humans. The cortical potentials are evoked by a displacement close to the hyperacuity range, below the size correponding to a Snellen acuity of 20/20. As to be expected, such a small displacement evokes rather small cortical potentials. Nevertheless, there was a signifi-cant decrease in latencies and an increase in amplitudes of the evoked responses. These changes are opposite to the changes one would expect as a result of habituation and fatigue: those processes will increase latencies and decrease amplitudes.

The change of distribution of potential over the occipital pole is another, highly significant correlate of perceptual learning. It is reassuring that the only significant


200 300 400 Latency/ms

100 150 200 250 300

(b) After training

(c) Figure 6. (a) Amplitudes of cortical potentials evoked by a jump-displacement stimulus are higher after training (lower trace) than before training (upper trace) whereas latencies decrease. (b) The distribution of potentials over the occipital pole changes significantly between before and after training. The numbers at the top left of each distribution refer to the latency, in ms. Black areas indicate negative potentials, white areas positive potentials. Isopotential lines indi-cate steps of 0.1 \LV each, (c) Schematic view of the stimulus configuration: left, lines straight; right, lines offset.

422 M Fahle

difference between the distribution of potential resulting from the first versus the second half of presentations is localised, for latencies below 100 ms, over the primary visual cortex (VI), and that differences over other brain areas occur only after longer latencies. Hence, the electrophysiological results are compatible with the hypothesis that specific perceptual learning might occur in the primary visual cortex, but indicate that there might be additional changes in other, 'higher' brain areas, though analysis of evoked potentials through the skull is not a safe way to localise activity in the brain.

In the model of visual object recognition and perceptual learning based on radial basis functions that was outlined above assumptions are not made regarding the exact nature of the neuronal mechanisms that take place during learning. There are several theories that are aimed at explaining these mechanisms, which are based, for example, on a basic assumption put forward by Hebb (1949; cf von der Malsburg and Singer 1988; Palm 1982). Hebb postulated that the effectiveness of a synapse is increased every time it is able to activate the postsynaptic neuron. Neurons in the visual cortex receive inputs not only from the eye (via the lateral geniculate body) but mostly from other cortical and subcortical neurons. Therefore, it is possible to increase the proba-bility that an input activates a neuron by increasing the simultaneous input to this neuron from other parts of the brain. The repeated presentation of the stimulus (while it is attended to) might increase the effectiveness of the synapses for this specific visual input. The increased effectiveness of synapses as a result of visual training might lead to an increase in their amplitude and a decrease in their latency of spiking, but, of course, other mechanisms are conceivable.

6 Parallel processing of vernier acuity To test the hypothesis that vernier breaks are detected 'early' during pattern recogni-tion I measured reaction times for the detection of a vernier target as a function of the number of straight stimuli presented simultaneously. It is generally assumed that those features that are detected at a constant reaction time irrespective of the number of distractors presented simultaneously can be processed in parallel over the visual field. Examples of such elementary features are colour, brightness, line orientation and length, as well as line terminators (Julesz 1981, 1984; Treisman and Gormican 1988). Obviously, it requires a high number of cortical neurons to process a given feature in parallel over the visual field. While the parallel versus serial discrimination of relations between figures might be far less straightforward than originally thought (Humphreys et al 1994), the concept of parallel versus serial processing seems to be still valid for single features as opposed to relations between features. Hence, one might speculate that only those features can be processed in parallel that are of prominent importance for visual pattern recognition and that represent the elementary building blocks of visual perception, extracted at the early stage of visual pattern processing in the human brain. If vernier breaks could indeed be detected in parallel at different locations in the visual field, this would indicate that deviation from straightness is one of these elementary features for visual pattern recognition, probably detected at one of the first stages of cortical pattern analysis.

6.1 Methods Vernier stimuli were presented on the same experimental setup as before. Between 2 and 16 stimuli were presented simultaneously at an eccentricity of 4.5 deg. A central cross served as a fixation aid. Each of the stimuli was 2.5 min wide and 85 min high, except for observer UK (41 min), and vernier offset was 5 min, slightly above two-point resolution at 4.5 deg (Levi et al 1985; Westheimer 1982). Stimulus luminance was 450 cd m"2, background luminance was 20 cd m~2, and observation distance was 0.5 m. The presentation of each stimulus ended when the observer responded.


Acoustic feedback followed after incorrect responses. In part of the experiment, an eye tracker monitored eye position to discriminate whether the subjects fixated or scanned the stimuli.

In a two-alternative forced-choice task, three basic conditions were tested: (a) identification of the offset target among straight distractors; (b) identification of the straight target among offset distractors; (c) identification of the target offset in the direction opposite to that of the distractors. Only half of the presentations contained a target. The observers had to indicate whether or not a target was present in the display and to push the appropriate one of two push-buttons. Reaction times repre-sent the average both of positive (target present) and of negative (target absent) presentations and rely upon at least 120 responses per condition and observer.

Before the experiments proper, the four observers underwent an ophthalmological examination as well as a training period with more than 1000 stimulus presentations. Three of the observers had previously participated in similar experiments. All observers had normal or corrected-to-normal visual acuity, and, with the exception of the author, were unaware of the purpose of the experiment.

6.2 Results Detection of a single offset vernier was almost independent of the number of distrac-tors: reaction times were nearly constant for up to at least eight stimulus elements presented simultaneously (figure 7a; slope 9.5 3.5 ms per distractor). Even if the size of the vertical gap of the verniers varied randomly by up to 5 min arc, precluding the detection of the offset target on the basis of its larger gap (figure 7b), reaction times increased hardly at all with the number of distractors, and the same was true for the detection of a vernier offset to the left among distractors offset to the right, at least for constant orientation of the stimuli (figure 7d). On the other hand, reaction times increased sharply if observers had to find a straight line among offset distrac-tors (figure 7c). Reaction times tended to be shorter for presentations with a target than for those without target.

We repeated the experiments, varying the orientation of the stimuli independently and at random by up to 20, to investigate a possible influence of the implicit orien-tation information that is present in an offset vernier (right-hand column of figure 7). Detection of a single vernier among distractors was almost as fast as at fixed orienta-tion. Detecting a target offset in a direction opposite to that of the distractors, on the other hand, was almost independent of the number of distractors at fixed vertical orientation (figure 7d), but required reaction times increasing dramatically with the number of distractors if orientation varied.

6.3 Discussion Reaction times for the detection of a single vernier target among straight distractors increased by between 1.5 and 9.5 ms per distractor. This is far less than the 30 to 50 ms required for serial search (Jonides 1983; Krose and Julesz 1989; Treisman and Gormican 1988), and lies well within the range accepted for parallel search. Moreover, a vernier target was detected in a 150 ms presentation among straight distractors, even if the vernier offset was smaller than the diameter of the foveal photoreceptors (Fahle 1990; 1991). Detection of a vernier offset in a 150 ms flash at 0.2 deg eccentricity often required a minimal displacement between 20 and 30 s arc even if a mask followed immediately after the stimulus presentation (Fahle 1991). To obtain such performance, observers had to test whether each of the stimulus elements was straight or offset. The correct decision could be taken even after presentation times that were too short for a serial search. Instead, all stimuli seemed to be probed in parallel.

424 M Fahle

T D 2.0

1.5 -I

1.0

0.5

0.0

T D MF HW AH UK

T D

TT 2.0 -I

1.5 \

1.0 -I

0.5 J

T D

TT

0.0 *-. . r- i i i i

T D 2.5

2.0

1.5

1.0

0.5

0.0

T D

i i r

T D 5.0

4.0

3.0

2.0 -I

1.0 -I

0.0

T D

2 3 4 6 8 .12 16 2 3 4 6 8 12 16 M) Number of targets

Figure 7. Parallel processing of vernier stimuli: reaction times for four subjects as a function of the number of elements (distractors or target plus distractors) presented simultaneously, (a) One offset target may be hidden among straight distractors. (b) One offset target may be present among straight distractors, as in (a), and vertical gap size varies by up to 5 min. (c) One straight target may be hidden among offset distractors. (d) The target is offset in a direction opposite to that of the distractors. Left-hand column: reaction times for vertical orientation of stimuli. Right-hand column: results for variable orientation of stimuli (after Fahle 1991). T target, D distractor.


It has been previously reported that an orientation cue is used in the detection of vernier offsets (Andrews et al 1973; Watt et al 1983; Watt and Campbell 1985). But detection was parallel in our experiment even if no absolute orientation cues were available. With these stimuli of variable orientation, the underlying neuronal mecha-nism could not use the implicit orientation difference between a straight stimulus and an offset one since absolute orientations varied. Moreover, the gap sizes of all stimuli varied independently, hence the terminators present in an offset vernier could not be used as the discriminating feature. I conclude that the feature which discriminates between offset stimuli and straight ones must rely on deviation from straightness, not on an absolute orientation cue (cfTreisman and Gormican 1988). The claim that deviation from straightness is an elementary feature of visual perception was further supported by the finding that a figure can immediately be discriminated from its surround if the elements of the figure are bent whereas the elements of the surround are straight, ie the figure 'pops out' (Wolfe et al 1992).

The detection of a vernier target offset in one direction among distractors offset in the opposite direction can be made in parallel only as long as all stimuli share a common orientation. Reaction times for this taskwhich might seem as easy as the detection of an offsetincrease steeply with the number of distractors. The results show that if the orientation cue is marked by variable orientations of all stimulus elements, detection of the target with the opposite offset requires serial search.

7 Conclusions I found that vernier offsets are detected in parallel at different positions of the visual field and that deviation from straightness probably represents an elementary feature of vision. Improvement in detecting a vernier offset is stimulus specific: performance for vernier acuity, three-dot acuity, and a jump-displacement task improves as a func-tion of training. Since this improvement depends on personal experience, it is called learning. Two phases of learning can be discriminated, a fast one in the minute range and a slow phase that continues at least over 10 h. Learning is specific for the orien-tation of the stimulus, for its position in the visual field, as well as for the eye used during the training phase. This early perceptual learning is a prerequisite for a recent model of human visual pattern recognition. In this model it is assumed that pattern recognition might be achieved by some form of 'fuzzy' template matching. This is to say that object recognition is considered as a process much more based on memory and less on computation than previously thought. If the model were true, perceptual learning would be a central factor for pattern recognition even in adults: only those objects can be recognised that are similar to objects that have been seen previously from various angles and whose views have been stored in memory. It is important to realise in this context that the perceptual task tested here, namely to detect a break in a straight line, is one that humans are faced with in everyday situations and that might be a first step in the process of analysing visual patterns. Therefore, it is not surpris-ing that even 'naive' observers will have undergone some training during their lives and that the effects of learning during the experiment are both slow and relatively moderate in extent. Informal pilot studies, as well as eg the results of Fiorentini and Berardi (1980, 1981), suggest that the extent and speed of perceptual learning increase for unfamiliar tasks such as the discrimination of phase relations in complex grating patterns. Moreover, many studies show that visual performance of children improves considerably with age (eg Zanker et al 1992) and that this improvement requires visual experience, as is evident from deprivation studies. The results reviewed here, on the other hand, show that learning is possible even for relatively familiar features. It is surprising that learning occurs for a feature that is detected in parallel, hence is an elementary feature of vision, and that is extracted during the first

426 M Fahle

steps of visual pattern recognition, while cortical areas such as VI have often been considered to be relatively 'hard wired' in adults. The present results suggest that area VI might be more modifiable even in adults than was previously assumed.

Cortical potentials evoked by the jump displacement of parallel vernier lines had significantly shorter latencies and larger amplitudes after training than before training. At the same time, the distribution of potentials over the posterior pole of the brain changed in a highly significant way. The results both of psychophysical and of elec-trophysiological experiments can be taken as evidence that some form of perceptual learning might occur very early during cortical processing. A next step will be to repeat the behavioural and especially the electrophysiological experiments in monkeys, where a direct recording from the surface of the cortex is possible, with a much better signal-to-noise ratio than is possible in recordings from the skull of humans. The final hope is to detect the neuronal basis of perceptual learning on a cellular level in the visual cortex of mammals.

Acknowledgements. This research was supported by the Deutsche Forschungsgemeinschaft (Fal 19/5-2; SFB 307, TP A6), the von Humboldt Society, and the Max-Planck Society. I wish to thank all the observers for their participation, Mrs H Weller for technical and secretarial help, and Dipl. Ing. M Repnow for writing the computer programs. Part of this work has been published as part of the four publications indicated in the captions of figures 1, 2, 5, and 7.

References AhissarM, Hochstein S, 1993 "Attentional control of early perceptual learning" Proceedings of

the National Academy of Sciences of the United States of America 905718-5722 Andrews D P, Butcher A K, Buckley B R, 1973 "Acuities for spatial arrangement in line figures:

human and ideal observer compared" Vision Research 13599-6 20 Ball K, SekulerR, 1987 "Direction-specific improvement in motion discrimination" Vision

Research 21 953-965 Bennett R G, Westheimer G, 1991 "The effect of training on visual alignment discrimination and

grating resolution" Perception &Psychophysics 49 541 - 546 BoucartM, Delord S, GierschA, 1994 "The computation of contour information in complex

objects" Perception 23 399-409 DillM, WolfR, HeisenbergM, 1993 "Visual pattern recognition in Drosophila involved

retinotopic matching" Nature (London) 365 751-753 FahleM, 1990 "Parallel, semi-parallel, and serial processing of visual hyperacuity" in Human

Vision and Electronic Imaging: Models, Methods, and Applications, SPIE 1249147-159 Fahle M, 1991 "Parallel perception of vernier offsets, curvature, and chevrons in humans" Vision

Research 31 2149-2184 Fahle M, Edelman S, 1993 "Long term learning in vernier acuity: Effects of stimulus orientation,

range and of feedback" Vision Research 33397-412 Fahle M, Skrandies W, 1994 "An electrophysiological correlate of learning in motion perception"

German Journal of Ophthalmology in the press FahleM, Edelman S, Poggio T, 1994 "Short-term learning in vernier acuity" Vision Research

(submitted) Fendick M, Westheimer G, 1983 "Effects of practice and the separation of test targets on foveal

and peripheral stereoacuity" Vision Research 23 145-150 Fiorentini A, Berardi N, 1980 "Perceptual learning specific for orientation and spatial frequency"

Nature (London) 287 43 -44 Fiorentini A, Berardi N, 1981 "Learning in grating waveform discrimination: Specificity for

orientation and spatial frequency" Vision Research 21 1149-1158 Hebb D 0,1949 Organization of Behavior (New York: John Wiley) Humphreys GW, Keulers N, Donnelly N, 1994 "Parallel visual coding in three dimensions"

Perception 23453-470 Jonides J, 1983 "Further toward a model of the mind's eye's movement" Bulletin of the Psycho-

nomic Society 21 247 - 250 Julesz B, 1981 "Textons, the elements of texture perception, and their interactions" Nature

(London) 290 91-97 Julesz B, 1984 "A brief outline of the texton theory of human vision" Trends in Neuroscience 1

41-45


Kami A, SagiD, 1991 "Where practice makes perfect in texture discrimination: Evidence for primary visual cortex plasticity" Proceedings of the National Academy of Sciences of the United States of America 88 4966 - 4970

KroseB J A, Julesz B, 1989 "The control and speed of shifts of attention" Vision Research 29 1607-1619

Levi D M, Klein S A, Aitsebaomo P, 1985 "Vernier acuity, crowding and cortical magnification" Vision Research 25 963 - 977

McKee S P, Westheimer G, 1978 "Improvement in vernier acuity with practice" Perception & Psychophysics 24 258 -262

Malsburg C von der, Singer W, 1988 "Principles of cortical network organization" in Neurobiology of Neocortex; Dahlem Konferenzen Eds P Rakic, W Singer (New York: John Wiley) pp 69 - 99

Palm G, 1982 Neural Assemblies (Berlin: Springer) Poggio T, 1990 "A theory of how the brain might work" in Cold Spring Habor Symposia on

Quantitative Biology LV 899-910 Poggio T, GirosiF, 1990 "Regularization algorithms for learning that are equivalent to multi-

layer networks" Science 247 978 - 982 Poggio T, Edelman S, Fahle M, 1992a "Learning of visual modules from examples: A framework

for understanding adaptive visual performance" Computer Vision, Graphics & Image Processing: Image Understanding 56 22 - 30

Poggio T, Fahle M, Edelman S, 1992b "Fast perceptual learning in visual hyperacuity" Science 2561018-1021

Ramachandran V S, Braddick O, 1973 "Orientation-specific learning in stereopsis" Perception 2 371-376

SagiD, Kami A, 1993 "The time course of learning a visual skill" Nature (London) 365 250-252

SagiD, PolatU, 1992 "Perceptual learning increases the range of inhibitory connections between spatial filters" Perception 21 Supplement 2, 69

Shiu L-P, PashlerH, 1991 "Improvement in line orientation discrimination is retinally local but dependent on cognitive set" Investigative Ophthalmology and Visual Science 32 1041

Steinman S B, Levi D M, Klein S A, Manny R E, 1985 "Selectivity of the evoked potential for vernier offsets" Vision Research 25 951 - 961

Taylor MM, Creelman C D, 1967 "PEST: Efficient estimates on probability functions" The lournal of the Acoustical Society of America 41782-787

TreismanA, Gormican S, 1988 "Feature analysis in early vision: Evidence from search asymmetries" Psychological Review 95 15-48

VanGoolLJ, Moons T, Pauwels E, WagemansJ, 1994 "Invariance from the Euclidean geometer's perspective" Perception 23 547-561

Vogels R, OrbanGA, 1985 "The effect of practice on the oblique effect in line orientation judgments" Vision Research 25 1679-1687

Watt R J, Campbell F W, 1985 "Vernier acuity: interactions between length effects and gaps when orientation cues are eliminated" Spatial Vision 1 31 -38

Watt R J, Morgan M J, WardRM, 1983 "The use of different cues in vernier acuity" Vision Research 23 991 -995

Weiss Y, Edelman S, Fahle M, 1993 "Models of perceptual learning in vernier hyperacuity" Neural Computation 5695-718

Westheimer G, 198 2 "The spatial grain of the perifoveal field" Vision Research 22 157-162 Wolfe J M, Yee A, Friedman-Hill S R, 1992 "Curvature is a basic feature for visual search tasks"

Perception 21 465-480 ZankerJ, Mohn G, Weber U, Zeitler-Driess K, Fahle M, 1992 "The development of vernier

acuity in human infants" Vision Research 321557-1564

Human Pattern Recognition

Documents

Transcript of Human Pattern Recognition