SPEECH PERCEPTION AND PRODUCTION IN THE BRAIN

June 5, 2009, Universiteit Leiden




9.30 Welcome
9.55 Opening
10.00 Hemispheric lateralisation of prosodic processing
J. Witteman
10.30 Speaking rate context affects online word segmentation: Evidence from eye-tracking
E. Reinisch, A. Jesse & J. McQueen
11.00 Coffee break
11.30 Self-monitoring and feedback in speech production: fMRI studies using overt speech
I. Christoffels
12.00 Internal speech monitoring investigated with ERPs
E. Severens
12.30-12.45 Algemene Ledenvergadering
12.45-14.00 Lunch
14.00 Accents in the brain: using functional Magnetic Resonance Imaging to untangle the brain's response to accent variation in the speech signal
P. Adank
14.30 The role of language background in processing foreign-accented speech: a magicoo effect
A. Weber
15.00 Coffee break
15.30 Reduced speech: Seeing both the big picture and the detail
N. Warner
16.00 The bounds on perceptual retuning of native phoneme categories
M. Sjerps
16.30 Drinks




Abstracts:


Accents in the brain: using functional Magnetic Resonance Imaging to untangle the brain's response to accent variation in the speech signal

Patti Adank

School of Psychological Sciences at the Unversity of Manchester, UK
Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University Nijmegen, The Netherlands

This project explores the neural bases of processing accent variation in the acoustic signal in two fMRI experiments. The first experiment used a repetition suppression fMRI paradigm, as this type of design it allows identification of areas specifically involved in processing phonological/phonetic variation during sentence comprehension. Sentences were produced in two accents: in Standard Dutch and an artificial accent of Dutch. In the scanner, participants listened to two sentences presented in quick succession. The second sentence was either spoken by the same speaker in the same accent, by the same speaker in a different accent, by a different speaker in the same accent, or by a different speaker in a different accent. This design thus allowed us to study neural responses to a change in speaker only, a change in accent only and a change in accent and speaker. The results showed accent-related activation in left pars opercularis (PO), an area associated with speech production, and in left posterior superior temporal gyrus (STG), an areas associated with speech perception

The second fMRI experiment aimed to take one step further in establishing how the speech comprehension system extracts meaning from the inherently variable acoustic signal by identifying whether the neural bases for processing speech in noise are identical to those for processing accented speech. Noise perturbs speech comprehension at a lower level than the phonological/phonetic variation in accented speech. Sentences were presented spoken in Standard Dutch in quiet, in an unfamiliar accent, or in Standard Dutch embedded in noise, using a sparse-scanning fMRI design. The processing cost of accented sentences and the noisy sentences was made equal. The neural bases for both types of variation were expected to vary: noise was expected to be processed primarily at a lower-level and involve bilateral primary and association auditory cortices bilaterally, while accented speech was expected to be processed primarily in areas involved in processing higher-level variation (phonological) including left PO and left STG.


Back to top

Self-monitoring and feedback in speech production: fMRI studies using overt speech

Ingrid Christoffels

Leiden Institute for Psychological Research - Cognitive Psychology Unit & Leiden Institute for Brain and Cognition (LIBC)

Speakers usually hear themselves while speaking and use this external auditory feedback to monitor their speech. In a first fMRI study, we compared conditions of masked and normal auditory feedback in word production. We found that especially the bilateral insula and the cingulate cortex were more activated in the presence of normal verbal feedback as opposed to masked feedback. These findings suggest that the anterior cingulate cortex (ACC), which is often implicated in error processing and conflict monitoring, is also engaged in ongoing speech monitoring Moreover, we found that the cortical response in the bilateral superior temporal gyrus was attenuated when speaking with normal feedback. The framework of the forward model helps to explain how self-produced input may result in the attenuation of the sensitivity of the auditory cortex. This feedback mechanism compares the expected sensory consequence of an action with actual sensory feedback, dampening the sensory response.

In a second study we tested the prediction of this framework that the overlap between expected and actual auditory feedback determines the amount of attenuation. To do so, we parametrically manipulated the quality of verbal feedback during speaking. The superior temporal gyrus response was attenuated less with more masking during speaking, not during listening to similar input. These results indicate that a very specific prediction of self-generated speech input takes place that results in feedback modulated cancellation in human auditory cortex. This feedback mechanism may be very important in speech monitoring, since it can help to distinguish internally from externally generated speech.


Back to top

Speaking rate context affects online word segmentation: Evidence from eye-tracking

Eva Reinisch, Alexandra Jesse & James M. McQueen

Max Planck Instituut voor Psycholinguistiek, Nijmegen

Durational cues in word recognition about word boundaries are perceived relative to speaking rate (e.g., Repp et al., 1978). Studies on the effect of speaking rate, however, have mainly used offline categorization or goodness judgment tasks. Since in these tasks listeners give their responses after they processed the whole stimulus these studies are not conclusive about whether speaking rate context is indeed used during word recognition. Durational cues to word segmentation, however, are used online. When hearing a temporarily ambiguous word sequence such as Dutch een(s) (s)peer ("on(c)e (s)pear") with an intended target peer listeners considered an [s]-initial competitor word spuit ("syringe") longer as a viable candidate if the [s] was long (Shatzman & McQueen, 2006). Given these results it was predicted that the duration of critical sounds is perceived online in relation to speaking rate; that is, speaking rate modulates online word segmentation. Moreover, we asked whether the amount of rate context (i.e., long vs. short carrier sentences) or its location relative to a target word (close vs. distal) influences this effect.

In a series of eye-tracking experiments listeners' fixations on four printed words were monitored while they listened to sentences such as Ze heeft wel een(s) (s)peer gezegd ("She said on(c)e (s)pear"). Listeners were instructed to click with the computer mouse on the target word mentioned in the sentence (e.g., peer). Critically, besides the target, one of the words on the screen was an [s]-initial competitor (speen; "pacifier"). If listeners use speaking rate information to modulate lexical competition they should perceive the [s] as longer and therefore more word-initial if the sentence preceding the [s] was presented at a fast than at a slow rate. Indeed, following a fast rate context listeners considered the [s]-initial competitor longer as a viable candidate than following a slow rate. The opposite was true for [s]-initial targets and not-[s]-initial competitors (e.g., sneew - neef; "snow"-"nephew").

Rate context immediately adjacent to the target (i.e., wel eens) had a greater influence on lexical competition than more distal context. The amount of context (i.e., a long vs. a short sentence), however, appears to matter in offline categorization tasks but not in eye-tracking. Phonetic speaking rate context information is thus used in online word recognition. But unlike in offline tasks where all available context is used for listeners' decisions, in online word recognition the incoming information is updated continuously. Therefore immediate context has more influence online.


Back to top

Internal speech monitoring investigated with ERPs

Els Severens

Universiteit Gent

People make very little speech errors and when they do make one, they are often able to correct these errors very fast. This has been attributed to an internal speech monitor which inspects and corrects internal speech (Levelt, Roelofs, & Meyer, 1999). Even though the internal speech monitor is used to explain several speech error patterns (e.g., the lexical bias effect: the finding that speech errors are more often real words than nonwords) there is only one study that directly shows the existence of an internal speech monitor (Motley, Camden and Baars, 1982). This study measured Galvanic Skin Response (GSR) to show that people detect and correct taboo word errors internally even though they do not make these errors overtly. Since the GSR is not a reliable measure of cognitive processes, we try to replicate these findings using Event Related-Potentials (ERPs). The findings that will be discussed in this talk provide support for an internal monitor that checks the internal speech for errors.

An important suggestion about the internal self-monitor is that it is situated in the comprehension system (Levelt, et. al., 1999). If this suggestion is true one can expect that speech error patterns that have been ascribed to the internal self-monitor can also be found in comprehension monitoring. Since both monitoring systems rely on the comprehension system it is reasonable to assume that both systems use the same criteria. In this second ERP study we show that the lexical bias effect, which has been attributed to an internal self-monitor can also be found in comprehension monitoring. Therefore, it is reasonable to assume that production monitoring relies on the comprehension system.

References:
Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences, 22(1), 1-75.
Motley, M. T., Camden, C. T., & Baars, B. J. (1982). Covert formulation and editing of anomalies in speech production – Evidence from experimentally elicited slips of the tongue. Journal of Verbal Learning and Verbal Behavior, 21(5), 578-594.


Back to top

The bounds on perceptual retuning of native phoneme categories.

Matthias Sjerps

Max Planck Institute for Psycholinguistics, Nijmegen

Four experiments examined flexibility in speech perception. In Experiment 1, two groups of Dutch listeners were exposed to the English sound theta (as in bath) which replaced /f/ in 20 /f/-final Dutch words (Group 1), or /s/ in 20 /s/-final Dutch words (Group 2). During a subsequent test phase, theta replaced the final sound in minimal pairs such as doof/doos (deaf/box). Listeners heard these critical primes and made visual lexical decisions to e.g. doof and doos. Group 1 were faster on doof decisions after critical primes than after unrelated primes; Group 2 were faster on doos decisions. The groups had thus learned that theta was, respectively, /f/ or /s/. This learning was thorough as the effects were of the same nature, and just as large, when the theta was replaced by an ambiguous /fs/-mixture (Experiment 2), and even when the critical primes contained unambiguous /f/ or /s/ instead of theta (Experiment 3). In a fourth Experiment, a nonspeech noise replaced the theta. Listeners interpreted the sound as an /f/, irrespective of their training. This finding shows that learning about a new sound strongly depends on its spectral characteristics. Perceptual learning in one’s native language is thorough, and can happen in spite of years of phonetic learning in a second-language that actually distinguishes between the training sound (here theta) and the category that it is learned to represent (here /f/ and /s/).


Back to top

Reduced speech: Seeing both the big picture and the detail

Natasha Warner

University of Arizona

TBA


Back to top

The role of language background in processing foreign-accented speech: a magicoo effect

Andrea Weber

Max Planck Institute for Psycholinguistics, Nijmegen

Foreign-accented speech deviates quite noticeably from the pronunciation norms of a target language. The deviations are, however, not random. Because the features of foreign-accented speech arise primarily from an interaction of the phonological structures of the speaker's native language and the target language, its phonetic characteristics are systematic and quite consistent across speakers. A direct consequence of the influence of a speaker’s native language is that different language-backgrounds can lead to different deviations from target language norms: for example, while Dutch speakers pronounce English groove as groof, Japanese speakers pronounce it as groobo. The question is then whether listeners can recognize these pronunciation variants correctly, and whether familiarity with variant forms determines the amount of interference in spoken-word recognition. That is, are variant forms that are typical for the listeners' own accent easier to process than untypical forms? In this talk, I will present evidence from a number of cross-modal priming studies with native and non-native listeners, showing how the processing of foreign-accented words is specific to the listeners' language background and withstands in part fine phonetic differences.


Back to top

Hemispheric lateralisation of prosodic processing

Jurriaan Witteman

LUCL/LIBC, Universiteit Leiden

How suprasegmental (prosodic) information is processed in the brain is still a matter of debate. A central question within this discussion is to which extent prosodic processing is lateralized to one of the two cerebral hemispheres. There are three non-mutually exclusive hypotheses regarding lateralization of prosodic processing:
(1) Acoustic lateralization hypotheses state that lateralization of prosodic processing is dependent on the acoustic properties which are important to analyze the prosodic information. For instance, there is some evidence that the left hemisphere is better at high temporal resolution analyses while there is right hemisphere superiority in pitch processing.
(2) The functional lateralization hypothesis posits that lateralization of prosodic processing depends on the function of the prosodic material; the right hemisphere would be specialized in processing emotional prosodic information while the left hemisphere is more involved when prosody is processed linguistically.
(3) The attraction hypothesis extents the functional hypothesis by adding that the size of the prosodic unit is of importance in the lateralization of prosodic processing; for bigger prosodic units the right hemisphere would be increasingly involved.
In the current experiments the dichotic listening technique was used in an attempt to test the differential explanatory power of the three lateralization hypotheses. This technique allows one to look at the relative contribution of each hemisphere to the processing of an auditory signal by simultaneously presenting two different sounds to each ear. During dichotic stimulation ipsilateral projection from the ears to the cerebral hemispheres is inhibited, making better performance of one ear at the task an indication of specialization of the contralateral hemisphere. No evidence was found for function or unit dependent lateralization of prosodic processing in the current experiments. It is concluded that the results support the idea of continuous close cooperation of the two hemispheres in the processing of prosodic information.



Locatie:

Lipsiusgebouw, Cleveringaplaats 1, Leiden, zaal 0.05

Routebeschrijving:

Vanaf voorzijde station rechtdoor Stationsweg en verlengde (Steenstraat) volgen, voorbij plein met rijtje boompjes (aan linkerhand) linksaf Blauwpoortsbrug over, rechtsaf en doorlopen tot begin Rapenburg, rechterzijde gracht nemen, tweede straat rechts (Doelensteeg) en deze komt uit op de Cleveringaplaats; op de gevel van het gebouw hangt een bord met "LAK Theater"