Sound and Music Science Speech Production
Learning Objectives How human vocal organ makes speech sounds How speech sounds are the product of the source, the filter and the radiation efficiency Speech articulation by different parts of the vocal tract Formants as resonances of the vocal tract How the glottis and the vocal tract are studied
The Vocal Organs It spans the oral and nasal cavities and stretches to the lungs and the diaphragm The lungs serve as reservoir of air and a source energy In speaking, air is forced form the lungs through the larynx into the three main cavities: the pharynx, the nasal and the oral cavities
The Vocal Organs continued
The Vocal Organs continued Air exits through the nose and mouth Air can be inhaled and exhaled without much sound To produce speech sounds, the flow of air is interrupted by the vocal cords or by constrictions in the vocal tract (made by the tongue or lips)
The Larynx and the Vocal Folds The most important sound source is the larynx, which contains the vocal folds or vocal cords The larynx is constructed mainly of cartilages The thyroid is one of these cartilages that forms the Adam s apple
The Larynx
The Larynx
Larynx and Vocal Folds continued The vocal folds are folds of ligament extending from the thyroid cartilage at the front to the arytenoid cartilages at the back The arytenoid cartilages are movable and control the size of the V-shaped opening between the vocal cords (glottis) Open for breathing and closed for sound production
Control of the glottal Opening
Glottal Openings A sudden opening of the vocal folds would produce a light cough or a glottal stop (a harsh h ) They are completely opening for unvoiced consonants such as s, sh, and f An intermediate opening produces a regular h sound
Glottal Openings continued By rapidly opening and closing the folds, air flow is modulated as the rapid vibration produces a buzzing sound from which vowels and voiced consonants are created There are analogous functions of the folds and the lips as in the production of p and f sounds
Vibration of the Folds The rate of vibration is determined by the mass and tension of the folds Pressure and velocity of the air do contribute in a smaller way They are typically longer and heaver in an adult male than a female and vibrate a t a lower frequency Typical speech range is one octave and singing range is two octaves
Phases of a vocal fold vibration
Vibration Modes of the Folds In normal mode, they open and close completely during the cycle and generate puffs of air that are roughly triangular in shape Open phase mode, the folds do not close completely over their entire length, so air flow does not go to zero This produces a breathy voice
Vibration Modes of the Folds continued In the third mode, very little air passes in short puffs giving rise to a creaky voice In a fourth, (head voice or falsetto) is normally not used in speech
Opening of the Vocal Folds The vocal folds are opened by air pressure in the trachea which blows them upward and outward When air velocity increases, the pressure decreases between then and they are pulled back together by the Bernoulli force
Miscellaneous facts of the Folds The folds are essential in the production of a whisper Speaking louder is mostly determined by the rate of glottal closure as this produces higher harmonics in the glottal airflow spectrum, and these harmonics excite resonances of the vocal tract leading to a buildup in the sound level
The Vocal Tract Responsible for transforming buzzes and whooshes of the vocal fold and other sources into intricate, subtle sounds of speech It can be thought of as a tube extending from the vocal folds to the lips, with a side branch leading to the nasal cavity Typical length of 17cm
The Vocal Tract continued
The Pharynx The pharynx connects the larynx with the oral cavity Its shape is not easily varied, though its length can be adjusted slightly by raising or lowering the larynx on one end, and the soft palate on the other end The soft palate acts as a valve to isolate or connect the nasal cavity to the pharynx
The Epiglottis Since food also passes through the pharynx on its way to the esophagus, the epiglottis serves as a valve to prevent food from going into the trachea It serves to acoustically isolate the esophagus from the larynx The epiglottis and the false vocal cords appear to play no significant role in speech production
Nasal Cavity Because of its fixed dimensions it is virtually untunable The soft palate controls the air flow from the pharynx to the nasal cavity If the soft palate is lowered, air and sound waves flow into the nasal cavity and a nasal effect results from resonance within the nasal cavity
Oral Cavity Because its size and shape can be varied, the oral cavity is probably the most important single part of the vocal tract The tongue flexibility along with the movement of the lips, cheeks and teeth change the size, shape and acoustics of the oral cavity
The Oral Cavity continued The lips control the size and shape of the mouth opening through which sound is radiated The mouth radiates more efficiently at higher frequencies where the wavelength approaches the size of the opening This can be seen in a 6 db per octave rise in radiation efficiency
Articulation of Speech Each syllable is made of one or more phonemes Phonemes are either vowel or consonant Vowels are always voiced (with vibrations of the vocal folds) Consonants are either voiced or unvoiced
Articulation of Speech continued There are 12 to 21 vowel sounds in English (depending on which speech scientist you talk to) Opinions vary as to whether it is a pure vowel sound rather than a diphthong (a combination of two or more vowel sounds into one phoneme)
Vowels of American English
Articulation of Speech continued Consonants are classified according to their manner of articulation: Plosive or stop consonants (p, b, t, etc) are produced by blocking the flow of air somewhere in the vocal tract (usually the mouth) and releasing the pressure rather suddenly Fricatives (f, s, sh, etc) are made by constricting the airflow to produce turbulence
Articulation of Speech continued Nasals (m, n, ng) are made by lowering the soft palate to connect the nasal cavity to the pharynx and then blocking the mouth cavity at some point along its length Liquids (r, l) are produced by raising the tip of the tongue while the oral cavity is somewhat constricted Semivowel or glide consonants (w, y) are produced by keeping the vocal tract briefly in a vowel position then changing it rapidly to a vowel sound that follows
Articulation of Speech continued Consonants are further classified according to their place of articulation, primarily the lips (labial), teeth (dental), gums (alveolar), palate (palatal) and glottis (glottal), and lips and teeth (labiodental) There are 24 consonant sounds in English
Consonants
Formants: Resonances of the Vocal Tract Formants are the peaks that occur in the sound spectra of the vowels, that are independent of the pitch They appear as envelopes that modify the amplitudes of the various harmonics of the source sound Each formant corresponds to one or more resonances in the vocal tract
Formants continued The frequency of the formants are virtually independent of the source spectrum
Effect of Formants on Sound
Formant Frequencies F 1 F 2 F 3
Prosodic Features of Speech Prosodic features are characteristics which convey meaning, emphasis, and emotion without actually changing the phonemes. They include pitch, rhythm, and accent In English, prosodic features play a secondary roles to the phonemes However, in Chinese, prosodic features change the meaning a phoneme
Prosodic Features of Speech continued Prosodic features tend to indicate the emotional state of the speaker There have been attempts to use them in lie detection to analyze recorded speech for evidence of stress
Speech Analysis Requires that we analyze frequency and sound level as functions of time To effectively this, three dimensional representations are used A real-time spectrum analyzer rapidly analyzes the spectrum of sound using the fast-fourier transform (FFT)
Speech Analysis continued The sound spectrograph was particularly developed to analyze speech by Bell Labs in 1945 It records a sound-level-frequency-time plot for a brief sample of speech Sound level is represented by the degree of blackness in a 2-D time-frequency graph
Speech Analysis continued The modern digital version uses filters to divide the incoming speech signals into many different frequency bands The amount of power that comes through each filter is measured as a function of time The speech spectrograph is printed on grayscale
Schematic of a Sound Spectrograph