Sound and Music Science. Speech Production

Sound and Music Science Speech Production

Learning Objectives How human vocal organ makes speech sounds How speech sounds are the product of the source, the filter and the radiation efficiency Speech articulation by different parts of the vocal tract Formants as resonances of the vocal tract How the glottis and the vocal tract are studied

The Vocal Organs It spans the oral and nasal cavities and stretches to the lungs and the diaphragm The lungs serve as reservoir of air and a source energy In speaking, air is forced form the lungs through the larynx into the three main cavities: the pharynx, the nasal and the oral cavities

The Vocal Organs continued

The Vocal Organs continued Air exits through the nose and mouth Air can be inhaled and exhaled without much sound To produce speech sounds, the flow of air is interrupted by the vocal cords or by constrictions in the vocal tract (made by the tongue or lips)

The Larynx and the Vocal Folds The most important sound source is the larynx, which contains the vocal folds or vocal cords The larynx is constructed mainly of cartilages The thyroid is one of these cartilages that forms the Adam s apple

The Larynx

Larynx and Vocal Folds continued The vocal folds are folds of ligament extending from the thyroid cartilage at the front to the arytenoid cartilages at the back The arytenoid cartilages are movable and control the size of the V-shaped opening between the vocal cords (glottis) Open for breathing and closed for sound production

Control of the glottal Opening

Glottal Openings A sudden opening of the vocal folds would produce a light cough or a glottal stop (a harsh h ) They are completely opening for unvoiced consonants such as s, sh, and f An intermediate opening produces a regular h sound

Glottal Openings continued By rapidly opening and closing the folds, air flow is modulated as the rapid vibration produces a buzzing sound from which vowels and voiced consonants are created There are analogous functions of the folds and the lips as in the production of p and f sounds

Vibration of the Folds The rate of vibration is determined by the mass and tension of the folds Pressure and velocity of the air do contribute in a smaller way They are typically longer and heaver in an adult male than a female and vibrate a t a lower frequency Typical speech range is one octave and singing range is two octaves

Phases of a vocal fold vibration

Vibration Modes of the Folds In normal mode, they open and close completely during the cycle and generate puffs of air that are roughly triangular in shape Open phase mode, the folds do not close completely over their entire length, so air flow does not go to zero This produces a breathy voice

Vibration Modes of the Folds continued In the third mode, very little air passes in short puffs giving rise to a creaky voice In a fourth, (head voice or falsetto) is normally not used in speech

Opening of the Vocal Folds The vocal folds are opened by air pressure in the trachea which blows them upward and outward When air velocity increases, the pressure decreases between then and they are pulled back together by the Bernoulli force

Miscellaneous facts of the Folds The folds are essential in the production of a whisper Speaking louder is mostly determined by the rate of glottal closure as this produces higher harmonics in the glottal airflow spectrum, and these harmonics excite resonances of the vocal tract leading to a buildup in the sound level

The Vocal Tract Responsible for transforming buzzes and whooshes of the vocal fold and other sources into intricate, subtle sounds of speech It can be thought of as a tube extending from the vocal folds to the lips, with a side branch leading to the nasal cavity Typical length of 17cm

The Vocal Tract continued

The Pharynx The pharynx connects the larynx with the oral cavity Its shape is not easily varied, though its length can be adjusted slightly by raising or lowering the larynx on one end, and the soft palate on the other end The soft palate acts as a valve to isolate or connect the nasal cavity to the pharynx

The Epiglottis Since food also passes through the pharynx on its way to the esophagus, the epiglottis serves as a valve to prevent food from going into the trachea It serves to acoustically isolate the esophagus from the larynx The epiglottis and the false vocal cords appear to play no significant role in speech production

Nasal Cavity Because of its fixed dimensions it is virtually untunable The soft palate controls the air flow from the pharynx to the nasal cavity If the soft palate is lowered, air and sound waves flow into the nasal cavity and a nasal effect results from resonance within the nasal cavity

Oral Cavity Because its size and shape can be varied, the oral cavity is probably the most important single part of the vocal tract The tongue flexibility along with the movement of the lips, cheeks and teeth change the size, shape and acoustics of the oral cavity

The Oral Cavity continued The lips control the size and shape of the mouth opening through which sound is radiated The mouth radiates more efficiently at higher frequencies where the wavelength approaches the size of the opening This can be seen in a 6 db per octave rise in radiation efficiency

Articulation of Speech Each syllable is made of one or more phonemes Phonemes are either vowel or consonant Vowels are always voiced (with vibrations of the vocal folds) Consonants are either voiced or unvoiced

Articulation of Speech continued There are 12 to 21 vowel sounds in English (depending on which speech scientist you talk to) Opinions vary as to whether it is a pure vowel sound rather than a diphthong (a combination of two or more vowel sounds into one phoneme)

Vowels of American English

Articulation of Speech continued Consonants are classified according to their manner of articulation: Plosive or stop consonants (p, b, t, etc) are produced by blocking the flow of air somewhere in the vocal tract (usually the mouth) and releasing the pressure rather suddenly Fricatives (f, s, sh, etc) are made by constricting the airflow to produce turbulence

Articulation of Speech continued Nasals (m, n, ng) are made by lowering the soft palate to connect the nasal cavity to the pharynx and then blocking the mouth cavity at some point along its length Liquids (r, l) are produced by raising the tip of the tongue while the oral cavity is somewhat constricted Semivowel or glide consonants (w, y) are produced by keeping the vocal tract briefly in a vowel position then changing it rapidly to a vowel sound that follows

Articulation of Speech continued Consonants are further classified according to their place of articulation, primarily the lips (labial), teeth (dental), gums (alveolar), palate (palatal) and glottis (glottal), and lips and teeth (labiodental) There are 24 consonant sounds in English

Consonants

Formants: Resonances of the Vocal Tract Formants are the peaks that occur in the sound spectra of the vowels, that are independent of the pitch They appear as envelopes that modify the amplitudes of the various harmonics of the source sound Each formant corresponds to one or more resonances in the vocal tract

Formants continued The frequency of the formants are virtually independent of the source spectrum

Effect of Formants on Sound

Formant Frequencies F 1 F 2 F 3

Prosodic Features of Speech Prosodic features are characteristics which convey meaning, emphasis, and emotion without actually changing the phonemes. They include pitch, rhythm, and accent In English, prosodic features play a secondary roles to the phonemes However, in Chinese, prosodic features change the meaning a phoneme

Prosodic Features of Speech continued Prosodic features tend to indicate the emotional state of the speaker There have been attempts to use them in lie detection to analyze recorded speech for evidence of stress

Speech Analysis Requires that we analyze frequency and sound level as functions of time To effectively this, three dimensional representations are used A real-time spectrum analyzer rapidly analyzes the spectrum of sound using the fast-fourier transform (FFT)

Speech Analysis continued The sound spectrograph was particularly developed to analyze speech by Bell Labs in 1945 It records a sound-level-frequency-time plot for a brief sample of speech Sound level is represented by the degree of blackness in a 2-D time-frequency graph

Speech Analysis continued The modern digital version uses filters to divide the incoming speech signals into many different frequency bands The amount of power that comes through each filter is measured as a function of time The speech spectrograph is printed on grayscale

Schematic of a Sound Spectrograph