Research

I am interested in how speakers formulate and produce multi-phrase utterances and how listeners recognize them. During my Ph.D., I conducted research on speech planning and production, examining how articulatory movements and acoustic signals vary in relation to syntactic and prosodic structure. In my ongoing postdoctoral training, I am investigating intelligibility of connected speech, especially focusing on bilingual speech. I utilize both experimental and computational methods in my research.

Below is a summary of some of my past and current projects. If you click on each figure, you will be directed to the most relevant publication (except for the ones that are under review). Please check out my CV for a full list of publications and presentations, and feel free to email me for any pdfs or slides/posters.


Automatic recognition of L2 speech-in-noise

In this study, we compared four state-of-the-art Automatic Speech Recognition (ASR) systems (Google, HuBERT, wav2vec 2.0, whisper) and human listeners on word recognition accuracy of second language (L2) speech embedded in noise. We found that one system, whisper, performed at levels similar to (or in some cases, better than) human listeners. However, the content of its responses diverged substantially from human responses. This suggests that ASR could be utilized to predict human intelligibility but should be used with caution.

Proactive and reactive F0 adjustments in speech

A production experiment was conducted to investigate speakers' (i) pre-planned and (ii) adaptive F0 control. In particular, the experiment examined whether speakers vary F0 parameters (i) according to the initially planned utterance length and (ii) in response to the unanticipated changes in the length. An experimental paradigm was developed in which the visual stimuli that cue the parts of the utterance are delayed until after participants initiate an utterance. Analyses of F0 trajectories found strong evidence for both pre-planned and adaptive F0 control.

F0 control: pitch targets vs. pitch register

The present study examined what speakers control most directly to produce variationa in F0, by evaluating target-control and register-control hypotheses. In the target-control hypothesis, it is individual pitch targets that speakers mainly control to produce variations in F0, whereas in the register-control hypothesis, it is the control of pitch register (F0 space in which the targets are realized) that induces F0 variations. These hypotheses were assessed by examining the correlations between F0 peaks and valleys in empirical F0 trajectories and through computational modeling. The results suggest that pitch register may be a more important control parameter than previous models have assumed.

Functional relations between speech rate and phonetic variables

This study examined how phonetic measures covary with speech rate, specifically assessing whether there is evidence for linear and/or non-linear relations with rate, and how those relations may differ between phrase boundaries. Productions of English non-restrictive (NRRCs) and restrictive relative clauses (RRCs) were collected using a method in which variation in speech rate was cued by the speed of motion of a visual stimulus. Analyses of articulatory and acoustic variables showed that the variables associated with a phrase boundary that follows the RC were more susceptible to rate variation than those at a boundary that precedes the RC. Phonetic variables at the post-RC boundary also showed evidence for non-linear relations with rate, which suggest floor or ceiling attenuation effects at extreme rates.

Temporal localization of syntactic-prosodic information

This study used a novel neural network-based analysis method for temporally localizing prosodic information that is associated with syntactic contrast in acoustic and articulatory signals. Neural networks were trained on multi-dimensional acoustic and articulatory data to classify the two types of relative clauses (RRCs vs. NRRCs), and the network accuracies on test data were analyzed. The results found two different patterns: (i) syntactically conditioned prosodic information was either widely distributed around the boundaries or (ii) narrowly distributed at specific locations. The findings suggest that prosodic expression of syntactic contrasts does not occur in the uniform way or at a fixed location.

Phonetic evidence for hierarchical prosodic phrases

This study shows that the existing phonetic evidence for hierarchical organization of prosodic phrases is ambiguous, and that a non-hierarchical organization of phrases is also consistent with the data. To compare hierarchical and non-hierarchical organization models, the current study analyzed productions of English NRRCs and RRCs at varying speech rates. We examined whether articulatory and acoustic variables at phrase boundaries exhibit evidence of speech rate-dependent mixtures of categories through regression mixture models. Overall, the evidence for multiple levels of prosodic phrase categories was not very compelling. The measures that were most supportive of hierarchical phrase structure were measures of boundary-related slowing and gestural overlap at boundaries.

Teaching

Introduction to Phonetics and Phonology

[Spring 2019] Department of Linguistics, Cornell University
Instructor: Draga Zec

Elementary Korean I

[Fall 2018, 2019, 2020] Department of Asian Studies, Cornell University

Elementary Korean II

[Spring 2022] Department of Asian Studies, Cornell University