Seung-Eun Kim

I am a Postdoctoral Fellow in the Department of Linguistics at Northwestern University. I am working with Dr. Matt Goldrick and Dr. Ann Bradlow on the NSF projects titled "Enhancing research on speech and deep learning through holistic acoustic analysis" and "Talker-specific, language-specific, and language-general characteristics of first-language and second-language speech."

I received my Ph.D. from the Department of Linguistics at Cornell University in December 2022 and was advised by Dr. Sam Tilsen. I received my B.A. and M.A. from the Department of English Language & Literature at Yonsei University, South Korea.

contact: seungeun.kim@northwestern.edu

Research

I am interested in how speakers formulate and produce multi-phrase utterances and how listeners recognize them. During my Ph.D., I conducted research on speech planning and production, examining how articulatory movements and acoustic signals vary in relation to syntactic and prosodic structure. In my ongoing postdoctoral research, I am investigating intelligibility of connected speech, especially that produced by bilingual speakers. I utilize both experimental and computational methods in my research.

Below is a summary of some of my past and current projects. If you click on each figure, you will be directed to the most relevant publication (except for the ones that are under review). Please check out my CV for a full list of publications and presentations, and feel free to email me for any pdfs or slides/posters.

Predicting relative intelligibility of second-language (L2) talkers using self-supervised representations

In this study, we modeled relative speech intelligibility of over 100 L2 English talkers, using a self-supervised learning model. For each recording, we obtained representations from a self-supervised model, which formed a trajectory in the perceptual similarity space. We found that the average distance of the trajectories of an L2 talker from the L1 talker group correlates with their intelligibility assessed by human L1 listeners. Notably, the distance measure predicted the relative intelligibility of L2 talkers more robustly than traditional acoustic-phonetic cues (e.g., speech rate, mean pitch). This study emphasizes the importance of holistic assessment of speech in modeling speech intelligibility.

Automatic recognition of L2 speech-in-noise

In this study, we compared four state-of-the-art Automatic Speech Recognition (ASR) systems (Google, HuBERT, wav2vec 2.0, whisper) and human listeners on word recognition accuracy of second-language (L2) speech embedded in noise. We found that one system, whisper, performed at levels similar to (or in some cases, better than) human listeners. However, the content of its responses diverged substantially from human responses, when speech was embedded in high levels of noise. This suggests that ASR could be utilized to predict human intelligibility but should be used with caution.

Proactive and reactive F0 adjustments in speech

A production experiment was conducted to investigate speakers' (i) proactive and (ii) reactive F0 control. In particular, the experiment examined whether speakers vary F0 parameters (i) according to the initially planned utterance length and (ii) in response to the unanticipated changes in the length. An experimental paradigm was developed in which the visual stimuli that cue the parts of the utterance are delayed until after participants initiate an utterance. Analyses of F0 trajectories found evidence for both proactive and reactive F0 control.

The Gesture-Field-Register framework for modeling F0 control

This study proposed a modeling framework that accounts for speakers' pitch/F0 control, building on Articulatory Phonology and Task Dynamics. The framework considers F0 gestures to be the fundamental control units. The F0 gestures have normalized targets; at each time point, the targets of the gestures that are active and the forces of the neutral system determine the (normalized) dynamic target of the F0 tract variable. The dynamic targets are transformed to F0 values in Hz via F0 register parameters. By fitting empirical F0 data that span multiple phrases (with several F0 peaks/valleys), we found evidence in support of the control mechanism, where speakers have invariant F0 targets but variant F0 register. This suggests that speakers may be primarily adjusting their F0 space with relatively fixed F0 targets, in order to produce F0 variations within an utterance.

Functional relations between speech rate and phonetic variables

This study examined how phonetic measures covary with speech rate, specifically assessing whether there is evidence for linear and/or non-linear relations with rate, and how those relations may differ between phrase boundaries. Productions of English non-restrictive (NRRCs) and restrictive relative clauses (RRCs) were collected using a method in which variation in speech rate was cued by the speed of motion of a visual stimulus. Analyses of articulatory and acoustic variables showed that the variables associated with a phrase boundary that follows the RC were more susceptible to rate variation than those at a boundary that precedes the RC. Phonetic variables at the post-RC boundary also showed evidence for non-linear relations with rate, which suggest floor or ceiling attenuation effects at extreme rates.

Temporal localization of syntactic-prosodic information

This study used a novel neural network-based analysis method for temporally localizing prosodic information that is associated with syntactic contrast in acoustic and articulatory signals. Neural networks were trained on multi-dimensional acoustic and articulatory data to classify the two types of relative clauses (RRCs vs. NRRCs), and the network accuracies on test data were analyzed. The results found two different patterns: (i) syntactically conditioned prosodic information was either widely distributed around the boundaries or (ii) narrowly distributed at specific locations. The findings suggest that prosodic expression of syntactic contrasts does not occur in the uniform way or at a fixed location.

Teaching

Introduction to Phonetics and Phonology

[Spring 2019] Department of Linguistics, Cornell University
Instructor: Draga Zec

Elementary Korean I

[Fall 2018, 2019, 2020] Department of Asian Studies, Cornell University

Elementary Korean II

[Spring 2022] Department of Asian Studies, Cornell University