publications | Taehun Cha

2025

Curse of Smoothness in Functional Neural Networks

Taehun Cha, and Donghun Lee

IEEE Signal Processing Letters, 2025

Abs DOI PDF Slides Website

Functional neural networks (FNNs) have emerged as powerful tools for modeling complex relationships in functional data, leveraging the flexibility of deep learning to capture non-linear patterns. However, most components of FNNs are directly borrowed from standard deep neural networks, such as element-wise non-linear activation functions and gradient-based optimization strategies. In this study, we investigate how the functional nature of FNNs affects gradient-based optimization. Analogous to the well-known vanishing gradient problem, we theoretically show that the smoothness of the hidden state function bounds the weight gradient norm, a phenomenon we call the \textitcurse of smoothness. Empirically, we demonstrate that FNN optimization becomes significantly more difficult as model depth increases, compared to conventional deep neural networks. In particular, we verify that gradients in FNNs vanish in deeper layers as the hidden state functions become smoother. These findings suggest that applying standard deep learning techniques to functional data without accounting for the unique properties of functional data can lead to misleading or suboptimal results.
Feature Learning as a Virtual Covariance Learning

Taehun Cha, and Donghun Lee

In OPT 2025: Optimization for Machine Learning, 2025

Abs PDF Poster Website

Feature learning is central to the success of neural networks but remains poorly understood. Recent work proposed the Neural Feature Ansatz, which highlights alignment between learned features and \nabla_x f, but does not explicitly explain why and how feature learning dynamics occur. To address this, we introduce a novel concept, **virtual update**, a stochastic gradient descent (SGD) step applied to inputs and hidden states rather than parameters, i.e., x - γ\nabla_x \mathcalL and h - γ\nabla_h \mathcalL. We theoretically show that SGD aligns network weights with the covariance structure of the virtual update. This does not result in disagreement with an actual update, as the actually updated input does not deviate far from the virtually updated input. Building on this insight, we propose the **virtual covariance learning** algorithm, which directly obtains the weight matrix that achieves the desired covariance structure. This algorithm efficiently learns effective weights within one or two epochs–whereas SGD requires 10–20 epochs–with low variance and no overfitting.
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

Chandler Smith, Marwa Abdulhai, Manfred Diaz, and 83 more authors

In The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025

Abs Website

Large language model (LLM) agents have demonstrated impressive capabilities for social interaction and are increasingly being deployed in situations where they might engage with both human and artificial agents. These interactions represent a critical frontier for LLM-based agents, yet existing evaluation methods fail to measure how well these capabilities generalize to novel social situations. In this paper, we introduce a method for evaluating the ability of LLM-based agents to cooperate in zero-shot, mixed-motive environments using Concordia, a natural language multi-agent simulation environment. This work introduces an approach to measuring human-appropriate cooperative intelligence, emphasizing an agent’s ability to identify and exploit opportunities for mutual gain across diverse partners and contexts. We present empirical results from the NeurIPS 2024 Concordia Contest, where agents were evaluated on their ability to achieve mutual gains across a suite of diverse scenarios ranging from negotiation to collective action problems. Our findings reveal significant gaps between current agent capabilities and the robust generalization required for reliable cooperation, particularly in scenarios demanding persuasion and norm enforcement.
Emergent Linear Separability of Unseen Data Points in High-dimensional Last-Layer Feature Space

Taehun Cha, and Donghun Lee

In High-dimensional Learning Dynamics 2025, 2025

Abs PDF Poster Website

In this work, we investigate the emergence of linear separability for unseen data points in the high-dimensional last-layer feature space of deep neural networks. Through empirical analysis, we observe that, after training, in-distribution and out-of-distribution samples become linearly separable in the last-layer feature space when the hidden dimension is sufficiently high—even in regimes where the input data itself is not. We leverage these observations for the task of uncertainty quantification. By connecting our findings to classical support vector machine margin theory, we theoretically show that the separating hyperplane exhibits a higher weight norm when facing in-distribution data points. This work highlights linear separability as a fundamental and analyzable property of trained deep neural networks’ representations, offering a geometric perspective on the practical uncertainty quantification task in neural networks.
ABC3: Active Bayesian Causal Inference with Cohn Criteria in Randomized Experiments

Taehun Cha, and Donghun Lee

In The 39th Annual AAAI Conference on Artificial Intelligence, Feb 2025

Abs PDF Poster Slides Website

In causal inference, a randomized experiment is a de facto method to overcome various theoretical issues in observational study. However, the experimental design requires expensive costs, so an efficient experimental design is necessary. We propose ABC3, a Bayesian active learning policy for causal inference. We show a policy minimizing an estimation error on conditional average treatment effect is equivalent to minimizing an integrated posterior variance, similar to Cohn criteria. We theoretically prove ABC3 also minimizes an imbalance between the treatment and control groups and the type 1 error probability. Imbalance-minimizing characteristic is especially notable as several works have emphasized the importance of achieving balance. Through extensive experiments on real-world data sets, ABC3 achieves the highest efficiency, while empirically showing the theoretical results hold.

2024

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

Taehun Cha, and Donghun Lee

In Findings of the Association for Computational Linguistics: EMNLP 2024, Nov 2024

Abs DOI PDF Poster Slides Website

In this work, we show the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure. By examining 24 models on 6 data sets, we find out that 88-98% of cases return statistically significantly distinguishable generation probability and uncertainty distributions. Using this general phenomenon, we showcase a hallucination-reducing training algorithm. Our algorithm outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures.
Evaluating Extrapolation Ability of Large Language Model in Chemical Domain

Taehun Cha, and Donghun Lee

In Proceedings of the 1st Workshop on Language + Molecules (L+M 2024), Aug 2024

Abs DOI PDF Poster Website

Solving a problem outside the training space, i.e. extrapolation, has been a long problem in the machine learning community. The current success of large language models demonstrates the LLM’s extrapolation ability to several unseen tasks. In line with these works, we evaluate the LLM”s extrapolation ability in the chemical domain. We construct a data set measuring the material properties of epoxy polymers depending on various raw materials and curing processes. LLM should predict the material property when novel raw material is introduced utilizing its chemical knowledge. Through experiments, LLM tends to choose the right direction of adjustment but fails to determine the exact degree, resulting in poor MAE on some properties. But LLM can successfully adjust the degree with only a one-shot example. The results show that LLM can extrapolate to new unseen material utilizing its chemical knowledge learned through massive pre-training.
SentenceLDA: Discriminative and Robust Document Representation with Sentence Level Topic Model

Taehun Cha, and Donghun Lee

In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Mar 2024

Abs PDF Slides Website

A subtle difference in context results in totally different nuances even for lexically identical words. On the other hand, two words can convey similar meanings given a homogeneous context. As a result, considering only word spelling information is not sufficient to obtain quality text representation. We propose SentenceLDA, a sentence-level topic model. We combine modern SentenceBERT and classical LDA to extend the semantic unit from word to sentence. By extending the semantic unit, we verify that SentenceLDA returns more discriminative document representation than other topic models, while maintaining LDA’s elegant probabilistic interpretability. We also verify the robustness of SentenceLDA by comparing the inference results on original and paraphrased texts. Additionally, we implement one possible application of SentenceLDA on corpus-level key opinion mining by applying SentenceLDA on an argumentative corpus, DebateSum.

2022

Noun-MWP: Math Word Problems Meet Noun Answers

Taehun Cha, Jaeheun Jung, and Donghun Lee

In Proceedings of the 29th International Conference on Computational Linguistics, Oct 2022

Abs PDF Poster Slides Website

We introduce a new type of problems for math word problem (MWP) solvers, named Noun-MWPs, whose answer is a non-numerical string containing a noun from the problem text. We present a novel method to empower existing MWP solvers to handle Noun-MWPs, and apply the method on Expression-Pointer Transformer (EPT). Our model, N-EPT, solves Noun-MWPs significantly better than other models, and at the same time, solves conventional MWPs as well. Solving Noun-MWPs may lead to bridging MWP solvers and traditional question-answering NLP models.