Developmental AI for Speech and Language Learning

Thomas HUEBER,
CNRS research director, researcher at GIPSA-lab,
thomas.hueber@grenoble-inp.fr

Mathilde FORT,
Researcher at LPNC/Université Grenoble Alpes, teaching activities at INSPE Université Lyon 1),
Research Director IRD,
Mathilde.Fort@univ-grenoble-alpes.fr

Website

Laurent Girin

Laurent GIRIN,
Professeur at Grenoble-INP UGA, researcher at GIPSA-lab,
Laurent.Girin@gipsa-lab.grenoble-inp.fr

Website

DESCRIPTION

The DevAI&Speech project aims to enhance conversational AI by drawing inspiration from human language acquisition. It focuses on SpeechLMs, generative speech models that learn without text labels, mirroring how children acquire speech before learning to read and write. The project will explore how sensori-motor, physical and social interactions shape language learning and can improve SpeechLMs efficiency. Key goals include integrating speech biomechanics, enabling multimodal input processing, embedding a SpeechLM in humanoid robots, and training it through natural interactions with humans. The model’s design and training data will draw on experimental results from interaction studies between parents, children, and robots conducted in a Babylab. By bridging AI and developmental science, the project aims to improve conversational AI and deepen insights into language acquisition.

ACTIVITIES

Open PhD position: Grounding a Multimodal Speech Language Model Through Physical and Social Interaction – More info here: PhD Positions - MIAI Cluster IA

CHAIR EVENTS

Kickoff meeting – 2025, July 16, GIPSA-lab / LPNC (Grenoble)

SELECTED LIST OF PUBLICATIONS

Georges, M-A, Lavechin, M, Schwartz, J-L, Hueber, T, (2024) "Decode, move and speak! Self-supervised learning of speech units, gestures, and sounds relationships using vocal imitation", Computational Linguistics, https://doi.org/10.1162/coli_a_00532
Ortiz, A., Schatz, T., Hueber, T., Dupoux, E., "Simulating articulatory trajectories with phonological feature interpolation", Proc. of Interspeech, 2024, pp. 3595-3599
Girin L., Leglaive S., Bie X, Diard J., Hueber T., Alameda-Pineda X. (2021), “Dynamical Variational Autoencoders: A Comprehensive Review”, Foundations and Trends in Machine Learning, Vol. 15, No. 1-2, pp 1–175
X. Lin, L. Girin et X. Alameda-Pineda, "Mixture of dynamical variational autoencoders for multi-source trajectory modeling and separation," Transactions on Machine Learning Research, Published online at https ://jmlr.org/tmlr/papers, 2023.
Birulés, J., Goupil, L., Josse, J., Fort, M. The role of talking faces in infant language learning: Mind the gap between screen-based settings and real-life communicative interactions (2023). Brain Sciences, 13(8), 1167. https://doi.org/10.3390/brainsci13081167
Fort M., Lammertink, I., Guevara-Rukoz, A., Peperkamp, S., Fikert, P., Tsuji. S., (2018). Symbouki: a meta-analysis on the emergence of sound symbolism in early language acquisition. Developmental Science. https://doi.org/10.1111/desc.12659
M. Lenglet, O. Perrotin, G. Bailly (2024) FastLips: an End-to-End Audiovisual Text-to-Speech System with Lip Features Prediction for Virtual Avatars, Proceedings of Interspeech, Kos, Greece, September 1-5, pp. 3450-3454.
M. Jacquelin, M. Garnier, L. Girin, R. Vincent, O. Perrotin (2024), Exploring the Multidimensional Representation of Unidimensional Speech Acoustic Parameters Extracted by Deep Unsupervised Models, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing Workshops (ICASSPW), Seoul, Korea, April 15, pp. 858-862.
Rakotomalala, T., Baraduc, P., Perrier, P. (2022) Trajectories predicted by optimal speech motor control using LSTM networks. Proc. Interspeech 2022, 630-634, doi: 10.21437/Interspeech.2022-10604

CHAIR PRESENTATION

License:
Unless otherwise stated, all documents are shared under the Creative Commons BY-NC-ND 4.0 license.
You may view and share them for non-commercial purposes, without modification, and with appropriate credit to the authors.

Published on August 20, 2025
Updated on August 21, 2025

Print Share

Multidisciplinary Institute in Artificial intelligence