Developmental AI for Speech and Language Learning 

Logo DevAI

Thomas Hueber
Thomas HUEBER,

CNRS research director, researcher at GIPSA-lab,
thomas.hueber@grenoble-inp.fr




Mathilde Fort

Mathilde FORT,
Researcher at LPNC/Université Grenoble Alpes, teaching activities at INSPE Université Lyon 1),
Research Director IRD,
Mathilde.Fort@univ-grenoble-alpes.fr


Laurent GIRIN,
Professeur at Grenoble-INP UGA, researcher at GIPSA-lab,
Laurent.Girin@gipsa-lab.grenoble-inp.fr




DESCRIPTION

The DevAI&Speech project aims to enhance conversational AI by drawing inspiration from human language acquisition. It focuses on SpeechLMs, generative speech models that learn without text labels, mirroring how children acquire speech before learning to read and write. The project will explore how sensori-motor, physical and social interactions shape language learning and can improve SpeechLMs efficiency. Key goals include integrating speech biomechanics, enabling multimodal input processing, embedding a SpeechLM in humanoid robots, and training it through natural interactions with humans. The model’s design and training data will draw on experimental results from interaction studies between parents, children, and robots conducted in a Babylab. By bridging AI and developmental science, the project aims to improve conversational AI and deepen insights into language acquisition.

ACTIVITIES

Open PhD position: Grounding a Multimodal Speech Language Model Through Physical and Social Interaction – More info here: PhD Positions - MIAI Cluster IA  

CHAIR EVENTS

Kickoff meeting – 2025, July 16, GIPSA-lab / LPNC (Grenoble)
Kickoff meeting – 2025, July 16, GIPSA-lab / LPNC (Grenoble)
 

SELECTED LIST OF PUBLICATIONS

  • Georges, M-A, Lavechin, M, Schwartz, J-L, Hueber, T, (2024) "Decode, move and speak! Self-supervised learning of speech units, gestures, and sounds relationships using vocal imitation", Computational Linguistics, https://doi.org/10.1162/coli_a_00532
  • Ortiz, A., Schatz, T., Hueber, T., Dupoux, E., "Simulating articulatory trajectories with phonological feature interpolation", Proc. of Interspeech, 2024, pp. 3595-3599
  • Girin L., Leglaive S., Bie X, Diard J., Hueber T., Alameda-Pineda X. (2021), “Dynamical Variational Autoencoders: A Comprehensive Review”, Foundations and Trends in Machine Learning, Vol. 15, No. 1-2, pp 1–175
  • X. Lin, L. Girin et X. Alameda-Pineda, "Mixture of dynamical variational autoencoders for multi-source trajectory modeling and separation," Transactions on Machine Learning Research, Published online at https ://jmlr.org/tmlr/papers, 2023.
  • Birulés, J., Goupil, L., Josse, J., Fort, M. The role of talking faces in infant language learning: Mind the gap between screen-based settings and real-life communicative interactions (2023). Brain Sciences, 13(8), 1167. https://doi.org/10.3390/brainsci13081167
  • Fort M., Lammertink, I., Guevara-Rukoz, A., Peperkamp, S., Fikert, P., Tsuji. S., (2018). Symbouki: a meta-analysis on the emergence of sound symbolism in early language acquisition. Developmental Science. https://doi.org/10.1111/desc.12659
  • M. Lenglet, O. Perrotin, G. Bailly (2024) FastLips: an End-to-End Audiovisual Text-to-Speech System with Lip Features Prediction for Virtual Avatars, Proceedings of Interspeech, Kos, Greece, September 1-5, pp. 3450-3454.
  • M. Jacquelin, M. Garnier, L. Girin, R. Vincent, O. Perrotin (2024), Exploring the Multidimensional Representation of Unidimensional Speech Acoustic Parameters Extracted by Deep Unsupervised Models, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing Workshops (ICASSPW), Seoul, Korea, April 15, pp. 858-862.
  • Rakotomalala, T., Baraduc, P., Perrier, P. (2022) Trajectories predicted by optimal speech motor control using LSTM networks. Proc. Interspeech 2022, 630-634, doi: 10.21437/Interspeech.2022-10604

CHAIR PRESENTATION

Chair Presentation Dev AI & Speech

License:
Unless otherwise stated, all documents are shared under the Creative Commons BY-NC-ND 4.0 license.
You may view and share them for non-commercial purposes, without modification, and with appropriate credit to the authors.

Published on  August 20, 2025
Updated on August 21, 2025