Mathilde FORT,
Researcher at LPNC/Université Grenoble Alpes, teaching activities at INSPE Université Lyon 1),
Research Director IRD, Mathilde.Fort@univ-grenoble-alpes.fr
The DevAI&Speech project aims to enhance conversational AI by drawing inspiration from human language acquisition. It focuses on SpeechLMs, generative speech models that learn without text labels, mirroring how children acquire speech before learning to read and write. The project will explore how sensori-motor, physical and social interactions shape language learning and can improve SpeechLMs efficiency. Key goals include integrating speech biomechanics, enabling multimodal input processing, embedding a SpeechLM in humanoid robots, and training it through natural interactions with humans. The model’s design and training data will draw on experimental results from interaction studies between parents, children, and robots conducted in a Babylab. By bridging AI and developmental science, the project aims to improve conversational AI and deepen insights into language acquisition.
ACTIVITIES
Open PhD position: Grounding a Multimodal Speech Language Model Through Physical and Social Interaction – More info here: PhD Positions - MIAI Cluster IA
CHAIR EVENTS
Kickoff meeting – 2025, July 16, GIPSA-lab / LPNC (Grenoble)
SELECTED LIST OF PUBLICATIONS
Georges, M-A, Lavechin, M, Schwartz, J-L, Hueber, T, (2024) "Decode, move and speak! Self-supervised learning of speech units, gestures, and sounds relationships using vocal imitation", Computational Linguistics, https://doi.org/10.1162/coli_a_00532
Ortiz, A., Schatz, T., Hueber, T., Dupoux, E., "Simulating articulatory trajectories with phonological feature interpolation", Proc. of Interspeech, 2024, pp. 3595-3599
X. Lin, L. Girin et X. Alameda-Pineda, "Mixture of dynamical variational autoencoders for multi-source trajectory modeling and separation," Transactions on Machine Learning Research, Published online at https ://jmlr.org/tmlr/papers, 2023.
Birulés, J., Goupil, L., Josse, J., Fort, M. The role of talking faces in infant language learning: Mind the gap between screen-based settings and real-life communicative interactions (2023). Brain Sciences, 13(8), 1167. https://doi.org/10.3390/brainsci13081167
Fort M., Lammertink, I., Guevara-Rukoz, A., Peperkamp, S., Fikert, P., Tsuji. S., (2018). Symbouki: a meta-analysis on the emergence of sound symbolism in early language acquisition. Developmental Science. https://doi.org/10.1111/desc.12659
M. Lenglet, O. Perrotin, G. Bailly (2024) FastLips: an End-to-End Audiovisual Text-to-Speech System with Lip Features Prediction for Virtual Avatars, Proceedings of Interspeech, Kos, Greece, September 1-5, pp. 3450-3454.
M. Jacquelin, M. Garnier, L. Girin, R. Vincent, O. Perrotin (2024), Exploring the Multidimensional Representation of Unidimensional Speech Acoustic Parameters Extracted by Deep Unsupervised Models, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing Workshops (ICASSPW), Seoul, Korea, April 15, pp. 858-862.
Rakotomalala, T., Baraduc, P., Perrier, P. (2022) Trajectories predicted by optimal speech motor control using LSTM networks. Proc. Interspeech 2022, 630-634, doi: 10.21437/Interspeech.2022-10604
CHAIR PRESENTATION
License:
Unless otherwise stated, all documents are shared under the Creative Commons BY-NC-ND 4.0 license.
You may view and share them for non-commercial purposes, without modification, and with appropriate credit to the authors.
Published on August 20, 2025 Updated on August 21, 2025
Core members
Thomas Hueber
Mathilde Fort
Laurent Girin
Olivier Perrotin (GIPSA-lab)
Pierre Baraduc (GIPSA-lab)
Pr. Okko Räsänen (Univ. Tampere, Finland)
Associated members
Maxime Calka (GIPSA-lab)
Pascal Perrier (GIPSA-lab)
Maëva Garnier (GIPSA-lab)
Leticia Schiavon Kolberg (GIPSA-lab)
Martin Lenglet (ATOS Inno’Labs)
Brice Varini (ATOS Inno’Labs)
Lea Haefflingher (ATOS Inno’Labs)
Stéphane Lathuilière (Centre INRIA UGA)
Xavier Alameda-Pineda (Centre INRIA UGA)
Emmanuel Dupoux (EHESS/ENS/Meta)
Angelo Ortiz (CoML team, ENS)
Research topics
Sensorimotor and social grounding of conversational AI, Multimodal LLM, AI for studying speech and language acquisition in children, Social robotics
Share the linkCopyCopiedClose the modal windowShare the URL of this pageI recommend:Consultable at this address:La page sera alors accessible depuis votre menu "Mes favoris".Stop videoPlay videoMutePlay audioChat: A question? Chatbot Robo FabricaMatomo traffic statisticsX (formerly Twitter)