Thomas Bouvier

Project HF

My name is Thomas Bouvier, and I am based in Paris. I obtained an engineering degree in electrical engineering & computer science in 2019. My hobby has always been creating computer projects, ranging from developing small video games from scratch (Herr Speck, Demo Video) to building an AI-powered robot that plays Flappy Bird (Floppy Bird, Demo Video).

After completing my engineering studies (INSA), I worked at Snips, a parisian startup that pioneered privacy-preserving AI technology at the edge. This experience allowed me to contribute to open-source libraries, similarly to what HF does today. Following Snips’ acquisition, I wanted to delve deeper into Machine Learning, leading me to join Inria as a research engineer. I really enjoyed creating and optimizing scalable, data-intensive pipelines powering streaming applications.

I then pursued a PhD at Inria, working on a topic at the intersection of Machine Learning and High-Performance Computing (HPC). I focused on training ML models at supercomputer scale, leveraging parallelization techniques for efficiency. I could work at Argonne National Lab in Chicago to benefit from their compute infrastructure. I have kept up excellent connections with the people there, who are very open to collaborations. I defended my PhD two months ago.

To sum up, I am applying to Hugging Face with the idea of contributing to HPC topics like parallelization and large-scale pre-training. I’m keen to contribute to open source software and internal projects. I would also like to create collaborations with leading institutions in HPC.

Please detail the reasons why you are applying to work at Hugging Face and how you think you can make an impact on our team (self-authored only 🤗)

I am applying for the following reasons:

Please detail the project you would be most excited to work on in your first 3 months of joining

I identified Nanotron⚡️ 🤗 as the project that would best suit my interests and skills in the short term. I identified a few HPC-oriented improvements that could improve the robustness of large training runs. The following could realistically be integrated into Nanotron⚡️ in ~4 months:

With a few more months of work, zero-bubble pipeline parallelism could be integrated to maximize GPU utilization at scale, encouraging the adoption of Nanotron⚡️ by more teams running on tight training budgets. I suspect this feature to be more challenging than those above.

Finally, I also have some research ideas for continual fine-tuning, involving techniques that could be integrated directly into Nanotron⚡️ to mitigate catastrophic forgetting. This aligns with the type of research I conducted during my PhD, where I devised an open-source software Neomem to learn from evolving datasets. Published results available at https://arxiv.org/abs/2406.03285.

Best,

Thomas