9 Mar 2022
We are excited to announce the publication of a new version of our massive multilingual corpus OSCAR, namely version 22.02
Main changes: document-oriented corpus with annotations you can filter on, document-level language identification, a new multilingual subcorpus for multilingual documents, and more!
1 Mar 2022
🤝 Wissam Antoun just joined ALMAnaCH as a research engineer
He will work with Djamé Seddah and Benoît Sagot on language models for languages displaying high variabilty, in particular on Arabic dialects as found in user-generated content on social media.
1 Feb 2022
🤝 Jesujoba Alabi just joined ALMAnaCH as a research engineer
He will be working under the superivsion of Rachel Bawden on domain adaptation for neural machine translation in the context of the DadaNMT project.
17 Jan 2022
🤝 Rua Ismail just joined ALMAnaCH as a research engineer
She will be working under the supervision of Benoît Sagot on the OSCAR corpus, in particular on language identification, and on the description of two Nubian languages.
1 Dec 2021
🤝 Nathan Godey just joined ALMAnaCH as a PhD student
He will be working under the supervision of Benoît Sagot and Éric de la Clergerie on improving language models, in particular by using approches derived from optimal transport.
27 Oct 2021
🎓 Louis Martin's PhD defence
Louis Martin defended his PhD thesis, supervised by Benoît Sagot, Éric de La Clergerie, Antoine Bordes (FAIR Paris), on sentence simplification using controllable and unsupervised methods.
1 Oct 2021
🤝 Lydia Nishimwe just joined ALMAnaCH as a PhD student
Within the framework of Rachel Bawden's PRAIRIE chair, she will be working on robust neural machine translation for user-generated content.
1 Oct 2021
🤝 You Zuo just joined ALMAnaCH as a research engineer
She will be working on fine-grained patent classification in collaboration with INPI, the French intellectual property office.
1 Oct 2021
🤝 Roman Castagné is now an ALMAnaCH PhD student
He will be working under the supervision of Benoît Sagot and Éric de la Clergerie on improving language models by better understanding what they learn and how they learn it.
20 Sep 2021
🤝 Camille Rey just joined ALMAnaCH as a Master 2 intern
She will be studing the errors produced by neural machine translation systems.
15 May 2021
🤝 Paul-Ambroise Duquenne just joined ALMAnaCH as a PhD student
He will carry out his research on LASER-like sentence representation spaces under the joint supervision of Benoît Sagot, for ALMAnaCH, and Holger Schwenk, for FAIR (Facebook's AI research lab in Paris) in the context of an industrial (“CIFRE”) PhD.
4 May 2021
We are thrilled to announce PAGnol, a new addition to our language model family.
PAGnol is a free, GPT-3-like generative LM for French, developed in collaboration with LightOn.
3 May 2021
🤝 Matthieu Futeral-Peter just joined ALMAnaCH as a Master 2 intern
His work is in collaboration with the Willow project-team at Inria, with the aim of constructing better multilingual and multimodal word embeddings.
19 Apr 2021
🤝 Tú Anh Nguyễn just joined ALMAnacH as a PhD student
He will carry out his research on the unsupervised learning of linguistic representations from speech (audio) data under the joint supervision of Benoît Sagot, for ALMAnaCH, and Emmanuel Dupoux, for FAIR (Facebook's AI research lab in Paris) in the context of an industrial (“CIFRE”) PhD.
5 Apr 2021
🤝 Hugo Scheithauer just joined ALMAnaCH as a Master 2 intern
He will work on the addition of NER technologies into the open-source eScriptorium environment for automatic transcription using the use case provided by the LECTAUREP project.
1 Apr 2021
🤝 Syrielle Montariol just joined ALMAnaCH as a post-doc
She will work within the H2020 CounteR project under the main supervision of Djamé Seddah on the detection of semantic changes in social media posts at an individual level, in order to contribute to detecting and analysing multiple types of radicalisation processes.
1 Apr 2021
🤝 Thomas Wang just joined ALMAnaCH as a research engineer
Within the framework of Benoît Sagot's PRAIRIE chair, he will work on novel neural language modelling architectures that require less computing power, less memory and/or less data for training. He will notably work on reducing the computational and memory impact of attention mechanisms, especially when long inputs must be processed at once.
1 Apr 2021
🤝 Roman Castagné just joined ALMAnaCH as a Master 2 intern
Within the framework of Benoît Sagot's PRAIRIE chair, he will work on multi-level neural language modelling architectures in order to lower the impact of input noise in the performance of such models.
1 Apr 2021
🤝 Julien Abadji just joined ALMAnaCH as a research engineer
Within the framework of Benoît Sagot's PRAIRIE chair, he will work on the quantitative (volume, number of languages) and qualitative (language classification accuracy, offensive content filtering) improvement of our Common-Crawl-based large multilingual corpus OSCAR. He will also work on the production of new versions of OSCAR on a regular basis.
8 Mar 2021
🤝 Manon Ovide just joined ALMAnaCH as a Master 2 intern
She will work on the digital scientific publishing pipeline set up for the DAHN project, and in particular on the publication step, in compliance with TEI guidelines.
11 Feb 2021
Inria Paris, Unapei and Facebook Artificial Intelligence Research present Cap'FALC, a project aiming to improve information accessibility for people with intellectual disabilities by developing a new digital tool to help produce more content in FALC (“Facile à Lire et à Comprendre”, i.e. Easy to Read and Understand). [The event will be held in French]
1 Feb 2021
🤝 Thibault Charmet just joined ALMAnaCH as a research engineer
He will work in collaboration with the Cour de Cassation on tools for improving jurisprudence consistency, as part of the IA Lab, an initiative within DINUM (the Direction of Digital Affairs attached to the French Prime Minister) whose goal is to help the State's public administrations to benefit from the recent advances in AI.
13 Jan 2021
New ALMAnaCH website launched!
19 Nov 2020
Article on the collaboration between ALMAnaCH and the Winespace start-up on Inria's website
16 Nov 2020
📣 Benoît Sagot at "France Is AI"
Benoît Sagot was invited as a panellist with François Yvon at "France Is AI", France's biggest event in Artificial Intelligence.
1 Nov 2020
🥂 Benoît Sagot promoted to "Directeur de Recherches"
1 Nov 2020
🤝 Rachel Bawden just joined ALMAnaCH as an Inria “Chargée de Recherches”
She will be working on machine translation and multilingual NLP.
1 Nov 2020
🤝 Arij Riabi just joined ALMAnaCH as a research engineer
Within the framework of Benoît Sagot's PRAIRIE chair, she will be working on NLP for low-resource, non-standardised language varieties, especially North-African dialectal Arabic written using the Latin script (Arabizi)
1 Nov 2020
🤝 Lucas Terriel just joined ALMAnaCH as a research engineer
Witin the EHRI, DAHN and NER4archives projects, he will work at the interface between NLP and Digital Humanities for archival documents, with a focus on named entity recognition in finding aids.
8 Oct 2020
🎓 Jack Bowers's PhD defence
Jack Bowers defended his PhD thesis, supervised by Laurent Romary, on language documentation and standards in digital humanities, and more precisely on the use of the TEI to document Mixtepec-Mixtec.
1 Oct 2020
🎓 Mohamed Khemakhem's PhD defence
Mohamed Khemakhem defended his PhD thesis, supervised by Laurent Romary, on standard-based lexical models for the automatic structuration of electronic dictionnaries
1 Sep 2020
🤝 Yves Tadjo just joined ALMAnaCH as a research engineer
Within the DAHN project, he will develop tools for digital humanities for archival documents.
15 Jul 2020
🎓 Loïc Grobol's PhD defence
Loïc Grobol defended his PhD thesis, supervised by Isabelle Tellier†, Frédéric Landragin, Marco Dinarelli and Éric de la Clergerie, on coreference resolution for French.
25 May 2020
Article on the Cap'FALC initiative on Inria's website
Cap'FALC is an initiative involving FAIR (Facebook) and UNAPEI. Its goal is to develop a text simplification algorithm and an accessible tool to aid the production of FALC (the French equivalent of “Easy read”) for people with mental disabilities
4 May 2020
Laurent Romary interviewed on Inria's website
19 Nov 2019
📰 French national radio station France Culture speaks about CamemBERT
18 Nov 2019
📰 The French newspaper Le Monde publishes a paper on CamemBERT
1 Jul 2019
ALMAnaCH is now an Inria project-team
1 Jan 2017
Creation of ALMAnaCH as an Inria team