Research projects

EU projects

EHRI “European Holocaust Research Infrastructure”

EHRI “European Holocaust Research Infrastructure”

Digitalisation of the archival research on the Holocaust.

DARIAH

DARIAH

Digital Research Infrastructure for the Arts and Humanities.

CounteR

CounteR

Privacy-first situational awareness platform for violent terrorism and crime prediction, counter-radicalisation and citizen protection.

ANR projects

CulturIA

CulturIA

A Cultural History of Artificial Intelligence.

MaTOS

MaTOS

The MaTOS (Machine Translation for Open Science) project aims to develop new methods for the machine translation (MT) of complete scientific documents, as well as automatic metrics to evaluate the quality of these translations.

Other national projects

PRAIRIE

PRAIRIE

An institute for interdisciplinary research and education in Artificial Intelligence, founded by 5 academic an 16 industrial members.

Cap'FALC

Cap'FALC

Development of a text simplification algorithm and an accessible tool to ease the production of FALC (the French equivalent of “Easy read”).

EFL

EFL

Empirical foundations of linguistics, including computational linguistics and natural language processing.

LiLT

LiLT

Linguistic issues in language technology.

Huma-Num

Huma-Num

Very large research infrastructure (TGIR) aimed at facilitating the digitalisation of humanities and social sciences.

Patrimoines matériels – innovation, expérimentation et résilience

Patrimoines matériels – innovation, expérimentation et résilience

Nénufar

Nénufar

Digitalisation and exploitation of the early editions of the Petit Larousse dictionary.

OncoLab

OncoLab

Standardisation and structuring of cancer-related health data.

HTRomance

HTRomance

The HTRomance project aims to address the issue of the generalization of manuscript text transcript models (HTR), in particular for manuscripts from the 11th to the 19th century preserved at the Bibliothèque nationale de France.

COLaF

COLaF

Resources and tools for languages of France.

TIERED

TIERED

Transforming Interdisciplinary Education and Research for Evolving Democracies.

International projects

Universal Dependencies Project

Universal Dependencies Project

The Universal Dependencies project is an open community effort with over 300 contributors producing nearly 200 treebanks in over 100 languages.

Past projects

EU projects
  • enCollect (COST, 2017-2020): Combining language learning and crowdsourcing for developing language teaching materials and more generic language resources for NLP.
  • DESIR (H2020, 2017-2019): The DESIR project aims at contributing to the sustainability of the DARIAH infrastructure along all its dimensions: dissemination, growth, technology, robustness, trust and education. Inria is responsable for providing of a portfolio of text analytics services based on GROBID and entity-fishing.
  • HIRMEOS (H2020, 2017-2019): Integration of Research Monographs in the European Open Science infrastructure.
  • Parthenos (H2020, 2015-2019): Strengthening the cohesion of research in the broad sector of Linguistic Studies, Humanities, Cultural Heritage, History, Archaeology and related fields through a thematic cluster of European Research Infrastructures, integrating initiatives, e-infrastructures and other world-class infrastructures, and building bridges between different, although tightly interrelated, fields.
  • Iperion CH (H2020, 2015-2019): Coordinating infrastructural activities in the cultural heritage domain.
ANR projects
  • BASNUM (ANR, 2018-2023): Digitalisation and computational annotation and exploitation of Henri Basnage de Beauval’s encyclopedic dictionary (1701).
  • Profiterole (ANR, 2017-2021): Modelling and analysis of Medieval French.
  • ParSiTi (ANR, 2016-2022): Context-aware parsing and machine translation of user-generated content.
  • TIME-US (ANR, 2016-2021): Digital study of remuneration and time budget textile trades in XVIIIth and XIXth century France.
  • SoSweet (ANR, 2015-2020): Studying sociolinguistic variability on Twitter, comparing linguistic and graph-based views on tweets.
  • PARSE-ME (ANR, 2015-2021): Multi-word expressions in parsing.
  • VerDI (ANR RAPID, 2015-2018): Automatic identification of information concealment on the internet.
Other national projects
  • DAdaNMT (Sorbonne Emergence, 2022-2023): The aim of this project is to investigate domain adaptation for neural machine translation. We will be exploring the adaptation of models to specific, low-resource domains domains as well as training models for multiple domains.
  • Gallic(orpor)a (BNF Datalab, 2021-2021): Consolidate and apply a processing chain for ancient Gallica documents in long diachrony, from the first French manuscripts to revolutionary prints.
  • DataCatalogue (Convention (MIC), 2021-2022): The project aims at contributing to the proper transition between a basic digitalisation of cultural heritage content and the actual usage of the corresponding content within a "collection as data" perspective. To acheive this, we experiment news methods for extracting the logical structure of scanned (and OCRed) catalogues and standardise their content for publication towards curators, researchers, or wider users.
  • NER4archives (Convention (MIC, Archives Nationales), 2020-2021): Named entity recognition for finding aids in XML-EAD, a standard for encoding descriptive information regarding archival records.
  • DAHN (Convention (MIC, Archives Nationales), 2019-2022): Digitalisation and computational exploitation of archives of historical interest.
  • LECTAUREP (Convention (MIC, Archives Nationales), 2018-2021): Development of a platform for the transcription, reading and automatic analysis of notarial deeds present in the National Archives.
  • OPALINe (PIA, 2017-2020): Development of tools for the accessibility of digital books for visually impaired people.
  • Matériaux Anciens et Patrimoniaux (DIM, 2017-2021): The DIM « Matériaux anciens et patrimoniaux » (MAP) is a region-wide research network. Its singularity relies on a close collaboration between human sciences, experimental sciences such as physics and chemistry, scientific ecology and information sciences, while integrating socio-economical partners from the cultural heritage environment. Based on its research, development and valorization potential, we expect such an interdisciplinary network to raise the Ile-de-France region up to a world-top position as far as heritage sciences and research on ancient materials are concerned.
International projects
  • BigScience (Informal initiative, 2021-2022): This collaboration aims at fostering discussions and reflections around the research questions surrounding large language models (capabilities, limitations, potential improvements, bias, ethics, environmental impact, role in the general AI/cognitive research landscape) as well as the challenges around creating and sharing such models and datasets for research purposes and among the research community. The collaborative tasks involves creating, sharing and evaluating a large multilingual dataset and a large multilingual generative language model. An uncommonly large compute budget was allocated for these collaborative tasks (several millions GPU hours on several thousands GPUs, in particular on the French public cluster Jean Zay).
  • NLP Resources for Analyzing Reactions to Major Events in Hebrew and French Social Media (PHC Maïmonide, 2018-2019): Building NLP resources for analyzing reactions to major events in Hebrew and French social media.
  • MCM-NL (ANR-NSF, 2016-2020): Exploring correlations between data from neuro-imagery (fMRI, EEG) and data from NLP tools (mostly parsers). The data comes from “Le Petit Prince” read in French and English, and parsed with different parsers.