AI Risks: Optical Character Recognition and Named Entity Recognition

27 June 2024

The AI risks project assesses the data protection risks of AI for Optical Character Recognition (OCR) and Named Entity Recognition (NER).

OCR is a technology used to convert images or scanned documents containing text into machine-readable text.
NER is used to identify named entities such as names, organizations and locations within a document and classify them into predefined categories.

The EDPB launched this project in the context of the Support Pool of Experts programme at the request of the EDPS.

Project completed by the external expert Isabel Barbera in September 2023.

 
Objective

This project helps data controllers who use AI for these purposes to perform an assessment of data protection risks and, data protection authorities to evaluate the validity and effectiveness of this assessment in the course of their investigations. 
For both technologies, the external expert identified specific data protection and privacy risks posed by the procurement, the development and the use of the specific technology.

The AI Risks project includes several deliverables: 

Named Entity Recognition (NER) - Possible Risks & Mitigations 577.2KB
Optical Character Recognition (OCR) - Possible Risks & Mitigations 865.1KB