Fair ML models and Cross-Attentional Neural Network for Image and Text

3 min readMay 24, 2022

“Novel multimodal learning solution from bank automation”

On 17th May 2022 Data Science Milan has organized a Meetup at the wonderful Unicredit Tower location with Riccardo Monetti and Dario Saccavino as speakers.

“Towards fair ML models”, by Riccardo Monetti, Data Scientist at Unicredit

Riccardo has spoken about Fairness in Machine Learning applications.

Giving a definition of fairness is quite hard, it could be resumed saying it refers to the various attempts used to correct the bias in the decision process based on machine learning models.

Riccardo has shown fairness as a constructed space where everything is balanced, an ideal world, meanwhile in a real world there are many other variables such as bias, errors, prejudice and so on that transform it in imbalanced space. Fairness is linked with the bias.

In the data lifecycle, moving on the observed space is introduced the social bias, for instance by subjectivity, then with sampling strategy is obtained a raw data space with representation bias, losing for instance some specific data points and after that with data preparation strategy is obtained a prepared data space introducing another bias, a data preparation bias from data scientists and machine learning engineers.

In the machine learning modeling the question is which type of bias occur in the measurement from the construct space, which classes have to be protected, then which type of target will be evaluated in the task, and at the end how much unfairness is tolerated to select some actions to mitigate the bias.

“Cross-Attentional Neural Network to Compare Image and Text”, by Dario Marino Saccavino, Data Scientist at Unicredit

Dario has shown a machine learning model developed for a business use case: automating the process of cheque deposit by customers at the ATM. Usually, a client who deposits a cheque into the ATM is asked to manually insert some textual fields written into the cheque (amount and date). Later, a back-office operator manually checks the correctness of the filled fields. The goal of the automated process is to examine the scanned image of the cheque and verify that the printed fields inserted in the ATM is the same.

The first step on the cheque verification workflow is based on the object detector, “YoLo model”, the second step consist on extract text from the cropped image and compare with the typed text.

The approach followed to overcome the difficult in the text extraction was performed by TextMatcher architecture made up by a neural network structure built with an encoder block followed by a sequence-to-sequence decoder.

The idea to perform a better comparison was to exploit the information from the encoder part, because this side contains all compressed information from the inputs. So, the assessment is made up generating the image embedding and the text embedding to discover, by a cross-attentional block, alignments between the two sequences of elements. This step affords to compute the local similarity score and then the global similarity score compared with the similarity threshold to label if image and text are matching or not.

The innovative Cross-Attentional neural network is inspired on the self-attention layer from Transformer architecture, well known in NLP.

The performance of the model was evaluated increasing the similarity threshold with a lower rate tolerating a higher rework rate. The motivation is that with the rework rate as the false negative (the algorithm rejects a correct value), the cheque is passed to the back office with the goal to fix the mistake. Instead with the error rate, when there is the false positive (the algorithm accepts an incorrect value), the cheque has the good probability to pass undetected in the workflow generating issues on the bank process.

Recording&Slides:

video

slides

References:

[2205.05507] TextMatcher: Cross-Attentional Neural Network to Compare Image and Text (arxiv.org)

Written by Claudio G. Giancaterino

Fair ML models and Cross-Attentional Neural Network for Image and Text

Written by Data Science Milan