the difficult choice of sharing data and at not risking data privacy

Our Path to safer AI : Sharing data without compromising privacy

5 min readFeb 22, 2021

Smoothly sharing data and preserving privacy — does it need to be such a difficult choice?

Friction is a beautiful phenomenon — in the right amount, it gives you control. Have too much of it and it makes you immobile and makes work frustrating.

A big source of friction in data-led innovation is sharing of data, and for people who are at the other side of it, accessing the data. This problem gets tougher when one is handling sensitive data.

Now there is some amount of good friction (Rightly so). Various laws and compliances (such as GDPR in Europe, HIPAA in USA, PDP in India) which mandate ethical and responsible use of data by enterprises and technology teams. However, given the contexts of your business, sometimes this friction can also prove to be counter-productive or limiting, even though you are eventually creating value for the same customer.

A map of sensitive data used in AI, and the various data residencies and compliances that act on it

What if you never had to choose between speed and safety?

Smoother, safer sharing of data has multiple benefits -

Going from idea to AI experiment faster
Being able to extract value from untapped data sources

Changing mindsets: Future-proofing v/s When-it-happens

Sharing of data within a company or with external data-science vendors is not easy, and comes with big responsibilities :

Preserving privacy of Personally Identifiable Information (PII)
Protecting sensitive data from exposure

Privacy-by-default: Going beyond Compliance and regulations

source : Gartner State of Privacy Report

Road/traffic systems started being developed in the 1800s. Cars started coming in the 1900s, with limited features of course. But just having better traffic systems didn’t stop the accidents. So that is when vehicles had to develop different mechanisms(breaking, airbags, etc) inside the vehicle, hence making the journey far more safer and reliable than what the traffic rules facilitated.

Compliance and regulations create the necessary friction that helps in being customer-first. On the other hand, applying and upholding them while learning from data to benefit the same customers — is easier said than done.

Jokes aside, just depending on compliance and regulations is not enough. They set a foundation. World’s leading data science teams not only want to pass compliance, but also be one step ahead, and design privacy-by-default into their workflows.

What’s happening in the real world?

Unlocking personalization — safely and privately

In the case of finance, an entire ecosystem of personalization products and services is based on data that is generated by our financial records and transactions.

However, doing personalization at scale is not a trivial challenge. It needs —

iterative learning
a single view of the customer, and
a contextual ‘curriculum’ (sequence of interventions for the customer)

These personalization services collect various data points about their audience, from multiple sources. What if most, if not all of these sources have sensitive, personal data about their customers? What if all of these sources are governed by different regulations, even. This very quickly turns into friction that puts roadblocks and speed bumps in your process.

When it comes to this tradeoff, this is what Sri Shivananda — SVP and CTO at Paypal — has to say :

Your business is glocal, why not your data science?

Risk-prevention algorithms need to connect both global and local data sets to deliver accuracy. However, with increasing local data residencies, it is becoming increasingly tricky and tedious to keep track of local regulations and merging that with the global insights that most MNCs are built on.

Brighterion, San Francisco-based AI company, was acquired by Mastercard in 2017. Worldpay turned to Brighterion to help them with anti-money laundering (AML), counter-terrorism financing (CTF), financial fraud and compliance before thousands of people converged on London for the 2012 Summer Olympics.

To achieve truly effective, dynamic, and scalable AML — while also ensuring compliance with UK government regulations — Worldpay and Brighterion worked together to move beyond the existing rules-based systems to implement the highly adaptive, real-time, AI-based fraud prevention

However, the future is not making it easier for such technologies to flourish smoothly. Companies such as Brighterion and Simility (acquired by Paypal) are facing a common pressure — merging their historical data sets with local customer data. China’s data protection laws (for instance PBOC’s data guidelines) prohibit incoming external data from being purged, at the same time, they also prohibit any local data to leave its data centers. How do the leading risk analysis technologies navigate this stalemate?

AI is built better together. Can we not have to choose between collaboration and efficiency?

An increasing number of banks and FIs (Financial Institutions) have started collaborating with external AI teams and startups on niche, rewarding challenges such as risk assessment, credit assessment, personalization, and such.

However, operationalizing these relationship has many pressures :
1. Long negotiation cycles
2. Constantly second-guessing trust in each other: liabilities and risks of data sharing