Cyber Security

Luke Hally

Data lakes

August 2, 2021
Tags:

A data lake is a lot of data. Lots of pieces of useless information, all floating around made useful by proximity and manipulation. As security people – and as people in a digital world – as people who care about privacy we want it deleted. But data governance people – people within businesses who are charged with its care – they want to keep it. They want to keep it all, to grow it and use it for things. They don’t know what the things are, but they know the things will be good – and make them sound smart in meetings. So what’s the problem with data lakes? They have two main issues:

  1. They are attractive to attackers, and hard to maintain and defend.
  2. They are prone to scope creep – once you have the data, why not use the data? We saw this with the WA Police using QR check-in data. Once you are using the data, why not share the data?

Scope creep

I’ve noticed point 2 in my own experience, a tendency amongst tech types to want to gather and analyse data via two approaches:

  • Devs who see themselves as benevolent and also more knowledgeable than others, wanting to use data to make things easier/more intuitive/reduce friction for ‘users’. 
  • Managers who don’t know what they want to do with the data, but they know that once they have the data, they will be able to do all sorts of things with it to increase engagement/profits/ad revenue. 

Reflection

We need to remember that just because we can, doesn’t mean we should. We need to consider the increased risk of leakage from a data lake as well as the increased risks to privacy due to scope creep. 

And that all the water doesn’t stay in the lake, it eventually makes it to the ocean. Or in data parlance, all the data eventually makes its way to the internet.

Recent posts