Hashes and Cryptographic Hashes

By Luke Hally

September 14, 2021

Categories: Security engineering

Tags: Encryption

Encryption is the glamourous part of confidentiality, today we will look at hashes. We’ll cover regular and cryptographic hashes, a bit of history then we’ll look at ways to attack them.

Hashing

In the last podcast we talked about tamper evidence with wax seals, magic numbers and MACs. It turns out the magic number is called a hash. A hash function takes a message and returns a number. But what could it be used for? We typically use them for comparing files for change or corruption – taking the hash is known as finger printing the file.

We have two types of hash, a regular hash and a cryptographic hash.

Regular Hash

A regular hash has these properties

It summarises something
Things that are likely to change will give a different fingerprint
Chance of repeats is statistically low
Things that are nearby should have different hashes – known as the avalanche effect: small change in, big change out.
It has a fixed size output no matter the input size. There will be inevitable collisions (inputs with the same hash)
Collisions have even distribution, with collisions being for unrelated inputs

Search engines use hashes for indexing and with the opposite properties, they want hashes of similar things to be close to each other so similar results come up together.

Cryptographic Hash

We use them for integrity, to know whether or not a file has been tampered with or corrupted

Same properties as a regular hash
They are one way (this is done through iteration – lots of rounds)
Pre-image resistant – reversing the hash and finding the original message
Collision resistant – reduced collisions
Second preimage resistance (specific collision resistant) – finding another message with the same hash

History

MD

It all started with MD – message digest) by Rivest (of RSA fame). It was 128 bit and there were various versions. Various weaknesses were found so it was further developed. Note that a weakness is not a vulnerability but a concern that there may be one and a sign that it is time to do it better. It culminated in MD5, but the weaknesses were so bad that it was recommended to not be used in 1996, but was still in use in 2006. It could be reversed by a laptop in 2005.

SHA

The NSA donated SHA in 1993, soon afterwards they realised that it had theoretical problems and released SHA1 which was 160 bits, 128bit was too small. Like AES, it relied on lots of rounds. It was vulnerable from 2005 (to state funded adversaries, NSA etc) but web browsers accepted it until 2017 and Windows accepted it until 2020! There was too much hardcoded dependency on it. This is a good example of invisible risk – people going about things with this risk in their security. SHA2 was released in 2001 and is widely used – do not use MD5 or SHA1. SHA2 comes in different sizes and is named after the size eg: SHA256, SHA512. It hasn’t been broken but is a bit shaky. SHA3 is recommended.

A note on Merkle–Damgård

No they are not a guard to stop things spilling over, it is the name of the contributors. They were used in MD, SHA1 and SHA2 as a method for creating the hash with collision resistance.

break the input into blocks.
has the block, then add the hash to the next block,
then hash the next block and so on.
This ensures that the hash relies on all contents of the file (making sure any change in the input causes a change in the output)

But there is a problem, it is vulnerable to a message extension attack, if I know the hash (it’s a known method), I can simply add that hash to my new block and add my block to the hash.

Attacks

Collision attack

Finding two inputs that produce the same hash. Note we don’t care about the message, just that we can find one that gives us the same hash as another. But why? We might be trying to crack a password and have the hash from a hacked database – anything that matches the hash will get us in but if we can find multiple ‘passwords’ that render the matching the hash we increase our chance of success.

Pre image attack

Reversing a hash, the reason for wanting to do this is self explanatory.

Second preimage attack (specific collision)

Given message ‘x’ and its hash, can we find a second message ‘x2’ that will generate the same hash? Or someone may want to swap a document or file without anyone realising – if the hash matches how will they know? Richard’s example of the will was a good one. I create a will and take the hash, then I publicly release the hash and say “when I die, anyone can check the integrity of the will by taking the hash and seeing if it matches this hash”. But what if my lawyer is dodgy and wants my money? Knowing the original message and the hash, they could undertake a second preimage attack.

Imagine I have used MD5, only 128 bits. The lawyer could create a new will, with 128 lines, then add or remove a space at the end of each line until the correct hash is generated. This is too much work for a computer, so a birthday attack can be used. Since the lawyer has access to the legitimate and illegitimate wills, they can keep modifying both until a matching hash is found.

Reflection

Hashing is a way of fingerprinting an input, say a file to maintain integrity. Someone can take the hash of a file and compare it to the published hash for that version and if it matches they can be confident it is the same and they have the correct version and it is not corrupted. Cryptographic hashes have the same properties as a regular hash, with some extras:

They are one way (this is done through iteration – lots of rounds)
Pre-image resistant – reversing the hash and finding the original message
Collision resistant – reduced collisions
Second pre image resistance (specific collision resistant) – finding another message with the same hash

The distribution of collisions is a great way to deal with them. It is a bit like when someone is hiding something in a room, and they look away from the thing they are hiding – giving a clue to it’s location. But with a hash even if we looked in the correct part of the keyspace, it doesn’t tell us anything.

Cyber Security

Luke Hally