Experiment 5: hashing data

There are law enforcement projects underway to create databases of contraband files such as child porn or commercial software. These projects store information about the files without storing the files themselves, for the sake of faster queries as well as simply not wanting to store the stuff. The information is stored as hashes, or fingerprints, of each file. The simplest way to thwart such an automated database is to modify the illicit files in some way.

Method

Fig. 5.1: Sample JPG file used.

Take an image (e.g. Fig. 5.1) and change it such that its hash value changes. The original file is called coffee1.jpg; the others are copies of this file, changed in one subtle way that won’t be noticed by the eye. The second file had the bottommost row of pixels cropped. The third file had its red channel color balance increased by 1. The fourth one was re-saved with the JPEG quality set to 99%.

Results

As shown below, all four files have different hash values but contain essentially the same image -- making automated hash comparisons useless. It is simple to script these tasks, making their use over thousands of images trivial. The same thing could be done to illegally copied music files. Even pirated software can be disguised, by adding a small text file to its ISO.

nate@huygens:~$ sha256sum coffee*.jpg a347eaee0c2cb98c9bf7cb1e47425a7e9efa152299f74c4989554ab898c40d67 coffee1.jpg f8512bf5fd93c37e0933b748afcd3913a8ab77dc23d65c4a5cf18a8390dab354 coffee2.jpg 75f0dab516996b1a8b6ae3621b079f9040fa0de103c69c73ebd79ea106c30b75 coffee3.jpg f08a8713b4f439e759422cf76e27c02218efc0731449cf7cffbb317dbda38c32 coffee4.jpg