They often say the internet has no delete button.
It's a very useful analogy to explain to new users of the internet the gravity of certain areas of their personal computer and information security: maybe think twice before uploading semi-sensitive information, and three times for anything more secret. Once it's on the internet it can be hard to delete.
How to delete files from the internet:
One way could be to create a "hash" of the file you want gone from the internet, and then you know what you don't want to know, without knowing it! Amazing. Example time:
Assume the file is named horrific-revenge-porn.jpg
The following hash (random encrypted strings derived from the picture of an idiot) can be used instead of the actual offensive / suppressed original file data - makes sense if you are trying to delete the file not to hold it! It can be given to administrators to guarantee a system does not have a copy of the file:
For the very most extreme cases, involving criminal contraband information such as the unfortunately case of the kiwi man from Hastings sentenced to 4 years for secretly filming his female Airbnb guests, CERT and The NZ Police should offer to provide file hashes to the victims of criminal data breaches like revenge porn and so forth.
This would enable the following desirable privacy benefits:
- ensure the banned files are not held on owned computer systems
- securely provide the means for others to also do so
- nothing about the files contents can be said from looking at the random letters*
- a registry of illegal files would help large ISPs to keep their disks clean
- an infinitely large file - even a 50 GB file - is reduced to a short piece of text
- its not encryption - it's an irreversible scrambling of any file to a set sized chunk of gibberish
For example, lets say this picture above is some revenge-porn you made once but was posted by your evil ex-partner or stalker, and now you'd like it gone from the nets; in theory if you put the file hashes into a government registry, one day ISPs can do seasonal scans and wipe files matching.
China likely does this to laser target and delete entire sections of internet from it's citizens. They probably have scanners running 24/7 to find old shots from the Tiannamen Square Masacre - and perhaps even this new shot - this time the guy is flat as a pancake after being literally "rolled" by a tank:
SHA256 is the current state of the art. You can get SHA512 also but its twice the length.
MD5 was huge for a very long time. Popular for verifying big .iso files after downloading.
CRC32 is not a hash, but its a checksum maintained by your computer in the disk that can also be used in a similar way. It will detect a single bit change, but unlike a true secure hash, you can pad the altered file to get the same CRC32 for a different file easily (if you add a bit, then delete a bit also, it is just each byte added together essentially).
While it's theoretically possible for two different input files to create the same hash - a hash collision - if you use two or more different hash types like above or even just also including the filesize in bytes: 50,323 bytes in this case, you eliminate the false positive potential.
Also, any large ISP isn't going to want to automatically delete files based on just one parameter. For use by a sovereign national police force I'd recommend using all four: bytes, crc32, md5, sha256 plus a category eg: kid-porn, espionage, credentials, financial, medical/health, military, government, personal privacy, government, education, entertainment (here we hit a snag: the copyright industry).
The way a hash completely changes with tiny little single bit alterations to the input file, to get a hash collision is going to require a wildly different filesize, say 50 Kb versus 50 Tb!
The commands to get this on my mac were:
shasum -a 256 [bad-file.jpg] md5 [bad-file.jpg] crc32 [bad-file.jpg] ls -la [bad-file.jpg]
What's nice is that you can double check your hash using a different program, openssl:
openssl sha -sha256 [bad-file.jpg]
Now you can quickly compare huge files without transferring them; and detect tiny alterations to big files.