Entry tags:
Using AI to hide secret messages via Steganography
Interesting stuff. I especially enjoyed the bits talking about older methods of hiding messages in plain sight, like marking words in print with invisible ink.
Steganography is an interesting art. It's not cryptography as the technically the text is plainly available - if you know how to read it. One method of steganography was encoding messages in photographs and then posting them online. There's lots of wasted bits in photos, so you alter the bits, which doesn't really alter the image, post the photo, the recipient knows how to decode the bits, the message is passed. But the technique is detectable because the image doesn't compress as well as an unaltered photo.
Detecting textual steganography requires that you analyze the message text and develop a word probability distribution. The word 'the' is one of the most commonly occurring words used in written and spoken communications, 'analysis' less so. By comparing normal text to steganographic text, you can make assumptions as to whether or not text contains a hidden message.
The text that the message is hidden IN is called the cover text. It might be something like a visit to a local museum, and then the AI will alter that text to inject your secret message. You can then send the altered message and the recipient can re-process it and extract your secret message.
Now, here's the interesting bit. By using AI, the difference in probability distributions can be reduced to zero. So an enemy - a censor, a hostile state actor, whatever - cannot accurately say that any given message contains stenographic text!
Word probability doesn't tell you what the hidden message is, just the likelihood of whether or not there is a hidden message there, which may mean an increased likelihood of a person or group coming under tighter scrutiny.
The problem that I see with this is they're talking about a "plug-in for an app like WhatsApp or Signal would do the heavy algorithmic lifting". I'm a little confused at this point. If they need to match the probability distribution of the cover text with the PD of the secret message, and it's done by an AI which is a supercomputer or a computer cluster, will you be able to do that with just a plugin on a smart phone? I'd like to see some more solid proof of concept here rather than 'our math models demonstrate' sort of stuff before human rights workers in bad places put themselves at risk with stuff like this.
https://www.quantamagazine.org/secret-messages-can-hide-in-ai-generated-media-20230518/
Steganography is an interesting art. It's not cryptography as the technically the text is plainly available - if you know how to read it. One method of steganography was encoding messages in photographs and then posting them online. There's lots of wasted bits in photos, so you alter the bits, which doesn't really alter the image, post the photo, the recipient knows how to decode the bits, the message is passed. But the technique is detectable because the image doesn't compress as well as an unaltered photo.
Detecting textual steganography requires that you analyze the message text and develop a word probability distribution. The word 'the' is one of the most commonly occurring words used in written and spoken communications, 'analysis' less so. By comparing normal text to steganographic text, you can make assumptions as to whether or not text contains a hidden message.
The text that the message is hidden IN is called the cover text. It might be something like a visit to a local museum, and then the AI will alter that text to inject your secret message. You can then send the altered message and the recipient can re-process it and extract your secret message.
Now, here's the interesting bit. By using AI, the difference in probability distributions can be reduced to zero. So an enemy - a censor, a hostile state actor, whatever - cannot accurately say that any given message contains stenographic text!
Word probability doesn't tell you what the hidden message is, just the likelihood of whether or not there is a hidden message there, which may mean an increased likelihood of a person or group coming under tighter scrutiny.
The problem that I see with this is they're talking about a "plug-in for an app like WhatsApp or Signal would do the heavy algorithmic lifting". I'm a little confused at this point. If they need to match the probability distribution of the cover text with the PD of the secret message, and it's done by an AI which is a supercomputer or a computer cluster, will you be able to do that with just a plugin on a smart phone? I'd like to see some more solid proof of concept here rather than 'our math models demonstrate' sort of stuff before human rights workers in bad places put themselves at risk with stuff like this.
https://www.quantamagazine.org/secret-messages-can-hide-in-ai-generated-media-20230518/
Re: Well ...
I agree, that's an important point.
>> One of the most secure forms of crypto is one-time pads. Sender and receiver each have matching OTPs to encrypt and decrypt messages, and each page is used once then destroyed. Never reused. Because the key is only used once, there's never a repeating pattern. As long as the algorithm that generates the OTPs is horribly flawed and your adversary got ahold of it, it's pretty darn perfect. Of course, if your adversary gets ahold of your messages and your used OTPs, you're probably screwed.<<
You can create something like that using books. There's a code that relies on looking up letters or words in a book that both people have. If you use the same book, there's always a chance your enemy will figure out which it is, get a copy, and decode the message. But if you both have access to a shelf of books, which you move through using an agreed pattern, then the chance of an enemy cracking that approaches nil. It's hard enough if you're just using them from left to right. But you could just as well use them in the alphabetical order of Norse runes or Celtic ogham. If you're doing this in a section of library where nothing's been checked out in 30 years, and there are thousands of books, then you're right back to a code that's effective because it can't be cracked fast enough.
And if you're a complete bastard about it, you might even do this in such a way that using a particular wrong book to crack the code will yield a coherent but misleading message. Of course, that really takes a crypto genius.