Steganography
By: Joe Jupin Supervised by: Dr. Longin Jan Latecki
Overview
Introduction
Clandestine Communication Digital Applications of Steganography Uncompressed Images Compressed Images Steganalysis The Images Used
Background
Finding and Extracting Messages from Bitmaps Detecting Messages in jpegs Future Work
Introduction
Clandestine Communication
Cryptography
Scrambles the message into cipher
Steganography
Hides the message in unexpected places
Digital Applications of Steganography
Can be hidden in digital data
MS Word (doc) Web pages (htm) Executables (exe) Sound files (mp3, wav, cda) Video files (mpeg, avi) Digital images (bmp, gif, jpg)
Background
Character Space 09 AZ az
Length = 12
Message = Hello Stego!
Binary
Uncompressed Images
Grayscale Bitmap images (bmp)
Integer 32 00100000 256 shades of intensity from black to white 48 obtained 57 00110000 00111001 Can be from color images Arranged into a 2-D matrix - 01011010 65 90 01000001 Messages are hidden in the least significant bits 97 122 01100001 01111010 (lsb) Matrix values change slightly Interested in patterns that form messages
Background
Compressed Images
Grayscale jpeg images (jpg)
Joint Photographic Experts Group (jpeg) Converts image to YCbCr colorspace Divides into 8x8 blocks Uses Discrete Cosine Transform (DCT)
Obtain frequency coefficients Scaled by quantization to remove some frequencies High quality setting will not be noticed
Huffman Coding Affects the images statistical properties
Background
Steganalysis The Images Used
From Star Trek Website
1,000 color jpeg images 320x240 or 240x320 www.startrek.com There will be Klingons
Finding and Extracting Messages from Bitmaps
Problem
Messages can be hidden in lsbs May be anywhere in image Cannot see message in image Would take forever to be processed by a human
Finding and Extracting Messages from Bitmaps
Steganography is the art and science of communicating in a way which hides the existence of the communication. In Procedure contrast to cryptography, the "enemy" is allowed to Inject messages intowhere a images detect, intercept and modify messages without being able to Take a Boolean snapshot of even and odd pixels violate certain security premises guaranteed by a Construct a string of all possible characters cryptosystem, the goal of steganography is to hide messages An n-pixel image has n-7 individual character inside other "harmless" a way that does not enumerations (320 xmessages 240 - 7 = in 76,793) allow "enemy" to even detect that there is a second Use any character properties to match a message secret message [Markus Kuhn 1995-07-03]. pattern in the present enumerated string
Define a message (pattern of message characters) Define message characters (used in messages) Use stego stems (patterns)
A test can be performed faster by using tiled samples
Finding and Extracting Messages from Bitmaps
Observation
Only considered linear unencrypted messages Trial performed on 100 grayscale bitmaps
97 clean 3 stego
Took an average of 9 seconds per image to find with 100% accuracy (no training -- cold)
Occasionally some garbage text at head or tail
Took an average of 3 seconds per image to test with 100% accuracy
Clean images had pattern scores of less than 10 Stego images had pattern scores of 31 or more
Finding and Extracting Messages from Bitmaps
Conclusion
Messages are detectible and extractible from non-encrypted uncompressed images Linear messages can be found in any direction with more computation This method can be foiled by hashing the message into the image
Detecting Messages in jpegs
Problem
Cannot use an enumeration scheme to detect or find a message May only be able to detect because of encoding schemes and encryption Cannot see message in image Statistical properties of an image change when a message is injected
Detecting Messages in jpegs
-0.004 0.590963 meanV meanH 0.050189 meanD 120.485 0.080103 varV 0.059 0.345166 varH 0.363 0.343829 varD 1.041 0.332710 skwV 3.809 0.001311 skwH -0.291 0.021374 12 17.120 12 12 12 12 12 12 12 -0.146 0.482941 skwD 838.622 krtV 0.094929 krtH 97.874 0.084698 krtD 0.887 0.411032 meanEv 0.034 0.331954 1.391 0.572352 3.948 0.260870 -0.703 0.337264 12 12 12 12 12 meanEh 12 meanEd 12 varEv 12 -2.200 0.135543 varEh varEd 0.065238 skwEv 47.077 0.079329 skwEh -1.128 0.542244 skwEd -0.465 0.187500 krtEv 2.060 0.603208 krtEh 3.726 0.306227 krtEd -0.738 0.424866 12 15627.538 12 12 12 12 12 12 12
0.01123 15.318 0.370270 meanV meanH 0.032725 meanD 90.017 0.025054 varV 0.594 0.381317 varH 0.268 0.412698 varD 0.969 0.385321 skwV 3.877 0.001666 skwH -0.172 0.043085 23 23 23 23 23 23 23
-0.523 0.402427 skwD krtV 920.19 0.053992 krtH 62.226 0.155397 krtD -1.366 0.553661 meanEv -0.146 0.476190 1.326 0.432629 3.944 0.237224 -0.705 0.271698 23 23 23 23 23 meanEh 23 meanEd 23 varEv 23
Procedure
34 34
Obtain the 4-level 2-D wavelet decomposition of the0.935 images -0.004 0.395349 meanV meanH 0.026724 meanD 182.339 0.044753 varV -1.808 0.738226 varH 0.601 0.479060 varD 1.226 0.367367 skwV 4.692 0.073430 skwH 0.205 0.361345 -0.079 0.427911 skwD 193.451 krtV 0.042625 krtH 364.874 0.055986 krtD -9.569 0.558653 meanEv -0.116 0.350634 meanEh 1.133 0.332762 meanEd 4.244 varEv -0.577 0.301011 Obtain the orientation decomposition of0.165738 frequency 1.899 0.611057 varEh 3640.213 varEd 0.054988 skwEv 24.731 skwEh 0.166710 0.766 skwEd 0.497393 -0.349 0.518569 krtEv 1.681 0.373766 krtEh 3.426 0.153005 krtEd -0.625 0.320611 space statistics
23 15572.229 varEh 4.418 0.422609 varEd 0.096439 skwEv 23.531 0.087974 skwEh -0.123 0.463496 skwEd -0.541 0.471598 krtEv 1.980 0.242233 krtEh 3.571 0.153389 krtEd -0.705 0.360447 23 23 23 23 23 23 23
34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34
class 0 = clean, 1=stego) 72 features plus the class (0 Includes: mean, variance, skewness and kurtosis of coefficients and error for prediction in subband
Normalize the data by 0-1 min-max Train Fisher Linear Descriptor (FLD) Test the FLD threshold
Detecting Messages in jpegs
Observation
Trials performed on 2000 images
1000 clean and 1000 stego Random selection of 1000 instances without replacement (500 each class) Messages in stego had sufficient size
Results show overwhelming accuracy
Bior3.1 True Neg 100%, True Pos 98.6% Rbio5.5 True Neg 99.8%, True Pos 98.8%
Detecting Messages in jpegs
Conclusion
Messages of sufficient size can be detected in stego images with great accuracy Improved accuracy may be due to a large training set
1000 (800/200) 500 (400/100)
Restricted domain
Many similar images
Detecting Messages in jpegs
Problems
Authors did not handle log of zero problem
Replaced with small value
Differing jpeg sizes need differing message sizes
Dynamic message injection
Detecting Messages in jpegs
Other Classifiers
Tests were run on J4.8, SMO, Logistic and Nave Bayes for bior3.1 and rbio5.5 with 80/20 split and default settings Results
Future Work
Would like to find optimal stems
Pattern matching Text mining Cryptanalysis
Would like to optimize TestMsg code
C/assembly code
References
Petitcolas, F.A.P., Anderson, R., Kuhn, M.G., "Information Hiding - A Survey", July1999, URL: http://www.cl.cam.ac.uk/~fapp2/publications/ieee99-infohiding.pdf (11/26/0117:00) Farid, Hany, Detecting Steganographic Messages in Digital Images Department of Computer Science, Dartmouth College, Hanover NH 03755 Moby Words II, Copyright (c) 1988-93, Grady Ward. All Rights Reserved. Lyu, Siwei and Farid, Hany, Steganalysis Using Color Wavelet Statistics and One-Class Support Vector Machines, Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA Farid, Hany, Detecting Hidden Messages Using Higher Order Statistical Models Department of Computer Science, Dartmouth College, Hanover NH 03755
Spy Vs. Spy
by Antonio Prohias from MAD Magazine
Have a good Winter Break!