0% found this document useful (0 votes)
625 views3 pages

Steganography

Steganography has caught the eye of the privacy craving public. Corporations including IBM, Kodak, and NEC have identified it as a new market worth investing in. The amount of data that can be hidden is limited to the amount of insignificant bits in the file.

Uploaded by

nemerw
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
625 views3 pages

Steganography

Steganography has caught the eye of the privacy craving public. Corporations including IBM, Kodak, and NEC have identified it as a new market worth investing in. The amount of data that can be hidden is limited to the amount of insignificant bits in the file.

Uploaded by

nemerw
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Principles of Steganography

Max Weiss

Math 187: Introduction to Cryptography


Professor Kevin O’Bryant

1 Introduction to an end-user. For example, an insertion


algorithm may write data in the comment blocks
Although steganography has been a topic of of an HTML file. Several file types and
discussion since pre-1995, it is only as of the programs also establish an EOF marker to
new millennium that this information hiding signify the end of a file. Data written after this
technique has caught the eye of the privacy marker is nonexistent as far as meaningful
craving public. Established businesses have content is concerned. However, from a
adopted steganography for covert steganography standpoint the EOF can be used
communication; artists have done the same for to mark the beginning of hidden data. Utilizing
intellectual property protection from consumers an insertion technique changes file size
and advertising agencies. Several large according to the amount of data hidden and
corporations including IBM, Kodak, and NEC therefore can be used to determine the presence
have identified steganography as a new market of hidden information. Coming across a 5MB
worth investing in. HTML file would arouse suspicion, for instance.

2 Overview In a Substitution-based algorithm, the most


insignificant bits of information that determine
Steganography: literally “hidden writing.” the meaningful content of the original file are
Nowadays steganography is most often replaced with new data in a way that causes the
associated with embedding data in some form of least amount of distortion. Although file size
electronic media. The difference between does not change during execution of the
steganography and the more commonly used algorithm, the amount of data that can be hidden
cryptography is that while cryptography is limited to the amount of insignificant bits in
scrambles and obfuscates data that can then be the file. Higher “quality” files (where applicable)
accessed publicly (without consequence), tend to contain more bits of insignificant
steganography conceals the data altogether. Data information.
from a “covert,” or source file is hidden by
altering insignificant bits of information in an The injection and substitution algorithms both
“overt,” or host file. For example, an algorithm require a “covert” file that contains the
designed to embed an audio file might replace information to be hidden, and an “overt” file that
information describing frequencies inaudible to acts as the host. The generation technique
the human ear. requires only a covert file, as it is used to create
the overt file. For instance, the covert file can be
3 Modern Algorithms used to create a fractal image with unique colors,
angles, and line lengths. A main flaw of the
Modern steganography identifies two main insertion and substitution techniques is the
classification schemes for the sorting of ability to compare a given file with another
algorithms. The first distinguishes algorithms instance of the supposedly “same” file. If the file
based on file type. The second, more widely used size, MD5 checksum, or anything else is
scheme categorizes based on embedding method. different, it can be assumed that data has been
embedded in the file in question. Since the result
3.1 Injection (Insertion), Substitution, of a Generation algorithm is an “original” file,
and Generation Classifications the technique is immune to comparison tests.

Insertion-based techniques hide data in sections 3.2 Embedding Data in a JPEG Image
of a file that are ignored by the processing
application and do not modify those bits that Because the JPEG file format is compact and
determine the contents of a file that are relevant does not significantly degrade the quality of an
image it is in frequent use on the internet. The the ownership of an entity. Digital watermarking
JPEG format uses a discrete cosine transform is a means by which an image is marked such
(DCT) to identify 64 DCT coefficients in that the owner of a file can rightfully identify any
successive 8x8 pixel blocks. Of these quantized instance of that file to be his own. For example,
coefficients, the least significant bits are used to companies that sell photographs for use in
embed data. Because modifications to these bits websites or advertisements can embed
affect pixel frequency as opposed to spatial watermarks in sample pictures to identify
structure (as in GIF images where image whether or not a photograph in use has been paid
structure information is present at every bit for or not. There has also been significant recent
layer), no obvious distortion is present. research into “fingerprinting” (hidden serial
numbers or a set of characteristics that tend to
4 Shortcomings of Steganography distinguish an object from other similar objects).
In general, fingerprints can be used to detect
Because steganography has gained popularity copyright violators while watermarks can be
only in the past decade, there are many flaws and used to prosecute them.
vulnerabilities that still need to be addressed.
Consequently, new steganography technologies 5.1 Invisible Watermarks
are being released with increased frequency.
There are two forms of digital watermarks:
4.1 Revealing the Existence of Hidden visible and invisible. A visible watermark simply
Data overlays a copyright notice on the original image.
An invisible watermark is the manifestation of
Because steganography modifies an existing file steganography used to embed copyright
that is most likely in circulation on the internet, a information into the file itself without altering its
bitwise comparison of a given file with the visual representation. Steganography can be used
“same” file suspected of containing hidden to either embed text information into an image,
information can reveal use of steganography. or to alter a pattern of bits to form a uniformly
Additionally, two communicating parties can be distributed pattern in the image pixels
easily identified as communicating covertly if indistinguishable by the human eye.
files that normally would not be exchanged
suddenly are. For example, two business 5.2 Steganography with a Slightly
executives frequently exchanging photographs of Different Goal
cars over a period of time could arouse suspicion.
Watermarks do not conform entirely to the
4.2 Rendering Hidden Data Useless paradigms of steganography. While conventional
steganography is based on the idea of hiding as
Once a file is identified as possibly containing much data as possible, digital watermarks tend to
hidden data, one can either attempt to recover the be small. Conventional steganography also
information if the algorithm is known, or to emphasizes the secrecy of the data to be hidden
destroy the data without affecting the quality of and transmitted. Even if an invisible watermark
the original file. An altered bitmap converted to cannot be visually identified, the knowledge that
JPEG would compress the file and remove one exists is enough to discourage potential
unnecessary bits of information, therefore copyright violators.
removing any hidden data. Converting to any
other format may not necessarily cause the image 5.3 Defeating Digital Watermarking
to lose information, but would change the bit
composition of the data, making any hidden data As with other files embedded using
unreadable. steganography, images containing digital
watermarks can be made “clean” by simply
5 Practical Steganography: Digital converting the file to another file format, and
Watermarking back to the original format if desired. One
publicly available tool written by Fabien
Now that the majority of information takes on a Petitcolas (University of Cambridge, Microsoft
digital form, it has become increasingly Research) called StirMark was written to crack
necessary to provide a means by which such several watermarking schemes including
information can be easily identified to be under PictureMarc, SysCoP, JK_PGS, SureSign,
EIKONA-mark, Echo Hiding, and the NEC yi = n2i
method. StirMark can apply a uniformly
distributed jitter pattern on an image, which allows one to calculate the chi-square value
confuses most watermark detecting software. A v+1
more sophisticated attack performed by StirMark x2 = Σ ((yi - yi*)2 / yi*)
introduces a slight yet significant distortion in i=1

the image emulating the digital-to-analog process where v is degrees of freedom.


on printers, and then the analog-to-digital on
scanners. Another test performed by StirMark to 6.2 Dictionary Attacks
evaluate the strength of a watermarking system
calculates errors in the file relative to the original. In order to verify any assertions one can make
The PSNR test uses the formula PSNR (peak from a x2 test, it is necessary perform a
signal-to-noise ratio) = 20 log (255/RMS Error). dictionary attack on the suspected file (it is
necessary to perform the x2 test first, because
6 Steganalysis when scanning a large number of files for hidden
information the x2 test will perform exponentially
Checking for file sizes and suspicious situations faster than a dictionary attack). Because
may work in detecting the use of steganography, commercial software embeds data based on a
but do not provide any solid evidence. user-supplied password, a brute force attack can
Steganalysis compares the properties of an be used to prove that hidden information exists.
unaltered file to one that contains embedded The dictionary attack will cycle through a known
information. set of around 1,800,000 words, phrases, and PIN
numbers until the correct one is found.
6.1 Statistical Analysis of JPEG Images
References
When one introduces random uniformly
distributed noise to any kind of file, the entropy [1] Eric Cole. Hiding in Plain Sight. Wiley
of that file increases. Because embedded data is Publishing. Indianapolis, Indiana, 2003.
essentially uniform noise (as far as the image is
concerned, because encrypted data to be [2]
embedded will have a higher entropy than that of <http://www.digimarc.com/watermarki
English text), steganography leaves its ng/>. Digimarc Corporation. June 2004.
fingerprint as increased entropy.
[3] Niels Provos and Peter Honeyman.
When a JPEG image is modified as a result of Detecting Steganographic Content on the
steganography, certain colors will convert to Internet. ISOC NDSS'02, San Diego, CA,
another color according to the image color table. February 2002. [August 2001, CITI Techreport].
If a given color A occurs less frequently than B,
A will be converted to B more often than B to A. [4]
Therefore the difference in color frequencies will <http://www.petitcolas.net/fabien/stega
decrease and an analysis of color frequency nography/> Digital Watermarking and
would not yield much information. Instead, an Steganography. <June 2004>.
analysis of DCT coefficients should prove more
fruitful. A x2 test on the image should show [5] Fabien Petitcolas. Attacks on Copyright
distortion from embedded data. An image with Marking Systems. Vol. 1525 Lecture Notes in
hidden data should have a similar frequency for Computer Science <June 2004>.
adjacent DCT coefficients. Therefore, one can
use the formula

yi* = (n2i + n2i+1) / 2

to compute the expected distribution, where ni is


the frequency of DCT coefficient i. Comparing
this expected distribution to the observed
distribution

You might also like