0% found this document useful (0 votes)

6 views13 pages

Zfs Dedup

Jeff Bonwick's blog discusses the introduction of built-in deduplication in ZFS, which eliminates duplicate copies of data to save storage space. The blog explains various deduplication levels (file, block, byte), their trade-offs, and the synchronous nature of ZFS deduplication. It also highlights the scalability of ZFS deduplication, allowing it to handle large amounts of data efficiently while providing options for verification and performance optimization.

Uploaded by

hposhtak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views13 pages

Zfs Dedup

Uploaded by

hposhtak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.

com/bonwick/en_US/entry/zfs_dedup

Jeff Bonwick's Blog

Ar ch iv e s
« November 2009
All General Slab Allocator ZFS
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
« ZFS in MacOS X Snow... | Main
8 9 10 11 12 13 14
15 16 17 18 19 20 21
Monday Nov 02, 2009 22 23 24 25 26 27 28
29 30

ZFS Deduplication Today

All
You knew this day was coming: ZFS now has built-in deduplication.
/General
/Slab Allocator
If you already know what dedup is and why you want it, you can skip the next couple of sections. For
/ZFS
everyone else, let's start with a little background.
Comments

What is it?
L an gu age s:
Deduplication is the process of eliminating duplicate copies of data. Dedup is generally either file-level,
English
block-level, or byte-level. Chunks of data -- files, blocks, or byte ranges -- are checksummed using some
hash function that uniquely identifies data with very high probability. When using a secure hash like 简体中文
SHA256, the probability of a hash collision is about 2^-256 = 10^-77 or, in more familiar notation,
0.00000000000000000000000000000000000000000000000000000000000000000000000000001. Español
For reference, this is 50 orders of magnitude less likely than an undetected, uncorrected ECC memory
error on the most reliable hardware you can buy. Русский

Chunks of data are remembered in a table of some sort that maps the data's checksum to its storage 日本語
location and reference count. When you store another copy of existing data, instead of allocating new
Português Brasil
space on disk, the dedup code just increments the reference count on the existing data. When data is
highly replicated, which is typical of backup servers, virtual machine images, and source code repositories,
deduplication can reduce space consumption not just by percentages, but by multiples. Search

What to dedup: Files, blocks, or bytes?

Data can be deduplicated at the level of files, blocks, or bytes. L in ks

File-level assigns a hash signature to an entire file. File-level dedup has the lowest overhead when the blogs.sun.com
natural granularity of data duplication is whole files, but it also has significant limitations: any change to Weblog
Login
any block in the file requires recomputing the checksum of the whole file, which means that if even one
block changes, any space savings is lost because the two versions of the file are no longer identical. This is
fine when the expected workload is something like JPEG or MPEG files, but is completely ineffective when
managing things like virtual machine images, which are mostly identical but differ in a few blocks. Re fe r r e r s
Today's Page Hits: 6882
Block-level dedup has somewhat higher overhead than file-level dedup when whole files are duplicated,
but unlike file-level dedup, it handles block-level data such as virtual machine images extremely well.
Most of a VM image is duplicated data -- namely, a copy of the guest operating system -- but some blocks
are unique to each VM. With block-level dedup, only the blocks that are unique to each VM consume
additional storage space. All other blocks are shared.

Byte-level dedup is in principle the most general, but it is also the most costly because the dedup code
must compute 'anchor points' to determine where the regions of duplicated vs. unique data begin and
end. Nevertheless, this approach is ideal for certain mail servers, in which an attachment may appear
many times but not necessary be block-aligned in each user's inbox. This type of deduplication is
generally best left to the application (e.g. Exchange server), because the application understands the data
it's managing and can easily eliminate duplicates internally rather than relying on the storage system to
find them after the fact.

ZFS provides block-level deduplication because this is the finest granularity that makes sense for a
general-purpose storage system. Block-level dedup also maps naturally to ZFS's 256-bit block checksums,
which provide unique block signatures for all blocks in a storage pool as long as the checksum function is
cryptographically strong (e.g. SHA256).

1 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

When to dedup: now or later?

In addition to the file/block/byte-level distinction described above, deduplication can be either

synchronous (aka real-time or in-line) or asynchronous (aka batch or off-line). In synchronous dedup,
duplicates are eliminated as they appear. In asynchronous dedup, duplicates are stored on disk and
eliminated later (e.g. at night). Asynchronous dedup is typically employed on storage systems that have
limited CPU power and/or limited multithreading to minimize the impact on daytime performance. Given
sufficient computing power, synchronous dedup is preferable because it never wastes space and never
does needless disk writes of already-existing data.

ZFS deduplication is synchronous. ZFS assumes a highly multithreaded operating system (Solaris) and a
hardware environment in which CPU cycles (GHz times cores times sockets) are proliferating much faster
than I/O. This has been the general trend for the last twenty years, and the underlying physics suggests
that it will continue.

How do I use it?

Ah, finally, the part you've really been waiting for.

If you have a storage pool named 'tank' and you want to use dedup, just type this:

zfs set dedup=on tank

That's it.

Like all zfs properties, the 'dedup' property follows the usual rules for ZFS dataset property inheritance.
Thus, even though deduplication has pool-wide scope, you can opt in or opt out on a per-dataset basis.

What are the tradeoffs?

It all depends on your data.

If your data doesn't contain any duplicates, enabling dedup will add overhead (a more CPU-intensive
checksum and on-disk dedup table entries) without providing any benefit. If your data does contain
duplicates, enabling dedup will both save space and increase performance. The space savings are obvious;
the performance improvement is due to the elimination of disk writes when storing duplicate data, plus
the reduced memory footprint due to many applications sharing the same pages of memory.

Most storage environments contain a mix of data that is mostly unique and data that is mostly replicated.
ZFS deduplication is per-dataset, which means you can selectively enable dedup only where it is likely to
help. For example, suppose you have a storage pool containing home directories, virtual machine images,
and source code repositories. You might choose to enable dedup follows:

zfs set dedup=off tank/home

zfs set dedup=on tank/vm

zfs set dedup=on tank/src

Trust or verify?

If you accept the mathematical claim that a secure hash like SHA256 has only a 2^-256 probability of
producing the same output given two different inputs, then it is reasonable to assume that when two
blocks have the same checksum, they are in fact the same block. You can trust the hash. An enormous
amount of the world's commerce operates on this assumption, including your daily credit card
transactions. However, if this makes you uneasy, that's OK: ZFS provies a 'verify' option that performs a
full comparison of every incoming block with any alleged duplicate to ensure that they really are the
same, and ZFS resolves the conflict if not. To enable this variant of dedup, just specify 'verify' instead of
'on':

zfs set dedup=verify tank

Selecting a checksum

Given the ability to detect hash collisions as described above, it is possible to use much weaker (but
faster) hash functions in combination with the 'verify' option to provide faster dedup. ZFS offers this
option for the fletcher4 checksum, which is quite fast:

zfs set dedup=fletcher4,verify tank

The tradeoff is that unlike SHA256, fletcher4 is not a pseudo-random hash function, and therefore

2 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

cannot be trusted not to collide. It is therefore only suitable for dedup when combined with the 'verify'
option, which detects and resolves hash collisions. On systems with a very high data ingest rate of largely
duplicate data, this may provide better overall performance than a secure hash without collision
verification.

Unfortunately, because there are so many variables that affect performance, I cannot offer any absolute
guidance on which is better. However, if you are willing to make the investment to experiment with
different checksum/verify options on your data, the payoff may be substantial. Otherwise, just stick with
the default provided by setting dedup=on; it's cryptograhically strong and it's still pretty fast.

Scalability and performance

Most dedup solutions only work on a limited amount of data -- a handful of terabytes -- because they
require their dedup tables to be resident in memory.

ZFS places no restrictions on your ability to dedup. You can dedup a petabyte if you're so inclined. The
performace of ZFS dedup will follow the obvious trajectory: it will be fastest when the DDTs (dedup tables)
fit in memory, a little slower when they spill over into the L2ARC, and much slower when they have to be
read from disk. The topic of dedup performance could easily fill many blog entries -- and it will over time --
but the point I want to emphasize here is that there are no limits in ZFS dedup. ZFS dedup scales to any
capacity on any platform, even a laptop; it just goes faster as you give it more hardware.

Acknowledgements

Bill Moore and I developed the first dedup prototype in two very intense days in December 2008. Mark
Maybee and Matt Ahrens helped us navigate the interactions of this mostly-SPA code change with the
ARC and DMU. Our initial prototype was quite primitive: it didn't support gang blocks, ditto blocks, out-of-
space, and various other real-world conditions. However, it confirmed that the basic approach we'd been
planning for several years was sound: namely, to use the 256-bit block checksums in ZFS as hash
signatures for dedup.

Over the next several months Bill and I tag-teamed the work so that at least one of us could make
forward progress while the other dealt with some random interrupt of the day.

As we approached the end game, Matt Ahrens and Adam Leventhal developed several optimizations for
the ZAP to minimize DDT space consumption both on disk and in memory, key factors in dedup
performance. George Wilson stepped in to help with, well, just about everything, as he always does.

For final code review George and I flew to Colorado where many folks generously lent their time and
expertise: Mark Maybee, Neil Perrin, Lori Alt, Eric Taylor, and Tim Haley.

Our test team, led by Robin Guo, pounded on the code and made a couple of great finds -- which were
actually latent bugs exposed by some new, tighter ASSERTs in the dedup code.

My family (Cathy, Andrew, David, and Galen) demonstrated enormous patience as the project became
all-consuming for the last few months. On more than one occasion one of the kids has asked whether we
can do something and then immediately followed their own question with, "Let me guess: after dedup is
done."

Well, kids, dedup is done. We're going to have some fun now.

Posted at 05:14AM Nov 02, 2009 by bonwick in General | Comments[77]

Comments:

Can't wait to test it...

Posted by Thommy M. Malmström on November 02, 2009 at 06:36 AM PST #

Really great news. While other filesystems try to get at the point where ZFS was yesterday, ZFS moves ahead.

Posted by c0t0d0s0 on November 02, 2009 at 06:50 AM PST #

"When using a secure hash like SHA256, the probability of a hash collision is about 2^-256"

Jeff, what about this?

http://en.wikipedia.org/wiki/Birthday_problem

Posted by max on November 02, 2009 at 06:50 AM PST #

Will dedup speed up copying or moving files from one dataset to another? If yes, will it result in read-only
activity on the disks when moving the file or, even better, in only increasing the reference count of the blocks?

3 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

Posted by Cristian Yxen on November 02, 2009 at 06:59 AM PST #

Well, now you just need to dedup your time schedule, and you'll have a lot of spare time blocks laying
around that you can use for your kids!

Posted by Gregg Wonderly on November 02, 2009 at 07:07 AM PST #

Birthday collisions, having only 365 values, fully utilize less than 15 bits. 256 bit offers substantially more
values. Feel free to recalculate the birthday formulas with 2^256 instead of 365 and post the result if you
like.

Posted by Bahamat on November 02, 2009 at 07:31 AM PST #

" the probability of a hash collision is about 2^-256 = 10^-77 "

Consider 2^(-256/2) to simplifie.

Posted by bastien on November 02, 2009 at 07:47 AM PST #

Does Dedup also apply to data in the L2ARC? If so... wow o_0

Posted by Ross on November 02, 2009 at 07:48 AM PST #

You mentioned, that probably everyone wants to have the DDTs in memory. So a little formula like
zfs_size/blocksize * N ~ DDT size would be helpful as well...

Posted by 79.194.19.161 on November 02, 2009 at 07:51 AM PST #

Bahamat: did you read the link? It includes probabilities for 256 bit hashes. With a 256 bit hash, there's a
1-in-a-billion chance of a collision with 1.5 × 10^34 blocks. How many blocks might there be in a ZFS
array?

Posted by Michael S. on November 02, 2009 at 08:35 AM PST #

Excellent news; for starters this should give us the space benefits of a sparse zone with the flexibility of a full
root one.

Posted by Dick Davies on November 02, 2009 at 08:40 AM PST #

@max, @bastien -

There is a discussion of the overall chance of hash collision when factoring in the total number of blocks in
the ARC thread for a related, but orthogonal case:

http://arc.opensolaris.org/caselog/PSARC/2009/557/mail

Which refers to the following table:

http://en.wikipedia.org/wiki/Birthday_paradox#Probability_table

To have a collision probability of 10^-18 (already more reliable than almost anything else in the system), this
would require approximately 2^98 unique blocks (2^115 bytes @128k) to be written, well beyond the limits
for any forseeable storage platform.

Posted by Eric Schrock on November 02, 2009 at 08:57 AM PST #

The problem with using a hash function is that attackers control a lot of data on your filesystem on a modern
OS. Consider the web cache on a machine where a user browses the web. This allows an attacker a platform
to intentionally try to cause collisions that can cause the filesystem to malfunction. These sort of attacks are
still hard, but there has been a lot of progress in attack several popular hash functions lately. A solution to
prevent this is to use a keyed hash function and keep the key secret.

Posted by newsham on November 02, 2009 at 09:47 AM PST #

Do you have any advice or observations on how dedup interacts with ZFS compression?

Posted by Alan Burlison on November 02, 2009 at 09:52 AM PST #

If the implementation of the SHA256 ( or possibly SHA512 at some point ) algorithm is well threaded then
one would be able to leverage those massively multi-core Niagara T2 servers. The SHA256 hash is based on
six 32-bit functions whereas SHA512 is based on six 64-bit functions. The CMT Niagara T2 can easily
process those 64-bit hash functions and the multi-core CMT trend is well established. So long as context
switch times are very low one would think that IO with a SHA512 based de-dupe implementation would be
possible and even realistic. That would solve the hash collision concern I would think.

Posted by Dennis Clarke on November 02, 2009 at 09:55 AM PST #

This is great news.

4 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

How long until we can actually use this feature in the development builds? Canʼt wait to try it out!

Posted by Brian White on November 02, 2009 at 10:23 AM PST #

"Excellent news; for starters this should give us the space benefits of a sparse zone with the flexibility of a full
root one."

Unless the deduplication also spills over into the memory management, it doesn't: Running two deduplicated
full root zones still requires twice the RAM than running only one of them, while running two sparse zones
only require twice the read/write pages, with the read-only pages being shared.

So the questions are: How often does Solaris load a couple of identical pages to memory, when they're
deduplicated on disk? Are there plans to get that to "once", if it isn't already?

Real-life scenario: We run 60 sparse zones vs. the 16 full zones that the system could manage before we're
out of RAM (4GB installed).
And the difference only grows with more RAM.

Posted by Patrick Georgi on November 02, 2009 at 10:39 AM PST #

Great news!! wtg Jeff & Co!

Does the deduping apply to the ARC/L2ARC as well (ie, only pointers to duplicate blocks reside, rather than
the whole block)?

And I assume it works well with compression?

Can't wait to play. :)

Posted by Don MacAskill on November 02, 2009 at 11:30 AM PST #

Any ideas when this will show up in releases of OpenSolaris or Solaris 10?

Posted by Daniel on November 02, 2009 at 11:35 AM PST #

One question I thought of while reading is do different pools recognize the data in the other pool for
deduplication. For example, if I have pools A and B (deduplication is enabled in each) and I have an identical
block X in both pools, will it be deduplicated? There are pro's and con's to deduplicating across pools and
not deduplicating across pools. Which did you choose and will there be options in the future to do the
opposite?

Posted by Mark Weber on November 02, 2009 at 11:39 AM PST #

@Don - The ARC work (so that we deduplicate in-core) is forthcoming. Mark Maybee is making good
progress on that.

And yes, it works perfectly well with compression - just as you would imagine. :)

Posted by Bill Moore on November 02, 2009 at 11:44 AM PST #

Congrats Jeff,

Another great step for OpenSolaris and OpenStorage.

I'm looking forward to snv_128 on IPS. Is there any way you can hurry the binary images over to IPS?

Posted by Mike La Spina on November 02, 2009 at 11:48 AM PST #

My Apologies,

Congrats Jeff, Bill and all the great team members.

Posted by Mike La Spina on November 02, 2009 at 11:53 AM PST #

newsham: as far as the world's open cryptographic community knows, it is impossible for anyone to generate
collisions in SHA-256 even if they are deliberately trying to do so.

Dennis: Hm, can't ZFS use the hardware implementation of SHA-256 in the T2? Anyway, "the hash collision
concern" is already solved by SHA-256 -- see Eric Shrock's comment.

all: it seems like there are some funny interactions between dedupe and crypto: http://mail.opensolaris.org
/pipermail/zfs-crypto-discuss/2009-November/002947.html . I'm glad to learn from this blog post that
Nicolas Williams was wrong to say that dedupe will always require full block comparison.

Posted by Zooko on November 02, 2009 at 11:54 AM PST #

So, is this going to be included in the next update of solaris or are we going to have to download some kind
of a kernel patch soon?

5 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

Thanks.

Posted by Deniz on November 02, 2009 at 11:55 AM PST #

This is something I always wished. Some people rant about differential copies (copy on write of files) or
sparse files, but the problem of these schemes is that they tend to not survive the file being copied on a
different disk, or at any rate they work only if the filesystem has knowledge of the original creation of the
redundancy. It is not the case here, and so is much more general. Imagine, one can now write a file of one
petabyte of zeroes - without the application telling the filesystem (other than by writing said zeroes).

Fun question: would it be possible for a malicious user to try and write blocks that he suspects are the same
as blocks from data he's not supposed to be able to read, and figure out if a deduplication occurred by
timing the process? Would be an interesting attack. (note: has nothing to with finding a collision with whatever
hash function is used, though these attacks are interesting as well; as far as I am aware, so far ZFS was only
dependent on the hash function resistance to pre-image attacks, if of course ZFS was supposed to guarantee
cryptographic integrity; but now the hash function must also resist collision attacks or fun things might
occur...)

One drawback of the current support however: given a huge array such that the dedup table has to spill to
disk, which is used in bursts for, say, backup, and since browsing the table (and adding entries to it) has no
locality whatsoever (otherwise, it means there is a problem with the hash function, if I'm not mistaken), it will
have to hit the disk with the same proportion as the proportion of data not in memory over the total size of
the table (or in other words: you cannot efficiently cache that table in memory), then an offline option would
have been useful for that use case.

Pierre (still in bargaining stage Mac user)

Posted by Pierre Lebeaupin on November 02, 2009 at 11:56 AM PST #

Congrats to everyone involved!

I'm curious about the claim of increased performance across the board -- doesn't read performance suffer
from the transformation from what was once a continuous read into many short reads in different areas of
the disk? Or perhaps you have a strategy to deal with that, or don't find this fragmentation to be a problem
in practice?

Thanks.

Posted by Chris on November 02, 2009 at 11:59 AM PST #

Well done Jeff. You seem to be involved with all the best work ;)

Re:"One question I thought of while reading is do different pools recognize the data in the other pool for
deduplication. "

I doubt that dedup has the ability to be done across pools. Reason a is that it's a ZFS flag to turn it on rather
than a zpool one. Reason b is that you can't be sure that the second pool will not be removed or even
exported. I prefer it this way.

Posted by dasmo on November 02, 2009 at 12:09 PM PST #

Congrats on this achievement, but I also have a question.

Birthday paradox arguments aside, cryptographic hashes have a long history of being broken. If this ever
happens to SHA-256, the Mother of All Remote privilege escalations will immediately apply. E.g.,a collision
with a block of the Windows kernel would let *a web site* run privileged code on however-many-hundreds of
VMs hang off the corrupted physical block (assuming web caches get written to virtual disk).

The verify flag provides an out here, but it is not the default, and most users will take that hint. Unfortunately,
there is no way to know whether you should have used verify until it is too late; if SHA-256 is ever broken,
then a non-verify pool is defective.

I realize you all are smart cookies, and have thought much more deeply about this than I have just reading
this blog post. So what's the punchline? Are we that much more confident about SHA-256 than its
predecessors? Is the performance hit to verify enough to make the system useless for important domains?
Some combination of the two?

Posted by Keith Adams on November 02, 2009 at 12:41 PM PST #

@Keith Adams:

It's not _that_ bad. For your exploit to work, you'd need to have the faulty file around _first_.

Deduplication doesn't overwrite existing files with a duplicate, but avoids writing duplicates in the first place.
(at least as far as I understand it)

6 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

So, the attack vector would require you to know beforehand of a new component that's normally ran with
elevanted rights. Then push a file with the same hash to the system, and then wait for deduplication to kick
in.
That's a whole lot harder than "hey, I can write a file with the same content as the kernel".

It's possible to scan the hash space: Write all kinds of files, hash with a different hash in RAM and on disk, if
they differ, you got a collision with another file.
And then hope for it to be something truly secret ;-)

But given the size of the hash space, that's not a productive use of your time, and it would be too easy to
figure out that there's something fishy going on (who's creating and deleting billions of files all the time?)

Posted by Patrick Georgi on November 02, 2009 at 12:49 PM PST #

Oh wow, that dude actually makes sense!

RT
www.complete-privacy.at.tc

Posted by John Woods on November 02, 2009 at 01:17 PM PST #

@zooko: indeed, ZFS dedup does have the option of not verifying block contents when hashes match -- I
spoke too quickly. A minor error, I think.

@{Zooko, various others}: The point is that if you don't want to trust the hash function, well, you don't have to.

@Zooko: Back to the ZFS crypto issue that Darren was asking for advice on: by MACing every block in
addition to hashing it we don't depend on the hash function's collision resistance for security, though, clearly,
for dedup you'll want to enable block contents verification if you do fear attackers that can create hash
collisions. IMO, not depending on a hash function's collision resistance, is a good thing.

Posted by Nico on November 02, 2009 at 01:18 PM PST #

Congratulations on rolling this out of the door. Quite an achievement.

A quick&simple suggestion wrt "Trust or verify?" and, specifically, using fast fletcher hash with subsequent
positve(s) verification by byte-comparison. There is third option: always look-up by fletcher hash (assuming
blocks are indexed by both this AND sha-256), and use sha to verify positives.

This has slight advantage over verification by byte-comparison, especially with multiple positives, and huge -
fletcher over sha - win in negative cases. The latter yields a nice property: the less duplicated is the data,
the lesser is dedupe's overhead.

Posted by Andrey Kuzmin on November 02, 2009 at 01:34 PM PST #

Where are we going to see it first, Solaris, OpenSolaris or S7000?

Posted by Jacob on November 02, 2009 at 02:22 PM PST #

That's nice an all... but for my purposes, it would be far more of a "win", to allow for cross-zfs(but same
pool) hardlinks. And/or mv's.

Not to mention some kind of zfs-aware rsync. For fast, efficient remote replication (or restorals, for that
matter!), that does not require keeping a matching "full-filesystem snapshot" around).
Maybe with this dup-detection stuff, you will be closer to having that happen now?

Posted by Philip on November 02, 2009 at 02:45 PM PST #

Fabulous info! Can't wait to try this out on my VM.

Posted by AppGirl on November 02, 2009 at 03:10 PM PST #

What a lot of others asked: Where and when are we, the general public, going to be able to see it and test it
ourselves? Your blog speaks as if it's readily available, yet I don't see it in Update 8 of Sol10, nor in recent
snapshots of OpenSolaris. Am I missing something?

Thanks

Posted by Wrex on November 02, 2009 at 03:53 PM PST #

I stand corrected:
http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html

So when can we expect it in Solaris 10? :-)

Posted by Wrex on November 02, 2009 at 04:10 PM PST #

7 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

The probability of a collision for an equiprobable hash function is ~ 2^(-n/2), where "n" is the size of the
output hash in bits. The probability is _not_ 2^-n (that's the probability of a getting a single output, not the
probability of two input documents producing the same output which is exactly a collision).

For more information about collisions for cryptographical functions please read about the Birthday attack. For
example: http://en.wikipedia.org/wiki/Birthday_attack.

Posted by Felipe Alfaro Solana on November 02, 2009 at 04:16 PM PST #

Congrats, really nice feature!

Did you consider side-channel attacks? Let us assume that an attacker that is a user on a machine knows
that somewhere on disk there is a block that contains just a username and a password, say "john:abc"

Now he could write his own combinations "john:aaa", "john:aab", .. ,"john:zzz" each to a unique block in a file
on disk and notice that the timing of writing "john:abc" is different to all the others, because the block
"john:abc" is deduped?

Posted by Klaus Borchers on November 02, 2009 at 04:40 PM PST #

@Wrex: You should see it in OpenSolaris dev builds (i.e. http://pkg.opensolaris.org/dev) in roughly a month.
The current "build" (build 128) closes for code changes on 9 November, then it gets some QA, and then is
published. We just pushed build 126 publicly last week.

Posted by Dan Price on November 02, 2009 at 04:57 PM PST #

More thoughts: How about using dedup processing, to then enable synthetic snapshot creation across
systems?

Example: two separate solaris systems, both running zfs filesystems that have "mostly similar" content, and
need to be kept near-synced in future.
Rather than having to completely blow away and rebuild one, to then have a shared full snapshot for
incremental zsends... how about some kind of tool that would create one?
Reasons this would be worthwhile, could be large datasets, and bandwidth-constrained interconnects.

and/or: user error. They were previously kept in sync with zsync, but some admin accidentally deletes the
"wrong" snapshot. or all snapshots. That admin is going to have some very very unhappy users, unless
there's a nice neat way to regen the common snapshot without long downtimes for rebuild.

Posted by Philip on November 02, 2009 at 05:16 PM PST #

For all the people that will climb down the hole of hash function probability it would be interesting to contrast
that with just bit rot on modern drives.

Posted by Laird on November 02, 2009 at 05:19 PM PST #

@Klaus:

Timing it may be unreliable, especially on a busy system but someone could just dtrace it to figure it out. It
might make sense to have a nodedup option applied at file level so developers have a chance to adress such
concerns for sensitive files.

Posted by John on November 02, 2009 at 05:34 PM PST #

We have no idea on the suitability of our data for dedup. Is there any method available that can scan the
data and report on the suitability of switching dedup on?

Posted by pvw on November 02, 2009 at 05:43 PM PST #

Super news!! :D

Posted by Dave on November 02, 2009 at 05:49 PM PST #

@Felipe Alfaro Solana: The "probability of a collision" is NOT about 2^(n/2). That is the approximate number
of items one needs to hash with an n-bit hash function for the probability of a collision to be about 50%.

The critical issue, which a few people have touched upon, is that the probability of "a collision" depends on
how much data is being hashed. If we have a zetabyte of data and a 128K block size we can end up
needing to hash up to 2^53 (about 10^16) blocks. The probability of getting any collisions on a 256 bit hash
function with this many tries is about 1 in 2^151 (less than 1 in 10^46). This is dozens of orders of
magnitude safer than the underlying disc drives.

Posted by Nicko on November 02, 2009 at 06:11 PM PST #

Re side channel attacks - seems to me they are plausible both by measuring system response time and by
measuring available space on the volume(s) after writing the suspect block(s). Even on busy system, it will be
possible to identify trends. (e.g., if we write block "X" 1000 times, it is always much faster than if we write

8 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

block "Y" 1000 times; same for space usage.) These attacks could be mitigated by making the actual
response time equal to the expected worst-case scenario, and by limiting "space available" responses to
"available under quota" versus "absolute space available", but both approaches would create other side
effects such as slower response time(s) and forcing the use of quotas.

It could also be used to identify media files (music, video, whatever) if there's a typical/canonical encoding
that's likely to be used.

Might be useful for identifying executable/library/data files installed on a machine (perhaps at the version
level to identify vulnerabilities) if the attacker can work with a known example; or for iteratively determining
the contents of a sensitive file such as a *nix password file (discussed above by Klaus Borchers) or a file
containing a database access password, passphrase, .htaccess file (or password file pointed to by .htaccess
file), etc.

I am a fan of the deduplication idea, but I think it has significant security implications that may be tough to
identify early.

Posted by Greg Broiles on November 02, 2009 at 06:16 PM PST #

Tanks... I am waiting for this since zfs was made. :)

Posted by vince on November 02, 2009 at 07:51 PM PST #

Jeff,

Congratulations to you and your team! You have reenergized my enthusiasm for Solaris and you have given
my business a strong reason go to w/ Sun Solaris in lieu of the competition.

God bless you. Take time for your family.

Joe

Posted by Joseph Kotran on November 02, 2009 at 09:04 PM PST #

I have a large ZFS pool (on a Storage 7000 system) containing virtual machine images. I'm sure the dedup
stuff will be very useful for this kind of data. I'm wondering, though, will there be some way of forcing existing
data through the algorithm, or will only data that was written after dedup was turned on benefit?

Posted by TA on November 03, 2009 at 12:14 AM PST #

Congratulations for a job well done.

After reading all the comments so far, would you consider the following suggestion to allow the user to select
check summing method SHA-256, SHA-512, fletcher4 or some future method.
Enabling the check summing function extensions as a dynamically loadable modules would allow in place
dedup improvements

Posted by sophie on November 03, 2009 at 12:58 AM PST #

Excellent work. I have two quick questions:

1. What happens when the DDTs cannot fit in memory? How many extra I/Os will be needed in this case for
each application I/O?

2. Given the common case that two blocks are very similar but not identical, how does ZFS handle this?

Posted by Thanos Makatos on November 03, 2009 at 01:18 AM PST #

will there be an asynchronous version in the future??

thanks for the info

Posted by abc on November 03, 2009 at 01:54 AM PST #

Scary is all I can say.

So what happens when you lose a block on the disk that happened to be the reference block for 12 others?

First thing I'd want to know before letting this option anywhere near my production filesystems is how much
redundancy is there and how easy is it to configure ?

Posted by sophed on November 03, 2009 at 01:57 AM PST #

Does this open a backup poisening attack ? If i write the same blocks in sequence in one big or several small
files, with a sequence just long enough to fool the compression system of the backup, would this enable an
attacker to exhaust backup space ?
If the backup space is another deduped ZFS-System, would this enable an attacker to exhaust the
communications capability ? If there is a quota on the accounts, this is not a real problem, but with unlimited
accounts, it may be.

9 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

Off course, most of the time it is not an attacker, but an idiot running untested software, without having a
look at it, for extended periods of time.

Can ZFS trigger a warning, if a block happens to be referenced for more than setable number of times ?

Posted by Knut Grunwald on November 03, 2009 at 02:47 AM PST #

Sophed,
There is a mechanism for that. If a block is referenced by 1000 other blocks you have a severe problem if
that block corrupts. Therefore you will be able to specify how many references is allowed to a block. Saying
something like, "a block can not have more than 10 references" or something similar.

But, ZFS is very secure and for the redudancy you use raidz2 or raidz3, of course. With raidz2 (raid-6) you
will get lots of redundancy.

Jeff,
Good work! BTW, jeff, did you know that Solaris is the best OS out there? :o) Truly.

Posted by Kebabbert on November 03, 2009 at 05:27 AM PST #

Dreaming about a BitTorrent client that uses dedup to find chunks on disk before trying to download them.

Posted by 213.208.249.134 on November 03, 2009 at 05:47 AM PST #

Couple of perhaps stupid questions (if so aplogies):

Does the p value for collisions hold true for blocks that only differ in a single bit?

and

Do you/have you empirically tested your implementation in some way to verify collision frequency...?

Posted by 131.111.45.20 on November 03, 2009 at 05:59 AM PST #

When will we see this in Solaris? (not OpenSolaris)

In Solaris 10 next or Solaris +++ (11?)?

--
David Strom

Posted by David Strom on November 03, 2009 at 06:20 AM PST #

Awesome work! Congratulations! Can't wait to try this out. By the way, does this mean that data deduplication
software are going to be pushed aside?

Posted by Lalith Suresh on November 03, 2009 at 08:10 AM PST #

"You should see it in OpenSolaris dev builds (i.e. http://pkg.opensolaris.org/dev) in roughly a month. The
current "build" (build 128) closes for code changes on 9 November, then it gets some QA, and then is
published. We just pushed build 126 publicly last week."

That's great, but OpenSolaris is so GNU/Linux bastardized and changing so fast, that's both illogical and
impractical to use it in production.

Let's try the question another way:

will the ZFS deduplication feature make it into Solaris 10 u9?

Posted by UX-admin on November 03, 2009 at 09:44 AM PST #

So I have to ask... Why use an expensive hashing algorithm at all? Why not use a cheaper hash, but use trust
+ verify before actually performing a deduplication? This will reduce processor load and eliminate the use of
collisions in the event two pieces of data hash the same but are in fact different.

On my system, MD5 costs 1/3 of what SHA256 costs in CPU time, and while it may be more likely to cause
collisions, ZFS should always do a byte-to-byte comparison on any block before deduping one to make sure
they are, in fact, identical.

Posted by Mike on November 03, 2009 at 11:20 AM PST #

@Mike: That's what dedup=fletcher4,verify is for...

Posted by Max on November 03, 2009 at 11:32 AM PST #

For some real world hash collision examples you may want to try using backuppc on your data. It includes
deduplication, but runs at a file level.

10 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

I was rather surprised to find that in a small system (a few TB), I'm running into quite a few files that have the
same hash but different contents.

For example, on one of my backup servers:

# Pool is 2698.23GB comprising 1232264 files and 4369 directories (as of 11/3 10:16),
# Pool hashing gives 2651 repeated files with longest chain 51,

So, I have 3TB of data, and one of the checksums has 51 collisions that were detected (51 different files that
had the same hash but different contents).

At first I was surprised that there were any collisions, but then I remembered the birthday problem...

In any case, it seems like "verify" is the most conservative setting, and I'm surprised that the file-system that
basically promises to never corrupt your data defaults to a setting in which this could happen.

Sean

Posted by Sean Reifschneider on November 03, 2009 at 11:33 AM PST #

By default, fletcher4 is used when creating new zfs file-systems. Many users like myself have already filled
up a bunch of zfs file-systems using fletcher4. What happens if a user tries to enable dedup but doesn't use
verify for an existing fletcher4 filesystem? Is there a warning/error?

Are there any plans for asynchronous deduplication at some point?

For the paranoid and performance conscious, I could see wanting to do dedup with verification in a batch
job, run at least nightly.

Posted by Garen on November 03, 2009 at 12:06 PM PST #

@Sean Reifschneider
What hash function is being used by software you mentioned?

Posted by Bartek Pelc on November 03, 2009 at 12:08 PM PST #

@Greg: Come to think of it, the easiest exploit would probably be to use the "readahead" features of the
system, where not single blocks (say 4K) are read from disk, but larger sectors (say 64K).

Reading sixteen 4k-blocks on an even sector boundary, the first block, if not in cache, will take at least
several msecs(disk access time), while the 15 subsequent blocks will be available in usecs.

In order to find out if a block with a certain content exists anywhere on disk, from within, say, a VM, you just
write a sector with 15 blocks of random garbage onto the disk, but one block somewhere in the middle, e.g.
#9, contains the contents you want to check. After a few minutes to hours, when the data you have written
has been evicted from cache in ram, you read back the first block, which takes a long time, and then the
others, which follow almost immediately - except for #9, which was deduplicated, and must be fetched from
somewhere else on disk.

In practice, I guess, with all the optimisations and layers in the system, it may be far more complicated, but
securing the system against this kind of attack will be even more difficult.

Posted by Klaus Borchers on November 03, 2009 at 12:57 PM PST #

If this is turned on for an existing ZFS file system are pre-existing segments able to be deduplicated? If ZFS
only supports synchronous dedupe does the data have to be pulled off and repopulated?

Posted by Jason Herring on November 03, 2009 at 03:33 PM PST #

As we rely heavily on ZFS for more than 2 Billion of Files in different Filesystems my question is: is dedup
working across filesystems?

and is there a way to get the non dedup ratio compared to the dedup ratio on a zfs filesystem?

could we get the "not deduped" size of a filesystem? with df or do we receive the deduped size?

just some management questions

hope to test your good work soon!

Michael

Posted by Michael Widmann on November 03, 2009 at 10:53 PM PST #

First of all, thank you all for this feature, and I hope you and your families will get some quality time together

11 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

before hacking on all the questions and requests in these comments :)

And I also thank the commenters for raising some interesting questions and ideas.

Concerning the matter of deduplicating data that's already on our disks, in our existing systems, I'm afraid
we'd have to suffice for a while with a trick I use to compress previously uncompressed files (say, local
zones). We shut down the zone, move its files to a subdirectory, and use Sun cpio (keeping ZFSacl's) to copy
the files back to their expected location. Upon write, they are compressed (and nowadays they'd also be
deduped). This can be tweaked to doing per-file copies/removals to minimize the free-space pressure
when remaking existing systems.

Needless to say, some supported utility which allows to (un-)compress and (un-)dedup existing files in-place
(like setting the Compressed flag on NTFS objects) is very welcome and long-awaited :) The already de-facto
working capability of using different compression algorithms within the same ZFS dataset is also a bonus
versus NTFS, and I'd love to see that in said utility. (In example of our local zones, the fresh install of binary
files can be done with gzip-9, then new files like data and logs are written with a faster lzjb.)

Another question arises: what if we have same files (blocks) compressed with different algorithms? On-disk
blocks apparently contain different (compressed) data and have different checksums for the same original
data, and different amount of on-disk blocks for the same original files.

These would probably not dedup ultimately to one block, but to at least as many as there are different
compression algorithms?

And for compressed blocks inside different original files (including VM images) the block-alignment would
make it even less probable that we have dedupable whole blocks? Even a one-byte offset would not let us
save space from otherwise same original data?

In short - does it mean we would (probably) save more space by not compressing certain filesystems (i.e. VM
image containers) but rather only deduping them?

PS: I was surprised to see no follow-up to this comment:

> Well, now you just need to dedup your time schedule, and you'll have a lot of spare time blocks laying
around that you can use for your kids!

Apparently, this strikes the family time too. Instead of going with kids to a zoo 10 times, "Jeff" only goes once
and tells the family that they should remember it as 10 different trips ;)

Posted by Jim Klimov on November 04, 2009 at 05:15 AM PST #

Can ZFS use sha256 for self-healing instead deduplicaion?

Posted by QuAzI on November 04, 2009 at 05:41 AM PST #

QuAzl - zfs set checksum=sha256 dataset

see man zfs for more details

It's been there for ever.

Posted by Robert Milkowski on November 04, 2009 at 06:50 AM PST #

I know about checksum and how it's work for RAID-Z. But if ZFS can use sha256 for duplicates searching
maybe they can use it for self-healing instead deduplicating?

Posted by QuAzI on November 04, 2009 at 07:46 AM PST #

@QuAzI
Sha256 in ZFS IS used for self healing if you set it as checksum algorithm.. It is used, instead of fletcher, to
check integrity of every block in datasets, not only in RAIDZ. It was like that since creation of ZFS, and now
this same checksum field can be used for two purposes, integrity and deduplication.

Posted by Bartek Pelc on November 04, 2009 at 08:52 AM PST #

Thanks. I don't found that in overviews and documents, just examples of self-healing of mirrors found. It's
good.

Posted by QuAzI on November 04, 2009 at 09:21 AM PST #

Out of curiosity, how does dedup interact with userquota/groupquota? Will the full size of the deduped block
count against the quota? I'm guessing that's the case as it's more efficient, though really the user/group isn't
using all that space.

Dedup sounds great. I'm really looking forward to that and comstar making their way into the 7000 series.
Thanks for your work.

12 of 13 11/06/2009 02:21 AM
ZFS Deduplication : Jeff Bonwick's Blog http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

Posted by francisco on November 04, 2009 at 10:43 PM PST #

URL: Notify me by email of new

comments

Your Comment:

Remember
Information?

HTML Syntax: NOT allowed

Please answer this simple math question

1 + 66 =

Preview Post

13 of 13 11/06/2009 02:21 AM

Storage Management (ZFS)
No ratings yet
Storage Management (ZFS)
12 pages
Solaris 11 Advanced System Administrator
No ratings yet
Solaris 11 Advanced System Administrator
106 pages
01 - Security Essentials
No ratings yet
01 - Security Essentials
34 pages
Implementing IBM Storage Data Deduplication Solutions Sg247888
No ratings yet
Implementing IBM Storage Data Deduplication Solutions Sg247888
328 pages
Dashboard in A Day Slides
No ratings yet
Dashboard in A Day Slides
40 pages
Parameter All Data Backup Procedure: "D:/Machine Backup/mmddyy/"
No ratings yet
Parameter All Data Backup Procedure: "D:/Machine Backup/mmddyy/"
5 pages
Oracle Certkiller 1z0-822 140q
No ratings yet
Oracle Certkiller 1z0-822 140q
106 pages
SSRCTR 10 01
No ratings yet
SSRCTR 10 01
97 pages
Zfs A4
No ratings yet
Zfs A4
26 pages
Dedup Slides
No ratings yet
Dedup Slides
51 pages
5 FileSystems
No ratings yet
5 FileSystems
33 pages
Openzfs Basics: George Wilson Matt Ahrens
No ratings yet
Openzfs Basics: George Wilson Matt Ahrens
39 pages
14.1 Zfs Intro
No ratings yet
14.1 Zfs Intro
44 pages
Oracle Solaris Zfs
No ratings yet
Oracle Solaris Zfs
32 pages
Zfs Aaron Toponce
No ratings yet
Zfs Aaron Toponce
25 pages
Overview of Storage in Windows Server 2016
No ratings yet
Overview of Storage in Windows Server 2016
49 pages
Wipro RPS Assignment Day 6
No ratings yet
Wipro RPS Assignment Day 6
15 pages
14 0-zfs
No ratings yet
14 0-zfs
22 pages
A Study On Data Deduplication in HPC Storage Systems.
No ratings yet
A Study On Data Deduplication in HPC Storage Systems.
11 pages
DAY 6 Data Deduplication Windows Server 2019
No ratings yet
DAY 6 Data Deduplication Windows Server 2019
9 pages
M01res01-Technology Overview
No ratings yet
M01res01-Technology Overview
46 pages
Getting Started With ZFS
No ratings yet
Getting Started With ZFS
43 pages
Ext4 Features
No ratings yet
Ext4 Features
9 pages
t420 HW Maintenance Manual 1436437
No ratings yet
t420 HW Maintenance Manual 1436437
186 pages
Kaspi-Kz-Registration Document
No ratings yet
Kaspi-Kz-Registration Document
299 pages
Li 2020
No ratings yet
Li 2020
13 pages
Mod 3
No ratings yet
Mod 3
8 pages
Introduction To ZFS R1a
No ratings yet
Introduction To ZFS R1a
12 pages
ZFS Imp
No ratings yet
ZFS Imp
24 pages
ZFS On FreeBSD
No ratings yet
ZFS On FreeBSD
20 pages
ZFS Deep Dive
No ratings yet
ZFS Deep Dive
49 pages
Zfs Internals Uli Graef
No ratings yet
Zfs Internals Uli Graef
32 pages
Kaspi-kz-4Q FY 2024 Presentation
No ratings yet
Kaspi-kz-4Q FY 2024 Presentation
63 pages
ZFS Cheat Sheet
No ratings yet
ZFS Cheat Sheet
22 pages
Netapp Deduplication
100% (1)
Netapp Deduplication
17 pages
Ext2 & Ext3 File Systems: File System and File Structures
No ratings yet
Ext2 & Ext3 File Systems: File System and File Structures
18 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
Zfs Last
No ratings yet
Zfs Last
44 pages
Understanding Data Deduplication
No ratings yet
Understanding Data Deduplication
4 pages
1Z0 822 Demo 1
No ratings yet
1Z0 822 Demo 1
8 pages
SDFS Architecture
No ratings yet
SDFS Architecture
17 pages
Introduction To ZFS R1d-1
No ratings yet
Introduction To ZFS R1d-1
13 pages
Swift UI
No ratings yet
Swift UI
25 pages
Tauqeer Iqbal AWS Architect IDC
No ratings yet
Tauqeer Iqbal AWS Architect IDC
6 pages
Overview of Storage in Windows Server 2016
No ratings yet
Overview of Storage in Windows Server 2016
49 pages
ZFS Cheatsheet: This Is A Quick and Dirty Cheatsheet On Sun's ZFS
No ratings yet
ZFS Cheatsheet: This Is A Quick and Dirty Cheatsheet On Sun's ZFS
7 pages
ZFS Feature Flags
No ratings yet
ZFS Feature Flags
13 pages
ZFS
100% (1)
ZFS
24 pages
Deduplication School
No ratings yet
Deduplication School
61 pages
Zfs Last Word
No ratings yet
Zfs Last Word
34 pages
ZFS: The Last Word in File Systems
No ratings yet
ZFS: The Last Word in File Systems
29 pages
The Last Word in File Systems: Bill Moore
No ratings yet
The Last Word in File Systems: Bill Moore
33 pages
HTTP Security Headers With Nginx
No ratings yet
HTTP Security Headers With Nginx
8 pages
TR 3603
No ratings yet
TR 3603
11 pages
System Management
No ratings yet
System Management
20 pages
ZFS
No ratings yet
ZFS
2 pages
Zfs
No ratings yet
Zfs
44 pages
HP - HPE0-V14.v2020-02-10.q30: Show Answer
No ratings yet
HP - HPE0-V14.v2020-02-10.q30: Show Answer
10 pages
Solaris Dynamic File System: Sun Microsystems, Inc
No ratings yet
Solaris Dynamic File System: Sun Microsystems, Inc
26 pages
Zfs Introduction
No ratings yet
Zfs Introduction
17 pages
Rafiq Et Al 2022 Privacy Prevention of Big Data Applications A Systematic Literature Review
No ratings yet
Rafiq Et Al 2022 Privacy Prevention of Big Data Applications A Systematic Literature Review
23 pages
Salesforce Data Loader
No ratings yet
Salesforce Data Loader
55 pages
Oracle Data Integerator (ODI) 11.1.1.9.0 Installation Document
No ratings yet
Oracle Data Integerator (ODI) 11.1.1.9.0 Installation Document
30 pages
Data Deduplication For Dummies: Submitted by 1.shashank Shekhar (11609052) 2.manisha (11609026)
No ratings yet
Data Deduplication For Dummies: Submitted by 1.shashank Shekhar (11609052) 2.manisha (11609026)
22 pages
Marketing Cloud Consultant Certification Study Notes
No ratings yet
Marketing Cloud Consultant Certification Study Notes
19 pages
Master Data
No ratings yet
Master Data
36 pages
nx4 Dedup
No ratings yet
nx4 Dedup
17 pages
Configure Ap Mode On The Wap4410n
No ratings yet
Configure Ap Mode On The Wap4410n
14 pages
Evaluating Deduplication Solutions?: What You Really Should Consider
No ratings yet
Evaluating Deduplication Solutions?: What You Really Should Consider
9 pages
What Is Data Obfuscation - Techniques & Strategy - Imperva
No ratings yet
What Is Data Obfuscation - Techniques & Strategy - Imperva
10 pages
ZFS Intro
No ratings yet
ZFS Intro
2 pages
Unit 1 Servlet
No ratings yet
Unit 1 Servlet
19 pages
Zero
No ratings yet
Zero
9 pages
9 Arrays
No ratings yet
9 Arrays
16 pages
Final Paper v2.5
No ratings yet
Final Paper v2.5
6 pages
Proposal 1-Comment 1
No ratings yet
Proposal 1-Comment 1
14 pages
Copado Fundamentals I
No ratings yet
Copado Fundamentals I
7 pages
Context Variables
No ratings yet
Context Variables
13 pages
Technical Overview of Data Exchange
No ratings yet
Technical Overview of Data Exchange
35 pages
The Undelete Technology Research For UNIX-like
No ratings yet
The Undelete Technology Research For UNIX-like
5 pages
Data Modeling Powerdesigner Da Data Sheet
No ratings yet
Data Modeling Powerdesigner Da Data Sheet
4 pages
ZFS Data Integrity
No ratings yet
ZFS Data Integrity
3 pages
EXT4 Filesystem
No ratings yet
EXT4 Filesystem
4 pages
PAM Vs IAM
No ratings yet
PAM Vs IAM
5 pages
SVVPLAN
No ratings yet
SVVPLAN
17 pages
Open Source ERP
No ratings yet
Open Source ERP
2 pages
Answer Key
No ratings yet
Answer Key
4 pages
Sap Hana On Vmware Vsphere 6.5 and 6.7 in Production
No ratings yet
Sap Hana On Vmware Vsphere 6.5 and 6.7 in Production
4 pages
Outline of Perl - Wikipedia
No ratings yet
Outline of Perl - Wikipedia
1 page
Data Storage Policy PDF
No ratings yet
Data Storage Policy PDF
3 pages
Types of Computer Software
No ratings yet
Types of Computer Software
3 pages
Live Project - 6. Steganography - Technique To Hide Information Within Image File
No ratings yet
Live Project - 6. Steganography - Technique To Hide Information Within Image File
10 pages
Business Process Procedure For Credit Memo
No ratings yet
Business Process Procedure For Credit Memo
8 pages
Umesh Arora
No ratings yet
Umesh Arora
1 page
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Gluster Filesystem - Practical Method
From Everand
Gluster Filesystem - Practical Method
Fabian Mestre
No ratings yet
FreeBSD Mastery: Advanced ZFS: IT Mastery, #9
From Everand
FreeBSD Mastery: Advanced ZFS: IT Mastery, #9
Michael W. Lucas
No ratings yet