0% found this document useful (0 votes)

84 views

Bit Twiddling Hacks

Uploaded by

İhsan Türkoğlu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views

Bit Twiddling Hacks

Uploaded by

İhsan Türkoğlu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

09.12.

2017 B t Tw ddl ng Hacks

B t Tw ddl ng Hacks
By Sean Eron Anderson
[email protected]
Ind v dually, the code sn ppets here are n the publ c doma n (unless otherw se noted) — feel free to use them however you
please. The aggregate collect on and descr pt ons are © 1997-2005 Sean Eron Anderson. The code and descr pt ons are d str buted
n the hope that they w ll be useful, but WITHOUT ANY WARRANTY and w thout even the mpl ed warranty of
merchantab l ty or f tness for a part cular purpose. As of May 5, 2005, all the code has been tested thoroughly. Thousands of
people have read t. Moreover, Professor Randal Bryant, the Dean of Computer Sc ence at Carneg e Mellon Un vers ty, has
personally tested almost everyth ng w th h s Ucl d code ver f cat on system. What he hasn't tested, I have checked aga nst all
poss ble nputs on a 32-b t mach ne. To the f rst person to nform me of a leg t mate bug n the code, I'll pay a bounty of
US$10 (by check or Paypal). If d rected to a char ty, I'll pay US$20.

Contents

About the operat on count ng methodology

Compute the s gn of an nteger
Detect f two ntegers have oppos te s gns
Compute the nteger absolute value (abs) w thout branch ng
Compute the m n mum (m n) or max mum (max) of two ntegers w thout branch ng
Determ n ng f an nteger s a power of 2
S gn extend ng
S gn extend ng from a constant b t-w dth
S gn extend ng from a var able b t-w dth
S gn extend ng from a var able b t-w dth n 3 operat ons
Cond t onally set or clear b ts w thout branch ng
Cond t onally negate a value w thout branch ng
Merge b ts from two values accord ng to a mask
Count ng b ts set
Count ng b ts set, na ve way
Count ng b ts set by lookup table
Count ng b ts set, Br an Kern ghan's way
Count ng b ts set n 14, 24, or 32-b t words us ng 64-b t nstruct ons
Count ng b ts set, n parallel
Count b ts set (rank) from the most-s gn f cant b t upto a g ven pos t on
Select the b t pos t on (from the most-s gn f cant b t) w th the g ven count (rank)
Comput ng par ty (1 f an odd number of b ts set, 0 otherw se)
Compute par ty of a word the na ve way
Compute par ty by lookup table
Compute par ty of a byte us ng 64-b t mult ply and modulus d v s on
Compute par ty of word w th a mult ply
Compute par ty n parallel
Swapp ng Values
Swapp ng values w th subtract on and add t on
Swapp ng values w th XOR
Swapp ng nd v dual b ts w th XOR
Revers ng b t sequences
Reverse b ts the obv ous way
Reverse b ts n word by lookup table
Reverse the b ts n a byte w th 3 operat ons (64-b t mult ply and modulus d v s on)
Reverse the b ts n a byte w th 4 operat ons (64-b t mult ply, no d v s on)
Reverse the b ts n a byte w th 7 operat ons (no 64-b t, only 32)
Reverse an N-b t quant ty n parallel w th 5 * lg(N) operat ons
Modulus d v s on (aka comput ng rema nders)
Comput ng modulus d v s on by 1 << s w thout a d v s on operat on (obv ous)
Comput ng modulus d v s on by (1 << s) - 1 w thout a d v s on operat on
https://graph cs.stanford.edu/~seander/b thacks.html 1/30
09.12.2017 B t Tw ddl ng Hacks

Comput ng modulus d v s on by (1 << s) - 1 n parallel w thout a d v s on operat on

F nd ng nteger log base 2 of an nteger (aka the pos t on of the h ghest b t set)
F nd the log base 2 of an nteger w th the MSB N set n O(N) operat ons (the obv ous way)
F nd the nteger log base 2 of an nteger w th an 64-b t IEEE float
F nd the log base 2 of an nteger w th a lookup table
F nd the log base 2 of an N-b t nteger n O(lg(N)) operat ons
F nd the log base 2 of an N-b t nteger n O(lg(N)) operat ons w th mult ply and lookup
F nd nteger log base 10 of an nteger
F nd nteger log base 10 of an nteger the obv ous way
F nd nteger log base 2 of a 32-b t IEEE float
F nd nteger log base 2 of the pow(2, r)-root of a 32-b t IEEE float (for uns gned nteger r)
Count ng consecut ve tra l ng zero b ts (or f nd ng b t nd ces)
Count the consecut ve zero b ts (tra l ng) on the r ght l nearly
Count the consecut ve zero b ts (tra l ng) on the r ght n parallel
Count the consecut ve zero b ts (tra l ng) on the r ght by b nary search
Count the consecut ve zero b ts (tra l ng) on the r ght by cast ng to a float
Count the consecut ve zero b ts (tra l ng) on the r ght w th modulus d v s on and lookup
Count the consecut ve zero b ts (tra l ng) on the r ght w th mult ply and lookup
Round up to the next h ghest power of 2 by float cast ng
Round up to the next h ghest power of 2
Interleav ng b ts (aka comput ng Morton Numbers)
Interleave b ts the obv ous way
Interleave b ts by table lookup
Interleave b ts w th 64-b t mult ply
Interleave b ts by B nary Mag c Numbers
Test ng for ranges of bytes n a word (and count ng occurances found)
Determ ne f a word has a zero byte
Determ ne f a word has a byte equal to n
Determ ne f a word has byte less than n
Determ ne f a word has a byte greater than n
Determ ne f a word has a byte between m and n
Compute the lex cograph cally next b t permutat on

About the operat on count ng methodology

When total ng the number of operat ons for algor thms here, any C operator s counted as one operat on.
Intermed ate ass gnments, wh ch need not be wr tten to RAM, are not counted. Of course, th s operat on
count ng approach only serves as an approx mat on of the actual number of mach ne nstruct ons and CPU
t me. All operat ons are assumed to take the same amount of t me, wh ch s not true n real ty, but CPUs have
been head ng ncreas ngly n th s d rect on over t me. There are many nuances that determ ne how fast a
system w ll run a g ven sample of code, such as cache s zes, memory bandw dths, nstruct on sets, etc. In the
end, benchmark ng s the best way to determ ne whether one method s really faster than another, so cons der
the techn ques below as poss b l t es to test on your target arch tecture.

Compute the s gn of an nteger

int v; // we want to find the sign of v
int sign; // the result goes here

// CHAR_BIT is the number of bits per byte (normally 8).

sign = -(v < 0); // if v < 0 then -1, else 0.
// or, to avoid branching on CPUs with flag registers (IA32):
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
// or, for one less instruction (but not portable):
sign = v >> (sizeof(int) * CHAR_BIT - 1);

https://graph cs.stanford.edu/~seander/b thacks.html 2/30

09.12.2017 B t Tw ddl ng Hacks

The last express on above evaluates to s gn = v >> 31 for 32-b t ntegers. Th s s one operat on faster than the
obv ous way, s gn = -(v < 0). Th s tr ck works because when s gned ntegers are sh fted r ght, the value of the
far left b t s cop ed to the other b ts. The far left b t s 1 when the value s negat ve and 0 otherw se; all 1 b ts
g ves -1. Unfortunately, th s behav or s arch tecture-spec f c.

Alternat vely, f you prefer the result be e ther -1 or +1, then use:
sign = +1 | (v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then -1, else +1

On the other hand, f you prefer the result be e ther -1, 0, or +1, then use:

sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));

// Or, for more speed but less portability:
sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1)); // -1, 0, or +1
// Or, for portability, brevity, and (perhaps) speed:
sign = (v > 0) - (v < 0); // -1, 0, or +1

If nstead you want to know f someth ng s non-negat ve, result ng n +1 or else 0, then use:
sign = 1 ^ ((unsigned int)v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then 0, else 1

Caveat: On March 7, 2003, Angus Duggan po nted out that the 1989 ANSI C spec f cat on leaves the result
of s gned r ght-sh ft mplementat on-def ned, so on some systems th s hack m ght not work. For greater
portab l ty, Toby Spe ght suggested on September 28, 2005 that CHAR_BIT be used here and throughout
rather than assum ng bytes were 8 b ts long. Angus recommended the more portable vers ons above,
nvolv ng cast ng on March 4, 2006. Roh t Garg suggested the vers on for non-negat ve ntegers on
September 12, 2009.

Detect f two ntegers have oppos te s gns

int x, y; // input values to compare signs

bool f = ((x ^ y) < 0); // true iff x and y have opposite signs

Manfred We s suggested I add th s entry on November 26, 2009.

Compute the nteger absolute value (abs) w thout branch ng

int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;

r = (v + mask) ^ mask;

Patented var at on:

r = (v ^ mask) - mask;

Some CPUs don't have an nteger absolute value nstruct on (or the comp ler fa ls to use them). On mach nes
where branch ng s expens ve, the above express on can be faster than the obv ous approach, r = (v < 0) ? -
(uns gned)v : v, even though the number of operat ons s the same.

On March 7, 2003, Angus Duggan po nted out that the 1989 ANSI C spec f cat on leaves the result of s gned
r ght-sh ft mplementat on-def ned, so on some systems th s hack m ght not work. I've read that ANSI C does
not requ re values to be represented as two's complement, so t may not work for that reason as well (on a
d m n sh ngly small number of old mach nes that st ll use one's complement). On March 14, 2004, Ke th H.
Duggar sent me the patented var at on above; t s super or to the one I n t ally came up w th, r=(+1|(v>>
(sizeof(int)*CHAR_BIT-1)))*v, because a mult ply s not used. Unfortunately, th s method has been patented
n the USA on June 6, 2000 by Vlad m r Yu Volkonsky and ass gned to Sun M crosystems. On August 13,
2006, Yur y Kam nsk y told me that the patent s l kely nval d because the method was publ shed well
https://graph cs.stanford.edu/~seander/b thacks.html 3/30
09.12.2017 B t Tw ddl ng Hacks

before the patent was even f led, such as n How to Opt m ze for the Pent um Processor by Agner Fog, dated
November, 9, 1996. Yur y also ment oned that th s document was translated to Russ an n 1997, wh ch
Vlad m r could have read. Moreover, the Internet Arch ve also has an old l nk to t. On January 30, 2007,
Peter Kankowsk shared w th me an abs vers on he d scovered that was nsp red by M crosoft's V sual C++
comp ler output. It s featured here as the pr mary solut on. On December 6, 2007, Ha J n compla ned that
the result was s gned, so when comput ng the abs of the most negat ve value, t was st ll negat ve. On Apr l
15, 2008 Andrew Shap ra po nted out that the obv ous approach could overflow, as t lacked an (uns gned)
cast then; for max mum portab l ty he suggested (v < 0) ? (1 + ((unsigned)(-1-v))) : (unsigned)v. But
c t ng the ISO C99 spec on July 9, 2008, V ncent Lefèvre conv nced me to remove t becasue even on non-
2s-complement mach nes -(uns gned)v w ll do the r ght th ng. The evaluat on of -(uns gned)v f rst converts
the negat ve value of v to an uns gned by add ng 2**N, y eld ng a 2s complement representat on of v's value
that I'll call U. Then, U s negated, g v ng the des red result, -U = 0 - U = 2**N - U = 2**N - (v+2**N) = -v
= abs(v).

Compute the m n mum (m n) or max mum (max) of two ntegers w thout branch ng
int x; // we want to find the minimum of x and y
int y;
int r; // the result goes here

r = y ^ ((x ^ y) & -(x < y)); // min(x, y)

On some rare mach nes where branch ng s very expens ve and no cond t on move nstruct ons ex st, the
above express on m ght be faster than the obv ous approach, r = (x < y) ? x : y, even though t nvolves two
more nstruct ons. (Typ cally, the obv ous approach s best, though.) It works because f x < y, then -(x < y)
w ll be all ones, so r = y ^ (x ^ y) & ~0 = y ^ x ^ y = x. Otherw se, f x >= y, then -(x < y) w ll be all zeros, so
r = y ^ ((x ^ y) & 0) = y. On some mach nes, evaluat ng (x < y) as 0 or 1 requ res a branch nstruct on, so
there may be no advantage.

To f nd the max mum, use:

r = x ^ ((x ^ y) & -(x < y)); // max(x, y)

Qu ck and d rty vers ons:

If you know that INT_MIN <= x - y <= INT_MAX, then you can use the follow ng, wh ch are faster because
(x - y) only needs to be evaluated once.
r = y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y)
r = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)

Note that the 1989 ANSI C spec f cat on doesn't spec fy the result of s gned r ght-sh ft, so these aren't
portable. If except ons are thrown on overflows, then the values of x and y should be uns gned or cast to
uns gned for the subtract ons to avo d unnecessar ly throw ng an except on, however the r ght-sh ft needs a
s gned operand to produce all one b ts when negat ve, so cast to s gned there.

On March 7, 2003, Angus Duggan po nted out the r ght-sh ft portab l ty ssue. On May 3, 2005, Randal E.
Bryant alerted me to the need for the precond t on, INT_MIN <= x - y <= INT_MAX, and suggested the
non-qu ck and d rty vers on as a f x. Both of these ssues concern only the qu ck and d rty vers on. N gel
Horspoon observed on July 6, 2005 that gcc produced the same code on a Pent um as the obv ous solut on
because of how t evaluates (x < y). On July 9, 2008 V ncent Lefèvre po nted out the potent al for overflow
except ons w th subtract ons n r = y + ((x - y) & -(x < y)), wh ch was the prev ous vers on. T mothy B.
Terr berry suggested us ng xor rather than add and subract to avo d cast ng and the r sk of overflows on June
2, 2009.

Determ n ng f an nteger s a power of 2

https://graph cs.stanford.edu/~seander/b thacks.html 4/30

09.12.2017 B t Tw ddl ng Hacks

unsigned int v; // we want to see if v is a power of 2

bool f; // the result goes here

f = (v & (v - 1)) == 0;

Note that 0 s ncorrectly cons dered a power of 2 here. To remedy th s, use:

f = v && !(v & (v - 1));

S gn extend ng from a constant b t-w dth

S gn extens on s automat c for bu lt- n types, such as chars and nts. But suppose you have a s gned two's
complement number, x, that s stored us ng only b b ts. Moreover, suppose you want to convert x to an nt,
wh ch has more than b b ts. A s mple copy w ll work f x s pos t ve, but f negat ve, the s gn must be
extended. For example, f we have only 4 b ts to store a number, then -3 s represented as 1101 n b nary. If
we have 8 b ts, then -3 s 11111101. The most-s gn f cant b t of the 4-b t representat on s repl cated
s n strally to f ll n the dest nat on when we convert to a representat on w th more b ts; th s s s gn extend ng.
In C, s gn extens on from a constant b t-w dth s tr v al, s nce b t f elds may be spec f ed n structs or un ons.
For example, to convert from 5 b ts to an full nteger:
int x; // convert this from using 5 bits to a full int
int r; // resulting sign extended number goes here
struct {signed int x:5;} s;
r = s.x = x;

The follow ng s a C++ template funct on that uses the same language feature to convert from B b ts n one
operat on (though the comp ler s generat ng more, of course).
template <typename T, unsigned B>
inline T signextend(const T x)
{
struct {T x:B;} s;
return s.x = x;
}

int r = signextend<signed int,5>(x); // sign extend 5 bit number x to r

John Byrd caught a typo n the code (attr buted to html formatt ng) on May 2, 2005. On March 4, 2006, Pat
Wood po nted out that the ANSI C standard requ res that the b tf eld have the keyword "s gned" to be s gned;
otherw se, the s gn s undef ned.

S gn extend ng from a var able b t-w dth

Somet mes we need to extend the s gn of a number but we don't know a pr or the number of b ts, b, n wh ch
t s represented. (Or we could be programm ng n a language l ke Java, wh ch lacks b tf elds.)
unsigned b; // number of bits representing the number in x
int x; // sign extend this b-bit number to r
int r; // resulting sign-extended number
int const m = 1U << (b - 1); // mask can be pre-computed if b is fixed

x = x & ((1U << b) - 1); // (Skip this if bits in x above position b are already zero.)
r = (x ^ m) - m;

The code above requ res four operat ons, but when the b tw dth s a constant rather than var able, t requ res
only two fast operat ons, assum ng the upper b ts are already zeroes.

A sl ghtly faster but less portable method that doesn't depend on the b ts n x above pos t on b be ng zero s:

int const m = CHAR_BIT * sizeof(x) - b;

r = (x << m) >> m;

https://graph cs.stanford.edu/~seander/b thacks.html 5/30

09.12.2017 B t Tw ddl ng Hacks

Sean A. Irv ne suggested that I add s gn extens on methods to th s page on June 13, 2004, and he prov ded m
= (1 << (b - 1)) - 1; r = -(x & ~m) | x; as a start ng po nt from wh ch I opt m zed to get m = 1U << (b -
1); r = -(x & m) | x. But then on May 11, 2007, Shay Green suggested the vers on above, wh ch requ res one
less operat on than m ne. V p n Sharma suggested I add a step to deal w th s tuat ons where x had poss ble
ones n b ts other than the b b ts we wanted to s gn-extend on Oct. 15, 2008. On December 31, 2009 Chr s
P razz suggested I add the faster vers on, wh ch requ res two operat ons for constant b t-w dths and three for
var able w dths.

S gn extend ng from a var able b t-w dth n 3 operat ons

The follow ng may be slow on some mach nes, due to the effort requ red for mult pl cat on and d v s on.
Th s vers on s 4 operat ons. If you know that your n t al b t-w dth, b, s greater than 1, you m ght do th s
type of s gn extens on n 3 operat ons by us ng r = (x * mult pl ers[b]) / mult pl ers[b], wh ch requ res only
one array lookup.
unsigned b; // number of bits representing the number in x
int x; // sign extend this b-bit number to r
int r; // resulting sign-extended number
#define M(B) (1U << ((sizeof(x) * CHAR_BIT) - B)) // CHAR_BIT=bits/byte
static int const multipliers[] =
{
0, M(1), M(2), M(3), M(4), M(5), M(6), M(7),
M(8), M(9), M(10), M(11), M(12), M(13), M(14), M(15),
M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23),
M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31),
M(32)
}; // (add more if using more than 64 bits)
static int const divisors[] =
{
1, ~M(1), M(2), M(3), M(4), M(5), M(6), M(7),
M(8), M(9), M(10), M(11), M(12), M(13), M(14), M(15),
M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23),
M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31),
M(32)
}; // (add more for 64 bits)
#undef M
r = (x * multipliers[b]) / divisors[b];

The follow ng var at on s not portable, but on arch tectures that employ an ar thmet c r ght-sh ft,
ma nta n ng the s gn, t should be fast.
const int s = -b; // OR: sizeof(x) * CHAR_BIT - b;
r = (x << s) >> s;

Randal E. Bryant po nted out a bug on May 3, 2005 n an earl er vers on (that used mult pl ers[] for
d v sors[]), where t fa led on the case of x=1 and b=1.

Cond t onally set or clear b ts w thout branch ng

bool f; // conditional flag
unsigned int m; // the bit mask
unsigned int w; // the word to modify: if (f) w |= m; else w &= ~m;

w ^= (-f ^ w) & m;

// OR, for superscalar CPUs:

w = (w & ~m) | (-f & m);

On some arch tectures, the lack of branch ng can more than make up for what appears to be tw ce as many
operat ons. For nstance, nformal speed tests on an AMD Athlon™ XP 2100+ nd cated t was 5-10% faster.
An Intel Core 2 Duo ran the superscalar vers on about 16% faster than the f rst. Glenn Slayden nformed me

https://graph cs.stanford.edu/~seander/b thacks.html 6/30

09.12.2017 B t Tw ddl ng Hacks

of the f rst express on on December 11, 2003. Marco Yu shared the superscalar vers on w th me on Apr l 3,
2007 and alerted me to a typo 2 days later.

Cond t onally negate a value w thout branch ng

If you need to negate only when a flag s false, then use the follow ng to avo d branch ng:
bool fDontNegate; // Flag indicating we should not negate v.
int v; // Input value to negate if fDontNegate is false.
int r; // result = fDontNegate ? v : -v;

r = (fDontNegate ^ (fDontNegate - 1)) * v;

If you need to negate only when a flag s true, then use th s:

bool fNegate; // Flag indicating if we should negate v.
int v; // Input value to negate if fNegate is true.
int r; // result = fNegate ? -v : v;

r = (v ^ -fNegate) + fNegate;

Avraham Plotn tzky suggested I add the f rst vers on on June 2, 2009. Mot vated to avo d the mult ply, I
came up w th the second vers on on June 8, 2009. Alfonso De Gregor o po nted out that some parens were
m ss ng on November 26, 2009, and rece ved a bug bounty.

Merge b ts from two values accord ng to a mask

unsigned int a; // value to merge in non-masked bits
unsigned int b; // value to merge in masked bits
unsigned int mask; // 1 where bits from b should be selected; 0 where from a.
unsigned int r; // result of (a & ~mask) | (b & mask) goes here

r = a ^ ((a ^ b) & mask);

Th s shaves one operat on from the obv ous way of comb n ng two sets of b ts accord ng to a b t mask. If the
mask s a constant, then there may be no advantage.

Ron Jeffery sent th s to me on February 9, 2006.

Count ng b ts set (na ve way)

unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v

for (c = 0; v; v >>= 1)
{
c += v & 1;
}

The na ve approach requ res one terat on per b t, unt l no more b ts are set. So on a 32-b t word w th only
the h gh set, t w ll go through 32 terat ons.

Count ng b ts set by lookup table

static const unsigned char BitsSetTable256[256] =
{
# define B2(n) n, n+1, n+1, n+2
# define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2)
# define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
https://graph cs.stanford.edu/~seander/b thacks.html 7/30
09.12.2017 B t Tw ddl ng Hacks

B6(0), B6(1), B6(1), B6(2)

};

unsigned int v; // count the number of bits set in 32-bit value v

unsigned int c; // c is the total bits set in v

// Option 1:
c = BitsSetTable256[v & 0xff] +
BitsSetTable256[(v >> 8) & 0xff] +
BitsSetTable256[(v >> 16) & 0xff] +
BitsSetTable256[v >> 24];

// Option 2:
unsigned char * p = (unsigned char *) &v;
c = BitsSetTable256[p[0]] +
BitsSetTable256[p[1]] +
BitsSetTable256[p[2]] +
BitsSetTable256[p[3]];

// To initially generate the table algorithmically:

BitsSetTable256[0] = 0;
for (int i = 0; i < 256; i++)
{
BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
}

On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.

Count ng b ts set, Br an Kern ghan's way

unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}

Br an Kern ghan's method goes through as many terat ons as there are set b ts. So f we have a 32-b t word
w th only the h gh b t set, then t w ll only go once through the loop.

Publ shed n 1988, the C Programm ng Language 2nd Ed. (by Br an W. Kern ghan and Denn s M. R tch e)
ment ons th s n exerc se 2-9. On Apr l 19, 2006 Don Knuth po nted out to me that th s method "was f rst
publ shed by Peter Wegner n CACM 3 (1960), 322. (Also d scovered ndependently by Derr ck Lehmer and
publ shed n 1964 n a book ed ted by Beckenbach.)"

Count ng b ts set n 14, 24, or 32-b t words us ng 64-b t nstruct ons

unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v

// option 1, for at most 14-bit values in v:

c = (v * 0x200040008001ULL & 0x111111111111111ULL) % 0xf;

// option 2, for at most 24-bit values in v:

c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL)
% 0x1f;

// option 3, for at most 32-bit values in v:

c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) %
0x1f;
c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
https://graph cs.stanford.edu/~seander/b thacks.html 8/30
09.12.2017 B t Tw ddl ng Hacks

Th s method requ res a 64-b t CPU w th fast modulus d v s on to be eff c ent. The f rst opt on takes only 3
operat ons; the second opt on takes 10; and the th rd opt on takes 15.

R ch Schroeppel or g nally created a 9-b t vers on, s m l ar to opt on 1; see the Programm ng Hacks sect on
of Beeler, M., Gosper, R. W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972. H s method
was the nsp rat on for the var ants above, dev sed by Sean Anderson. Randal E. Bryant offered a couple bug
f xes on May 3, 2005. Bruce Dawson tweaked what had been a 12-b t vers on and made t su table for 14 b ts
us ng the same number of operat ons on Feburary 1, 2007.

Count ng b ts set, n parallel

unsigned int v; // count bits set in this (32-bit value)
unsigned int c; // store the total here
static const int S[] = {1, 2, 4, 8, 16}; // Magic Binary Numbers
static const int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};

c = v - ((v >> 1) & B[0]);

c = ((c >> S[1]) & B[1]) + (c & B[1]);
c = ((c >> S[2]) + c) & B[2];
c = ((c >> S[3]) + c) & B[3];
c = ((c >> S[4]) + c) & B[4];

The B array, expressed as b nary, s:

B[0] = 0x55555555 = 01010101 01010101 01010101 01010101
B[1] = 0x33333333 = 00110011 00110011 00110011 00110011
B[2] = 0x0F0F0F0F = 00001111 00001111 00001111 00001111
B[3] = 0x00FF00FF = 00000000 11111111 00000000 11111111
B[4] = 0x0000FFFF = 00000000 00000000 11111111 11111111

We can adjust the method for larger nteger s zes by cont nu ng w th the patterns for the B nary Mag c
Numbers, B and S. If there are k b ts, then we need the arrays S and B to be ce l(lg(k)) elements long, and we
must compute the same number of express ons for c as S or B are long. For a 32-b t v, 16 operat ons are
used.

The best method for count ng b ts n a 32-b t nteger v s the follow ng:

v = v - ((v >> 1) & 0x55555555); // reuse input as temporary

v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count

The best b t count ng method takes only 12 operat ons, wh ch s the same as the lookup-table method, but
avo ds the memory and potent al cache m sses of a table. It s a hybr d between the purely parallel method
above and the earl er methods us ng mult pl es ( n the sect on on count ng b ts w th 64-b t nstruct ons),
though t doesn't use 64-b t nstruct ons. The counts of b ts set n the bytes s done n parallel, and the sum
total of the b ts set n the bytes s computed by mult ply ng by 0x1010101 and sh ft ng r ght 24 b ts.

A general zat on of the best b t count ng method to ntegers of b t-w dths upto 128 (parameter zed by type T)
s th s:

v = v - ((v >> 1) & (T)~(T)0/3); // temp

v = (v & (T)~(T)0/15*3) + ((v >> 2) & (T)~(T)0/15*3); // temp
v = (v + (v >> 4)) & (T)~(T)0/255*15; // temp
c = (T)(v * ((T)~(T)0/255)) >> (sizeof(T) - 1) * CHAR_BIT; // count

See Ian Ashdown's n ce newsgroup post for more nformat on on count ng the number of b ts set (also
known as s deways add t on). The best b t count ng method was brought to my attent on on October 5, 2005
by Andrew Shap ra; he found t n pages 187-188 of Software Opt m zat on Gu de for AMD Athlon™ 64
and Opteron™ Processors. Charl e Gordon suggested a way to shave off one operat on from the purely
parallel vers on on December 14, 2005, and Don Clugston tr mmed three more from t on December 30,
2005. I made a typo w th Don's suggest on that Er c Cole spotted on January 8, 2006. Er c later suggested the

https://graph cs.stanford.edu/~seander/b thacks.html 9/30

09.12.2017 B t Tw ddl ng Hacks

arb trary b t-w dth general zat on to the best method on November 17, 2006. On Apr l 5, 2007, Al W ll ams
observed that I had a l ne of dead code at the top of the f rst method.

Count b ts set (rank) from the most-s gn f cant b t upto a g ven pos t on
The follow ng f nds the the rank of a b t, mean ng t returns the sum of b ts that are set to 1 from the most-
s gnf cant b t downto the b t at the g ven pos t on.
uint64_t v; // Compute the rank (bits set) in v from the MSB to pos.
unsigned int pos; // Bit position to count bits upto.
uint64_t r; // Resulting rank of bit at pos goes here.

// Shift out bits after given position.

r = v >> (sizeof(v) * CHAR_BIT - pos);
// Count set bits in parallel.
// r = (r & 0x5555...) + ((r >> 1) & 0x5555...);
r = r - ((r >> 1) & ~0UL/3);
// r = (r & 0x3333...) + ((r >> 2) & 0x3333...);
r = (r & ~0UL/5) + ((r >> 2) & ~0UL/5);
// r = (r & 0x0f0f...) + ((r >> 4) & 0x0f0f...);
r = (r + (r >> 4)) & ~0UL/17;
// r = r % 255;
r = (r * (~0UL/255)) >> ((sizeof(v) - 1) * CHAR_BIT);

Juha Järv sent th s to me on November 21, 2009 as an nverse operat on to the comput ng the b t pos t on
w th the g ven rank, wh ch follows.

Select the b t pos t on (from the most-s gn f cant b t) w th the g ven count (rank)

The follow ng 64-b t code selects the pos t on of the rth 1 b t when count ng from the left. In other words f
we start at the most s gn f cant b t and proceed to the r ght, count ng the number of b ts set to 1 unt l we
reach the des red rank, r, then the pos t on where we stop s returned. If the rank requested exceeds the count
of b ts set, then 64 s returned. The code may be mod f ed for 32-b t or count ng from the r ght.
uint64_t v; // Input value to find position with rank r.
unsigned int r; // Input: bit's desired rank [1-64].
unsigned int s; // Output: Resulting position of bit with rank r [1-64]
uint64_t a, b, c, d; // Intermediate temporaries for bit count.
unsigned int t; // Bit count temporary.

// Do a normal parallel bit count for a 64-bit integer,

// but store all intermediate steps.
// a = (v & 0x5555...) + ((v >> 1) & 0x5555...);
a = v - ((v >> 1) & ~0UL/3);
// b = (a & 0x3333...) + ((a >> 2) & 0x3333...);
b = (a & ~0UL/5) + ((a >> 2) & ~0UL/5);
// c = (b & 0x0f0f...) + ((b >> 4) & 0x0f0f...);
c = (b + (b >> 4)) & ~0UL/0x11;
// d = (c & 0x00ff...) + ((c >> 8) & 0x00ff...);
d = (c + (c >> 8)) & ~0UL/0x101;
t = (d >> 32) + (d >> 48);
// Now do branchless select!
s = 64;
// if (r > t) {s -= 32; r -= t;}
s -= ((t - r) & 256) >> 3; r -= (t & ((t - r) >> 8));
t = (d >> (s - 16)) & 0xff;
// if (r > t) {s -= 16; r -= t;}
s -= ((t - r) & 256) >> 4; r -= (t & ((t - r) >> 8));
t = (c >> (s - 8)) & 0xf;
// if (r > t) {s -= 8; r -= t;}
s -= ((t - r) & 256) >> 5; r -= (t & ((t - r) >> 8));
t = (b >> (s - 4)) & 0x7;
// if (r > t) {s -= 4; r -= t;}
https://graph cs.stanford.edu/~seander/b thacks.html 10/30
09.12.2017 B t Tw ddl ng Hacks

s -= ((t - r) & 256) >> 6; r -= (t & ((t - r) >> 8));

t = (a >> (s - 2)) & 0x3;
// if (r > t) {s -= 2; r -= t;}
s -= ((t - r) & 256) >> 7; r -= (t & ((t - r) >> 8));
t = (v >> (s - 1)) & 0x1;
// if (r > t) s--;
s -= ((t - r) & 256) >> 8;
s = 65 - s;

If branch ng s fast on your target CPU, cons der uncomment ng the f-statements and comment ng the l nes
that follow them.

Juha Järv sent th s to me on November 21, 2009.

Comput ng par ty the na ve way

unsigned int v; // word value to compute the parity of
bool parity = false; // parity will be the parity of v

while (v)
{
parity = !parity;
v = v & (v - 1);
}

The above code uses an approach l ke Br an Kern gan's b t count ng, above. The t me t takes s proport onal
to the number of b ts set.

Compute par ty by lookup table

static const bool ParityTable256[256] =
{
# define P2(n) n, n^1, n^1, n
# define P4(n) P2(n), P2(n^1), P2(n^1), P2(n)
# define P6(n) P4(n), P4(n^1), P4(n^1), P4(n)
P6(0), P6(1), P6(1), P6(0)
};

unsigned char b; // byte value to compute the parity of

bool parity = ParityTable256[b];

// OR, for 32-bit words:

unsigned int v;
v ^= v >> 16;
v ^= v >> 8;
bool parity = ParityTable256[v & 0xff];

// Variation:
unsigned char * p = (unsigned char *) &v;
parity = ParityTable256[p[0] ^ p[1] ^ p[2] ^ p[3]];

Randal E. Bryant encouraged the add t on of the (adm ttedly) obv ous last var at on w th var able p on May
3, 2005. Bruce Rawles found a typo n an nstance of the table var able's name on September 27, 2005, and
he rece ved a $10 bug bounty. On October 9, 2006, Fabr ce Bellard suggested the 32-b t var at ons above,
wh ch requ re only one table lookup; the prev ous vers on had four lookups (one per byte) and were slower.
On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.

Compute par ty of a byte us ng 64-b t mult ply and modulus d v s on

https://graph cs.stanford.edu/~seander/b thacks.html 11/30

09.12.2017 B t Tw ddl ng Hacks

unsigned char b; // byte value to compute the parity of

bool parity =
(((b * 0x0101010101010101ULL) & 0x8040201008040201ULL) % 0x1FF) & 1;

The method above takes around 4 operat ons, but only works on bytes.

Compute par ty of word w th a mult ply

The follow ng method computes the par ty of the 32-b t value n only 8 operat ons us ng a mult ply.
unsigned int v; // 32-bit word
v ^= v >> 1;
v ^= v >> 2;
v = (v & 0x11111111U) * 0x11111111U;
return (v >> 28) & 1;

Also for 64-b ts, 8 operat ons are st ll enough.

unsigned long long v; // 64-bit word
v ^= v >> 1;
v ^= v >> 2;
v = (v & 0x1111111111111111UL) * 0x1111111111111111UL;
return (v >> 60) & 1;

Andrew Shap ra came up w th th s and sent t to me on Sept. 2, 2007.

Compute par ty n parallel

unsigned int v; // word value to compute the parity of
v ^= v >> 16;
v ^= v >> 8;
v ^= v >> 4;
v &= 0xf;
return (0x6996 >> v) & 1;

The method above takes around 9 operat ons, and works for 32-b t words. It may be opt m zed to work just
on bytes n 5 operat ons by remov ng the two l nes mmed ately follow ng "uns gned nt v;". The method f rst
sh fts and XORs the e ght n bbles of the 32-b t value together, leav ng the result n the lowest n bble of v.
Next, the b nary number 0110 1001 1001 0110 (0x6996 n hex) s sh fted to the r ght by the value
represented n the lowest n bble of v. Th s number s l ke a m n ature 16-b t par ty-table ndexed by the low
four b ts n v. The result has the par ty of v n b t 1, wh ch s masked and returned.

Thanks to Mathew Hendry for po nt ng out the sh ft-lookup dea at the end on Dec. 15, 2002. That
opt m zat on shaves two operat ons off us ng only sh ft ng and XOR ng to f nd the par ty.

Swapp ng values w th subtract on and add t on

#define SWAP(a, b) ((&(a) == &(b)) || \
(((a) -= (b)), ((b) += (a)), ((a) = (b) - (a))))

Th s swaps the values of a and b w thout us ng a temporary var able. The n t al check for a and b be ng the
same locat on n memory may be om tted when you know th s can't happen. (The comp ler may om t t
anyway as an opt m zat on.) If you enable overflows except ons, then pass uns gned values so an except on
sn't thrown. The XOR method that follows may be sl ghtly faster on some mach nes. Don't use th s w th
float ng-po nt numbers (unless you operate on the r raw nteger representat ons).

Sanjeev S vasankaran suggested I add th s on June 12, 2007. V ncent Lefèvre po nted out the potent al for
overflow except ons on July 9, 2008

https://graph cs.stanford.edu/~seander/b thacks.html 12/30

09.12.2017 B t Tw ddl ng Hacks

Swapp ng values w th XOR

#define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))

Th s s an old tr ck to exchange the values of the var ables a and b w thout us ng extra space for a temporary
var able.

On January 20, 2005, Ia n A. Flem ng po nted out that the macro above doesn't work when you swap w th
the same memory locat on, such as SWAP(a[ ], a[j]) w th == j. So f that may occur, cons der def n ng the
macro as (((a) == (b)) || (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))). On July 14, 2009, Hallvard Furuseth
suggested that on some mach nes, (((a) ^ (b)) && ((b) ^= (a) ^= (b), (a) ^= (b))) m ght be faster, s nce the (a)
^ (b) express on s reused.

Swapp ng nd v dual b ts w th XOR

unsigned int i, j; // positions of bit sequences to swap
unsigned int n; // number of consecutive bits in each sequence
unsigned int b; // bits to swap reside in b
unsigned int r; // bit-swapped result goes here

unsigned int x = ((b >> i) ^ (b >> j)) & ((1U << n) - 1); // XOR temporary
r = b ^ ((x << i) | (x << j));

As an example of swapp ng ranges of b ts suppose we have have b = 00101111 (expressed n b nary) and we
want to swap the n = 3 consecut ve b ts start ng at = 1 (the second b t from the r ght) w th the 3 consecut ve
b ts start ng at j = 5; the result would be r = 11100011 (b nary).

Th s method of swapp ng s s m lar to the general purpose XOR swap tr ck, but ntended for operat ng on
nd v dual b ts. The var able x stores the result of XOR ng the pa rs of b t values we want to swap, and then
the b ts are set to the result of themselves XORed w th x. Of course, the result s undef ned f the sequences
overlap.

On July 14, 2009 Hallvard Furuseth suggested that I change the 1 << n to 1U << n because the value was
be ng ass gned to an uns gned and to avo d sh ft ng nto a s gn b t.

Reverse b ts the obv ous way

unsigned int v; // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end

for (v >>= 1; v; v >>= 1)

{
r <<= 1;
r |= v & 1;
s--;
}
r <<= s; // shift when v's highest bits are zero

On October 15, 2004, M chael Ho s e po nted out a bug n the or g nal vers on. Randal E. Bryant suggested
remov ng an extra operat on on May 3, 2005. Behdad Esfabod suggested a sl ght change that el m nated one
terat on of the loop on May 18, 2005. Then, on February 6, 2007, L yong Zhou suggested a better vers on
that loops wh le v s not 0, so rather than terat ng over all b ts t stops early.

Reverse b ts n word by lookup table

static const unsigned char BitReverseTable256[256] =
{
https://graph cs.stanford.edu/~seander/b thacks.html 13/30
09.12.2017 B t Tw ddl ng Hacks

# define R2(n) n, n + 264, n + 164, n + 3*64

# define R4(n) R2(n), R2(n + 2*16), R2(n + 1*16), R2(n + 3*16)
# define R6(n) R4(n), R4(n + 2*4 ), R4(n + 1*4 ), R4(n + 3*4 )
R6(0), R6(2), R6(1), R6(3)
};

unsigned int v; // reverse 32-bit value, 8 bits at time

unsigned int c; // c will get v reversed

// Option 1:
c = (BitReverseTable256[v & 0xff] << 24) |
(BitReverseTable256[(v >> 8) & 0xff] << 16) |
(BitReverseTable256[(v >> 16) & 0xff] << 8) |
(BitReverseTable256[(v >> 24) & 0xff]);

// Option 2:
unsigned char * p = (unsigned char *) &v;
unsigned char * q = (unsigned char *) &c;
q[3] = BitReverseTable256[p[0]];
q[2] = BitReverseTable256[p[1]];
q[1] = BitReverseTable256[p[2]];
q[0] = BitReverseTable256[p[3]];

The f rst method takes about 17 operat ons, and the second takes about 12, assum ng your CPU can load and
store bytes eas ly.

On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.

Reverse the b ts n a byte w th 3 operat ons (64-b t mult ply and modulus d v s on):
unsigned char b; // reverse this (8-bit) byte

b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;

The mult ply operat on creates f ve separate cop es of the 8-b t byte pattern to fan-out nto a 64-b t value.
The AND operat on selects the b ts that are n the correct (reversed) pos t ons, relat ve to each 10-b t groups
of b ts. The mult ply and the AND operat ons copy the b ts from the or g nal byte so they each appear n only
one of the 10-b t sets. The reversed pos t ons of the b ts from the or g nal byte co nc de w th the r relat ve
pos t ons w th n any 10-b t set. The last step, wh ch nvolves modulus d v s on by 2^10 - 1, has the effect of
merg ng together each set of 10 b ts (from pos t ons 0-9, 10-19, 20-29, ...) n the 64-b t value. They do not
overlap, so the add t on steps underly ng the modulus d v s on behave l ke or operat ons.

Th s method was attr buted to R ch Schroeppel n the Programm ng Hacks sect on of Beeler, M., Gosper, R.
W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972.

Reverse the b ts n a byte w th 4 operat ons (64-b t mult ply, no d v s on):

unsigned char b; // reverse this byte

b = ((b * 0x80200802ULL) & 0x0884422110ULL) * 0x0101010101ULL >> 32;

The follow ng shows the flow of the b t values w th the boolean var ables a, b, c, d, e, f, g, and h,
wh ch compr se an 8-b t byte. Not ce how the f rst mult ply fans out the b t pattern to mult ple cop es, wh le
the last mult ply comb nes them n the f fth byte from the r ght.
abcd efgh (-> hgfe dcba)
* 1000 0000 0010 0000 0000 1000 0000 0010 (0x80200802)
-------------------------------------------------------------------------------------------------
0abc defg h00a bcde fgh0 0abc defg h00a bcde fgh0
& 0000 1000 1000 0100 0100 0010 0010 0001 0001 0000 (0x0884422110)
-------------------------------------------------------------------------------------------------
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
* 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 (0x0101010101)
-------------------------------------------------------------------------------------------------
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
https://graph cs.stanford.edu/~seander/b thacks.html 14/30
09.12.2017 B t Tw ddl ng Hacks
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
-------------------------------------------------------------------------------------------------
0000 d000 h000 dc00 hg00 dcb0 hgf0 dcba hgfe dcba hgfe 0cba 0gfe 00ba 00fe 000a 000e 0000
>> 32
-------------------------------------------------------------------------------------------------
0000 d000 h000 dc00 hg00 dcb0 hgf0 dcba hgfe dcba
& 1111 1111
-------------------------------------------------------------------------------------------------
hgfe dcba

Note that the last two steps can be comb ned on some processors because the reg sters can be accessed as
bytes; just mult ply so that a reg ster stores the upper 32 b ts of the result and the take the low byte. Thus, t
may take only 6 operat ons.

Dev sed by Sean Anderson, July 13, 2001.

Reverse the b ts n a byte w th 7 operat ons (no 64-b t):

b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;

Make sure you ass gn or cast the result to an uns gned char to remove garbage n the h gher b ts. Dev sed by
Sean Anderson, July 13, 2001. Typo spotted and correct on suppl ed by M ke Ke th, January 3, 2002.

Reverse an N-b t quant ty n parallel n 5 * lg(N) operat ons:

unsigned int v; // 32-bit word to reverse bit order

// swap odd and even bits

v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);
// swap consecutive pairs
v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);
// swap nibbles ...
v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);
// swap bytes
v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);
// swap 2-byte long pairs
v = ( v >> 16 ) | ( v << 16);

The follow ng var at on s also O(lg(N)), however t requ res more operat ons to reverse v. Its v rtue s n
tak ng less sl ghtly memory by comput ng the constants on the fly.
unsigned int s = sizeof(v) * CHAR_BIT; // bit size; must be power of 2
unsigned int mask = ~0;
while ((s >>= 1) > 0)
{
mask ^= (mask << s);
v = ((v >> s) & mask) | ((v << s) & ~mask);
}

These methods above are best su ted to s tuat ons where N s large. If you use the above w th 64-b t nts (or
larger), then you need to add more l nes (follow ng the pattern); otherw se only the lower 32 b ts w ll be
reversed and the result w ll be n the lower 32 b ts.

See Dr. Dobb's Journal 1983, Edw n Freed's art cle on B nary Mag c Numbers for more nformat on. The
second var at on was suggested by Ken Raeburn on September 13, 2005. Veldme jer ment oned that the f rst
vers on could do w thout ANDS n the last l ne on March 19, 2006.

Compute modulus d v s on by 1 << s w thout a d v s on operator

https://graph cs.stanford.edu/~seander/b thacks.html 15/30

09.12.2017 B t Tw ddl ng Hacks

const unsigned int n; // numerator

const unsigned int s;
const unsigned int d = 1U << s; // So d will be one of: 1, 2, 4, 8, 16, 32, ...
unsigned int m; // m will be n % d
m = n & (d - 1);

Most programmers learn th s tr ck early, but t was ncluded for the sake of completeness.

Compute modulus d v s on by (1 << s) - 1 w thout a d v s on operator

unsigned int n; // numerator
const unsigned int s; // s > 0
const unsigned int d = (1 << s) - 1; // so d is either 1, 3, 7, 15, 31, ...).
unsigned int m; // n % d goes here.

for (m = n; n > d; n = m)
{
for (m = 0; n; n >>= s)
{
m += n & d;
}
}
// Now m is a value from 0 to d, but since with modulus division
// we want m to be 0 when it is d.
m = m == d ? 0 : m;

Th s method of modulus d v s on by an nteger that s one less than a power of 2 takes at most 5 + (4 + 5 *
ce l(N / s)) * ce l(lg(N / s)) operat ons, where N s the number of b ts n the numerator. In other words, t
takes at most O(N * lg(N)) t me.

Dev sed by Sean Anderson, August 15, 2001. Before Sean A. Irv ne corrected me on June 17, 2004, I
m stakenly commented that we could alternat vely ass gn m = ((m + 1) & d) - 1; at the end. M chael M ller
spotted a typo n the code Apr l 25, 2005.

Compute modulus d v s on by (1 << s) - 1 n parallel w thout a d v s on operator

// The following is for a word size of 32 bits!

static const unsigned int M[] =

{
0x00000000, 0x55555555, 0x33333333, 0xc71c71c7,
0x0f0f0f0f, 0xc1f07c1f, 0x3f03f03f, 0xf01fc07f,
0x00ff00ff, 0x07fc01ff, 0x3ff003ff, 0xffc007ff,
0xff000fff, 0xfc001fff, 0xf0003fff, 0xc0007fff,
0x0000ffff, 0x0001ffff, 0x0003ffff, 0x0007ffff,
0x000fffff, 0x001fffff, 0x003fffff, 0x007fffff,
0x00ffffff, 0x01ffffff, 0x03ffffff, 0x07ffffff,
0x0fffffff, 0x1fffffff, 0x3fffffff, 0x7fffffff
};

static const unsigned int Q[][6] =

{
{ 0, 0, 0, 0, 0, 0}, {16, 8, 4, 2, 1, 1}, {16, 8, 4, 2, 2, 2},
{15, 6, 3, 3, 3, 3}, {16, 8, 4, 4, 4, 4}, {15, 5, 5, 5, 5, 5},
{12, 6, 6, 6 , 6, 6}, {14, 7, 7, 7, 7, 7}, {16, 8, 8, 8, 8, 8},
{ 9, 9, 9, 9, 9, 9}, {10, 10, 10, 10, 10, 10}, {11, 11, 11, 11, 11, 11},
{12, 12, 12, 12, 12, 12}, {13, 13, 13, 13, 13, 13}, {14, 14, 14, 14, 14, 14},
{15, 15, 15, 15, 15, 15}, {16, 16, 16, 16, 16, 16}, {17, 17, 17, 17, 17, 17},
{18, 18, 18, 18, 18, 18}, {19, 19, 19, 19, 19, 19}, {20, 20, 20, 20, 20, 20},
{21, 21, 21, 21, 21, 21}, {22, 22, 22, 22, 22, 22}, {23, 23, 23, 23, 23, 23},
{24, 24, 24, 24, 24, 24}, {25, 25, 25, 25, 25, 25}, {26, 26, 26, 26, 26, 26},
{27, 27, 27, 27, 27, 27}, {28, 28, 28, 28, 28, 28}, {29, 29, 29, 29, 29, 29},
{30, 30, 30, 30, 30, 30}, {31, 31, 31, 31, 31, 31}
https://graph cs.stanford.edu/~seander/b thacks.html 16/30
09.12.2017 B t Tw ddl ng Hacks

};

static const unsigned int R[][6] =

{
{0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000},
{0x0000ffff, 0x000000ff, 0x0000000f, 0x00000003, 0x00000001, 0x00000001},
{0x0000ffff, 0x000000ff, 0x0000000f, 0x00000003, 0x00000003, 0x00000003},
{0x00007fff, 0x0000003f, 0x00000007, 0x00000007, 0x00000007, 0x00000007},
{0x0000ffff, 0x000000ff, 0x0000000f, 0x0000000f, 0x0000000f, 0x0000000f},
{0x00007fff, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f},
{0x00000fff, 0x0000003f, 0x0000003f, 0x0000003f, 0x0000003f, 0x0000003f},
{0x00003fff, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f},
{0x0000ffff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff},
{0x000001ff, 0x000001ff, 0x000001ff, 0x000001ff, 0x000001ff, 0x000001ff},
{0x000003ff, 0x000003ff, 0x000003ff, 0x000003ff, 0x000003ff, 0x000003ff},
{0x000007ff, 0x000007ff, 0x000007ff, 0x000007ff, 0x000007ff, 0x000007ff},
{0x00000fff, 0x00000fff, 0x00000fff, 0x00000fff, 0x00000fff, 0x00000fff},
{0x00001fff, 0x00001fff, 0x00001fff, 0x00001fff, 0x00001fff, 0x00001fff},
{0x00003fff, 0x00003fff, 0x00003fff, 0x00003fff, 0x00003fff, 0x00003fff},
{0x00007fff, 0x00007fff, 0x00007fff, 0x00007fff, 0x00007fff, 0x00007fff},
{0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff},
{0x0001ffff, 0x0001ffff, 0x0001ffff, 0x0001ffff, 0x0001ffff, 0x0001ffff},
{0x0003ffff, 0x0003ffff, 0x0003ffff, 0x0003ffff, 0x0003ffff, 0x0003ffff},
{0x0007ffff, 0x0007ffff, 0x0007ffff, 0x0007ffff, 0x0007ffff, 0x0007ffff},
{0x000fffff, 0x000fffff, 0x000fffff, 0x000fffff, 0x000fffff, 0x000fffff},
{0x001fffff, 0x001fffff, 0x001fffff, 0x001fffff, 0x001fffff, 0x001fffff},
{0x003fffff, 0x003fffff, 0x003fffff, 0x003fffff, 0x003fffff, 0x003fffff},
{0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff},
{0x00ffffff, 0x00ffffff, 0x00ffffff, 0x00ffffff, 0x00ffffff, 0x00ffffff},
{0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff},
{0x03ffffff, 0x03ffffff, 0x03ffffff, 0x03ffffff, 0x03ffffff, 0x03ffffff},
{0x07ffffff, 0x07ffffff, 0x07ffffff, 0x07ffffff, 0x07ffffff, 0x07ffffff},
{0x0fffffff, 0x0fffffff, 0x0fffffff, 0x0fffffff, 0x0fffffff, 0x0fffffff},
{0x1fffffff, 0x1fffffff, 0x1fffffff, 0x1fffffff, 0x1fffffff, 0x1fffffff},
{0x3fffffff, 0x3fffffff, 0x3fffffff, 0x3fffffff, 0x3fffffff, 0x3fffffff},
{0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff}
};

unsigned int n; // numerator

const unsigned int s; // s > 0
const unsigned int d = (1 << s) - 1; // so d is either 1, 3, 7, 15, 31, ...).
unsigned int m; // n % d goes here.

m = (n & M[s]) + ((n >> s) & M[s]);

for (const unsigned int * q = &Q[s][0], * r = &R[s][0]; m > d; q++, r++)

{
m = (m >> *q) + (m & *r);
}
m = m == d ? 0 : m; // OR, less portably: m = m & -((signed)(m - d) >> s);

Th s method of f nd ng modulus d v s on by an nteger that s one less than a power of 2 takes at most
O(lg(N)) t me, where N s the number of b ts n the numerator (32 b ts, for the code above). The number of
operat ons s at most 12 + 9 * ce l(lg(N)). The tables may be removed f you know the denom nator at
comp le t me; just extract the few relevent entr es and unroll the loop. It may be eas ly extended to more b ts.

It f nds the result by summ ng the values n base (1 << s) n parallel. F rst every other base (1 << s) value s
added to the prev ous one. Imag ne that the result s wr tten on a p ece of paper. Cut the paper n half, so that
half the values are on each cut p ece. Al gn the values and sum them onto a new p ece of paper. Repeat by
cutt ng th s paper n half (wh ch w ll be a quarter of the s ze of the prev ous one) and summ ng, unt l you
cannot cut further. After perform ng lg(N/s/2) cuts, we cut no more; just cont nue to add the values and put
the result onto a new p ece of paper as before, wh le there are at least two s-b t values.

Dev sed by Sean Anderson, August 20, 2001. A typo was spotted by Randy E. Bryant on May 3, 2005 (after
past ng the code, I had later added "uns nged" to a var able declarat on). As n the prev ous hack, I
m stakenly commented that we could alternat vely ass gn m = ((m + 1) & d) - 1; at the end, and Don

https://graph cs.stanford.edu/~seander/b thacks.html 17/30

09.12.2017 B t Tw ddl ng Hacks

Knuth corrected me on Apr l 19, 2006 and suggested m = m & -((signed)(m - d) >> s). On June 18, 2009
Sean Irv ne proposed a change that used ((n >> s) & M[s]) nstead of ((n & ~M[s]) >> s), wh ch typ cally
requ res fewer operat ons because the M[s] constant s already loaded.

F nd the log base 2 of an nteger w th the MSB N set n O(N) operat ons (the obv ous
way)
unsigned int v; // 32-bit word to find the log base 2 of
unsigned int r = 0; // r will be lg(v)

while (v >>= 1) // unroll for more speed...

{
r++;
}

The log base 2 of an nteger s the same as the pos t on of the h ghest b t set (or most s gn f cant b t set,
MSB). The follow ng log base 2 methods are faster than th s one.

F nd the nteger log base 2 of an nteger w th an 64-b t IEEE float

int v; // 32-bit integer to find the log base 2 of
int r; // result of log_2(v) goes here
union { unsigned int u[2]; double d; } t; // temp

t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] = 0x43300000;
t.u[__FLOAT_WORD_ORDER!=LITTLE_ENDIAN] = v;
t.d -= 4503599627370496.0;
r = (t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] >> 20) - 0x3FF;

The code above loads a 64-b t (IEEE-754 float ng-po nt) double w th a 32-b t nteger (w th no paddd ng b ts)
by stor ng the nteger n the mant ssa wh le the exponent s set to 252. From th s newly m nted double, 252
(expressed as a double) s subtracted, wh ch sets the result ng exponent to the log base 2 of the nput value, v.
All that s left s sh ft ng the exponent b ts nto pos t on (20 b ts r ght) and subtract ng the b as, 0x3FF (wh ch
s 1023 dec mal). Th s techn que only takes 5 operat ons, but many CPUs are slow at man pulat ng doubles,
and the end aness of the arch tecture must be accommodated.

Er c Cole sent me th s on January 15, 2006. Evan Fel x po nted out a typo on Apr l 4, 2006. V ncent Lefèvre
told me on July 9, 2008 to change the end an check to use the float's end an, wh ch could d ffer from the
nteger's end an.

F nd the log base 2 of an nteger w th a lookup table

static const char LogTable256[256] =
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};

unsigned int v; // 32-bit word to find the log of

unsigned r; // r will be lg(v)
register unsigned int t, tt; // temporaries

if (tt = v >> 16)

{
r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else

https://graph cs.stanford.edu/~seander/b thacks.html 18/30

09.12.2017 B t Tw ddl ng Hacks

{
r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}

The lookup table method takes only about 7 operat ons to f nd the log of a 32-b t value. If extended for 64-b t
quant t es, t would take roughly 9 operat ons. Another operat on can be tr mmed off by us ng four tables,
w th the poss ble add t ons ncorporated nto each. Us ng nt table elements may be faster, depend ng on your
arch tecture.

The code above s tuned to un formly d str buted output values. If your nputs are evenly d str buted across
all 32-b t values, then cons der us ng the follow ng:
if (tt = v >> 24)
{
r = 24 + LogTable256[tt];
}
else if (tt = v >> 16)
{
r = 16 + LogTable256[tt];
}
else if (tt = v >> 8)
{
r = 8 + LogTable256[tt];
}
else
{
r = LogTable256[v];
}

To n t ally generate the log table algor thm cally:

LogTable256[0] = LogTable256[1] = 0;
for (int i = 2; i < 256; i++)
{
LogTable256[i] = 1 + LogTable256[i / 2];
}
LogTable256[0] = -1; // if you want log(0) to return -1

Behdad Esfahbod and I shaved off a fract on of an operat on (on average) on May 18, 2005. Yet another
fract on of an operat on was removed on November 14, 2006 by Emanuel Hoogeveen. The var at on that s
tuned to evenly d str buted nput values was suggested by Dav d A. Butterf eld on September 19, 2008.
Venkat Reddy told me on January 5, 2009 that log(0) should return -1 to nd cate an error, so I changed the
f rst entry n the table to that.

F nd the log base 2 of an N-b t nteger n O(lg(N)) operat ons

unsigned int v; // 32-bit value to find the log2 of
const unsigned int b[] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000};
const unsigned int S[] = {1, 2, 4, 8, 16};
int i;

register unsigned int r = 0; // result of log2(v) will go here

for (i = 4; i >= 0; i--) // unroll for speed...
{
if (v & b[i])
{
v >>= S[i];
r |= S[i];
}
}

// OR (IF YOUR CPU BRANCHES SLOWLY):

unsigned int v; // 32-bit value to find the log2 of

register unsigned int r; // result of log2(v) will go here
https://graph cs.stanford.edu/~seander/b thacks.html 19/30
09.12.2017 B t Tw ddl ng Hacks

register unsigned int shift;

r = (v > 0xFFFF) << 4; v >>= r;

shift = (v > 0xFF ) << 3; v >>= shift; r |= shift;
shift = (v > 0xF ) << 2; v >>= shift; r |= shift;
shift = (v > 0x3 ) << 1; v >>= shift; r |= shift;
r |= (v >> 1);

// OR (IF YOU KNOW v IS A POWER OF 2):

unsigned int v; // 32-bit value to find the log2 of

static const unsigned int b[] = {0xAAAAAAAA, 0xCCCCCCCC, 0xF0F0F0F0,
0xFF00FF00, 0xFFFF0000};
register unsigned int r = (v & b[0]) != 0;
for (i = 4; i > 0; i--) // unroll for speed...
{
r |= ((v & b[i]) != 0) << i;
}

Of course, to extend the code to f nd the log of a 33- to 64-b t number, we would append another element,
0xFFFFFFFF00000000, to b, append 32 to S, and loop from 5 to 0. Th s method s much slower than the
earl er table-lookup vers on, but f you don't want b g table or your arch tecture s slow to access memory, t's
a good cho ce. The second var at on nvolves sl ghtly more operat ons, but t may be faster on mach nes w th
h gh branch costs (e.g. PowerPC).

The second vers on was sent to me by Er c Cole on January 7, 2006. Andrew Shap ra subsequently tr mmed
a few operat ons off of t and sent me h s var at on (above) on Sept. 1, 2007. The th rd var at on was
suggested to me by John Owens on Apr l 24, 2002; t's faster, but t s only su table when the nput s known
to be a power of 2. On May 25, 2003, Ken Raeburn suggested mprov ng the general case by us ng smaller
numbers for b[], wh ch load faster on some arch tectures (for nstance f the word s ze s 16 b ts, then only
one load nstruct on may be needed). These values work for the general vers on, but not for the spec al-case
vers on below t, where v s a power of 2; Glenn Slayden brought th s overs ght to my attent on on December
12, 2003.

F nd the log base 2 of an N-b t nteger n O(lg(N)) operat ons w th mult ply and lookup
uint32_t v; // find the log base 2 of 32-bit v
int r; // result goes here

static const int MultiplyDeBruijnBitPosition[32] =

{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};

v |= v >> 1; // first round down to one less than a power of 2

v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;

r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];

The code above computes the log base 2 of a 32-b t nteger w th a small table lookup and mult ply. It
requ res only 13 operat ons, compared to (up to) 20 for the prev ous method. The purely table-based method
requ res the fewest operat ons, but th s offers a reasonable comprom se between table s ze and speed.

If you know that v s a power of 2, then you only need the follow ng:

static const int MultiplyDeBruijnBitPosition2[32] =

{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
https://graph cs.stanford.edu/~seander/b thacks.html 20/30
09.12.2017 B t Tw ddl ng Hacks

};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];

Er c Cole dev sed th s January 8, 2006 after read ng about the entry below to round up to a power of 2 and
the method below for comput ng the number of tra l ng b ts w th a mult ply and lookup us ng a DeBru jn
sequence. On December 10, 2009, Mark D ck nson shaved off a couple operat ons by requ r ng v be rounded
up to one less than the next power of 2 rather than the power of 2.

F nd nteger log base 10 of an nteger

unsigned int v; // non-zero 32-bit integer value to compute the log base 10 of
int r; // result goes here
int t; // temporary

static unsigned int const PowersOf10[] =

{1, 10, 100, 1000, 10000, 100000,
1000000, 10000000, 100000000, 1000000000};

t = (IntegerLogBase2(v) + 1) * 1233 >> 12; // (use a lg2 method from above)

r = t - (v < PowersOf10[t]);

The nteger log base 10 s computed by f rst us ng one of the techn ques above for f nd ng the log base 2. By
the relat onsh p log10(v) = log2(v) / log2(10), we need to mult ply t by 1/log2(10), wh ch s approx mately
1233/4096, or 1233 followed by a r ght sh ft of 12. Add ng one s needed because the IntegerLogBase2
rounds down. F nally, s nce the value t s only an approx mat on that may be off by one, the exact value s
found by subtract ng the result of v < PowersOf10[t].

Th s method takes 6 more operat ons than IntegerLogBase2. It may be sped up (on mach nes w th fast
memory access) by mod fy ng the log base 2 table-lookup method above so that the entr es hold what s
computed for t (that s, pre-add, -mul tply, and -sh ft). Do ng so would requ re a total of only 9 operat ons to
f nd the log base 10, assum ng 4 tables were used (one for each byte of v).

Er c Cole suggested I add a vers on of th s on January 7, 2006.

F nd nteger log base 10 of an nteger the obv ous way

unsigned int v; // non-zero 32-bit integer value to compute the log base 10 of
int r; // result goes here

r = (v >= 1000000000) ? 9 : (v >= 100000000) ? 8 : (v >= 10000000) ? 7 :

(v >= 1000000) ? 6 : (v >= 100000) ? 5 : (v >= 10000) ? 4 :
(v >= 1000) ? 3 : (v >= 100) ? 2 : (v >= 10) ? 1 : 0;

Th s method works well when the nput s un formly d str buted over 32-b t values because 76% of the nputs
are caught by the f rst compare, 21% are caught by the second compare, 2% are caught by the th rd, and so
on (chopp ng the rema n ng down by 90% w th each compar s on). As a result, less than 2.6 operat ons are
needed on average.

On Apr l 18, 2007, Emanuel Hoogeveen suggested a var at on on th s where the cond t ons used d v s ons,
wh ch were not as fast as s mple compar sons.

F nd nteger log base 2 of a 32-b t IEEE float

const float v; // find int(log2(v)), where v > 0.0 && finite(v) && isnormal(v)
int c; // 32-bit int c gets the result;

c = *(const int *) &v; // OR, for portability: memcpy(&c, &v, sizeof c);
c = (c >> 23) - 127;

https://graph cs.stanford.edu/~seander/b thacks.html 21/30

09.12.2017 B t Tw ddl ng Hacks

The above s fast, but IEEE 754-compl ant arch tectures ut l ze subnormal (also called denormal) float ng
po nt numbers. These have the exponent b ts set to zero (s gn fy ng pow(2,-127)), and the mant ssa s not
normal zed, so t conta ns lead ng zeros and thus the log2 must be computed from the mant ssa. To
accomodate for subnormal numbers, use the follow ng:
const float v; // find int(log2(v)), where v > 0.0 && finite(v)
int c; // 32-bit int c gets the result;
int x = *(const int *) &v; // OR, for portability: memcpy(&x, &v, sizeof x);

c = x >> 23;

if (c)
{
c -= 127;
}
else
{ // subnormal, so recompute using mantissa: c = intlog2(x) - 149;
register unsigned int t; // temporary
// Note that LogTable256 was defined earlier
if (t = x >> 16)
{
c = LogTable256[t] - 133;
}
else
{
c = (t = x >> 8) ? LogTable256[t] - 141 : LogTable256[x] - 149;
}
}

On June 20, 2004, Sean A. Irv ne suggested that I nclude code to handle subnormal numbers. On June 11,
2005, Falk Hüffner po nted out that ISO C99 6.5/7 spec f ed undef ned behav or for the common type
punn ng d om *( nt *)&, though t has worked on 99.9% of C comp lers. He proposed us ng memcpy for
max mum portab l ty or a un on w th a float and an nt for better code generat on than memcpy on some
comp lers.

F nd nteger log base 2 of the pow(2, r)-root of a 32-b t IEEE float (for uns gned nteger
r)
const int r;
const float v; // find int(log2(pow((double) v, 1. / pow(2, r)))),
// where isnormal(v) and v > 0
int c; // 32-bit int c gets the result;

c = *(const int *) &v; // OR, for portability: memcpy(&c, &v, sizeof c);
c = ((((c - 0x3f800000) >> r) + 0x3f800000) >> 23) - 127;

So, f r s 0, for example, we have c = nt(log2((double) v)). If r s 1, then we have c = nt(log2(sqrt((double)

v))). If r s 2, then we have c = nt(log2(pow((double) v, 1./4))).

On June 11, 2005, Falk Hüffner po nted out that ISO C99 6.5/7 left the type punn ng d om *( nt *)&
undef ned, and he suggested us ng memcpy.

Count the consecut ve zero b ts (tra l ng) on the r ght l nearly

unsigned int v; // input to count trailing zero bits
int c; // output: c will count v's trailing zero bits,
// so if v is 1101000 (base 2), then c will be 3
if (v)
{
v = (v ^ (v - 1)) >> 1; // Set v's trailing 0s to 1s and zero rest
for (c = 0; v; c++)
{

https://graph cs.stanford.edu/~seander/b thacks.html 22/30

09.12.2017 B t Tw ddl ng Hacks

v >>= 1;
}
}
else
{
c = CHAR_BIT * sizeof(v);
}

The average number of tra l ng zero b ts n a (un formly d str buted) random b nary number s one, so th s
O(tra l ng zeros) solut on sn't that bad compared to the faster methods below.

J m Cole suggested I add a l near-t me method for count ng the tra l ng zeros on August 15, 2007. On
October 22, 2007, Jason Cunn ngham po nted out that I had neglected to paste the uns gned mod f er for v.

Count the consecut ve zero b ts (tra l ng) on the r ght n parallel

unsigned int v; // 32-bit word input to count zero bits on right
unsigned int c = 32; // c will be the number of zero bits on the right
v &= -signed(v);
if (v) c--;
if (v & 0x0000FFFF) c -= 16;
if (v & 0x00FF00FF) c -= 8;
if (v & 0x0F0F0F0F) c -= 4;
if (v & 0x33333333) c -= 2;
if (v & 0x55555555) c -= 1;

Here, we are bas cally do ng the same operat ons as f nd ng the log base 2 n parallel, but we f rst solate the
lowest 1 b t, and then proceed w th c start ng at the max mum and decreas ng. The number of operat ons s at
most 3 * lg(N) + 4, roughly, for N b t words.

B ll Burd ck suggested an opt m zat on, reduc ng the t me from 4 * lg(N) on February 4, 2011.

Count the consecut ve zero b ts (tra l ng) on the r ght by b nary search
unsigned int v; // 32-bit word input to count zero bits on right
unsigned int c; // c will be the number of zero bits on the right,
// so if v is 1101000 (base 2), then c will be 3
// NOTE: if 0 == v, then c = 31.
if (v & 0x1)
{
// special case for odd v (assumed to happen half of the time)
c = 0;
}
else
{
c = 1;
if ((v & 0xffff) == 0)
{
v >>= 16;
c += 16;
}
if ((v & 0xff) == 0)
{
v >>= 8;
c += 8;
}
if ((v & 0xf) == 0)
{
v >>= 4;
c += 4;
}
if ((v & 0x3) == 0)
{

https://graph cs.stanford.edu/~seander/b thacks.html 23/30

09.12.2017 B t Tw ddl ng Hacks

v >>= 2;
c += 2;
}
c -= v & 0x1;
}

The code above s s m lar to the prev ous method, but t computes the number of tra l ng zeros by
accumulat ng c n a manner ak n to b nary search. In the f rst step, t checks f the bottom 16 b ts of v are
zeros, and f so, sh fts v r ght 16 b ts and adds 16 to c, wh ch reduces the number of b ts n v to cons der by
half. Each of the subsequent cond t onal steps l kew se halves the number of b ts unt l there s only 1. Th s
method s faster than the last one (by about 33%) because the bod es of the f statements are executed less
often.

Matt Wh tlock suggested th s on January 25, 2006. Andrew Shap ra shaved a couple operat ons off on Sept.
5, 2007 (by sett ng c=1 and uncond t onally subtract ng at the end).

Count the consecut ve zero b ts (tra l ng) on the r ght by cast ng to a float
unsigned int v; // find the number of trailing zeros in v
int r; // the result goes here
float f = (float)(v & -v); // cast the least significant bit in v to a float
r = (*(uint32_t *)&f >> 23) - 0x7f;

Although th s only takes about 6 operat ons, the t me to convert an nteger to a float can be h gh on some
mach nes. The exponent of the 32-b t IEEE float ng po nt representat on s sh fted down, and the b as s
subtracted to g ve the pos t on of the least s gn f cant 1 b t set n v. If v s zero, then the result s -127.

Count the consecut ve zero b ts (tra l ng) on the r ght w th modulus d v s on and lookup
unsigned int v; // find the number of trailing zeros in v
int r; // put the result in r
static const int Mod37BitPosition[] = // map a bit value mod 37 to its position
{
32, 0, 1, 26, 2, 23, 27, 0, 3, 16, 24, 30, 28, 11, 0, 13, 4,
7, 17, 0, 25, 22, 31, 15, 29, 10, 12, 6, 0, 21, 14, 9, 5,
20, 8, 19, 18
};
r = Mod37BitPosition[(-v & v) % 37];

The code above f nds the number of zeros that are tra l ng on the r ght, so b nary 0100 would produce 2. It
makes use of the fact that the f rst 32 b t pos t on values are relat vely pr me w th 37, so perform ng a
modulus d v s on w th 37 g ves a un que number from 0 to 36 for each. These numbers may then be mapped
to the number of zeros us ng a small lookup table. It uses only 4 operat ons, however ndex ng nto a table
and perform ng modulus d v s on may make t unsu table for some s tuat ons. I came up w th th s
ndependently and then searched for a subsequence of the table values, and found t was nvented earl er by
Re ser, accord ng to Hacker's Del ght.

Count the consecut ve zero b ts (tra l ng) on the r ght w th mult ply and lookup
unsigned int v; // find the number of trailing zeros in 32-bit v
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];

https://graph cs.stanford.edu/~seander/b thacks.html 24/30

09.12.2017 B t Tw ddl ng Hacks

Convert ng b t vectors to nd ces of set b ts s an example use for th s. It requ res one more operat on than the
earl er one nvolv ng modulus d v s on, but the mult ply may be faster. The express on (v & -v) extracts the
least s gn f cant 1 b t from v. The constant 0x077CB531UL s a de Bru jn sequence, wh ch produces a un que
pattern of b ts nto the h gh 5 b ts for each poss ble b t pos t on that t s mult pl ed aga nst. When there are
no b ts set, t returns 0. More nformat on can be found by read ng the paper Us ng de Bru jn Sequences to
Index 1 n a Computer Word by Charles E. Le serson, Harald Prokof, and Ke th H. Randall.

On October 8, 2005 Andrew Shap ra suggested I add th s. Dust n Sp cuzza asked me on Apr l 14, 2009 to
cast the result of the mult ply to a 32-b t type so t would work when comp led w th 64-b t nts.

Round up to the next h ghest power of 2 by float cast ng

unsigned int const v; // Round this 32-bit value to the next highest power of 2
unsigned int r; // Put the result here. (So v=3 -> r=4; v=8 -> r=8)

if (v > 1)
{
float f = (float)v;
unsigned int const t = 1U << ((*(unsigned int *)&f >> 23) - 0x7f);
r = t << (t < v);
}
else
{
r = 1;
}

The code above uses 8 operat ons, but works on all v <= (1<<31).

Qu ck and d rty vers on, for doma n of 1 < v < (1<<25):

float f = (float)(v - 1);

r = 1U << ((*(unsigned int*)(&f) >> 23) - 126);

Although the qu ck and d rty vers on only uses around 6 operat ons, t s roughly three t mes slower than the
techn que below (wh ch nvolves 12 operat ons) when benchmarked on an Athlon™ XP 2100+ CPU. Some
CPUs w ll fare better w th t, though.

On September 27, 2005 And Sm thers suggested I nclude a techn que for cast ng to floats to f nd the lg of a
number for round ng up to a power of 2. S m lar to the qu ck and d rty vers on here, h s vers on worked w th
values less than (1<<25), due to mant ssa round ng, but t used one more operat on.

Round up to the next h ghest power of 2

unsigned int v; // compute the next highest power of 2 of 32-bit v

v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;

In 12 operat ons, th s code computes the next h ghest power of 2 for a 32-b t nteger. The result may be
expressed by the formula 1U << (lg(v - 1) + 1). Note that n the edge case where v s 0, t returns 0, wh ch
sn't a power of 2; you m ght append the express on v += (v == 0) to remedy th s f t matters. It would be
faster by 2 operat ons to use the formula and the log base 2 method that uses a lookup table, but n some
s tuat ons, lookup tables are not su table, so the above code may be best. (On a Athlon™ XP 2100+ I've
found the above sh ft-left and then OR code s as fast as us ng a s ngle BSR assembly language nstruct on,
wh ch scans n reverse to f nd the h ghest set b t.) It works by copy ng the h ghest set b t to all of the lower

https://graph cs.stanford.edu/~seander/b thacks.html 25/30

09.12.2017 B t Tw ddl ng Hacks

b ts, and then add ng one, wh ch results n carr es that set all of the lower b ts to 0 and one b t beyond the
h ghest set b t to 1. If the or g nal number was a power of 2, then the decrement w ll reduce t to one less, so
that we round up to the same or g nal value.

You m ght alternat vely compute the next h gher power of 2 n only 8 or 9 operat ons us ng a lookup table for
floor(lg(v)) and then evaluat ng 1<<(1+floor(lg(v))); Atul D vekar suggested I ment on th s on September 5,
2010.

Dev sed by Sean Anderson, Sepember 14, 2001. Pete Hart po nted me to a couple newsgroup posts by h m
and W ll am Lew s n February of 1997, where they arr ve at the same algor thm.

Interleave b ts the obv ous way

unsigned short x; // Interleave bits of x and y, so that all of the
unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z = 0; // z gets the resulting Morton Number.

for (int i = 0; i < sizeof(x) * CHAR_BIT; i++) // unroll for more speed...
{
z |= (x & 1U << i) << i | (y & 1U << i) << (i + 1);
}

Interleaved b ts (aka Morton numbers) are useful for l near z ng 2D nteger coord nates, so x and y are
comb ned nto a s ngle number that can be compared eas ly and has the property that a number s usually
close to another f the r x and y values are close.

Interleave b ts by table lookup

static const unsigned short MortonTable256[256] =
{
0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015,
0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055,
0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115,
0x0140, 0x0141, 0x0144, 0x0145, 0x0150, 0x0151, 0x0154, 0x0155,
0x0400, 0x0401, 0x0404, 0x0405, 0x0410, 0x0411, 0x0414, 0x0415,
0x0440, 0x0441, 0x0444, 0x0445, 0x0450, 0x0451, 0x0454, 0x0455,
0x0500, 0x0501, 0x0504, 0x0505, 0x0510, 0x0511, 0x0514, 0x0515,
0x0540, 0x0541, 0x0544, 0x0545, 0x0550, 0x0551, 0x0554, 0x0555,
0x1000, 0x1001, 0x1004, 0x1005, 0x1010, 0x1011, 0x1014, 0x1015,
0x1040, 0x1041, 0x1044, 0x1045, 0x1050, 0x1051, 0x1054, 0x1055,
0x1100, 0x1101, 0x1104, 0x1105, 0x1110, 0x1111, 0x1114, 0x1115,
0x1140, 0x1141, 0x1144, 0x1145, 0x1150, 0x1151, 0x1154, 0x1155,
0x1400, 0x1401, 0x1404, 0x1405, 0x1410, 0x1411, 0x1414, 0x1415,
0x1440, 0x1441, 0x1444, 0x1445, 0x1450, 0x1451, 0x1454, 0x1455,
0x1500, 0x1501, 0x1504, 0x1505, 0x1510, 0x1511, 0x1514, 0x1515,
0x1540, 0x1541, 0x1544, 0x1545, 0x1550, 0x1551, 0x1554, 0x1555,
0x4000, 0x4001, 0x4004, 0x4005, 0x4010, 0x4011, 0x4014, 0x4015,
0x4040, 0x4041, 0x4044, 0x4045, 0x4050, 0x4051, 0x4054, 0x4055,
0x4100, 0x4101, 0x4104, 0x4105, 0x4110, 0x4111, 0x4114, 0x4115,
0x4140, 0x4141, 0x4144, 0x4145, 0x4150, 0x4151, 0x4154, 0x4155,
0x4400, 0x4401, 0x4404, 0x4405, 0x4410, 0x4411, 0x4414, 0x4415,
0x4440, 0x4441, 0x4444, 0x4445, 0x4450, 0x4451, 0x4454, 0x4455,
0x4500, 0x4501, 0x4504, 0x4505, 0x4510, 0x4511, 0x4514, 0x4515,
0x4540, 0x4541, 0x4544, 0x4545, 0x4550, 0x4551, 0x4554, 0x4555,
0x5000, 0x5001, 0x5004, 0x5005, 0x5010, 0x5011, 0x5014, 0x5015,
0x5040, 0x5041, 0x5044, 0x5045, 0x5050, 0x5051, 0x5054, 0x5055,
0x5100, 0x5101, 0x5104, 0x5105, 0x5110, 0x5111, 0x5114, 0x5115,
0x5140, 0x5141, 0x5144, 0x5145, 0x5150, 0x5151, 0x5154, 0x5155,
0x5400, 0x5401, 0x5404, 0x5405, 0x5410, 0x5411, 0x5414, 0x5415,
0x5440, 0x5441, 0x5444, 0x5445, 0x5450, 0x5451, 0x5454, 0x5455,
0x5500, 0x5501, 0x5504, 0x5505, 0x5510, 0x5511, 0x5514, 0x5515,
0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555
};
https://graph cs.stanford.edu/~seander/b thacks.html 26/30
09.12.2017 B t Tw ddl ng Hacks

unsigned short x; // Interleave bits of x and y, so that all of the

unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.

z = MortonTable256[y >> 8] << 17 |

MortonTable256[x >> 8] << 16 |
MortonTable256[y & 0xFF] << 1 |
MortonTable256[x & 0xFF];

For more speed, use an add t onal table w th values that are MortonTable256 pre-sh fted one b t to the left.
Th s second table could then be used for the y lookups, thus reduc ng the operat ons by two, but almost
doubl ng the memory requ red. Extend ng th s same dea, four tables could be used, w th two of them pre-
sh fted by 16 to the left of the prev ous two, so that we would only need 11 operat ons total.

Interleave b ts w th 64-b t mult ply

In 11 operat ons, th s vers on nterleaves b ts of two bytes (rather than shorts, as n the other vers ons), but
many of the operat ons are 64-b t mult pl es so t sn't appropr ate for all mach nes. The nput parameters, x
and y, should be less than 256.
unsigned char x; // Interleave bits of (8-bit) x and y, so that all of the
unsigned char y; // bits of x are in the even positions and y in the odd;
unsigned short z; // z gets the resulting 16-bit Morton Number.

z = ((x * 0x0101010101010101ULL & 0x8040201008040201ULL) *

0x0102040810204081ULL >> 49) & 0x5555 |
((y * 0x0101010101010101ULL & 0x8040201008040201ULL) *
0x0102040810204081ULL >> 48) & 0xAAAA;

Holger Bettag was nsp red to suggest th s techn que on October 10, 2004 after read ng the mult ply-based
b t reversals here.

Interleave b ts by B nary Mag c Numbers

static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF};
static const unsigned int S[] = {1, 2, 4, 8};

unsigned int x; // Interleave lower 16 bits of x and y, so the bits of x

unsigned int y; // are in the even positions and bits from y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.
// x and y must initially be less than 65536.

x = (x | (x << S[3])) & B[3];

x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];

y = (y | (y << S[3])) & B[3];

y = (y | (y << S[2])) & B[2];
y = (y | (y << S[1])) & B[1];
y = (y | (y << S[0])) & B[0];

z = x | (y << 1);

Determ ne f a word has a zero byte

// Fewer operations:
unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);

https://graph cs.stanford.edu/~seander/b thacks.html 27/30

09.12.2017 B t Tw ddl ng Hacks

The code above may be useful when do ng a fast str ng copy n wh ch a word s cop ed at a t me; t uses 5
operat ons. On the other hand, test ng for a null byte n the obv ous ways (wh ch follow) have at least 7
operat ons (when counted n the most spar ng way), and at most 12.
// More operations:
bool hasNoZeroByte = ((v & 0xff) && (v & 0xff00) && (v & 0xff0000) && (v & 0xff000000))
// OR:
unsigned char * p = (unsigned char *) &v;
bool hasNoZeroByte = *p && *(p + 1) && *(p + 2) && *(p + 3);

The code at the beg nn ng of th s sect on (labeled "Fewer operat ons") works by f rst zero ng the h gh b ts of
the 4 bytes n the word. Subsequently, t adds a number that w ll result n an overflow to the h gh b t of a byte
f any of the low b ts were n t aly set. Next the h gh b ts of the or g nal word are ORed w th these values;
thus, the h gh b t of a byte s set ff any b t n the byte was set. F nally, we determ ne f any of these h gh b ts
are zero by OR ng w th ones everywhere except the h gh b ts and nvert ng the result. Extend ng to 64 b ts s
tr v al; s mply ncrease the constants to be 0x7F7F7F7F7F7F7F7F.

For an add t onal mprovement, a fast pretest that requ res only 4 operat ons may be performed to determ ne
f the word may have a zero byte. The test also returns true f the h gh byte s 0x80, so there are occas onal
false pos t ves, but the slower and more rel able vers on above may then be used on cand dates for an overall
ncrease n speed w th correct output.

bool hasZeroByte = ((v + 0x7efefeff) ^ ~v) & 0x81010100;

if (hasZeroByte) // or may just have 0x80 in the high byte
{
hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
}

There s yet a faster method — use hasless(v, 1), wh ch s def ned below; t works n 4 operat ons and
requ res no subsquent ver f cat on. It s mpl f es to

#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)

The subexpress on (v - 0x01010101UL), evaluates to a h gh b t set n any byte whenever the correspond ng
byte n v s zero or greater than 0x80. The sub-express on ~v & 0x80808080UL evaluates to h gh b ts set n
bytes where the byte of v doesn't have ts h gh b t set (so the byte was less than 0x80). F nally, by AND ng
these two sub-express ons the result s the h gh b ts set where the bytes n v were zero, s nce the h gh b ts set
due to a value greater than 0x80 n the f rst sub-express on are masked off by the second.

Paul Messmer suggested the fast pretest mprovement on October 2, 2004. Juha Järv later suggested
hasless(v, 1) on Apr l 6, 2005, wh ch he found on Paul Hs eh's Assembly Lab; prev ously t was wr tten n
a newsgroup post on Apr l 27, 1987 by Alan Mycroft.

Determ ne f a word has a byte equal to n

We may want to know f any byte n a word has a spec f c value. To do so, we can XOR the value to test w th
a word that has been f lled w th the byte values n wh ch we're nterested. Because XOR ng a value w th
tself results n a zero byte and nonzero otherw se, we can pass the result to haszero.
#define hasvalue(x,n) \
(haszero((x) ^ (~0UL/255 * (n))))

Stephen M Bennet suggested th s on December 13, 2009 after read ng the entry for haszero.

Determ ne f a word has a byte less than n

Test f a word x conta ns an uns gned byte w th value < n. Spec f cally for n=1, t can be used to f nd a 0-byte
by exam n ng one long at a t me, or any byte by XOR ng x w th a mask f rst. Uses 4 ar thmet c/log cal
operat ons when n s constant.
https://graph cs.stanford.edu/~seander/b thacks.html 28/30
09.12.2017 B t Tw ddl ng Hacks

Requ rements: x>=0; 0<=n<=128

#define hasless(x,n) (((x)-~0UL/255(n))&~(x)&~0UL/255128)

To count the number of bytes n x that are less than n n 7 operat ons, use
#define countless(x,n) \
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)

Juha Järv sent th s clever techn que to me on Apr l 6, 2005. The countless macro was added by Sean
Anderson on Apr l 10, 2005, nsp red by Juha's countmore, below.

Determ ne f a word has a byte greater than n

Test f a word x conta ns an uns gned byte w th value > n. Uses 3 ar thmet c/log cal operat ons when n s
constant.

Requ rements: x>=0; 0<=n<=127

#define hasmore(x,n) (((x)+~0UL/255(127-(n))|(x))&~0UL/255128)

To count the number of bytes n x that are more than n n 6 operat ons, use:
#define countmore(x,n) \
(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)

The macro hasmore was suggested by Juha Järv on Apr l 6, 2005, and he added countmore on Apr l 8, 2005.

Determ ne f a word has a byte between m and n

When m < n, th s techn que tests f a word x conta ns an uns gned byte value, such that m < value < n. It uses
7 ar thmet c/log cal operat ons when n and m are constant.

Note: Bytes that equal n can be reported by likelyhasbetween as false pos t ves, so th s should be checked by
character f a certa n result s needed.

Requ rements: x>=0; 0<=m<=127; 0<=n<=128

#define likelyhasbetween(x,m,n) \
((((x)-~0UL/255*(n))&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)

Th s techn que would be su table for a fast pretest. A var at on that takes one more operat on (8 total for
constant m and n) but prov des the exact answer s:
#define hasbetween(x,m,n) \
((~0UL/255*(127+(n))-((x)&~0UL/255*127)&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)

To count the number of bytes n x that are between m and n (exclus ve) n 10 operat ons, use:
#define countbetween(x,m,n) (hasbetween(x,m,n)/128%255)

Juha Järv suggested likelyhasbetween on Apr l 6, 2005. From there, Sean Anderson created hasbetween and
countbetween on Apr l 10, 2005.

Compute the lex cograph cally next b t permutat on

Suppose we have a pattern of N b ts set to 1 n an nteger and we want the next permutat on of N 1 b ts n a
lex cograph cal sense. For example, f N s 3 and the b t pattern s 00010011, the next patterns would be

https://graph cs.stanford.edu/~seander/b thacks.html 29/30

09.12.2017 B t Tw ddl ng Hacks

00010101, 00010110, 00011001,00011010, 00011100, 00100011, and so forth. The follow ng s a fast way to
compute the next permutat on.
unsigned int v; // current permutation of bits
unsigned int w; // next permutation of bits

unsigned int t = v | (v - 1); // t gets v's least significant 0 bits set to 1

// Next set to 1 the most significant bit to change,
// set to 0 the least significant ones, and add the necessary 1 bits.
w = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));

The __bu lt n_ctz(v) GNU C comp ler ntr ns c for x86 CPUs returns the number of tra l ng zeros. If you are
us ng M crosoft comp lers for x86, the ntr ns c s _B tScanForward. These both em t a bsf nstruct on, but
equ valents may be ava lable for other arch tectures. If not, then cons der us ng one of the methods for
count ng the consecut ve zero b ts ment oned earl er.

Here s another vers on that tends to be slower because of ts d v s on operator, but t does not requ re
count ng the tra l ng zeros.
unsigned int t = (v | (v - 1)) + 1;
w = t | ((((t & -t) / (v & -v)) >> 1) - 1);

Thanks to Dar o Sne derman s of Argent na, who prov ded th s on November 28, 2009.

A Beloruss an translat on (prov ded by Webhost ngrat ng) s ava lable.

https://graph cs.stanford.edu/~seander/b thacks.html 30/30

Bits and Bytes PDF
No ratings yet
Bits and Bytes PDF
76 pages
Speech Emotion Recognition
No ratings yet
Speech Emotion Recognition
55 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
30 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
32 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
32 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
30 pages
Algoritmi Pentru Operatile Pe Biti
No ratings yet
Algoritmi Pentru Operatile Pe Biti
41 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
28 pages
Bit Twiddling Hacks
No ratings yet
Bit Twiddling Hacks
29 pages
Bits, Bytes, and Integers: Computer Architecture and Organization
No ratings yet
Bits, Bytes, and Integers: Computer Architecture and Organization
87 pages
Bit Twiddling Hacks: by Sean Eron Anderson Seander@cs - Stanford.edu
No ratings yet
Bit Twiddling Hacks: by Sean Eron Anderson Seander@cs - Stanford.edu
34 pages
Bits and Bytes PDF
No ratings yet
Bits and Bytes PDF
76 pages
Cool Bit Manipulation Trickshacks
No ratings yet
Cool Bit Manipulation Trickshacks
6 pages
Computer Organization & Assembly Language: CS/COE0447
No ratings yet
Computer Organization & Assembly Language: CS/COE0447
30 pages
Datatypes
No ratings yet
Datatypes
30 pages
Chapter10 Arithmetic
No ratings yet
Chapter10 Arithmetic
68 pages
02-03-Bits-Ints (32423332)
No ratings yet
02-03-Bits-Ints (32423332)
86 pages
Bitwise Complement: The Bitwise Complement Operator, The Tilde,, Flips Every Bit
No ratings yet
Bitwise Complement: The Bitwise Complement Operator, The Tilde,, Flips Every Bit
30 pages
Bit Maipulation Algorithms
No ratings yet
Bit Maipulation Algorithms
53 pages
bitwise
No ratings yet
bitwise
11 pages
Lecture 2 Part 1
No ratings yet
Lecture 2 Part 1
42 pages
Chapter 2
No ratings yet
Chapter 2
36 pages
Bit Manipulation
No ratings yet
Bit Manipulation
5 pages
Bits and Bytes
No ratings yet
Bits and Bytes
29 pages
AIT_CS_03
No ratings yet
AIT_CS_03
26 pages
Stanford lecture
No ratings yet
Stanford lecture
92 pages
Bit Man Up Lation
No ratings yet
Bit Man Up Lation
14 pages
Number Systems and Number Representation: Princeton University
No ratings yet
Number Systems and Number Representation: Princeton University
51 pages
Chapter 2
No ratings yet
Chapter 2
38 pages
Bits Bytes and Integers: CSCI 232: Computer Organization Week 2
No ratings yet
Bits Bytes and Integers: CSCI 232: Computer Organization Week 2
52 pages
MIPS Architecture - BITS Pilani
No ratings yet
MIPS Architecture - BITS Pilani
58 pages
PPT#04
No ratings yet
PPT#04
43 pages
Bits, Bytes, and Integers: 15-213: Introduction To Computer Systems 2 and 3 Lectures, Sep. 3 and Sep. 8, 2015
No ratings yet
Bits, Bytes, and Integers: 15-213: Introduction To Computer Systems 2 and 3 Lectures, Sep. 3 and Sep. 8, 2015
87 pages
The Design of C: A Rational Reconstruction: Goals of This Lecture
No ratings yet
The Design of C: A Rational Reconstruction: Goals of This Lecture
18 pages
2.3_int_arthematic
No ratings yet
2.3_int_arthematic
37 pages
Bitwise Operations
No ratings yet
Bitwise Operations
18 pages
CA Lecture 8
No ratings yet
CA Lecture 8
49 pages
14.1 Bit Manipulation I - Apni Kaksha
No ratings yet
14.1 Bit Manipulation I - Apni Kaksha
5 pages
Lecture 03 Data Representation Integers
No ratings yet
Lecture 03 Data Representation Integers
20 pages
CPSC 161: Prof. L.N. Bhuyan .HTML
No ratings yet
CPSC 161: Prof. L.N. Bhuyan .HTML
28 pages
Explore Problems Mock Contest Discuss: A Summary: How To Use Bit Manipulation To Solve Problems Easily and Efficiently
No ratings yet
Explore Problems Mock Contest Discuss: A Summary: How To Use Bit Manipulation To Solve Problems Easily and Efficiently
18 pages
02 - Data Representation 2
No ratings yet
02 - Data Representation 2
48 pages
02 03 Bits Ints
No ratings yet
02 03 Bits Ints
87 pages
Bit Manipulation
No ratings yet
Bit Manipulation
10 pages
Algorithms and Data Structures
No ratings yet
Algorithms and Data Structures
80 pages
Bit Wise
100% (1)
Bit Wise
24 pages
CH02-Data-I(3) (2)
No ratings yet
CH02-Data-I(3) (2)
36 pages
Bitwise Operations
No ratings yet
Bitwise Operations
12 pages
02 DataRepresentation
No ratings yet
02 DataRepresentation
29 pages
Lec 06
No ratings yet
Lec 06
15 pages
Lesson 1 Num
No ratings yet
Lesson 1 Num
7 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
33 pages
【不周山之读厚 Csapp】i Data Lab (-V-) 小土刀
No ratings yet
【不周山之读厚 Csapp】i Data Lab (-V-) 小土刀
136 pages
IT3030E-CA-Chap4-Arithmetics
No ratings yet
IT3030E-CA-Chap4-Arithmetics
67 pages
Binary Numbers and Arithmetic
No ratings yet
Binary Numbers and Arithmetic
50 pages
Bits, Data Types, And Operation
No ratings yet
Bits, Data Types, And Operation
39 pages
Operator:: Table 5-5. Arithmetic Operators
No ratings yet
Operator:: Table 5-5. Arithmetic Operators
39 pages
CS429, Fall 2017 Data Lab: Manipulating Bits Assigned: Monday, September 11 2017 Due: Tuesday, September 26 2017, 11:59PM
No ratings yet
CS429, Fall 2017 Data Lab: Manipulating Bits Assigned: Monday, September 11 2017 Due: Tuesday, September 26 2017, 11:59PM
6 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Lecture 22
No ratings yet
Lecture 22
11 pages
5 Stacks
No ratings yet
5 Stacks
9 pages
C# Basics Cheat Sheet (1 of 4)
No ratings yet
C# Basics Cheat Sheet (1 of 4)
12 pages
Daa Unit I
No ratings yet
Daa Unit I
15 pages
Swift Algorithms Data Structures
No ratings yet
Swift Algorithms Data Structures
50 pages
UNIT-2 Data Representation
No ratings yet
UNIT-2 Data Representation
11 pages
Automata Solved MCQ
100% (3)
Automata Solved MCQ
27 pages
Lesson 5 TREES AND ALGORITHMS PDF
No ratings yet
Lesson 5 TREES AND ALGORITHMS PDF
31 pages
Term Paper
No ratings yet
Term Paper
6 pages
CMCQ
No ratings yet
CMCQ
18 pages
Neural Networks and Fuzzy Logic
0% (1)
Neural Networks and Fuzzy Logic
2 pages
RPSC Computer Science Syllabus
No ratings yet
RPSC Computer Science Syllabus
1 page
19EC2208 Solution of Linear Constant Coefficient Difference Equations
No ratings yet
19EC2208 Solution of Linear Constant Coefficient Difference Equations
13 pages
DSTR - Topic07.array Based Sequences
No ratings yet
DSTR - Topic07.array Based Sequences
13 pages
Download ebooks file (Ebook) Combinatorial Optimization and Applications: 8th International Conference, COCOA 2014, Wailea, Maui, HI, USA, December 19-21, 2014, Proceedings by Zhao Zhang, Lidong Wu, Wen Xu, Ding-Zhu Du (eds.) ISBN 9783319126906, 3319126903 all chapters
100% (8)
Download ebooks file (Ebook) Combinatorial Optimization and Applications: 8th International Conference, COCOA 2014, Wailea, Maui, HI, USA, December 19-21, 2014, Proceedings by Zhao Zhang, Lidong Wu, Wen Xu, Ding-Zhu Du (eds.) ISBN 9783319126906, 3319126903 all chapters
55 pages
Data Structures Using C (Csit124) Lecture Notes: by Dr. Nancy Girdhar
No ratings yet
Data Structures Using C (Csit124) Lecture Notes: by Dr. Nancy Girdhar
33 pages
Object and Class in Java
No ratings yet
Object and Class in Java
21 pages
Ensemble Learning
No ratings yet
Ensemble Learning
22 pages
Maths Test Paper 9
No ratings yet
Maths Test Paper 9
15 pages
DWM Musa
No ratings yet
DWM Musa
4 pages
Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling
No ratings yet
Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling
18 pages
Dsal Assignment 7
No ratings yet
Dsal Assignment 7
5 pages
CSE 102L Data Structures and Algorithms Lab (Common For B.Tech EEE, ECE, EI) Cycle Sheet - 1
0% (1)
CSE 102L Data Structures and Algorithms Lab (Common For B.Tech EEE, ECE, EI) Cycle Sheet - 1
4 pages
Total Domination Books PDF
75% (4)
Total Domination Books PDF
184 pages
Lab04 Divide-and-Conquer (Part 1)
No ratings yet
Lab04 Divide-and-Conquer (Part 1)
2 pages
NR-220502 - Design and Analysis of Algorithms
100% (2)
NR-220502 - Design and Analysis of Algorithms
4 pages
Introduction To Digital Logic
No ratings yet
Introduction To Digital Logic
14 pages
Lecture Notes - TOC
No ratings yet
Lecture Notes - TOC
18 pages
Variable Neighborhood Search Nenad Mladenovic Andrei Sleptchenko Angelo Sifaleras Mohammed Omar download
100% (1)
Variable Neighborhood Search Nenad Mladenovic Andrei Sleptchenko Angelo Sifaleras Mohammed Omar download
77 pages

Bit Twiddling Hacks

Uploaded by

Bit Twiddling Hacks

Uploaded by

09.12.

2017 B t Tw ddl ng Hacks

About the operat on count ng methodology

Comput ng modulus d v s on by (1 << s) - 1 n parallel w thout a d v s on operat on

About the operat on count ng methodology

Compute the s gn of an nteger

// CHAR_BIT is the number of bits per byte (normally 8).

https://graph cs.stanford.edu/~seander/b thacks.html 2/30

sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));

Detect f two ntegers have oppos te s gns

Manfred We s suggested I add th s entry on November 26, 2009.

Compute the nteger absolute value (abs) w thout branch ng

Patented var at on:

r = y ^ ((x ^ y) & -(x < y)); // min(x, y)

To f nd the max mum, use:

Qu ck and d rty vers ons:

Determ n ng f an nteger s a power of 2

https://graph cs.stanford.edu/~seander/b thacks.html 4/30

unsigned int v; // we want to see if v is a power of 2

Note that 0 s ncorrectly cons dered a power of 2 here. To remedy th s, use:

S gn extend ng from a constant b t-w dth

int r = signextend<signed int,5>(x); // sign extend 5 bit number x to r

S gn extend ng from a var able b t-w dth

int const m = CHAR_BIT * sizeof(x) - b;

https://graph cs.stanford.edu/~seander/b thacks.html 5/30

S gn extend ng from a var able b t-w dth n 3 operat ons

Cond t onally set or clear b ts w thout branch ng

// OR, for superscalar CPUs:

https://graph cs.stanford.edu/~seander/b thacks.html 6/30

Cond t onally negate a value w thout branch ng

r = (fDontNegate ^ (fDontNegate - 1)) * v;

If you need to negate only when a flag s true, then use th s:

Merge b ts from two values accord ng to a mask

r = a ^ ((a ^ b) & mask);

Ron Jeffery sent th s to me on February 9, 2006.

Count ng b ts set (na ve way)

Count ng b ts set by lookup table

B6(0), B6(1), B6(1), B6(2)

unsigned int v; // count the number of bits set in 32-bit value v

// To initially generate the table algorithmically:

Count ng b ts set, Br an Kern ghan's way

Count ng b ts set n 14, 24, or 32-b t words us ng 64-b t nstruct ons

// option 1, for at most 14-bit values in v:

// option 2, for at most 24-bit values in v:

// option 3, for at most 32-bit values in v:

Count ng b ts set, n parallel

c = v - ((v >> 1) & B[0]);

The B array, expressed as b nary, s:

v = v - ((v >> 1) & 0x55555555); // reuse input as temporary

v = v - ((v >> 1) & (T)~(T)0/3); // temp

https://graph cs.stanford.edu/~seander/b thacks.html 9/30

// Shift out bits after given position.

// Do a normal parallel bit count for a 64-bit integer,

s -= ((t - r) & 256) >> 6; r -= (t & ((t - r) >> 8));

Juha Järv sent th s to me on November 21, 2009.

Comput ng par ty the na ve way

Compute par ty by lookup table

unsigned char b; // byte value to compute the parity of

// OR, for 32-bit words:

Compute par ty of a byte us ng 64-b t mult ply and modulus d v s on

https://graph cs.stanford.edu/~seander/b thacks.html 11/30

unsigned char b; // byte value to compute the parity of

Compute par ty of word w th a mult ply

Also for 64-b ts, 8 operat ons are st ll enough.

Andrew Shap ra came up w th th s and sent t to me on Sept. 2, 2007.

Compute par ty n parallel

Swapp ng values w th subtract on and add t on

https://graph cs.stanford.edu/~seander/b thacks.html 12/30

Swapp ng values w th XOR

Swapp ng nd v dual b ts w th XOR

Reverse b ts the obv ous way

for (v >>= 1; v; v >>= 1)

Reverse b ts n word by lookup table

# define R2(n) n, n + 2*64, n + 1*64, n + 3*64

unsigned int v; // reverse 32-bit value, 8 bits at time

b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;

# define R2(n) n, n + 264, n + 164, n + 3*64