0% found this document useful (0 votes)
84 views

Bit Twiddling Hacks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

Bit Twiddling Hacks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

09.12.

2017 B t Tw ddl ng Hacks

B t Tw ddl ng Hacks
By Sean Eron Anderson
[email protected]
Ind v dually, the code sn ppets here are n the publ c doma n (unless otherw se noted) — feel free to use them however you
please. The aggregate collect on and descr pt ons are © 1997-2005 Sean Eron Anderson. The code and descr pt ons are d str buted
n the hope that they w ll be useful, but WITHOUT ANY WARRANTY and w thout even the mpl ed warranty of
merchantab l ty or f tness for a part cular purpose. As of May 5, 2005, all the code has been tested thoroughly. Thousands of
people have read t. Moreover, Professor Randal Bryant, the Dean of Computer Sc ence at Carneg e Mellon Un vers ty, has
personally tested almost everyth ng w th h s Ucl d code ver f cat on system. What he hasn't tested, I have checked aga nst all
poss ble nputs on a 32-b t mach ne. To the f rst person to nform me of a leg t mate bug n the code, I'll pay a bounty of
US$10 (by check or Paypal). If d rected to a char ty, I'll pay US$20.

Contents

About the operat on count ng methodology


Compute the s gn of an nteger
Detect f two ntegers have oppos te s gns
Compute the nteger absolute value (abs) w thout branch ng
Compute the m n mum (m n) or max mum (max) of two ntegers w thout branch ng
Determ n ng f an nteger s a power of 2
S gn extend ng
S gn extend ng from a constant b t-w dth
S gn extend ng from a var able b t-w dth
S gn extend ng from a var able b t-w dth n 3 operat ons
Cond t onally set or clear b ts w thout branch ng
Cond t onally negate a value w thout branch ng
Merge b ts from two values accord ng to a mask
Count ng b ts set
Count ng b ts set, na ve way
Count ng b ts set by lookup table
Count ng b ts set, Br an Kern ghan's way
Count ng b ts set n 14, 24, or 32-b t words us ng 64-b t nstruct ons
Count ng b ts set, n parallel
Count b ts set (rank) from the most-s gn f cant b t upto a g ven pos t on
Select the b t pos t on (from the most-s gn f cant b t) w th the g ven count (rank)
Comput ng par ty (1 f an odd number of b ts set, 0 otherw se)
Compute par ty of a word the na ve way
Compute par ty by lookup table
Compute par ty of a byte us ng 64-b t mult ply and modulus d v s on
Compute par ty of word w th a mult ply
Compute par ty n parallel
Swapp ng Values
Swapp ng values w th subtract on and add t on
Swapp ng values w th XOR
Swapp ng nd v dual b ts w th XOR
Revers ng b t sequences
Reverse b ts the obv ous way
Reverse b ts n word by lookup table
Reverse the b ts n a byte w th 3 operat ons (64-b t mult ply and modulus d v s on)
Reverse the b ts n a byte w th 4 operat ons (64-b t mult ply, no d v s on)
Reverse the b ts n a byte w th 7 operat ons (no 64-b t, only 32)
Reverse an N-b t quant ty n parallel w th 5 * lg(N) operat ons
Modulus d v s on (aka comput ng rema nders)
Comput ng modulus d v s on by 1 << s w thout a d v s on operat on (obv ous)
Comput ng modulus d v s on by (1 << s) - 1 w thout a d v s on operat on
https://graph cs.stanford.edu/~seander/b thacks.html 1/30
09.12.2017 B t Tw ddl ng Hacks

Comput ng modulus d v s on by (1 << s) - 1 n parallel w thout a d v s on operat on


F nd ng nteger log base 2 of an nteger (aka the pos t on of the h ghest b t set)
F nd the log base 2 of an nteger w th the MSB N set n O(N) operat ons (the obv ous way)
F nd the nteger log base 2 of an nteger w th an 64-b t IEEE float
F nd the log base 2 of an nteger w th a lookup table
F nd the log base 2 of an N-b t nteger n O(lg(N)) operat ons
F nd the log base 2 of an N-b t nteger n O(lg(N)) operat ons w th mult ply and lookup
F nd nteger log base 10 of an nteger
F nd nteger log base 10 of an nteger the obv ous way
F nd nteger log base 2 of a 32-b t IEEE float
F nd nteger log base 2 of the pow(2, r)-root of a 32-b t IEEE float (for uns gned nteger r)
Count ng consecut ve tra l ng zero b ts (or f nd ng b t nd ces)
Count the consecut ve zero b ts (tra l ng) on the r ght l nearly
Count the consecut ve zero b ts (tra l ng) on the r ght n parallel
Count the consecut ve zero b ts (tra l ng) on the r ght by b nary search
Count the consecut ve zero b ts (tra l ng) on the r ght by cast ng to a float
Count the consecut ve zero b ts (tra l ng) on the r ght w th modulus d v s on and lookup
Count the consecut ve zero b ts (tra l ng) on the r ght w th mult ply and lookup
Round up to the next h ghest power of 2 by float cast ng
Round up to the next h ghest power of 2
Interleav ng b ts (aka comput ng Morton Numbers)
Interleave b ts the obv ous way
Interleave b ts by table lookup
Interleave b ts w th 64-b t mult ply
Interleave b ts by B nary Mag c Numbers
Test ng for ranges of bytes n a word (and count ng occurances found)
Determ ne f a word has a zero byte
Determ ne f a word has a byte equal to n
Determ ne f a word has byte less than n
Determ ne f a word has a byte greater than n
Determ ne f a word has a byte between m and n
Compute the lex cograph cally next b t permutat on

About the operat on count ng methodology


When total ng the number of operat ons for algor thms here, any C operator s counted as one operat on.
Intermed ate ass gnments, wh ch need not be wr tten to RAM, are not counted. Of course, th s operat on
count ng approach only serves as an approx mat on of the actual number of mach ne nstruct ons and CPU
t me. All operat ons are assumed to take the same amount of t me, wh ch s not true n real ty, but CPUs have
been head ng ncreas ngly n th s d rect on over t me. There are many nuances that determ ne how fast a
system w ll run a g ven sample of code, such as cache s zes, memory bandw dths, nstruct on sets, etc. In the
end, benchmark ng s the best way to determ ne whether one method s really faster than another, so cons der
the techn ques below as poss b l t es to test on your target arch tecture.

Compute the s gn of an nteger


int v; // we want to find the sign of v
int sign; // the result goes here

// CHAR_BIT is the number of bits per byte (normally 8).


sign = -(v < 0); // if v < 0 then -1, else 0.
// or, to avoid branching on CPUs with flag registers (IA32):
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
// or, for one less instruction (but not portable):
sign = v >> (sizeof(int) * CHAR_BIT - 1);

https://graph cs.stanford.edu/~seander/b thacks.html 2/30


09.12.2017 B t Tw ddl ng Hacks

The last express on above evaluates to s gn = v >> 31 for 32-b t ntegers. Th s s one operat on faster than the
obv ous way, s gn = -(v < 0). Th s tr ck works because when s gned ntegers are sh fted r ght, the value of the
far left b t s cop ed to the other b ts. The far left b t s 1 when the value s negat ve and 0 otherw se; all 1 b ts
g ves -1. Unfortunately, th s behav or s arch tecture-spec f c.

Alternat vely, f you prefer the result be e ther -1 or +1, then use:
sign = +1 | (v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then -1, else +1

On the other hand, f you prefer the result be e ther -1, 0, or +1, then use:

sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));


// Or, for more speed but less portability:
sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1)); // -1, 0, or +1
// Or, for portability, brevity, and (perhaps) speed:
sign = (v > 0) - (v < 0); // -1, 0, or +1

If nstead you want to know f someth ng s non-negat ve, result ng n +1 or else 0, then use:
sign = 1 ^ ((unsigned int)v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then 0, else 1

Caveat: On March 7, 2003, Angus Duggan po nted out that the 1989 ANSI C spec f cat on leaves the result
of s gned r ght-sh ft mplementat on-def ned, so on some systems th s hack m ght not work. For greater
portab l ty, Toby Spe ght suggested on September 28, 2005 that CHAR_BIT be used here and throughout
rather than assum ng bytes were 8 b ts long. Angus recommended the more portable vers ons above,
nvolv ng cast ng on March 4, 2006. Roh t Garg suggested the vers on for non-negat ve ntegers on
September 12, 2009.

Detect f two ntegers have oppos te s gns


int x, y; // input values to compare signs

bool f = ((x ^ y) < 0); // true iff x and y have opposite signs

Manfred We s suggested I add th s entry on November 26, 2009.

Compute the nteger absolute value (abs) w thout branch ng


int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;

r = (v + mask) ^ mask;

Patented var at on:


r = (v ^ mask) - mask;

Some CPUs don't have an nteger absolute value nstruct on (or the comp ler fa ls to use them). On mach nes
where branch ng s expens ve, the above express on can be faster than the obv ous approach, r = (v < 0) ? -
(uns gned)v : v, even though the number of operat ons s the same.

On March 7, 2003, Angus Duggan po nted out that the 1989 ANSI C spec f cat on leaves the result of s gned
r ght-sh ft mplementat on-def ned, so on some systems th s hack m ght not work. I've read that ANSI C does
not requ re values to be represented as two's complement, so t may not work for that reason as well (on a
d m n sh ngly small number of old mach nes that st ll use one's complement). On March 14, 2004, Ke th H.
Duggar sent me the patented var at on above; t s super or to the one I n t ally came up w th, r=(+1|(v>>
(sizeof(int)*CHAR_BIT-1)))*v, because a mult ply s not used. Unfortunately, th s method has been patented
n the USA on June 6, 2000 by Vlad m r Yu Volkonsky and ass gned to Sun M crosystems. On August 13,
2006, Yur y Kam nsk y told me that the patent s l kely nval d because the method was publ shed well
https://graph cs.stanford.edu/~seander/b thacks.html 3/30
09.12.2017 B t Tw ddl ng Hacks

before the patent was even f led, such as n How to Opt m ze for the Pent um Processor by Agner Fog, dated
November, 9, 1996. Yur y also ment oned that th s document was translated to Russ an n 1997, wh ch
Vlad m r could have read. Moreover, the Internet Arch ve also has an old l nk to t. On January 30, 2007,
Peter Kankowsk shared w th me an abs vers on he d scovered that was nsp red by M crosoft's V sual C++
comp ler output. It s featured here as the pr mary solut on. On December 6, 2007, Ha J n compla ned that
the result was s gned, so when comput ng the abs of the most negat ve value, t was st ll negat ve. On Apr l
15, 2008 Andrew Shap ra po nted out that the obv ous approach could overflow, as t lacked an (uns gned)
cast then; for max mum portab l ty he suggested (v < 0) ? (1 + ((unsigned)(-1-v))) : (unsigned)v. But
c t ng the ISO C99 spec on July 9, 2008, V ncent Lefèvre conv nced me to remove t becasue even on non-
2s-complement mach nes -(uns gned)v w ll do the r ght th ng. The evaluat on of -(uns gned)v f rst converts
the negat ve value of v to an uns gned by add ng 2**N, y eld ng a 2s complement representat on of v's value
that I'll call U. Then, U s negated, g v ng the des red result, -U = 0 - U = 2**N - U = 2**N - (v+2**N) = -v
= abs(v).

Compute the m n mum (m n) or max mum (max) of two ntegers w thout branch ng
int x; // we want to find the minimum of x and y
int y;
int r; // the result goes here

r = y ^ ((x ^ y) & -(x < y)); // min(x, y)

On some rare mach nes where branch ng s very expens ve and no cond t on move nstruct ons ex st, the
above express on m ght be faster than the obv ous approach, r = (x < y) ? x : y, even though t nvolves two
more nstruct ons. (Typ cally, the obv ous approach s best, though.) It works because f x < y, then -(x < y)
w ll be all ones, so r = y ^ (x ^ y) & ~0 = y ^ x ^ y = x. Otherw se, f x >= y, then -(x < y) w ll be all zeros, so
r = y ^ ((x ^ y) & 0) = y. On some mach nes, evaluat ng (x < y) as 0 or 1 requ res a branch nstruct on, so
there may be no advantage.

To f nd the max mum, use:


r = x ^ ((x ^ y) & -(x < y)); // max(x, y)

Qu ck and d rty vers ons:

If you know that INT_MIN <= x - y <= INT_MAX, then you can use the follow ng, wh ch are faster because
(x - y) only needs to be evaluated once.
r = y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y)
r = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)

Note that the 1989 ANSI C spec f cat on doesn't spec fy the result of s gned r ght-sh ft, so these aren't
portable. If except ons are thrown on overflows, then the values of x and y should be uns gned or cast to
uns gned for the subtract ons to avo d unnecessar ly throw ng an except on, however the r ght-sh ft needs a
s gned operand to produce all one b ts when negat ve, so cast to s gned there.

On March 7, 2003, Angus Duggan po nted out the r ght-sh ft portab l ty ssue. On May 3, 2005, Randal E.
Bryant alerted me to the need for the precond t on, INT_MIN <= x - y <= INT_MAX, and suggested the
non-qu ck and d rty vers on as a f x. Both of these ssues concern only the qu ck and d rty vers on. N gel
Horspoon observed on July 6, 2005 that gcc produced the same code on a Pent um as the obv ous solut on
because of how t evaluates (x < y). On July 9, 2008 V ncent Lefèvre po nted out the potent al for overflow
except ons w th subtract ons n r = y + ((x - y) & -(x < y)), wh ch was the prev ous vers on. T mothy B.
Terr berry suggested us ng xor rather than add and subract to avo d cast ng and the r sk of overflows on June
2, 2009.

Determ n ng f an nteger s a power of 2

https://graph cs.stanford.edu/~seander/b thacks.html 4/30


09.12.2017 B t Tw ddl ng Hacks

unsigned int v; // we want to see if v is a power of 2


bool f; // the result goes here

f = (v & (v - 1)) == 0;

Note that 0 s ncorrectly cons dered a power of 2 here. To remedy th s, use:


f = v && !(v & (v - 1));

S gn extend ng from a constant b t-w dth


S gn extens on s automat c for bu lt- n types, such as chars and nts. But suppose you have a s gned two's
complement number, x, that s stored us ng only b b ts. Moreover, suppose you want to convert x to an nt,
wh ch has more than b b ts. A s mple copy w ll work f x s pos t ve, but f negat ve, the s gn must be
extended. For example, f we have only 4 b ts to store a number, then -3 s represented as 1101 n b nary. If
we have 8 b ts, then -3 s 11111101. The most-s gn f cant b t of the 4-b t representat on s repl cated
s n strally to f ll n the dest nat on when we convert to a representat on w th more b ts; th s s s gn extend ng.
In C, s gn extens on from a constant b t-w dth s tr v al, s nce b t f elds may be spec f ed n structs or un ons.
For example, to convert from 5 b ts to an full nteger:
int x; // convert this from using 5 bits to a full int
int r; // resulting sign extended number goes here
struct {signed int x:5;} s;
r = s.x = x;

The follow ng s a C++ template funct on that uses the same language feature to convert from B b ts n one
operat on (though the comp ler s generat ng more, of course).
template <typename T, unsigned B>
inline T signextend(const T x)
{
struct {T x:B;} s;
return s.x = x;
}

int r = signextend<signed int,5>(x); // sign extend 5 bit number x to r

John Byrd caught a typo n the code (attr buted to html formatt ng) on May 2, 2005. On March 4, 2006, Pat
Wood po nted out that the ANSI C standard requ res that the b tf eld have the keyword "s gned" to be s gned;
otherw se, the s gn s undef ned.

S gn extend ng from a var able b t-w dth

Somet mes we need to extend the s gn of a number but we don't know a pr or the number of b ts, b, n wh ch
t s represented. (Or we could be programm ng n a language l ke Java, wh ch lacks b tf elds.)
unsigned b; // number of bits representing the number in x
int x; // sign extend this b-bit number to r
int r; // resulting sign-extended number
int const m = 1U << (b - 1); // mask can be pre-computed if b is fixed

x = x & ((1U << b) - 1); // (Skip this if bits in x above position b are already zero.)
r = (x ^ m) - m;

The code above requ res four operat ons, but when the b tw dth s a constant rather than var able, t requ res
only two fast operat ons, assum ng the upper b ts are already zeroes.

A sl ghtly faster but less portable method that doesn't depend on the b ts n x above pos t on b be ng zero s:

int const m = CHAR_BIT * sizeof(x) - b;


r = (x << m) >> m;

https://graph cs.stanford.edu/~seander/b thacks.html 5/30


09.12.2017 B t Tw ddl ng Hacks

Sean A. Irv ne suggested that I add s gn extens on methods to th s page on June 13, 2004, and he prov ded m
= (1 << (b - 1)) - 1; r = -(x & ~m) | x; as a start ng po nt from wh ch I opt m zed to get m = 1U << (b -
1); r = -(x & m) | x. But then on May 11, 2007, Shay Green suggested the vers on above, wh ch requ res one
less operat on than m ne. V p n Sharma suggested I add a step to deal w th s tuat ons where x had poss ble
ones n b ts other than the b b ts we wanted to s gn-extend on Oct. 15, 2008. On December 31, 2009 Chr s
P razz suggested I add the faster vers on, wh ch requ res two operat ons for constant b t-w dths and three for
var able w dths.

S gn extend ng from a var able b t-w dth n 3 operat ons


The follow ng may be slow on some mach nes, due to the effort requ red for mult pl cat on and d v s on.
Th s vers on s 4 operat ons. If you know that your n t al b t-w dth, b, s greater than 1, you m ght do th s
type of s gn extens on n 3 operat ons by us ng r = (x * mult pl ers[b]) / mult pl ers[b], wh ch requ res only
one array lookup.
unsigned b; // number of bits representing the number in x
int x; // sign extend this b-bit number to r
int r; // resulting sign-extended number
#define M(B) (1U << ((sizeof(x) * CHAR_BIT) - B)) // CHAR_BIT=bits/byte
static int const multipliers[] =
{
0, M(1), M(2), M(3), M(4), M(5), M(6), M(7),
M(8), M(9), M(10), M(11), M(12), M(13), M(14), M(15),
M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23),
M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31),
M(32)
}; // (add more if using more than 64 bits)
static int const divisors[] =
{
1, ~M(1), M(2), M(3), M(4), M(5), M(6), M(7),
M(8), M(9), M(10), M(11), M(12), M(13), M(14), M(15),
M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23),
M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31),
M(32)
}; // (add more for 64 bits)
#undef M
r = (x * multipliers[b]) / divisors[b];

The follow ng var at on s not portable, but on arch tectures that employ an ar thmet c r ght-sh ft,
ma nta n ng the s gn, t should be fast.
const int s = -b; // OR: sizeof(x) * CHAR_BIT - b;
r = (x << s) >> s;

Randal E. Bryant po nted out a bug on May 3, 2005 n an earl er vers on (that used mult pl ers[] for
d v sors[]), where t fa led on the case of x=1 and b=1.

Cond t onally set or clear b ts w thout branch ng


bool f; // conditional flag
unsigned int m; // the bit mask
unsigned int w; // the word to modify: if (f) w |= m; else w &= ~m;

w ^= (-f ^ w) & m;

// OR, for superscalar CPUs:


w = (w & ~m) | (-f & m);

On some arch tectures, the lack of branch ng can more than make up for what appears to be tw ce as many
operat ons. For nstance, nformal speed tests on an AMD Athlon™ XP 2100+ nd cated t was 5-10% faster.
An Intel Core 2 Duo ran the superscalar vers on about 16% faster than the f rst. Glenn Slayden nformed me

https://graph cs.stanford.edu/~seander/b thacks.html 6/30


09.12.2017 B t Tw ddl ng Hacks

of the f rst express on on December 11, 2003. Marco Yu shared the superscalar vers on w th me on Apr l 3,
2007 and alerted me to a typo 2 days later.

Cond t onally negate a value w thout branch ng


If you need to negate only when a flag s false, then use the follow ng to avo d branch ng:
bool fDontNegate; // Flag indicating we should not negate v.
int v; // Input value to negate if fDontNegate is false.
int r; // result = fDontNegate ? v : -v;

r = (fDontNegate ^ (fDontNegate - 1)) * v;

If you need to negate only when a flag s true, then use th s:


bool fNegate; // Flag indicating if we should negate v.
int v; // Input value to negate if fNegate is true.
int r; // result = fNegate ? -v : v;

r = (v ^ -fNegate) + fNegate;

Avraham Plotn tzky suggested I add the f rst vers on on June 2, 2009. Mot vated to avo d the mult ply, I
came up w th the second vers on on June 8, 2009. Alfonso De Gregor o po nted out that some parens were
m ss ng on November 26, 2009, and rece ved a bug bounty.

Merge b ts from two values accord ng to a mask


unsigned int a; // value to merge in non-masked bits
unsigned int b; // value to merge in masked bits
unsigned int mask; // 1 where bits from b should be selected; 0 where from a.
unsigned int r; // result of (a & ~mask) | (b & mask) goes here

r = a ^ ((a ^ b) & mask);

Th s shaves one operat on from the obv ous way of comb n ng two sets of b ts accord ng to a b t mask. If the
mask s a constant, then there may be no advantage.

Ron Jeffery sent th s to me on February 9, 2006.

Count ng b ts set (na ve way)


unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v

for (c = 0; v; v >>= 1)
{
c += v & 1;
}

The na ve approach requ res one terat on per b t, unt l no more b ts are set. So on a 32-b t word w th only
the h gh set, t w ll go through 32 terat ons.

Count ng b ts set by lookup table


static const unsigned char BitsSetTable256[256] =
{
# define B2(n) n, n+1, n+1, n+2
# define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2)
# define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
https://graph cs.stanford.edu/~seander/b thacks.html 7/30
09.12.2017 B t Tw ddl ng Hacks

B6(0), B6(1), B6(1), B6(2)


};

unsigned int v; // count the number of bits set in 32-bit value v


unsigned int c; // c is the total bits set in v

// Option 1:
c = BitsSetTable256[v & 0xff] +
BitsSetTable256[(v >> 8) & 0xff] +
BitsSetTable256[(v >> 16) & 0xff] +
BitsSetTable256[v >> 24];

// Option 2:
unsigned char * p = (unsigned char *) &v;
c = BitsSetTable256[p[0]] +
BitsSetTable256[p[1]] +
BitsSetTable256[p[2]] +
BitsSetTable256[p[3]];

// To initially generate the table algorithmically:


BitsSetTable256[0] = 0;
for (int i = 0; i < 256; i++)
{
BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
}

On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.

Count ng b ts set, Br an Kern ghan's way


unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}

Br an Kern ghan's method goes through as many terat ons as there are set b ts. So f we have a 32-b t word
w th only the h gh b t set, then t w ll only go once through the loop.

Publ shed n 1988, the C Programm ng Language 2nd Ed. (by Br an W. Kern ghan and Denn s M. R tch e)
ment ons th s n exerc se 2-9. On Apr l 19, 2006 Don Knuth po nted out to me that th s method "was f rst
publ shed by Peter Wegner n CACM 3 (1960), 322. (Also d scovered ndependently by Derr ck Lehmer and
publ shed n 1964 n a book ed ted by Beckenbach.)"

Count ng b ts set n 14, 24, or 32-b t words us ng 64-b t nstruct ons


unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v

// option 1, for at most 14-bit values in v:


c = (v * 0x200040008001ULL & 0x111111111111111ULL) % 0xf;

// option 2, for at most 24-bit values in v:


c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL)
% 0x1f;

// option 3, for at most 32-bit values in v:


c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) %
0x1f;
c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
https://graph cs.stanford.edu/~seander/b thacks.html 8/30
09.12.2017 B t Tw ddl ng Hacks

Th s method requ res a 64-b t CPU w th fast modulus d v s on to be eff c ent. The f rst opt on takes only 3
operat ons; the second opt on takes 10; and the th rd opt on takes 15.

R ch Schroeppel or g nally created a 9-b t vers on, s m l ar to opt on 1; see the Programm ng Hacks sect on
of Beeler, M., Gosper, R. W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972. H s method
was the nsp rat on for the var ants above, dev sed by Sean Anderson. Randal E. Bryant offered a couple bug
f xes on May 3, 2005. Bruce Dawson tweaked what had been a 12-b t vers on and made t su table for 14 b ts
us ng the same number of operat ons on Feburary 1, 2007.

Count ng b ts set, n parallel


unsigned int v; // count bits set in this (32-bit value)
unsigned int c; // store the total here
static const int S[] = {1, 2, 4, 8, 16}; // Magic Binary Numbers
static const int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};

c = v - ((v >> 1) & B[0]);


c = ((c >> S[1]) & B[1]) + (c & B[1]);
c = ((c >> S[2]) + c) & B[2];
c = ((c >> S[3]) + c) & B[3];
c = ((c >> S[4]) + c) & B[4];

The B array, expressed as b nary, s:


B[0] = 0x55555555 = 01010101 01010101 01010101 01010101
B[1] = 0x33333333 = 00110011 00110011 00110011 00110011
B[2] = 0x0F0F0F0F = 00001111 00001111 00001111 00001111
B[3] = 0x00FF00FF = 00000000 11111111 00000000 11111111
B[4] = 0x0000FFFF = 00000000 00000000 11111111 11111111

We can adjust the method for larger nteger s zes by cont nu ng w th the patterns for the B nary Mag c
Numbers, B and S. If there are k b ts, then we need the arrays S and B to be ce l(lg(k)) elements long, and we
must compute the same number of express ons for c as S or B are long. For a 32-b t v, 16 operat ons are
used.

The best method for count ng b ts n a 32-b t nteger v s the follow ng:

v = v - ((v >> 1) & 0x55555555); // reuse input as temporary


v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count

The best b t count ng method takes only 12 operat ons, wh ch s the same as the lookup-table method, but
avo ds the memory and potent al cache m sses of a table. It s a hybr d between the purely parallel method
above and the earl er methods us ng mult pl es ( n the sect on on count ng b ts w th 64-b t nstruct ons),
though t doesn't use 64-b t nstruct ons. The counts of b ts set n the bytes s done n parallel, and the sum
total of the b ts set n the bytes s computed by mult ply ng by 0x1010101 and sh ft ng r ght 24 b ts.

A general zat on of the best b t count ng method to ntegers of b t-w dths upto 128 (parameter zed by type T)
s th s:

v = v - ((v >> 1) & (T)~(T)0/3); // temp


v = (v & (T)~(T)0/15*3) + ((v >> 2) & (T)~(T)0/15*3); // temp
v = (v + (v >> 4)) & (T)~(T)0/255*15; // temp
c = (T)(v * ((T)~(T)0/255)) >> (sizeof(T) - 1) * CHAR_BIT; // count

See Ian Ashdown's n ce newsgroup post for more nformat on on count ng the number of b ts set (also
known as s deways add t on). The best b t count ng method was brought to my attent on on October 5, 2005
by Andrew Shap ra; he found t n pages 187-188 of Software Opt m zat on Gu de for AMD Athlon™ 64
and Opteron™ Processors. Charl e Gordon suggested a way to shave off one operat on from the purely
parallel vers on on December 14, 2005, and Don Clugston tr mmed three more from t on December 30,
2005. I made a typo w th Don's suggest on that Er c Cole spotted on January 8, 2006. Er c later suggested the

https://graph cs.stanford.edu/~seander/b thacks.html 9/30


09.12.2017 B t Tw ddl ng Hacks

arb trary b t-w dth general zat on to the best method on November 17, 2006. On Apr l 5, 2007, Al W ll ams
observed that I had a l ne of dead code at the top of the f rst method.

Count b ts set (rank) from the most-s gn f cant b t upto a g ven pos t on
The follow ng f nds the the rank of a b t, mean ng t returns the sum of b ts that are set to 1 from the most-
s gnf cant b t downto the b t at the g ven pos t on.
uint64_t v; // Compute the rank (bits set) in v from the MSB to pos.
unsigned int pos; // Bit position to count bits upto.
uint64_t r; // Resulting rank of bit at pos goes here.

// Shift out bits after given position.


r = v >> (sizeof(v) * CHAR_BIT - pos);
// Count set bits in parallel.
// r = (r & 0x5555...) + ((r >> 1) & 0x5555...);
r = r - ((r >> 1) & ~0UL/3);
// r = (r & 0x3333...) + ((r >> 2) & 0x3333...);
r = (r & ~0UL/5) + ((r >> 2) & ~0UL/5);
// r = (r & 0x0f0f...) + ((r >> 4) & 0x0f0f...);
r = (r + (r >> 4)) & ~0UL/17;
// r = r % 255;
r = (r * (~0UL/255)) >> ((sizeof(v) - 1) * CHAR_BIT);

Juha Järv sent th s to me on November 21, 2009 as an nverse operat on to the comput ng the b t pos t on
w th the g ven rank, wh ch follows.

Select the b t pos t on (from the most-s gn f cant b t) w th the g ven count (rank)

The follow ng 64-b t code selects the pos t on of the rth 1 b t when count ng from the left. In other words f
we start at the most s gn f cant b t and proceed to the r ght, count ng the number of b ts set to 1 unt l we
reach the des red rank, r, then the pos t on where we stop s returned. If the rank requested exceeds the count
of b ts set, then 64 s returned. The code may be mod f ed for 32-b t or count ng from the r ght.
uint64_t v; // Input value to find position with rank r.
unsigned int r; // Input: bit's desired rank [1-64].
unsigned int s; // Output: Resulting position of bit with rank r [1-64]
uint64_t a, b, c, d; // Intermediate temporaries for bit count.
unsigned int t; // Bit count temporary.

// Do a normal parallel bit count for a 64-bit integer,


// but store all intermediate steps.
// a = (v & 0x5555...) + ((v >> 1) & 0x5555...);
a = v - ((v >> 1) & ~0UL/3);
// b = (a & 0x3333...) + ((a >> 2) & 0x3333...);
b = (a & ~0UL/5) + ((a >> 2) & ~0UL/5);
// c = (b & 0x0f0f...) + ((b >> 4) & 0x0f0f...);
c = (b + (b >> 4)) & ~0UL/0x11;
// d = (c & 0x00ff...) + ((c >> 8) & 0x00ff...);
d = (c + (c >> 8)) & ~0UL/0x101;
t = (d >> 32) + (d >> 48);
// Now do branchless select!
s = 64;
// if (r > t) {s -= 32; r -= t;}
s -= ((t - r) & 256) >> 3; r -= (t & ((t - r) >> 8));
t = (d >> (s - 16)) & 0xff;
// if (r > t) {s -= 16; r -= t;}
s -= ((t - r) & 256) >> 4; r -= (t & ((t - r) >> 8));
t = (c >> (s - 8)) & 0xf;
// if (r > t) {s -= 8; r -= t;}
s -= ((t - r) & 256) >> 5; r -= (t & ((t - r) >> 8));
t = (b >> (s - 4)) & 0x7;
// if (r > t) {s -= 4; r -= t;}
https://graph cs.stanford.edu/~seander/b thacks.html 10/30
09.12.2017 B t Tw ddl ng Hacks

s -= ((t - r) & 256) >> 6; r -= (t & ((t - r) >> 8));


t = (a >> (s - 2)) & 0x3;
// if (r > t) {s -= 2; r -= t;}
s -= ((t - r) & 256) >> 7; r -= (t & ((t - r) >> 8));
t = (v >> (s - 1)) & 0x1;
// if (r > t) s--;
s -= ((t - r) & 256) >> 8;
s = 65 - s;

If branch ng s fast on your target CPU, cons der uncomment ng the f-statements and comment ng the l nes
that follow them.

Juha Järv sent th s to me on November 21, 2009.

Comput ng par ty the na ve way


unsigned int v; // word value to compute the parity of
bool parity = false; // parity will be the parity of v

while (v)
{
parity = !parity;
v = v & (v - 1);
}

The above code uses an approach l ke Br an Kern gan's b t count ng, above. The t me t takes s proport onal
to the number of b ts set.

Compute par ty by lookup table


static const bool ParityTable256[256] =
{
# define P2(n) n, n^1, n^1, n
# define P4(n) P2(n), P2(n^1), P2(n^1), P2(n)
# define P6(n) P4(n), P4(n^1), P4(n^1), P4(n)
P6(0), P6(1), P6(1), P6(0)
};

unsigned char b; // byte value to compute the parity of


bool parity = ParityTable256[b];

// OR, for 32-bit words:


unsigned int v;
v ^= v >> 16;
v ^= v >> 8;
bool parity = ParityTable256[v & 0xff];

// Variation:
unsigned char * p = (unsigned char *) &v;
parity = ParityTable256[p[0] ^ p[1] ^ p[2] ^ p[3]];

Randal E. Bryant encouraged the add t on of the (adm ttedly) obv ous last var at on w th var able p on May
3, 2005. Bruce Rawles found a typo n an nstance of the table var able's name on September 27, 2005, and
he rece ved a $10 bug bounty. On October 9, 2006, Fabr ce Bellard suggested the 32-b t var at ons above,
wh ch requ re only one table lookup; the prev ous vers on had four lookups (one per byte) and were slower.
On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.

Compute par ty of a byte us ng 64-b t mult ply and modulus d v s on

https://graph cs.stanford.edu/~seander/b thacks.html 11/30


09.12.2017 B t Tw ddl ng Hacks

unsigned char b; // byte value to compute the parity of


bool parity =
(((b * 0x0101010101010101ULL) & 0x8040201008040201ULL) % 0x1FF) & 1;

The method above takes around 4 operat ons, but only works on bytes.

Compute par ty of word w th a mult ply


The follow ng method computes the par ty of the 32-b t value n only 8 operat ons us ng a mult ply.
unsigned int v; // 32-bit word
v ^= v >> 1;
v ^= v >> 2;
v = (v & 0x11111111U) * 0x11111111U;
return (v >> 28) & 1;

Also for 64-b ts, 8 operat ons are st ll enough.


unsigned long long v; // 64-bit word
v ^= v >> 1;
v ^= v >> 2;
v = (v & 0x1111111111111111UL) * 0x1111111111111111UL;
return (v >> 60) & 1;

Andrew Shap ra came up w th th s and sent t to me on Sept. 2, 2007.

Compute par ty n parallel


unsigned int v; // word value to compute the parity of
v ^= v >> 16;
v ^= v >> 8;
v ^= v >> 4;
v &= 0xf;
return (0x6996 >> v) & 1;

The method above takes around 9 operat ons, and works for 32-b t words. It may be opt m zed to work just
on bytes n 5 operat ons by remov ng the two l nes mmed ately follow ng "uns gned nt v;". The method f rst
sh fts and XORs the e ght n bbles of the 32-b t value together, leav ng the result n the lowest n bble of v.
Next, the b nary number 0110 1001 1001 0110 (0x6996 n hex) s sh fted to the r ght by the value
represented n the lowest n bble of v. Th s number s l ke a m n ature 16-b t par ty-table ndexed by the low
four b ts n v. The result has the par ty of v n b t 1, wh ch s masked and returned.

Thanks to Mathew Hendry for po nt ng out the sh ft-lookup dea at the end on Dec. 15, 2002. That
opt m zat on shaves two operat ons off us ng only sh ft ng and XOR ng to f nd the par ty.

Swapp ng values w th subtract on and add t on


#define SWAP(a, b) ((&(a) == &(b)) || \
(((a) -= (b)), ((b) += (a)), ((a) = (b) - (a))))

Th s swaps the values of a and b w thout us ng a temporary var able. The n t al check for a and b be ng the
same locat on n memory may be om tted when you know th s can't happen. (The comp ler may om t t
anyway as an opt m zat on.) If you enable overflows except ons, then pass uns gned values so an except on
sn't thrown. The XOR method that follows may be sl ghtly faster on some mach nes. Don't use th s w th
float ng-po nt numbers (unless you operate on the r raw nteger representat ons).

Sanjeev S vasankaran suggested I add th s on June 12, 2007. V ncent Lefèvre po nted out the potent al for
overflow except ons on July 9, 2008

https://graph cs.stanford.edu/~seander/b thacks.html 12/30


09.12.2017 B t Tw ddl ng Hacks

Swapp ng values w th XOR


#define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))

Th s s an old tr ck to exchange the values of the var ables a and b w thout us ng extra space for a temporary
var able.

On January 20, 2005, Ia n A. Flem ng po nted out that the macro above doesn't work when you swap w th
the same memory locat on, such as SWAP(a[ ], a[j]) w th == j. So f that may occur, cons der def n ng the
macro as (((a) == (b)) || (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))). On July 14, 2009, Hallvard Furuseth
suggested that on some mach nes, (((a) ^ (b)) && ((b) ^= (a) ^= (b), (a) ^= (b))) m ght be faster, s nce the (a)
^ (b) express on s reused.

Swapp ng nd v dual b ts w th XOR


unsigned int i, j; // positions of bit sequences to swap
unsigned int n; // number of consecutive bits in each sequence
unsigned int b; // bits to swap reside in b
unsigned int r; // bit-swapped result goes here

unsigned int x = ((b >> i) ^ (b >> j)) & ((1U << n) - 1); // XOR temporary
r = b ^ ((x << i) | (x << j));

As an example of swapp ng ranges of b ts suppose we have have b = 00101111 (expressed n b nary) and we
want to swap the n = 3 consecut ve b ts start ng at = 1 (the second b t from the r ght) w th the 3 consecut ve
b ts start ng at j = 5; the result would be r = 11100011 (b nary).

Th s method of swapp ng s s m lar to the general purpose XOR swap tr ck, but ntended for operat ng on
nd v dual b ts. The var able x stores the result of XOR ng the pa rs of b t values we want to swap, and then
the b ts are set to the result of themselves XORed w th x. Of course, the result s undef ned f the sequences
overlap.

On July 14, 2009 Hallvard Furuseth suggested that I change the 1 << n to 1U << n because the value was
be ng ass gned to an uns gned and to avo d sh ft ng nto a s gn b t.

Reverse b ts the obv ous way


unsigned int v; // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end

for (v >>= 1; v; v >>= 1)


{
r <<= 1;
r |= v & 1;
s--;
}
r <<= s; // shift when v's highest bits are zero

On October 15, 2004, M chael Ho s e po nted out a bug n the or g nal vers on. Randal E. Bryant suggested
remov ng an extra operat on on May 3, 2005. Behdad Esfabod suggested a sl ght change that el m nated one
terat on of the loop on May 18, 2005. Then, on February 6, 2007, L yong Zhou suggested a better vers on
that loops wh le v s not 0, so rather than terat ng over all b ts t stops early.

Reverse b ts n word by lookup table


static const unsigned char BitReverseTable256[256] =
{
https://graph cs.stanford.edu/~seander/b thacks.html 13/30
09.12.2017 B t Tw ddl ng Hacks

# define R2(n) n, n + 2*64, n + 1*64, n + 3*64


# define R4(n) R2(n), R2(n + 2*16), R2(n + 1*16), R2(n + 3*16)
# define R6(n) R4(n), R4(n + 2*4 ), R4(n + 1*4 ), R4(n + 3*4 )
R6(0), R6(2), R6(1), R6(3)
};

unsigned int v; // reverse 32-bit value, 8 bits at time


unsigned int c; // c will get v reversed

// Option 1:
c = (BitReverseTable256[v & 0xff] << 24) |
(BitReverseTable256[(v >> 8) & 0xff] << 16) |
(BitReverseTable256[(v >> 16) & 0xff] << 8) |
(BitReverseTable256[(v >> 24) & 0xff]);

// Option 2:
unsigned char * p = (unsigned char *) &v;
unsigned char * q = (unsigned char *) &c;
q[3] = BitReverseTable256[p[0]];
q[2] = BitReverseTable256[p[1]];
q[1] = BitReverseTable256[p[2]];
q[0] = BitReverseTable256[p[3]];

The f rst method takes about 17 operat ons, and the second takes about 12, assum ng your CPU can load and
store bytes eas ly.

On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.

Reverse the b ts n a byte w th 3 operat ons (64-b t mult ply and modulus d v s on):
unsigned char b; // reverse this (8-bit) byte

b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;

The mult ply operat on creates f ve separate cop es of the 8-b t byte pattern to fan-out nto a 64-b t value.
The AND operat on selects the b ts that are n the correct (reversed) pos t ons, relat ve to each 10-b t groups
of b ts. The mult ply and the AND operat ons copy the b ts from the or g nal byte so they each appear n only
one of the 10-b t sets. The reversed pos t ons of the b ts from the or g nal byte co nc de w th the r relat ve
pos t ons w th n any 10-b t set. The last step, wh ch nvolves modulus d v s on by 2^10 - 1, has the effect of
merg ng together each set of 10 b ts (from pos t ons 0-9, 10-19, 20-29, ...) n the 64-b t value. They do not
overlap, so the add t on steps underly ng the modulus d v s on behave l ke or operat ons.

Th s method was attr buted to R ch Schroeppel n the Programm ng Hacks sect on of Beeler, M., Gosper, R.
W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972.

Reverse the b ts n a byte w th 4 operat ons (64-b t mult ply, no d v s on):


unsigned char b; // reverse this byte

b = ((b * 0x80200802ULL) & 0x0884422110ULL) * 0x0101010101ULL >> 32;

The follow ng shows the flow of the b t values w th the boolean var ables a, b, c, d, e, f, g, and h,
wh ch compr se an 8-b t byte. Not ce how the f rst mult ply fans out the b t pattern to mult ple cop es, wh le
the last mult ply comb nes them n the f fth byte from the r ght.
abcd efgh (-> hgfe dcba)
* 1000 0000 0010 0000 0000 1000 0000 0010 (0x80200802)
-------------------------------------------------------------------------------------------------
0abc defg h00a bcde fgh0 0abc defg h00a bcde fgh0
& 0000 1000 1000 0100 0100 0010 0010 0001 0001 0000 (0x0884422110)
-------------------------------------------------------------------------------------------------
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
* 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 (0x0101010101)
-------------------------------------------------------------------------------------------------
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
https://graph cs.stanford.edu/~seander/b thacks.html 14/30
09.12.2017 B t Tw ddl ng Hacks
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
-------------------------------------------------------------------------------------------------
0000 d000 h000 dc00 hg00 dcb0 hgf0 dcba hgfe dcba hgfe 0cba 0gfe 00ba 00fe 000a 000e 0000
>> 32
-------------------------------------------------------------------------------------------------
0000 d000 h000 dc00 hg00 dcb0 hgf0 dcba hgfe dcba
& 1111 1111
-------------------------------------------------------------------------------------------------
hgfe dcba

Note that the last two steps can be comb ned on some processors because the reg sters can be accessed as
bytes; just mult ply so that a reg ster stores the upper 32 b ts of the result and the take the low byte. Thus, t
may take only 6 operat ons.

Dev sed by Sean Anderson, July 13, 2001.

Reverse the b ts n a byte w th 7 operat ons (no 64-b t):


b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;

Make sure you ass gn or cast the result to an uns gned char to remove garbage n the h gher b ts. Dev sed by
Sean Anderson, July 13, 2001. Typo spotted and correct on suppl ed by M ke Ke th, January 3, 2002.

Reverse an N-b t quant ty n parallel n 5 * lg(N) operat ons:


unsigned int v; // 32-bit word to reverse bit order

// swap odd and even bits


v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);
// swap consecutive pairs
v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);
// swap nibbles ...
v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);
// swap bytes
v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);
// swap 2-byte long pairs
v = ( v >> 16 ) | ( v << 16);

The follow ng var at on s also O(lg(N)), however t requ res more operat ons to reverse v. Its v rtue s n
tak ng less sl ghtly memory by comput ng the constants on the fly.
unsigned int s = sizeof(v) * CHAR_BIT; // bit size; must be power of 2
unsigned int mask = ~0;
while ((s >>= 1) > 0)
{
mask ^= (mask << s);
v = ((v >> s) & mask) | ((v << s) & ~mask);
}

These methods above are best su ted to s tuat ons where N s large. If you use the above w th 64-b t nts (or
larger), then you need to add more l nes (follow ng the pattern); otherw se only the lower 32 b ts w ll be
reversed and the result w ll be n the lower 32 b ts.

See Dr. Dobb's Journal 1983, Edw n Freed's art cle on B nary Mag c Numbers for more nformat on. The
second var at on was suggested by Ken Raeburn on September 13, 2005. Veldme jer ment oned that the f rst
vers on could do w thout ANDS n the last l ne on March 19, 2006.

Compute modulus d v s on by 1 << s w thout a d v s on operator

https://graph cs.stanford.edu/~seander/b thacks.html 15/30


09.12.2017 B t Tw ddl ng Hacks

const unsigned int n; // numerator


const unsigned int s;
const unsigned int d = 1U << s; // So d will be one of: 1, 2, 4, 8, 16, 32, ...
unsigned int m; // m will be n % d
m = n & (d - 1);

Most programmers learn th s tr ck early, but t was ncluded for the sake of completeness.

Compute modulus d v s on by (1 << s) - 1 w thout a d v s on operator


unsigned int n; // numerator
const unsigned int s; // s > 0
const unsigned int d = (1 << s) - 1; // so d is either 1, 3, 7, 15, 31, ...).
unsigned int m; // n % d goes here.

for (m = n; n > d; n = m)
{
for (m = 0; n; n >>= s)
{
m += n & d;
}
}
// Now m is a value from 0 to d, but since with modulus division
// we want m to be 0 when it is d.
m = m == d ? 0 : m;

Th s method of modulus d v s on by an nteger that s one less than a power of 2 takes at most 5 + (4 + 5 *
ce l(N / s)) * ce l(lg(N / s)) operat ons, where N s the number of b ts n the numerator. In other words, t
takes at most O(N * lg(N)) t me.

Dev sed by Sean Anderson, August 15, 2001. Before Sean A. Irv ne corrected me on June 17, 2004, I
m stakenly commented that we could alternat vely ass gn m = ((m + 1) & d) - 1; at the end. M chael M ller
spotted a typo n the code Apr l 25, 2005.

Compute modulus d v s on by (1 << s) - 1 n parallel w thout a d v s on operator

// The following is for a word size of 32 bits!

static const unsigned int M[] =


{
0x00000000, 0x55555555, 0x33333333, 0xc71c71c7,
0x0f0f0f0f, 0xc1f07c1f, 0x3f03f03f, 0xf01fc07f,
0x00ff00ff, 0x07fc01ff, 0x3ff003ff, 0xffc007ff,
0xff000fff, 0xfc001fff, 0xf0003fff, 0xc0007fff,
0x0000ffff, 0x0001ffff, 0x0003ffff, 0x0007ffff,
0x000fffff, 0x001fffff, 0x003fffff, 0x007fffff,
0x00ffffff, 0x01ffffff, 0x03ffffff, 0x07ffffff,
0x0fffffff, 0x1fffffff, 0x3fffffff, 0x7fffffff
};

static const unsigned int Q[][6] =


{
{ 0, 0, 0, 0, 0, 0}, {16, 8, 4, 2, 1, 1}, {16, 8, 4, 2, 2, 2},
{15, 6, 3, 3, 3, 3}, {16, 8, 4, 4, 4, 4}, {15, 5, 5, 5, 5, 5},
{12, 6, 6, 6 , 6, 6}, {14, 7, 7, 7, 7, 7}, {16, 8, 8, 8, 8, 8},
{ 9, 9, 9, 9, 9, 9}, {10, 10, 10, 10, 10, 10}, {11, 11, 11, 11, 11, 11},
{12, 12, 12, 12, 12, 12}, {13, 13, 13, 13, 13, 13}, {14, 14, 14, 14, 14, 14},
{15, 15, 15, 15, 15, 15}, {16, 16, 16, 16, 16, 16}, {17, 17, 17, 17, 17, 17},
{18, 18, 18, 18, 18, 18}, {19, 19, 19, 19, 19, 19}, {20, 20, 20, 20, 20, 20},
{21, 21, 21, 21, 21, 21}, {22, 22, 22, 22, 22, 22}, {23, 23, 23, 23, 23, 23},
{24, 24, 24, 24, 24, 24}, {25, 25, 25, 25, 25, 25}, {26, 26, 26, 26, 26, 26},
{27, 27, 27, 27, 27, 27}, {28, 28, 28, 28, 28, 28}, {29, 29, 29, 29, 29, 29},
{30, 30, 30, 30, 30, 30}, {31, 31, 31, 31, 31, 31}
https://graph cs.stanford.edu/~seander/b thacks.html 16/30
09.12.2017 B t Tw ddl ng Hacks

};

static const unsigned int R[][6] =


{
{0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000},
{0x0000ffff, 0x000000ff, 0x0000000f, 0x00000003, 0x00000001, 0x00000001},
{0x0000ffff, 0x000000ff, 0x0000000f, 0x00000003, 0x00000003, 0x00000003},
{0x00007fff, 0x0000003f, 0x00000007, 0x00000007, 0x00000007, 0x00000007},
{0x0000ffff, 0x000000ff, 0x0000000f, 0x0000000f, 0x0000000f, 0x0000000f},
{0x00007fff, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f},
{0x00000fff, 0x0000003f, 0x0000003f, 0x0000003f, 0x0000003f, 0x0000003f},
{0x00003fff, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f},
{0x0000ffff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff},
{0x000001ff, 0x000001ff, 0x000001ff, 0x000001ff, 0x000001ff, 0x000001ff},
{0x000003ff, 0x000003ff, 0x000003ff, 0x000003ff, 0x000003ff, 0x000003ff},
{0x000007ff, 0x000007ff, 0x000007ff, 0x000007ff, 0x000007ff, 0x000007ff},
{0x00000fff, 0x00000fff, 0x00000fff, 0x00000fff, 0x00000fff, 0x00000fff},
{0x00001fff, 0x00001fff, 0x00001fff, 0x00001fff, 0x00001fff, 0x00001fff},
{0x00003fff, 0x00003fff, 0x00003fff, 0x00003fff, 0x00003fff, 0x00003fff},
{0x00007fff, 0x00007fff, 0x00007fff, 0x00007fff, 0x00007fff, 0x00007fff},
{0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff},
{0x0001ffff, 0x0001ffff, 0x0001ffff, 0x0001ffff, 0x0001ffff, 0x0001ffff},
{0x0003ffff, 0x0003ffff, 0x0003ffff, 0x0003ffff, 0x0003ffff, 0x0003ffff},
{0x0007ffff, 0x0007ffff, 0x0007ffff, 0x0007ffff, 0x0007ffff, 0x0007ffff},
{0x000fffff, 0x000fffff, 0x000fffff, 0x000fffff, 0x000fffff, 0x000fffff},
{0x001fffff, 0x001fffff, 0x001fffff, 0x001fffff, 0x001fffff, 0x001fffff},
{0x003fffff, 0x003fffff, 0x003fffff, 0x003fffff, 0x003fffff, 0x003fffff},
{0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff},
{0x00ffffff, 0x00ffffff, 0x00ffffff, 0x00ffffff, 0x00ffffff, 0x00ffffff},
{0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff},
{0x03ffffff, 0x03ffffff, 0x03ffffff, 0x03ffffff, 0x03ffffff, 0x03ffffff},
{0x07ffffff, 0x07ffffff, 0x07ffffff, 0x07ffffff, 0x07ffffff, 0x07ffffff},
{0x0fffffff, 0x0fffffff, 0x0fffffff, 0x0fffffff, 0x0fffffff, 0x0fffffff},
{0x1fffffff, 0x1fffffff, 0x1fffffff, 0x1fffffff, 0x1fffffff, 0x1fffffff},
{0x3fffffff, 0x3fffffff, 0x3fffffff, 0x3fffffff, 0x3fffffff, 0x3fffffff},
{0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff}
};

unsigned int n; // numerator


const unsigned int s; // s > 0
const unsigned int d = (1 << s) - 1; // so d is either 1, 3, 7, 15, 31, ...).
unsigned int m; // n % d goes here.

m = (n & M[s]) + ((n >> s) & M[s]);

for (const unsigned int * q = &Q[s][0], * r = &R[s][0]; m > d; q++, r++)


{
m = (m >> *q) + (m & *r);
}
m = m == d ? 0 : m; // OR, less portably: m = m & -((signed)(m - d) >> s);

Th s method of f nd ng modulus d v s on by an nteger that s one less than a power of 2 takes at most
O(lg(N)) t me, where N s the number of b ts n the numerator (32 b ts, for the code above). The number of
operat ons s at most 12 + 9 * ce l(lg(N)). The tables may be removed f you know the denom nator at
comp le t me; just extract the few relevent entr es and unroll the loop. It may be eas ly extended to more b ts.

It f nds the result by summ ng the values n base (1 << s) n parallel. F rst every other base (1 << s) value s
added to the prev ous one. Imag ne that the result s wr tten on a p ece of paper. Cut the paper n half, so that
half the values are on each cut p ece. Al gn the values and sum them onto a new p ece of paper. Repeat by
cutt ng th s paper n half (wh ch w ll be a quarter of the s ze of the prev ous one) and summ ng, unt l you
cannot cut further. After perform ng lg(N/s/2) cuts, we cut no more; just cont nue to add the values and put
the result onto a new p ece of paper as before, wh le there are at least two s-b t values.

Dev sed by Sean Anderson, August 20, 2001. A typo was spotted by Randy E. Bryant on May 3, 2005 (after
past ng the code, I had later added "uns nged" to a var able declarat on). As n the prev ous hack, I
m stakenly commented that we could alternat vely ass gn m = ((m + 1) & d) - 1; at the end, and Don

https://graph cs.stanford.edu/~seander/b thacks.html 17/30


09.12.2017 B t Tw ddl ng Hacks

Knuth corrected me on Apr l 19, 2006 and suggested m = m & -((signed)(m - d) >> s). On June 18, 2009
Sean Irv ne proposed a change that used ((n >> s) & M[s]) nstead of ((n & ~M[s]) >> s), wh ch typ cally
requ res fewer operat ons because the M[s] constant s already loaded.

F nd the log base 2 of an nteger w th the MSB N set n O(N) operat ons (the obv ous
way)
unsigned int v; // 32-bit word to find the log base 2 of
unsigned int r = 0; // r will be lg(v)

while (v >>= 1) // unroll for more speed...


{
r++;
}

The log base 2 of an nteger s the same as the pos t on of the h ghest b t set (or most s gn f cant b t set,
MSB). The follow ng log base 2 methods are faster than th s one.

F nd the nteger log base 2 of an nteger w th an 64-b t IEEE float


int v; // 32-bit integer to find the log base 2 of
int r; // result of log_2(v) goes here
union { unsigned int u[2]; double d; } t; // temp

t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] = 0x43300000;
t.u[__FLOAT_WORD_ORDER!=LITTLE_ENDIAN] = v;
t.d -= 4503599627370496.0;
r = (t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] >> 20) - 0x3FF;

The code above loads a 64-b t (IEEE-754 float ng-po nt) double w th a 32-b t nteger (w th no paddd ng b ts)
by stor ng the nteger n the mant ssa wh le the exponent s set to 252. From th s newly m nted double, 252
(expressed as a double) s subtracted, wh ch sets the result ng exponent to the log base 2 of the nput value, v.
All that s left s sh ft ng the exponent b ts nto pos t on (20 b ts r ght) and subtract ng the b as, 0x3FF (wh ch
s 1023 dec mal). Th s techn que only takes 5 operat ons, but many CPUs are slow at man pulat ng doubles,
and the end aness of the arch tecture must be accommodated.

Er c Cole sent me th s on January 15, 2006. Evan Fel x po nted out a typo on Apr l 4, 2006. V ncent Lefèvre
told me on July 9, 2008 to change the end an check to use the float's end an, wh ch could d ffer from the
nteger's end an.

F nd the log base 2 of an nteger w th a lookup table


static const char LogTable256[256] =
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};

unsigned int v; // 32-bit word to find the log of


unsigned r; // r will be lg(v)
register unsigned int t, tt; // temporaries

if (tt = v >> 16)


{
r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else

https://graph cs.stanford.edu/~seander/b thacks.html 18/30


09.12.2017 B t Tw ddl ng Hacks

{
r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}

The lookup table method takes only about 7 operat ons to f nd the log of a 32-b t value. If extended for 64-b t
quant t es, t would take roughly 9 operat ons. Another operat on can be tr mmed off by us ng four tables,
w th the poss ble add t ons ncorporated nto each. Us ng nt table elements may be faster, depend ng on your
arch tecture.

The code above s tuned to un formly d str buted output values. If your nputs are evenly d str buted across
all 32-b t values, then cons der us ng the follow ng:
if (tt = v >> 24)
{
r = 24 + LogTable256[tt];
}
else if (tt = v >> 16)
{
r = 16 + LogTable256[tt];
}
else if (tt = v >> 8)
{
r = 8 + LogTable256[tt];
}
else
{
r = LogTable256[v];
}

To n t ally generate the log table algor thm cally:


LogTable256[0] = LogTable256[1] = 0;
for (int i = 2; i < 256; i++)
{
LogTable256[i] = 1 + LogTable256[i / 2];
}
LogTable256[0] = -1; // if you want log(0) to return -1

Behdad Esfahbod and I shaved off a fract on of an operat on (on average) on May 18, 2005. Yet another
fract on of an operat on was removed on November 14, 2006 by Emanuel Hoogeveen. The var at on that s
tuned to evenly d str buted nput values was suggested by Dav d A. Butterf eld on September 19, 2008.
Venkat Reddy told me on January 5, 2009 that log(0) should return -1 to nd cate an error, so I changed the
f rst entry n the table to that.

F nd the log base 2 of an N-b t nteger n O(lg(N)) operat ons


unsigned int v; // 32-bit value to find the log2 of
const unsigned int b[] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000};
const unsigned int S[] = {1, 2, 4, 8, 16};
int i;

register unsigned int r = 0; // result of log2(v) will go here


for (i = 4; i >= 0; i--) // unroll for speed...
{
if (v & b[i])
{
v >>= S[i];
r |= S[i];
}
}

// OR (IF YOUR CPU BRANCHES SLOWLY):

unsigned int v; // 32-bit value to find the log2 of


register unsigned int r; // result of log2(v) will go here
https://graph cs.stanford.edu/~seander/b thacks.html 19/30
09.12.2017 B t Tw ddl ng Hacks

register unsigned int shift;

r = (v > 0xFFFF) << 4; v >>= r;


shift = (v > 0xFF ) << 3; v >>= shift; r |= shift;
shift = (v > 0xF ) << 2; v >>= shift; r |= shift;
shift = (v > 0x3 ) << 1; v >>= shift; r |= shift;
r |= (v >> 1);

// OR (IF YOU KNOW v IS A POWER OF 2):

unsigned int v; // 32-bit value to find the log2 of


static const unsigned int b[] = {0xAAAAAAAA, 0xCCCCCCCC, 0xF0F0F0F0,
0xFF00FF00, 0xFFFF0000};
register unsigned int r = (v & b[0]) != 0;
for (i = 4; i > 0; i--) // unroll for speed...
{
r |= ((v & b[i]) != 0) << i;
}

Of course, to extend the code to f nd the log of a 33- to 64-b t number, we would append another element,
0xFFFFFFFF00000000, to b, append 32 to S, and loop from 5 to 0. Th s method s much slower than the
earl er table-lookup vers on, but f you don't want b g table or your arch tecture s slow to access memory, t's
a good cho ce. The second var at on nvolves sl ghtly more operat ons, but t may be faster on mach nes w th
h gh branch costs (e.g. PowerPC).

The second vers on was sent to me by Er c Cole on January 7, 2006. Andrew Shap ra subsequently tr mmed
a few operat ons off of t and sent me h s var at on (above) on Sept. 1, 2007. The th rd var at on was
suggested to me by John Owens on Apr l 24, 2002; t's faster, but t s only su table when the nput s known
to be a power of 2. On May 25, 2003, Ken Raeburn suggested mprov ng the general case by us ng smaller
numbers for b[], wh ch load faster on some arch tectures (for nstance f the word s ze s 16 b ts, then only
one load nstruct on may be needed). These values work for the general vers on, but not for the spec al-case
vers on below t, where v s a power of 2; Glenn Slayden brought th s overs ght to my attent on on December
12, 2003.

F nd the log base 2 of an N-b t nteger n O(lg(N)) operat ons w th mult ply and lookup
uint32_t v; // find the log base 2 of 32-bit v
int r; // result goes here

static const int MultiplyDeBruijnBitPosition[32] =


{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};

v |= v >> 1; // first round down to one less than a power of 2


v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;

r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];

The code above computes the log base 2 of a 32-b t nteger w th a small table lookup and mult ply. It
requ res only 13 operat ons, compared to (up to) 20 for the prev ous method. The purely table-based method
requ res the fewest operat ons, but th s offers a reasonable comprom se between table s ze and speed.

If you know that v s a power of 2, then you only need the follow ng:

static const int MultiplyDeBruijnBitPosition2[32] =


{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
https://graph cs.stanford.edu/~seander/b thacks.html 20/30
09.12.2017 B t Tw ddl ng Hacks

};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];

Er c Cole dev sed th s January 8, 2006 after read ng about the entry below to round up to a power of 2 and
the method below for comput ng the number of tra l ng b ts w th a mult ply and lookup us ng a DeBru jn
sequence. On December 10, 2009, Mark D ck nson shaved off a couple operat ons by requ r ng v be rounded
up to one less than the next power of 2 rather than the power of 2.

F nd nteger log base 10 of an nteger


unsigned int v; // non-zero 32-bit integer value to compute the log base 10 of
int r; // result goes here
int t; // temporary

static unsigned int const PowersOf10[] =


{1, 10, 100, 1000, 10000, 100000,
1000000, 10000000, 100000000, 1000000000};

t = (IntegerLogBase2(v) + 1) * 1233 >> 12; // (use a lg2 method from above)


r = t - (v < PowersOf10[t]);

The nteger log base 10 s computed by f rst us ng one of the techn ques above for f nd ng the log base 2. By
the relat onsh p log10(v) = log2(v) / log2(10), we need to mult ply t by 1/log2(10), wh ch s approx mately
1233/4096, or 1233 followed by a r ght sh ft of 12. Add ng one s needed because the IntegerLogBase2
rounds down. F nally, s nce the value t s only an approx mat on that may be off by one, the exact value s
found by subtract ng the result of v < PowersOf10[t].

Th s method takes 6 more operat ons than IntegerLogBase2. It may be sped up (on mach nes w th fast
memory access) by mod fy ng the log base 2 table-lookup method above so that the entr es hold what s
computed for t (that s, pre-add, -mul tply, and -sh ft). Do ng so would requ re a total of only 9 operat ons to
f nd the log base 10, assum ng 4 tables were used (one for each byte of v).

Er c Cole suggested I add a vers on of th s on January 7, 2006.

F nd nteger log base 10 of an nteger the obv ous way


unsigned int v; // non-zero 32-bit integer value to compute the log base 10 of
int r; // result goes here

r = (v >= 1000000000) ? 9 : (v >= 100000000) ? 8 : (v >= 10000000) ? 7 :


(v >= 1000000) ? 6 : (v >= 100000) ? 5 : (v >= 10000) ? 4 :
(v >= 1000) ? 3 : (v >= 100) ? 2 : (v >= 10) ? 1 : 0;

Th s method works well when the nput s un formly d str buted over 32-b t values because 76% of the nputs
are caught by the f rst compare, 21% are caught by the second compare, 2% are caught by the th rd, and so
on (chopp ng the rema n ng down by 90% w th each compar s on). As a result, less than 2.6 operat ons are
needed on average.

On Apr l 18, 2007, Emanuel Hoogeveen suggested a var at on on th s where the cond t ons used d v s ons,
wh ch were not as fast as s mple compar sons.

F nd nteger log base 2 of a 32-b t IEEE float


const float v; // find int(log2(v)), where v > 0.0 && finite(v) && isnormal(v)
int c; // 32-bit int c gets the result;

c = *(const int *) &v; // OR, for portability: memcpy(&c, &v, sizeof c);
c = (c >> 23) - 127;

https://graph cs.stanford.edu/~seander/b thacks.html 21/30


09.12.2017 B t Tw ddl ng Hacks

The above s fast, but IEEE 754-compl ant arch tectures ut l ze subnormal (also called denormal) float ng
po nt numbers. These have the exponent b ts set to zero (s gn fy ng pow(2,-127)), and the mant ssa s not
normal zed, so t conta ns lead ng zeros and thus the log2 must be computed from the mant ssa. To
accomodate for subnormal numbers, use the follow ng:
const float v; // find int(log2(v)), where v > 0.0 && finite(v)
int c; // 32-bit int c gets the result;
int x = *(const int *) &v; // OR, for portability: memcpy(&x, &v, sizeof x);

c = x >> 23;

if (c)
{
c -= 127;
}
else
{ // subnormal, so recompute using mantissa: c = intlog2(x) - 149;
register unsigned int t; // temporary
// Note that LogTable256 was defined earlier
if (t = x >> 16)
{
c = LogTable256[t] - 133;
}
else
{
c = (t = x >> 8) ? LogTable256[t] - 141 : LogTable256[x] - 149;
}
}

On June 20, 2004, Sean A. Irv ne suggested that I nclude code to handle subnormal numbers. On June 11,
2005, Falk Hüffner po nted out that ISO C99 6.5/7 spec f ed undef ned behav or for the common type
punn ng d om *( nt *)&, though t has worked on 99.9% of C comp lers. He proposed us ng memcpy for
max mum portab l ty or a un on w th a float and an nt for better code generat on than memcpy on some
comp lers.

F nd nteger log base 2 of the pow(2, r)-root of a 32-b t IEEE float (for uns gned nteger
r)
const int r;
const float v; // find int(log2(pow((double) v, 1. / pow(2, r)))),
// where isnormal(v) and v > 0
int c; // 32-bit int c gets the result;

c = *(const int *) &v; // OR, for portability: memcpy(&c, &v, sizeof c);
c = ((((c - 0x3f800000) >> r) + 0x3f800000) >> 23) - 127;

So, f r s 0, for example, we have c = nt(log2((double) v)). If r s 1, then we have c = nt(log2(sqrt((double)


v))). If r s 2, then we have c = nt(log2(pow((double) v, 1./4))).

On June 11, 2005, Falk Hüffner po nted out that ISO C99 6.5/7 left the type punn ng d om *( nt *)&
undef ned, and he suggested us ng memcpy.

Count the consecut ve zero b ts (tra l ng) on the r ght l nearly


unsigned int v; // input to count trailing zero bits
int c; // output: c will count v's trailing zero bits,
// so if v is 1101000 (base 2), then c will be 3
if (v)
{
v = (v ^ (v - 1)) >> 1; // Set v's trailing 0s to 1s and zero rest
for (c = 0; v; c++)
{

https://graph cs.stanford.edu/~seander/b thacks.html 22/30


09.12.2017 B t Tw ddl ng Hacks

v >>= 1;
}
}
else
{
c = CHAR_BIT * sizeof(v);
}

The average number of tra l ng zero b ts n a (un formly d str buted) random b nary number s one, so th s
O(tra l ng zeros) solut on sn't that bad compared to the faster methods below.

J m Cole suggested I add a l near-t me method for count ng the tra l ng zeros on August 15, 2007. On
October 22, 2007, Jason Cunn ngham po nted out that I had neglected to paste the uns gned mod f er for v.

Count the consecut ve zero b ts (tra l ng) on the r ght n parallel


unsigned int v; // 32-bit word input to count zero bits on right
unsigned int c = 32; // c will be the number of zero bits on the right
v &= -signed(v);
if (v) c--;
if (v & 0x0000FFFF) c -= 16;
if (v & 0x00FF00FF) c -= 8;
if (v & 0x0F0F0F0F) c -= 4;
if (v & 0x33333333) c -= 2;
if (v & 0x55555555) c -= 1;

Here, we are bas cally do ng the same operat ons as f nd ng the log base 2 n parallel, but we f rst solate the
lowest 1 b t, and then proceed w th c start ng at the max mum and decreas ng. The number of operat ons s at
most 3 * lg(N) + 4, roughly, for N b t words.

B ll Burd ck suggested an opt m zat on, reduc ng the t me from 4 * lg(N) on February 4, 2011.

Count the consecut ve zero b ts (tra l ng) on the r ght by b nary search
unsigned int v; // 32-bit word input to count zero bits on right
unsigned int c; // c will be the number of zero bits on the right,
// so if v is 1101000 (base 2), then c will be 3
// NOTE: if 0 == v, then c = 31.
if (v & 0x1)
{
// special case for odd v (assumed to happen half of the time)
c = 0;
}
else
{
c = 1;
if ((v & 0xffff) == 0)
{
v >>= 16;
c += 16;
}
if ((v & 0xff) == 0)
{
v >>= 8;
c += 8;
}
if ((v & 0xf) == 0)
{
v >>= 4;
c += 4;
}
if ((v & 0x3) == 0)
{

https://graph cs.stanford.edu/~seander/b thacks.html 23/30


09.12.2017 B t Tw ddl ng Hacks

v >>= 2;
c += 2;
}
c -= v & 0x1;
}

The code above s s m lar to the prev ous method, but t computes the number of tra l ng zeros by
accumulat ng c n a manner ak n to b nary search. In the f rst step, t checks f the bottom 16 b ts of v are
zeros, and f so, sh fts v r ght 16 b ts and adds 16 to c, wh ch reduces the number of b ts n v to cons der by
half. Each of the subsequent cond t onal steps l kew se halves the number of b ts unt l there s only 1. Th s
method s faster than the last one (by about 33%) because the bod es of the f statements are executed less
often.

Matt Wh tlock suggested th s on January 25, 2006. Andrew Shap ra shaved a couple operat ons off on Sept.
5, 2007 (by sett ng c=1 and uncond t onally subtract ng at the end).

Count the consecut ve zero b ts (tra l ng) on the r ght by cast ng to a float
unsigned int v; // find the number of trailing zeros in v
int r; // the result goes here
float f = (float)(v & -v); // cast the least significant bit in v to a float
r = (*(uint32_t *)&f >> 23) - 0x7f;

Although th s only takes about 6 operat ons, the t me to convert an nteger to a float can be h gh on some
mach nes. The exponent of the 32-b t IEEE float ng po nt representat on s sh fted down, and the b as s
subtracted to g ve the pos t on of the least s gn f cant 1 b t set n v. If v s zero, then the result s -127.

Count the consecut ve zero b ts (tra l ng) on the r ght w th modulus d v s on and lookup
unsigned int v; // find the number of trailing zeros in v
int r; // put the result in r
static const int Mod37BitPosition[] = // map a bit value mod 37 to its position
{
32, 0, 1, 26, 2, 23, 27, 0, 3, 16, 24, 30, 28, 11, 0, 13, 4,
7, 17, 0, 25, 22, 31, 15, 29, 10, 12, 6, 0, 21, 14, 9, 5,
20, 8, 19, 18
};
r = Mod37BitPosition[(-v & v) % 37];

The code above f nds the number of zeros that are tra l ng on the r ght, so b nary 0100 would produce 2. It
makes use of the fact that the f rst 32 b t pos t on values are relat vely pr me w th 37, so perform ng a
modulus d v s on w th 37 g ves a un que number from 0 to 36 for each. These numbers may then be mapped
to the number of zeros us ng a small lookup table. It uses only 4 operat ons, however ndex ng nto a table
and perform ng modulus d v s on may make t unsu table for some s tuat ons. I came up w th th s
ndependently and then searched for a subsequence of the table values, and found t was nvented earl er by
Re ser, accord ng to Hacker's Del ght.

Count the consecut ve zero b ts (tra l ng) on the r ght w th mult ply and lookup
unsigned int v; // find the number of trailing zeros in 32-bit v
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];

https://graph cs.stanford.edu/~seander/b thacks.html 24/30


09.12.2017 B t Tw ddl ng Hacks

Convert ng b t vectors to nd ces of set b ts s an example use for th s. It requ res one more operat on than the
earl er one nvolv ng modulus d v s on, but the mult ply may be faster. The express on (v & -v) extracts the
least s gn f cant 1 b t from v. The constant 0x077CB531UL s a de Bru jn sequence, wh ch produces a un que
pattern of b ts nto the h gh 5 b ts for each poss ble b t pos t on that t s mult pl ed aga nst. When there are
no b ts set, t returns 0. More nformat on can be found by read ng the paper Us ng de Bru jn Sequences to
Index 1 n a Computer Word by Charles E. Le serson, Harald Prokof, and Ke th H. Randall.

On October 8, 2005 Andrew Shap ra suggested I add th s. Dust n Sp cuzza asked me on Apr l 14, 2009 to
cast the result of the mult ply to a 32-b t type so t would work when comp led w th 64-b t nts.

Round up to the next h ghest power of 2 by float cast ng


unsigned int const v; // Round this 32-bit value to the next highest power of 2
unsigned int r; // Put the result here. (So v=3 -> r=4; v=8 -> r=8)

if (v > 1)
{
float f = (float)v;
unsigned int const t = 1U << ((*(unsigned int *)&f >> 23) - 0x7f);
r = t << (t < v);
}
else
{
r = 1;
}

The code above uses 8 operat ons, but works on all v <= (1<<31).

Qu ck and d rty vers on, for doma n of 1 < v < (1<<25):

float f = (float)(v - 1);


r = 1U << ((*(unsigned int*)(&f) >> 23) - 126);

Although the qu ck and d rty vers on only uses around 6 operat ons, t s roughly three t mes slower than the
techn que below (wh ch nvolves 12 operat ons) when benchmarked on an Athlon™ XP 2100+ CPU. Some
CPUs w ll fare better w th t, though.

On September 27, 2005 And Sm thers suggested I nclude a techn que for cast ng to floats to f nd the lg of a
number for round ng up to a power of 2. S m lar to the qu ck and d rty vers on here, h s vers on worked w th
values less than (1<<25), due to mant ssa round ng, but t used one more operat on.

Round up to the next h ghest power of 2


unsigned int v; // compute the next highest power of 2 of 32-bit v

v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;

In 12 operat ons, th s code computes the next h ghest power of 2 for a 32-b t nteger. The result may be
expressed by the formula 1U << (lg(v - 1) + 1). Note that n the edge case where v s 0, t returns 0, wh ch
sn't a power of 2; you m ght append the express on v += (v == 0) to remedy th s f t matters. It would be
faster by 2 operat ons to use the formula and the log base 2 method that uses a lookup table, but n some
s tuat ons, lookup tables are not su table, so the above code may be best. (On a Athlon™ XP 2100+ I've
found the above sh ft-left and then OR code s as fast as us ng a s ngle BSR assembly language nstruct on,
wh ch scans n reverse to f nd the h ghest set b t.) It works by copy ng the h ghest set b t to all of the lower

https://graph cs.stanford.edu/~seander/b thacks.html 25/30


09.12.2017 B t Tw ddl ng Hacks

b ts, and then add ng one, wh ch results n carr es that set all of the lower b ts to 0 and one b t beyond the
h ghest set b t to 1. If the or g nal number was a power of 2, then the decrement w ll reduce t to one less, so
that we round up to the same or g nal value.

You m ght alternat vely compute the next h gher power of 2 n only 8 or 9 operat ons us ng a lookup table for
floor(lg(v)) and then evaluat ng 1<<(1+floor(lg(v))); Atul D vekar suggested I ment on th s on September 5,
2010.

Dev sed by Sean Anderson, Sepember 14, 2001. Pete Hart po nted me to a couple newsgroup posts by h m
and W ll am Lew s n February of 1997, where they arr ve at the same algor thm.

Interleave b ts the obv ous way


unsigned short x; // Interleave bits of x and y, so that all of the
unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z = 0; // z gets the resulting Morton Number.

for (int i = 0; i < sizeof(x) * CHAR_BIT; i++) // unroll for more speed...
{
z |= (x & 1U << i) << i | (y & 1U << i) << (i + 1);
}

Interleaved b ts (aka Morton numbers) are useful for l near z ng 2D nteger coord nates, so x and y are
comb ned nto a s ngle number that can be compared eas ly and has the property that a number s usually
close to another f the r x and y values are close.

Interleave b ts by table lookup


static const unsigned short MortonTable256[256] =
{
0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015,
0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055,
0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115,
0x0140, 0x0141, 0x0144, 0x0145, 0x0150, 0x0151, 0x0154, 0x0155,
0x0400, 0x0401, 0x0404, 0x0405, 0x0410, 0x0411, 0x0414, 0x0415,
0x0440, 0x0441, 0x0444, 0x0445, 0x0450, 0x0451, 0x0454, 0x0455,
0x0500, 0x0501, 0x0504, 0x0505, 0x0510, 0x0511, 0x0514, 0x0515,
0x0540, 0x0541, 0x0544, 0x0545, 0x0550, 0x0551, 0x0554, 0x0555,
0x1000, 0x1001, 0x1004, 0x1005, 0x1010, 0x1011, 0x1014, 0x1015,
0x1040, 0x1041, 0x1044, 0x1045, 0x1050, 0x1051, 0x1054, 0x1055,
0x1100, 0x1101, 0x1104, 0x1105, 0x1110, 0x1111, 0x1114, 0x1115,
0x1140, 0x1141, 0x1144, 0x1145, 0x1150, 0x1151, 0x1154, 0x1155,
0x1400, 0x1401, 0x1404, 0x1405, 0x1410, 0x1411, 0x1414, 0x1415,
0x1440, 0x1441, 0x1444, 0x1445, 0x1450, 0x1451, 0x1454, 0x1455,
0x1500, 0x1501, 0x1504, 0x1505, 0x1510, 0x1511, 0x1514, 0x1515,
0x1540, 0x1541, 0x1544, 0x1545, 0x1550, 0x1551, 0x1554, 0x1555,
0x4000, 0x4001, 0x4004, 0x4005, 0x4010, 0x4011, 0x4014, 0x4015,
0x4040, 0x4041, 0x4044, 0x4045, 0x4050, 0x4051, 0x4054, 0x4055,
0x4100, 0x4101, 0x4104, 0x4105, 0x4110, 0x4111, 0x4114, 0x4115,
0x4140, 0x4141, 0x4144, 0x4145, 0x4150, 0x4151, 0x4154, 0x4155,
0x4400, 0x4401, 0x4404, 0x4405, 0x4410, 0x4411, 0x4414, 0x4415,
0x4440, 0x4441, 0x4444, 0x4445, 0x4450, 0x4451, 0x4454, 0x4455,
0x4500, 0x4501, 0x4504, 0x4505, 0x4510, 0x4511, 0x4514, 0x4515,
0x4540, 0x4541, 0x4544, 0x4545, 0x4550, 0x4551, 0x4554, 0x4555,
0x5000, 0x5001, 0x5004, 0x5005, 0x5010, 0x5011, 0x5014, 0x5015,
0x5040, 0x5041, 0x5044, 0x5045, 0x5050, 0x5051, 0x5054, 0x5055,
0x5100, 0x5101, 0x5104, 0x5105, 0x5110, 0x5111, 0x5114, 0x5115,
0x5140, 0x5141, 0x5144, 0x5145, 0x5150, 0x5151, 0x5154, 0x5155,
0x5400, 0x5401, 0x5404, 0x5405, 0x5410, 0x5411, 0x5414, 0x5415,
0x5440, 0x5441, 0x5444, 0x5445, 0x5450, 0x5451, 0x5454, 0x5455,
0x5500, 0x5501, 0x5504, 0x5505, 0x5510, 0x5511, 0x5514, 0x5515,
0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555
};
https://graph cs.stanford.edu/~seander/b thacks.html 26/30
09.12.2017 B t Tw ddl ng Hacks

unsigned short x; // Interleave bits of x and y, so that all of the


unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.

z = MortonTable256[y >> 8] << 17 |


MortonTable256[x >> 8] << 16 |
MortonTable256[y & 0xFF] << 1 |
MortonTable256[x & 0xFF];

For more speed, use an add t onal table w th values that are MortonTable256 pre-sh fted one b t to the left.
Th s second table could then be used for the y lookups, thus reduc ng the operat ons by two, but almost
doubl ng the memory requ red. Extend ng th s same dea, four tables could be used, w th two of them pre-
sh fted by 16 to the left of the prev ous two, so that we would only need 11 operat ons total.

Interleave b ts w th 64-b t mult ply


In 11 operat ons, th s vers on nterleaves b ts of two bytes (rather than shorts, as n the other vers ons), but
many of the operat ons are 64-b t mult pl es so t sn't appropr ate for all mach nes. The nput parameters, x
and y, should be less than 256.
unsigned char x; // Interleave bits of (8-bit) x and y, so that all of the
unsigned char y; // bits of x are in the even positions and y in the odd;
unsigned short z; // z gets the resulting 16-bit Morton Number.

z = ((x * 0x0101010101010101ULL & 0x8040201008040201ULL) *


0x0102040810204081ULL >> 49) & 0x5555 |
((y * 0x0101010101010101ULL & 0x8040201008040201ULL) *
0x0102040810204081ULL >> 48) & 0xAAAA;

Holger Bettag was nsp red to suggest th s techn que on October 10, 2004 after read ng the mult ply-based
b t reversals here.

Interleave b ts by B nary Mag c Numbers


static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF};
static const unsigned int S[] = {1, 2, 4, 8};

unsigned int x; // Interleave lower 16 bits of x and y, so the bits of x


unsigned int y; // are in the even positions and bits from y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.
// x and y must initially be less than 65536.

x = (x | (x << S[3])) & B[3];


x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];

y = (y | (y << S[3])) & B[3];


y = (y | (y << S[2])) & B[2];
y = (y | (y << S[1])) & B[1];
y = (y | (y << S[0])) & B[0];

z = x | (y << 1);

Determ ne f a word has a zero byte


// Fewer operations:
unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);

https://graph cs.stanford.edu/~seander/b thacks.html 27/30


09.12.2017 B t Tw ddl ng Hacks

The code above may be useful when do ng a fast str ng copy n wh ch a word s cop ed at a t me; t uses 5
operat ons. On the other hand, test ng for a null byte n the obv ous ways (wh ch follow) have at least 7
operat ons (when counted n the most spar ng way), and at most 12.
// More operations:
bool hasNoZeroByte = ((v & 0xff) && (v & 0xff00) && (v & 0xff0000) && (v & 0xff000000))
// OR:
unsigned char * p = (unsigned char *) &v;
bool hasNoZeroByte = *p && *(p + 1) && *(p + 2) && *(p + 3);

The code at the beg nn ng of th s sect on (labeled "Fewer operat ons") works by f rst zero ng the h gh b ts of
the 4 bytes n the word. Subsequently, t adds a number that w ll result n an overflow to the h gh b t of a byte
f any of the low b ts were n t aly set. Next the h gh b ts of the or g nal word are ORed w th these values;
thus, the h gh b t of a byte s set ff any b t n the byte was set. F nally, we determ ne f any of these h gh b ts
are zero by OR ng w th ones everywhere except the h gh b ts and nvert ng the result. Extend ng to 64 b ts s
tr v al; s mply ncrease the constants to be 0x7F7F7F7F7F7F7F7F.

For an add t onal mprovement, a fast pretest that requ res only 4 operat ons may be performed to determ ne
f the word may have a zero byte. The test also returns true f the h gh byte s 0x80, so there are occas onal
false pos t ves, but the slower and more rel able vers on above may then be used on cand dates for an overall
ncrease n speed w th correct output.

bool hasZeroByte = ((v + 0x7efefeff) ^ ~v) & 0x81010100;


if (hasZeroByte) // or may just have 0x80 in the high byte
{
hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
}

There s yet a faster method — use hasless(v, 1), wh ch s def ned below; t works n 4 operat ons and
requ res no subsquent ver f cat on. It s mpl f es to

#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)

The subexpress on (v - 0x01010101UL), evaluates to a h gh b t set n any byte whenever the correspond ng
byte n v s zero or greater than 0x80. The sub-express on ~v & 0x80808080UL evaluates to h gh b ts set n
bytes where the byte of v doesn't have ts h gh b t set (so the byte was less than 0x80). F nally, by AND ng
these two sub-express ons the result s the h gh b ts set where the bytes n v were zero, s nce the h gh b ts set
due to a value greater than 0x80 n the f rst sub-express on are masked off by the second.

Paul Messmer suggested the fast pretest mprovement on October 2, 2004. Juha Järv later suggested
hasless(v, 1) on Apr l 6, 2005, wh ch he found on Paul Hs eh's Assembly Lab; prev ously t was wr tten n
a newsgroup post on Apr l 27, 1987 by Alan Mycroft.

Determ ne f a word has a byte equal to n


We may want to know f any byte n a word has a spec f c value. To do so, we can XOR the value to test w th
a word that has been f lled w th the byte values n wh ch we're nterested. Because XOR ng a value w th
tself results n a zero byte and nonzero otherw se, we can pass the result to haszero.
#define hasvalue(x,n) \
(haszero((x) ^ (~0UL/255 * (n))))

Stephen M Bennet suggested th s on December 13, 2009 after read ng the entry for haszero.

Determ ne f a word has a byte less than n

Test f a word x conta ns an uns gned byte w th value < n. Spec f cally for n=1, t can be used to f nd a 0-byte
by exam n ng one long at a t me, or any byte by XOR ng x w th a mask f rst. Uses 4 ar thmet c/log cal
operat ons when n s constant.
https://graph cs.stanford.edu/~seander/b thacks.html 28/30
09.12.2017 B t Tw ddl ng Hacks

Requ rements: x>=0; 0<=n<=128

#define hasless(x,n) (((x)-~0UL/255*(n))&~(x)&~0UL/255*128)

To count the number of bytes n x that are less than n n 7 operat ons, use
#define countless(x,n) \
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)

Juha Järv sent th s clever techn que to me on Apr l 6, 2005. The countless macro was added by Sean
Anderson on Apr l 10, 2005, nsp red by Juha's countmore, below.

Determ ne f a word has a byte greater than n


Test f a word x conta ns an uns gned byte w th value > n. Uses 3 ar thmet c/log cal operat ons when n s
constant.

Requ rements: x>=0; 0<=n<=127

#define hasmore(x,n) (((x)+~0UL/255*(127-(n))|(x))&~0UL/255*128)

To count the number of bytes n x that are more than n n 6 operat ons, use:
#define countmore(x,n) \
(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)

The macro hasmore was suggested by Juha Järv on Apr l 6, 2005, and he added countmore on Apr l 8, 2005.

Determ ne f a word has a byte between m and n

When m < n, th s techn que tests f a word x conta ns an uns gned byte value, such that m < value < n. It uses
7 ar thmet c/log cal operat ons when n and m are constant.

Note: Bytes that equal n can be reported by likelyhasbetween as false pos t ves, so th s should be checked by
character f a certa n result s needed.

Requ rements: x>=0; 0<=m<=127; 0<=n<=128

#define likelyhasbetween(x,m,n) \
((((x)-~0UL/255*(n))&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)

Th s techn que would be su table for a fast pretest. A var at on that takes one more operat on (8 total for
constant m and n) but prov des the exact answer s:
#define hasbetween(x,m,n) \
((~0UL/255*(127+(n))-((x)&~0UL/255*127)&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)

To count the number of bytes n x that are between m and n (exclus ve) n 10 operat ons, use:
#define countbetween(x,m,n) (hasbetween(x,m,n)/128%255)

Juha Järv suggested likelyhasbetween on Apr l 6, 2005. From there, Sean Anderson created hasbetween and
countbetween on Apr l 10, 2005.

Compute the lex cograph cally next b t permutat on


Suppose we have a pattern of N b ts set to 1 n an nteger and we want the next permutat on of N 1 b ts n a
lex cograph cal sense. For example, f N s 3 and the b t pattern s 00010011, the next patterns would be

https://graph cs.stanford.edu/~seander/b thacks.html 29/30


09.12.2017 B t Tw ddl ng Hacks

00010101, 00010110, 00011001,00011010, 00011100, 00100011, and so forth. The follow ng s a fast way to
compute the next permutat on.
unsigned int v; // current permutation of bits
unsigned int w; // next permutation of bits

unsigned int t = v | (v - 1); // t gets v's least significant 0 bits set to 1


// Next set to 1 the most significant bit to change,
// set to 0 the least significant ones, and add the necessary 1 bits.
w = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));

The __bu lt n_ctz(v) GNU C comp ler ntr ns c for x86 CPUs returns the number of tra l ng zeros. If you are
us ng M crosoft comp lers for x86, the ntr ns c s _B tScanForward. These both em t a bsf nstruct on, but
equ valents may be ava lable for other arch tectures. If not, then cons der us ng one of the methods for
count ng the consecut ve zero b ts ment oned earl er.

Here s another vers on that tends to be slower because of ts d v s on operator, but t does not requ re
count ng the tra l ng zeros.
unsigned int t = (v | (v - 1)) + 1;
w = t | ((((t & -t) / (v & -v)) >> 1) - 1);

Thanks to Dar o Sne derman s of Argent na, who prov ded th s on November 28, 2009.

A Beloruss an translat on (prov ded by Webhost ngrat ng) s ava lable.

https://graph cs.stanford.edu/~seander/b thacks.html 30/30

You might also like