Linear Time Construction of Suffix Tree: Presented by Dr. Shazzad Hosain Asst. Prof. EECS, NSU
Linear Time Construction of Suffix Tree: Presented by Dr. Shazzad Hosain Asst. Prof. EECS, NSU
Suffix Tree
Presented By
Dr. Shazzad Hosain
Asst. Prof. EECS, NSU
Suffix tree
S=xabxac 1
= abxac 2 S=xabxac
= bxac 3
= xac 4
= ac 5
= c 6
Suffix tree
S=xabxa 1
= abxa 2 S=xabxa
= bxa 3
= xa 4 x
a a
= a 5 b
b x b
x a x
a a
Suffix tree (Example)
Let s=abab, a suffix tree of s contains all the
suffixes of s=abab$
{ $
$ a b
b
b$ $
ab$ a
a $ b
bab$ b $
abab$ $
}
Trivial algorithm to build a Suffix tree
s=abab$
a
b
Put the largest suffix in a
b
$
a b
Put the suffix bab$ in b a
a b
b $
$
{
abab$ a b
b a
bab$ a b
b $
$
}
We will also label each leaf with the starting point of the corres.
suffix.
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1
Naive Construction – More Example
abbcbab#
ab
cbab#
#
6 b 4
abbcbab#
bbcbab#
bcbab#
# ab#
1 7 5
cbab#
bcbab#
3
2
Analysis
Takes O(n2) time to build.
1. Remove the terminal symbols $ from the edge labels of the tree
2. Then remove any edge that has no label
Implicit Suffix Tree – More Example
{ $
abab$ a b 5
b
bab$ $
a
ab$ a $ b 4
b $
b$ $ 3
2
1
$
}
{ $
a
abab$
b b 5
bab$ $
a
ab$ b $ a 4
b
b$ $ 3
$
1 2
$
}
High-level of Ukkonen’s Algorithm
• Ukkonen’s algorithm is divided into m phases. In phase i+1,
tree i+1 is constructed from i
• Each phase i+1 is further divided into i+1 extensions, one for
each of the i+1 suffixes of S[1… i+1].
ab b
phases
a
: S[1…1] {a}
1
b b
2 : S[1…2] {ab, b}
a
3 : S[1…3] {aba, ba, a} a
extensions 1 2
O (m3)
a
: S[1…1] {a}
1
b b
2 : S[1…2] {ab, b}
a
3 : S[1…3] {aba, ba, a} a
extensions 1 2
Suffix Entension Rules
Let i already there and want to extend for i+1
Rule1: Let β = S[j … i] be a suffix of S[1 … i]. If path β ends at a leaf, character
S(i+1) is added to the end of the label of that leaf edge.
Rule2: some path from the end of string β starts with character S(i+1). In
this case the string β S(i+1) is already in the tree. So do nothing.
a
: S[1…1] {a}
1
b b
2 : S[1…2] {ab, b}
β S(i+1) a
3 : S[1…3] {aba, ba, a} a
1 2 b
b
4 : S[1…4] {abab, bab, ab, b}
1 2 3 1 2
Suffix Entension Rules
Let, i already there and want to extend for i+1
123456 O (m3)
Let, 5 is drawn for axabxb
axabxb
RULE1
xabxb
abxb
bxb
xb RULE3
b RULE2
Rule3: No path from the end of string β starts with character S(i+1), but at
least one labeled path continues from the end of β. Add new node.
Implementation and Speedup, Suffix Links
Definition: Let xα denotes an arbitrary string, where x is a single
character and α a substring (possibly empty). For an internal
node v with path-label xα, if there is another node s(v) with
path-label α, then a pointer from v to s(v) is called a suffix link.
Does root have a suffix link? No, because not an internal node
Every internal node has a suffix link.
Suffix Links – More Example
v ab
cbab#
#
6 b 4
bcbab# S(v)
abbcbab# # ab#
1 7 5
Suffix link cbab#
bcbab#
3
zabcdefghy
Nodes
Edge length 2 2 3 3
v=2 s(v)=1
s(v)=3
v=3
v=4 s(v)=5
Lemma 6.1.2: Let (v, s(v)) be any suffix link traversed during
Ukkonen’s algorithm. At that moment , the node-depth of v is
at most one greater than the node depth of s(v).