0% found this document useful (1 vote)
708 views

Horowitz and Sahani, Fundamentals of Computer Algorithms, 2ND Edition PDF

Uploaded by

CSEDept GIET-HYD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (1 vote)
708 views

Horowitz and Sahani, Fundamentals of Computer Algorithms, 2ND Edition PDF

Uploaded by

CSEDept GIET-HYD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 777
ELuIsS HOROWITZ SARTAJ SAHNI SANGUTHEVAR RAJASEKARAN COMPUTER ALGORITHMS COMPUTER SCIENCE PRESS Allred V. Aho, Columbia University Jeffrey D Ullman, Stanford University Foundations of Computer Science: Pascal Edition Foundations of Computer Science: C Edition Michael J. Clancy, University of California at Berkeley Marcia C. Linn, University of California at Berkeley Designing Pascal Solutions: A Case Study Approach Designing Pascal Solutions: Case Studies Using Data Structures ‘A. K. Dewdney, University of Western Ontario The New Turing Omnibus: 66 Excursions in Computer Science Introductory Computer Science: Bits of Theory, Bytes of Practice Robert Floyd, Stanford University Richard Beigel, Yale University The Language of Machines: An Introduction to ‘Computability and Formal Languages Michael R. Garey, Bell Laboratories David S. Johnson, Bell Laboratories Computers and Intractability: A Guide to the Theory of NP-Completeness Judith L. Gersting, University of Hawaii at Hilo Mathematical Structures for Computer Science, Third Edition Visual Basic® Programming: A Laboratory Approach Ellis Horowitz, University of Southern California Sartaj Sahni, University of Florida Fundamentals of Data Structures in Pascal, Fourth Edition Ellis Horowitz, University of Southern California Sartaj Sahni, University of Florida Susan Anderson-Freed, Illinois Wesleyan University Fundamentals of Data Structures in C Ellis Horowitz, University of Southern California Sartaj Sahni, University of Florida Dinesh Mehta, University of Tennessee Space Institute Fundamentals of Data Structures in C++ Ellis Horowitz, University of Southern California Sartaj Sahni, University of Florida Sanguthevar Rajasekaran, University of Florida Computer Algorithms Ellis Horowitz, University of Southern California Sartaj Sahni, University of Florida Sanguthevar Rajasekaran, University of Florida Computer Algorithms/C++ ‘Thomas W. Parsons, Hofstra University Introduction to Compiler Construction Gregory J. E. Rawlins, Indiana University Compared to What?: An Introduction to the Analysis of Algorithms Wei-Min Shen, Microelectronics and Computer ‘Technology Corporation Autonomous Learning from the Environment James A. Storer, Brandeis University Data Compression: Methods and Theory Steven Tanimoto, University of Washington Elements of Artificial Intelligence Using Common Lisp, Second Edition Kim W. Tracy, Bell Labs/Lucent Technologies, Ine, Peter Bouthoorn, Griningen University Object-Oriented Artificial Intelligence Using CH Jeffrey D. Ullman, Stanford University Principles of Database and Knowledge-Base Systems, Vol I: Classical Database Systems Principles of Database and Knowledge-Base Systems, Vol I: The New Technologies COMPUTER ALGORITHMS Ellis Horowitz University of Southern California Sartaj Sahni University of Florida Sanguthevar Rajasekaran University of Florida & Computer Science Press An imprint of W. H. Freeman and Company New York Acqisitions Editor: Richard Bonacci Project Editor: Penelope Hull Text Designer: The Authors Text Illustrations: The Authors Cover Designer: A Good Thing Cover Illustration: Tomek Olbinski Production Coordinator: Sheila Anderson Composition: The Authors Manufacturing: R R Donnelley & Sons Company Library of Congress Cataloging-in-Publication Data Horowitz, Ellis. Computer algorithms / Ellis Horowitz, Sartaj Sahni, Sanguthevar Rajasekaran. P. om, Includes bibliographical references and index. ISBN 0-7167-8316-9 1. Computer algorithms. 2. Pseudocode (Computer program language). I. Sahni, Sartaj. II. Rajasekaran, Sanguthevar. III, Title. QA76.9.443H67 1998 005.1NDC21 97-20318 cIp © 1998 by W. H. Freeman and Company. Alll rights reserved. No part of this book may be reproduced by any mechanical, photographic, or electronic process, or in the form of a phonographic recording, nor may it be stored in a retrieval system, transmitted, or otherwise copied for public or private use, without written permission from the publisher. Printed in the United States of America First printing, 1997 Computer Science Press An imprint of W. H. Freeman and Company 41 Madison Avenue, New York, New York 10010 Houndmills, Basingstoke RG21 6XS, England To my nuclear family, MARYANNE, PIPL, CHAN OCH, and TRA — Ellis Horowitz To, NEETA, AGAM, NEHA, and PARAM ~ Sartaj Sahni KEERAN, KRISHNA, PANDT, and PONNUTHAL ~— Sanguthevar Rajasekaran Contents PREFACE 1 INTRODUCTION ll ea] 14 15 WHAT IS AN ALGORITHM? .........-.-5 ALGORITHM SPECIFICATION . 1.2.1 Pseudocode Conventions 1.2.2 Recursive Algorithms PERFORMANCE ANALYSIS 1.3.1 Space Complexity 1.3.2. Time Complexity . 1 Asymptotic Notation (O, 2, @) 1.3.4 Practical Complexitie 1.3.5 Performance Measurement . . RANDOMIZED ALGORITHMS . . 1.4.1 Basics of Probability Theory 1.4.2. Randomized Algorithms: An Informal Description . . 1.4.3 Identifying the Repeated Element 1.4.4 Primality Testing . Le 145 Advantages and Disadvantages: REFERENCES AND READINGS 2 ELEMENTARY DATA STRUCTURES 21 2 STACKS AND Queues TREES 221 Terminology 2.2.2 Binary Trees DICTIONARIES . 2.3.1 Binary Search ‘Trees. 2.3.2 Cost Amortization . . xv Be 69 69 76 oe 78 83 89 viii CONTENTS 24 PRIORITY QUEUES 91 24.1 Heaps...... 0. 92 Heapsort 99 25 SETS AND DISJOINT SET UNION . 101 Introduction 7 101 . Union and Find Opera - 102 2.6 GRAPHS . . +. 2 2.6.1 Introduction : ee 1D) 2.6.2 Definitions . . . ee 112 Graph Representations na0 eis 2.7. REFERENCES AND READINGS ......... poco hd DIVIDE-AND-CONQUER 127 GENERAL METHOD . . . .. 27 BINARY SEARCH... .. . 131 FINDING THE MAXIMUM AND MINIMUM... 2... 139 MERGE SORT . 145 QUICKSORT . 154 3.5.1 Performance Measurement 3.5.2. Randomized Sorting Algorithms 159 30; SELECTION sete et 165 3.6.1 A Worst-Case _ Algorithm 169 62 Tuplamentation cfSelecta) ere t 72 3.7 STRASSEN’S MATRIX MULTIPLICATION ........- 179 3.8 CONVEX HULL 3.8.1 Some Geometric Primitives... . 3.8.2 The QuickHull Algorithm . . . 185 3.8.3 Graham’s Scan... . 187 4 An O(nlogn) Divide-and-Conquer Algorithm... .. 188 3.9 REFERENCES AND READINGS ... . . 193 3.10 ADDITIONAL EXERCISES 194 THE GREEDY METHOD 197 4.1 THE GENERAL METHOD 4.2 KNAPSACK PROBLEM . tee we 198 4.3 TREE VERTEX SPLITTING. ..........000- 77203 4.4 JOB SEQUENCING WITH DEADLINES... ...... + 208 4.5 MINIMUM-COST SPANNING TREES . : 216 4.5.1 Prim’s Algorithm. .........-. aaa ne CONTENTS ix 45.2 Kruskal’s Algorithm... 02.2.0 .000008 . . 220 453 An Optimal Randomized Algorithm («) ........ 225 4.6 OPTIMAL STORAGE ON TAPES . . , . 229 4.7 OPTIMAL MERGE PATTERNS... . . . . 234 4.8 SINGLE-SOURCE SHORTEST PATHS . 241 4.9 REFERENCES AND READINGS . See ee 249 4.10 ADDITIONAL EXERCISES ...........-... - 250 5 DYNAMIC PROGRAMMING 253 5.1 THE GENERAL METHOD................... 253 5.2. MULTISTAGE GRAPHS 257 5.3 ALL PAIRS SHORTEST PATHS . 265 5.4 SINGLE-SOURCE SHORTEST PATHS: . GENERAL WEIGHTS ......... 270 5.5 OPTIMAL BINARY SEARCH TREES (+) 275 5.6 STRING EDITING - + 284 5.7 O/1-KNAPSACK . . . 5 . 287 5.8 RELIABILITY DESIGN. ..............5 sees 295 5.9 THE TRAVELING SALESPERSON PROBLEM . . 298 5.10 FLOW SHOP SCHEDULING . .. 301 5.11 REFERENCES AND READINGS . .. 307 5.12 ADDITIONAL EXERCISES - 308 6 BASIC TRAVERSAL AND SEARCH TECHNIQUES 6.1 TECHNIQUES FOR BINARY TREES 6.2. TECHNIQUES FOR GRAPHS . 6.2.1 Breadth First Search and ‘Trav 6.2.2 Depth First Search and Traversal . : 6.3 CONNECTED COMPONENTS AND SPANNING TREES : 6.4 BICONNECTED COMPONENTS AND DFS 6.5 REFERENCES AND READINGS 7 BACKTRACKING 339 7.1 THE GENERAL METHOD..... 339 7.2. THE 8-QUEENS PROBLEM .. . 353 7.3 SUM OF SUBSETS. 7.4 GRAPH COLORING 7.5 HAMILTONIAN CYCLES 7.6 KNAPSACK PROBLEM x CONTENTS 7.7 REFERENCES AND READINGS . . 7.8 ADDITIONAL EXERCISES - 374 - 375 8 BRANCH-AND-BOUND Ble CHE METHOD) fest es Least Cost (LC) Search... . The 15-puzzle: An Example . . Control Abstractions for LC-Search Bounding 15 FIFO Branch-and-Bound 8.1.6 LC Branch-and-Bound 8.2. 0/1 KNAPSACK PROBLEM 8.2.1 LC Branch-and-Bound Solution 8.2.2 FIFO Branch-and-Bound Solution . . 8.3. TRAVELING SALESPERSON (+) 8.4 EFFICIENCY CONSIDERATIONS 8.5 REFERENCES AND READINGS 9 ALGEBRAIC PROBLEMS 417 9.1 THE GENERAL METHOD 417 9.2 EVALUATION AND INTERPOLATION 420 9.3 THE FAST FOURIER TRANSFORM 430 9.3.1 An In-place Version of the FFT » 435 9.3.2 Some Remaining Points ... . 438 9.4 MODULAR ARITHMETIC . 440 95 EVEN FASTER EVALUATION AND INTERPOLATION. 448 9.6 REFERENCES AND READINGS ............005 456 10 LOWER BOUND THEORY 457 10.1 COMPARISON TREES 458 10.1.1 Ordered Searching . -. 459 10.1.2 Sorting . . 459 O13 Selection ete . 464 10.2 ORACLES AND ADVERSARY ARGUMENTS - 466 10.2.1 Merging 467 10.22 Largest and Second Largest... 0.0. 468 10.2.3 State Space Method ...... 470 10.2.4 Selection 471 10.3 LOWER BOUNDS THROUGH REDUCTIONS 474 CONTENTS 10.3.1 Finding the Convex Hull . 10.3.2 s Problem Multiplying ‘Triangular Matrices Lee Inverting a Lower Triangular Matrix... 2... 5. 4 .3.6 Computing the Transitive Closure . . . 10.4 TECHNIQUES FOR ALGEBRAIC PROBLEMS (+) . 10.5 REFERENCES AND READINGS 11. NP-HARD AND NP-COMPLETE PROBLEMS 11.1 BASIC CONCEPTS 11.1.1 Nondeterminii 11.1.2 The classes NP- 11.2 COOK’S THEOREM (« 11.3 NP-HARD GRAPH PROBLEMS . . 11.3.1, Clique Decision Problem (CDP) 11.3.2. Node Cover Decision Problem [gorithm hard and NF ) 11.3.3. Chromatic Number Decision Problem (CNDP) 11.3.4 Directed Hamiltonian Cycle (DHC) (+) 11.3.5 ‘Traveling Salesperson D ao -6 AND/OR Graph Decision Problem (AOG) ..... . 11.4 NP-HARD SCHEDULING PROBLEMS i 11.4.1 Scheduling Identical Processor 12 APPROXIMATION ALGORITHMS 12.1 INTRODUCTION . . 1?.2 ABSOLUTE APPROXIMATIONS ww Shop Scheduling .. . 11.4.3 Job Shop Scheduling... . 11.5 NP-HARD CODE GENERATION PROBLEMS 1.5.1 Code Generation With Common Subexpi 11.5.2. Implementing Parallel Assignment Instruct 11.6 SOME SIMPLIFIED A’P-HARD PROBLEMS. . 11.7 REFERENCES AND READINGS 11.8 ADDITIONAL EXERCISES 12.2.1. Planar Graph Coloring 122.2 Maximum Programs Stored Problem . . . 12.2.3. N’P-hard Absolute Approximations 12.3 «APPROXIMATIONS sion Problem (TSP) ee oar xi . 533 534 .. 536 .. 538 .. 540 - 542 . 546 - 550 - 553 553, 13 CONTENTS 12.3.1. Scheduling Independent Tas! 12.3.2. Bin Packing 12.3.3. NP-hard Approximation Problems 12.4 POLYNOMIAL TIME APPROXIMATION SCHEMES 12.4.1 Scheduling Independent Tasks 1242) 0/1 Knepesde eee 12.5 FULLY POLYNOMIAL TIME ...... APPROXIMATION SCHEMES 12.5.1 Rounding . . 12.5.2. Interval Partitioning 12.5.3. Separation 12.6 PROBABILISTICALLY GOOD ALGORITHMS (+) . . 12.7 REFERENCES AND READINGS ...........- 12.8 ADDITIONAL EXERCISES ..........000.0005 PRAM ALGORITHMS 605 13.1 INTRODUCTION ..... 13.2, COMPUTATIONAL MODEL . 13.3 FUNDAMENTAL TECHNIQUES AND ALGORITHMS . . 615 13.3.1, Prefix Computation 615 .3.2 List Ranking... . 618 SELECTION ....... 627 13.4.1 Maximal Selection With n? Pro 627 13.4.2 Finding the Maximum Using n Pro 628 13.4.3. Maximal Selection Among Integers 13.4.4. General Selection Using n? Processors 13.4.5 A Work-Optimal Randomized Algorithm (*) 13.5 MERGING 13.5.1 A Logarithmic 13.5.2. Odd-Even Merge . 13.5.3 A Work-Optimal Algorithm . . . . 640 1 An O(log logm)-Time Algorithm . .. 641 SORTING... 0... ee eee 643 13.6.1 Odd-Even Merge Sort 643 13.6.2 An Alternative Randomized Algorithm 13.6.3 Preparata’s Algorithm»... .... 13.6.4 Reischuk’s Randomized Algorithm (+) GRAPH PROBLEMS . . 13.7.1. An Alternative Algorithm for Trans 13. = 636 636 637 13. & 13. g CONTENTS 13.7.2 All-Pairs Shortest Paths... . . Lee 13.8 COMPUTING THE CONVEX HULL .... . 13.9 LOWER BOUNDS... 20... . 13.9.1 A lower bound on average case sorting - 13.9.2 Finding the maximum... . . 13.10REFERENCES AND READINGS . 13.1LADDITIONAL EXERCISES 14 MESH ALGORITHMS 14.1 COMPUTATIONAL MODEL. . . 14.2 PACKET ROUTING 14.2.1 Packet Routing on a Linear Array .. 2... 14.2.2 A Greedy Algorithm for PPR on a Mesh . 14.2.3 A Randomized Algorithm With Small Queues FUNDAMENTAL ALGORITHMS ...... 14.3.1 Broadcasting 14.3.2. Prefix Computation u SELECTION .. 14.4.1 A Randomized Algorithm for n 14.4.2 Randomized Selection For n > p (+) 14.4.3 A Determini MERGING ......-....0-45 14.5.1 Rank Merge on a Linear Array 14.5.2 Odd-Even Merge on a Linear Array 14.5.3 Odd-Even nae on a Mesh. . SORTING . a 14.6.1 Sorting ¢ on a Linear Array 14.6.2 Sorting on a Mesh . . . . GRAPH PROBLEMS .. . 14.7.1 Ann xn Mesh Algorithm for Transitive 14.7.2 All Pairs Shortest Paths .........-.-- 14.8 COMPUTING THE CONVEX HULL ......- Ms & Ma & la Ee 11.9 REFERENCES AND READINGS ............- 14.10ADDITIONAL EXERCISES ............-- 15 HYPERCUBE ALGORITHMS 15.1 COMPUTATIONAL MODEL.........-00+005 . 699 . . - 708 Closure xiii 655 - 656 . . 659 . 660 oceans . 663 . 665 667 . . 667 - » 669 cee ... 674 .. 676 . 679 681 .» 681 . 685, . + 686 -. 691 .. 691 » » 692 » . 692 . 698 698, 699 701 701 703 710 .7u 713 718 ae CONTENTS 15.1.1 The Hypercube... . . 723 15.1.2. The Butterfly Network... ... 726 15.1.3 Embedding Of Other Networks . - . 727 15.2 PPR ROUTING 15.2.1 A Greedy Algorithm an 15.2.2 A Randomized Algorithm ..... 15.3 FUNDAMENTAL ALGORITHMS .. . . 15.3.1 Broadcasting... 15.3.2 Prefix Computation 15.3.3 Data Concentration... 2... . Sparse Enumeration Sort 15.4 SELECTION .... 1.4.1 A Randomized Algorithm for n = 15.4.2 Randomized Selection For n > p (* 7 15.4.3 A Deterministic Algorithm Forn>p ......... 745 15.5 MERGING .. . a oe 5.5.1 Odd-Even Merge . « 15.5.2. Bitonic Merge 15.6 SORTING . 15.6.1 Odd-Even Merge Sort 15.6.2 Bitonic Sort... . rn 15.7 GRAPH PROBLEMS ........... 753 15.8 COMPUTING THE CONVEX HULL . . 755 15.9 REFERENCES AND READINGS . . ao ves. TET 15.10ADDITIONAL EXERCISES 758 INDEX 761 PREFACE If we try to identify those contributions of computer science which will be long lasting, surely one of these will be the refinement of the concept called algorithm. Ever since man invented the idea of a machine which could per- form basic mathematical operations, the study of what can be computed and how it can be done well was launched. This study, inspired by the computer, has led to the discovery of many important algorithms and design methods. The discipline called computer science has embraced the study of algorithms as its own. It is the purpose of this book to organize what is known about them in a coherent fashion so that students and practitioners can learn to devise and analyze new algorithms for themselves. A book which contains every algorithm ever invented would be exceed- ingly large. Traditionally, algorithms books proceeded by examining only a small number of problem areas in depth. For each specific problem the most efficient algorithm for its solution is usually presented and analyzed. This approach has one major flaw. Though the student sees many fast algorithms and may master the tools of analysis, she/he remains unconfident about how to devise good algorithms in the first place. The missing ingredient is a lack of emphasis on design techniques. A knowledge of design will certainly help one to create good algorithms, yet without the tools of analysis there is no way to determine the quality of the result. This observation that design should be taught on a par with analysis led us to a more promising line of approach: namely to organize this book around some fundmental strategies of algorithm design. The number of ba- sign strategies is reasonably small. Moreover all of the algorithms one would typically wish to study can easily be fit into these categories; for exam- ple, mergesort and quicksort are perfect examples of the divide-and-conquer strategy while Kruskal’s minimum spanning tree algorithm and Dijkstra’s single source shortest path algorithm are straight forward examples of the greedy strategy. An understanding of these strategies is an essential first step towards acquiring the skills of design. Though we strongly feel that the emphasis on design as well as analysis is the appropriate way to organize the study of algorithms, a cautionary remark is in order. First, we have not included every known design principle. xv xvi PREFACE One example is linear programming which is one of the most successful techniques, but is often discussed in a course of its own. Secondly, the student should be inhibited from taking a cookbook approach to algorithm design by assuming that each algorithm must derive from only a single technique. This is not so. ‘A major portion of this book, Chapters 3 through 9, deal with the dif- ferent design strategies. First each strategy is described in general terms Typically a “program abstraction” is given which outlines the form that the computation will take if this strategy can be applied. Following this there are a succession of examples which reveal the intricacies and varieties of the general strategy. The examples are somewhat loosely ordered in terms of increasing complexity. The type of complexity may arise in several ways. Usually we begin with a problem which is very simple to understand and requires no data structures other than a one-dimensional array. For this problem it is usually obvious that the design strategy yields a correct solu- tion. Later examples may require a proof that an algorithm based on this design technique does work. Or, the later algorithms may require more so- phisticated data structures (e.g., trees or graphs) and their analyses may be more complex. ‘The major goal of this organization is to emphasize the arts of synthesis and analysis of algorithms. Auxiliary goals are to expose the student to good program structure and to proofs of algorithm correctness. The algorithms in this book are presented in a pseudocode that resem- bles C and Pascal. Section 1.2.1 describes the pseudocode conventions. Ex- ecutable versions (in C++) of many of these algorithms can be found in our home page. Most of the algorithms presented in this book are short and the language constructs used to describe them are simple enough that any one can understand. Chapters 13, 14, and 15 deal with parallel computing. Another special feature of this book is that we cover the area of random- ized algorithms extensively. Many of the algorithms discussed in Chapters 13, 14, and 15 are randomized. Some randomized algorithms are presented in the other chapters as well. An introductory one quarter course on parallel algorithms might cover Chapters 13, 14, and 15 and perhaps some minimal additional material. We have identified certain sections of the text (indicated with (*)) that are more suitable for advanced courses. We view the material presented in this book as ideal for a one semester or two quarter course given to juniors, seniors, or graduate students. It does require prior experience with pro- gramming in a higher level language but everything else is self-contained. Practically speaking, it seems that a course on data structures is helpful, if only for the fact that the students have greater programming maturity. For aschool on the quarter system, the first quarter might cover the basic design techniques as given in Chapters 3 through 9: divide-and-conquer, the greedy method, dynamic programming, search and traversal, backtracking, branch- and-bound, and algebraic methods (see TABLE 1). The second quarter would cover Chapters 10 through 15: lower bound theory, A’P-completeness and PREFACE xvii approximation methods, PRAM algorithms, Mesh algorithms and Hyper- cube algorithms (sce TABLE I1). [Subject [ Reading Introduction T1tol3 Introduction ~ 4 Data structures 2.1, 2.2 7 || Data structures 2.3 to 2.6 4 || Divide-and-conquer Chapter 3 Assignment I due D || The greedy method Chapter 4 Exain I Dynamic programming Chapter 5 Search and traversal techniques || Chapter 6 igument IT due \ ! || Backtracking Chapter 7 Branch-and-bound Chapter 8 ‘Algebraic methods Chapter 9 Assignment IIT due H Exam II TABLE I: FIRST QUARTER For a er schedule where the student has not been exposed to data structures and O-notation, Chapters 1 through 7, 11, and 13 is about the right. amount of material (see TABLE IIT) ‘A more rigorous pace would cover Chapters 1 to 7, 11, 13, and 14 (see TABLE IV). An advanced course, for those who have prior knowledge about data structures and O notation, might consist of Chapters 3 to 11, and 13 to 15 (see TABLE V). Programs for most of the algorithms given in this book are available from the following URL: http://www.cise.ufl.edu/~raj/BOOK.html. Please send your comments to [email protected]. For homework there are numerous exercises at the end of each chapter. The most popular and instructive homework assignment we have found is one which requires the student to execute and time two programs using the same data sets. Since most of the algorithms in this book provide all the implementation details, they can be easily made use of. Translating these algorithms into any programming language should be easy. The problem then reduces to devising suitable data sets and obtaining timing results. ‘The timing results should agree with the asymptotic analysis that was done xviii PREFACE [Week ]| Subject | Reading T___|| Lower bound theory 10.1 to 10. 2 Tower bound theory 10.4 NP-complete and NP-hard problems |) 11.1, 11.2 3 NP-complete and NP-hard problems | 11.3, 11.4 7 NP-complete and NP-hard problems | 11.5, 11.6 Approximation algorithms | 12.1, 12.2 ignment I due |e Approximation algorithms 3 to 12.6 Exam I 6 PRAM algorithms 13.1 to 13.4 [7 PRAM algorithms ~ 13.5 to 13.9 | Assignment II due 8 Mesh algorithms Td] to 14.5 9 Mesh algorithms 14.6 to 14.8 Hypercube algorithms 15.1 to 15.3 10 || Hypercube algorithms 15.4 to 15.8 Assignment III due | Exam II TABLE II: SECOND QUARTER, PREFACE PRAM algorithms ‘TABLE III: SEMESTER ~ Medium pace (no prior exposure) [ Week ][ Subject | Reading T Tatroduction - TI to 13 2 | Introduction 14 Data structures 2.1, 2.2 3 Data structures 23 to 2.6 14 Divide-and-conquer 3.1 to 34 ~ Assignment I due 5 Divide-and-conquer 3.5 to 3.7 Exam I 6 The greedy method FI to 44 7 The greedy method 25 to 4.7 Assignment II due 8 Dynamic programming 5.1 to 5.5 9 | Dynamic programming 5.6 to 5.10 T0 Search and traversal 6.1 to 64 Assignment III due Exam IL TI “Backtracking 71 to 7.3 T2 Backtracking 74 to 7.6 TS NP-complete and NP-hard problems || 11.1 to 11.3 Assignment IV due It__|| WP-complete and NP-hard problems || 11.4 to 11.6 Th PRAM algorithms 13.1 to 15.4 [16 13.5 to 13.9 Assignment V due Exam IIT xx PREFACE [ Week T Subject Reading T Tntroduction T1tol3 2 Tntroduction 14 Data structures 2.1, 2.2 3 Data structures 23 to 26 q Divide-and-conquer 3.1 to 3.5 Assignment I due_|} 5 Divide-and-conquer 3.6 to 3.7 The greedy method 4.1 to 4.3 Exam I 6 The greedy method 44 to 47 7 Dynamic programming 5.1 to 5.7 Assignment II due 8 Dynamic programming BB to 5.10 Search and traversal 6.1 to 6.2 9 ‘Search and traversal te 6.3, 64 Backtracking 7.1, 7.2 10 Backtracking 7.3 to 7.6 Assignment III due |} Exam II TI NP-hard and NP-complete problems || 11.1 to 113 12 | WP-hard and NP-complete problems || 11.4 to 11.6 TS PRAM algorithms 13.1 to 13.4 Assignment IV due if PRAM algorithms 13.5 to 13.9 _ 15, Mesh algorithms 14.1 to 143 16 Mesh algorithms 14.4 to 148 Assignment V due Exam IIL TABLE IV: SEMESTER — Rigorous pace (no prior exposure) PREFACE xxi jubject Reading } Divide-and-conquer 3.1 to 3.5 _ Divide-and-conquer 3.6, 3.7 The greedy method 4.1 to 4.3 3 The greedy method 44 04.7 4 || Dynamic programming 2 Chapter 5 | Assignment I due 5 Search and traversal techniques Chapter 6 =| | Exam I [6 Backtracking Chapter 7 7 Branch-and-bound Chapter 8 Assignment II due 8 ‘Algebraic methods Chapter 9 9 Tower bound theory Chapter 10 0 WP-complete and NP-hard problems || 11.1 to 11.3 | Exam IT Assignment IIL WP-complete and NP-hard problems [114 to 11.6 ‘AM algorithms I3.1 to 13.4 PRAM algorithms 13.5 to 13.9 Assignment IV due | 2t_] Mesh algorithms: ___, ~__ | 147 to 145 15 Mesh algorithms 14.6 to 14.8 |__| Hypercube algorithms 15.1 to 15.3 1 Hypercube algorithms 15.4 to 15.8 | Assignment V due Exam TIT TABLE V: SEMESTER - Advanced course (rigorous pace) xxii PREFACE for the algorithm. This is a nontrivial task which can be both educational and fun. Most importantly it emphasizes an aspect of this field that is often neglected, that there is an experimental side to the practice of algorithms Acknowledgements We are grateful to Martin J. Biernat, Jeff Jenness, Saleem Khan, Ming-Yang Kao, Douglas M. Campbell, and Stephen P. Leach for their critical comments which have immensely enhanced our presentation. We are thankful to the students at UF for pointing out mistakes in earlier versions. We are also thankful to Teo Gonzalez, Danny Krizanc, and David Wei who carefully read portions of this book. Ellis Horowitz, Sartaj Sahni Sanguthevar Rajasekaran June, 1997 Chapter 1 INTRODUCTION 1.1 WHAT IS AN ALGORITHM? The word algorithm comes from the name of a Persian author, Abu Ja’far Mohammed ibn Musa al Khowarizmi (c. 825 A.D.), who wrote a textbook on mathematics. This word has taken on a special significance in computer science, where “algorithm” has come to refer to a method that can be used a computer for the solution of a problem. ‘This is what makes algorithm cent from words such as process, technique, or method. Definition 1.1 [Algorithm]: An algorithm is a finite set of instructions that, if followed, accomplishes a particular task. In addition, all algorithms must satisfy the following criteria: 1. Input. Zero or more quantities are externally supplied. 2. Output. At least one quantity is produced. 3. Definiteness. Each instruction is clear and unambiguous. 4, Finiteness. If we trace out the instructions of an algorithm, then for all cases, the algorithm terminates after a finite number of steps. Effectiveness. Every instruction must be very basic so that it can be carried ont, in principle, by a person using only pencil and paper. It is not enough that each operation be definite as in criterion 3: it also must be feasible. a eo An algorithm is composed of a finite set of steps, each of which may require one or more operations. The possibility of a computer carrying out these operations necessitates that certain constraints be placed on the type of operations an algorithm can include. 2 CHAPTER 1. INTRODUCTION Criteria 1 and 2 require that an algorithm produce one or more outputs and have zero or more inputs that are externally supplied. According to cri- terion 3, each operation must be definite, meaning that it must be perfectly clear what should be done. Directions such as “add 6 or 7 to 2” or “compute 5/0” are not permitted because it is not clear which of the two possibilit should be done or what the result is. The fourth criterion for algorithms we assume in this book terminate after a finite number of operations. A related consideration is that the time for termination should be reasonably short. For example, an algorithm could be devised that decides whether any given position in the is a winning position. The algorithm works by examining all ble moves and countermoves that could be made from the starting po- sition. The difficulty with this algorithm is that even using the most modern computers, it may take billions of years to make the decision. We must be very concerned with analyzing the efficiency of each of our algorithms. Criterion 5 requires that each operation be effective; each step must be such that it can, at least in principle, be done by a person using pencil and paper in a finite amount of time. Performing arithmetic on integers is an example of an effective operation, but arithmetic with real numbers is not, since some values may be expressible only by infinitely long decimal expan- sion. Adding two such numbers would violate the effectiveness property Algorithms that are definite and effective are also called computational procedures. One important example of computational procedures is the op- erating system of a digital computer. This procedure is designed to control the execution of jobs, in such a way that when no jobs are available, it does not terminate but continues in a waiting state until a new job is en- Though computational procedures include important examples such one, we restrict our study to computational procedures that always terminate. To help us achieve the criterion of definiteness, algorithms are written in a programming language. Such languages are designed so that each legitimate sentence has a unique meaning. A’ program is the expression of an algorithm in a programming language. Sometimes words such as procedure, function. and subroutine are used synonymously for program. Most readers of this book have probably already programmed and run some algorithms on a computer. This is desirable because before you study a concept in general, it helps if you had some practical experience with it. Perhaps you had some difficulty getting started in formulating an initial solution to a problem, or perhaps you were unable to decide which of two algorithms was better. The goal of this book is to teach you how to make these decisions. The study of algorithms includes many important and active areas of research. There are four distinct areas of study one can identify: 1. How to devise algorithms — Creating an algorithm is an art which may never be fully automated. A major goal of this book is to study vari- that they 11. WHAT IS AN ALGORITHM? ous design techniques that. have proven to be useful in that they have often yielded good algorithms. By mastering these design strategies, it will become easier for you to devise new and useful algorithms. Many of the chapters of this book are organized around what we believe are the major methods of algorithm design. The reader may now wish to glance back at the table of contents to sce what these methods are called. Some of these techniques may already be familiar, and some have been found to be so useful that books have been written about them. Dynamic programming is one such technique. Some of the techniques are especially useful in fields other than computer science such as operations research and electrical engineering. In this book we can only hope to give an introduction to these many approaches to algorithm formulation. All of the approaches we consider have applica- tions in a variety of areas including computer science. But some important design techniques such as linear, nonlinear, and integer programming are not covered here as they are traditionally covered in other courses. 2. How to validate algorithms — Once an algorithm is devised, it is necessary to show that it computes the correct answer for all possible legal inputs. We refer to this process as algorithm validation. The algorithm need not as yet be expressed as a program. It is sufficient to state it in any precise way. The purpose of the validation is to assure us that this algorithm will work correctly independently of the i: ng the programming language it will eventually be written in. Once the validity of the method has been shown, a program can be written and a second phase begins. This phase is referred to as program proving or sometimes as program verification. ‘A proof of correctness requires that the solution be stated in two forms. One form is usually as a program which is annotated by a set of assertions about. the input and output variables of the program. These assertions are often expressed in the predicate calculus. The second form is called a specification, aud this may also be expressed in the predicate calculus. A proof consists of showing that these two forms are equivalent in that for every given legal input, they describe the same output. A complete proof of program correctuess requires that each statement of the programming language be precisely defined and all basic operations be proved correct. All these details may cause a proof to be very much longer than the program. 3. How to analyze algorithms — This field of study is called analysis of algorithms. As an algorithm is executed, it uses the computer's central sing unit (CPU) to perform operatious and its memory (both imme- and auxiliary) to hold the program and data. Analysis of algorithms or performance analysis refers to the task of determining how much com- puting time and storage an algorithm requires. This is a challenging area which sometimes requires great mathematical skill. An important result of this study is that it allows you to make quantitative judgments about the value of one algorithm over another. Another result is that it allows you to predict whether the software will meet any efficiency constraints that exist ues concern 4 CHAPTER 1. INTRODUCTION Questions such as how well does an algorithm perform in the best case, in the worst case, or on the average are typical. For each algorithm in the text, an analysis is also given. Analysis is more fully described in Section 1.3. 4. How to test a program — Testing a program consists of two phases: debugging and profiling (or performance measurement). Debugging is the process of executing programs on sample data sets to determine whether faulty results occur and, if so, to correct them. However, as E. Dijkstra has pointed out, “debugging can only point to the presence of errors, but not to their absence.” In cases in which we cannot verify the correctness of output on sample data, the following strategy can be employed: let more than one programmer develop programs for the same problem, and compare the outputs produced by these programs. If the outputs match, then there is a good chance that they are correct. A proof of correctness is much more valuable than a thousand tests (if that proof is correct), since it guarantees that the program will work correctly for all possible inputs. Profiling or performance measurement is the process of executing a correct program on data sets and measuring the time and space it takes to compute the results. These timing figures are useful in that they may confirm a previously done analysis and point out logical places to perform useful optimization. A description of the measurement of timing complexity can be found in Section 1.3.5. For some of the algorithms presented here, we show how to devise a range of data sets that will be useful for debugging and profiling. four categories serve to outline the questions we ask about algo- throughout this book. As we can’t hope to cover all these subjects completely, we content ourselves with concentrating on design and analysis, spending less time on program construction and correctness. EXERCISES 1. Look up the words algorism and algorithm in your dictionary and write down their meanings. 2. The name al-Khowarizmi (algorithm) literally means “from the toy of Khowarazm.” This city is now known as Khiva, and is located in Uzbekistan. Sce if you can find this country in an atlas. 3. Use the WEB to find out more about al-Khowarizmi, e.g., his dates, a picture, or a stamp. 1.2. ALGORITHM SPECIFICATION, 5 1.2 ALGORITHM SPECIFICATION 1.2.1 Pseudocode Conventions In computational theory, we distinguish between an algorithm and a pro- gram. ‘The latter does not have to satisfy the finiteness condition. For ex- ample, we can think of an operating system that continues in a “wait” loop until more jobs are entered. Such a program does not terminate unless the system crashes. Since our programs always terminate, we use “algorithm” and “program” interchangeably in this text We can describe an algorithm in many ways. We can use a natural language like English, although if we select this option, we must make sure that the resulting instructions are definite. Graphic representations called flowcharts are another possibility, but they work well only if the algorithm is small and simple. In this text we present most of our algorithms using a pseudocode that resembles C and Pascal. 1. Comments begin with // and continue until the end of line. 2. Blocks are indicated with matching braces: { and }. A compound statement (i.e., a collection of simple statements) can be represented as a block. The body of a procedure also forms a block. Statements are delimited by ; 3. An identifier begins with a letter. The data types of variables are not explicitly declared. ‘The types will be clear from the context Whether a variable is global or local to a procedure will also be evident from the context. We assume simple data types such as integer, float, char, boolean, and so on. Compound data types can be formed with records. Here is an example: node = record { datatype data; datatype_n data.n; node links } In this example, link is a pointer to the record type node. Individual data items of a record can be accessed with + and period. For instance if p points to a record of type node, p + data-1 stands for the value of the first field in the record. On the other hand, if q is a record of type node, q.data.1 will denote its first field. 6 CHAPTER 1. INTRODUCTION 4. Assignment of values to variables is done using the assignment state- ment (variable) := (epression); 5. There are two boolean values true and false, In order to produce these values, the logical operators and, or, and not and the relational operators <,<,=,#,>, and > are provi 6. Elements of multidimensional arrays are accessed using { and ]. For example, if A is a two dimensional array, the (i,)th element of the array is denoted as Ali, j]. Array indices start at zero. 7. The following looping statements are employed: for, while, and repeat- until. The while loop takes the following form: while (condition) do { (statement 1) (statement n) } As long as (condition) is true, the statements get executed. When ondition) becomes false, the loop is exited. The value of (condition) is evaluated at the top of the loop. The general form of a for loop is for variable { valuel to value2 step step do (statement 1) (statement n) + Here valuel, value2, and step are arithmetic expressions. A variable of type integer or real or a numerical constant is a simple form of an arithmetic expression. The clause “step step” is optional and taken as +1 if it does not occur. step could either be positive or negative. variable is tested for termination at the start of each iteration. The for loop can be implemented as a while loop as follows: 1.2. ALGORITHM SPECIFICATION 7 while ((variable — fin) + step < 0) do (statement 1) {statement n) variable := variable + iner; A repeat-until statement is constructed as follows: repeat (statement 1) {statement n) until (condition) The statements are executed as long as (condition) is false. The value of (condition) is computed after executing the statements. ‘The instruction break; can be used within any of the above looping instructions to force exit. In case of nested loops, break; results in the exit of the innermost loop that it is a part of. A return statement within any of the above also will result in exiting the loops. A return statement results in the exit of the function itself. 8. A conditional statement has the following forms: if (condition) then (statement) if (condition) then (statement 1) else (statement 2) Here (condition) is a boolean expression and (statement), itement 1), and (statement 2) are arbitrary statements (simple or compound). We Iso employ the following case statement: case s(condition 1): (statement 1) s(condition n): (statement n) relse: (statement n+ 1) 8 CHAPTER 1. INTRODUCTION Here (statement 1), (statement 2), etc. could be either simple state- ments or compound statements. A case statement is interpreted as follows. If (condition 1) is true, (statement 1) gets executed and the case statement is exited. If (statement 1) is false, (condition 2) is evaluated. If (condition 2) is true, (statement 2) gets executed and the case statement exited, and so on. If none of the conditions (condition 1), ..., (condition n) are true, (statement n+1) is executed and the case statement is exited. The else clause is optional. 9. Input and output are done using the instructions read and write. No format is used to specify the size of input or output quantities: 10. There is only one type of procedure: Algorithm. An algorithm con- sists of a heading and a body. The heading takes the form Algorithm Name ((parameter list)) where Name is the name of the procedure and ({parameter list)) is a listing of the procedure parameters. The body has one or more (simple or compound) statements enclosed within braces { and }. An algorithm may or may not return any values. Simple variables to procedures are passed by value. Arrays and records are passed by reference. An array name or a record name is treated as a pointer to the respective data type As an example, the following algorithm finds and returns the maximum of n given numbers: 1 Algorithm Max(A, n) 2 // Aisan array of size n. 3 4 Result := A[l}; 5 for i:=2 ton do 6 if Ali] > Result then Result := Ali]; 7 return Result; 8 } In this algorithm (named Max), A and n are procedure parameters Result and i are local variables. Next we present two examples to illustrate the process of translating a problem into an algorithm Example 1.1 [Selection sort] Suppose we must devise an algorithm that sorts a collection of n > 1 elements of arbitrary type. A simple solution is given by the following 1.2. ALGORITHM SPECIFICATION 9 From those elements that are currently unsorted, find the smallest and place it nest in the sorted list. Although this statement adequately describes the sorting problem, it is not an algorithm because it leaves several questions unanswered. For exam- ple, it does not tell us where and how the elements are initially stored or where we should place the result. We assume that the elements are stored in an array a, such that the ith integer is stored in the ith position afi], 1 1 elements; the result remains in a[1 : n] such that a[l] < a[2] <-++ 1 elements, the problem is to print all possible permutations of this set. For example, if the sct is {a,b,c}, then the set of permutations is {(a,.¢), (a,c, ). (ba,c). 12 CHAPTER 1. INTRODUCTION 1 Algorithm TowersOfHanoi(n, «, y, 2) 2 // Move the top n disks from tower « to tower y. 3 4 if (n > 1) then 5 i 6 TowersOfHanoi(n — 1,2, z,y)3 7 write (‘move top disk from tower", x, 8 to top of tower", y)s 9 TowersOfHanoi(n — 1, z,y,); 10 u } Algorithm 1.3 Towers of Hanoi (b,c,a), (c,a,6), (¢,b,a)}. It is easy to see that given nm elements, there are n! different permutations. A simple algorithm can be obtained by looking at the case of four elements (a,b,¢,d). ‘The answer can be constructed by writing 1. a followed by all the permutations of (b,c, d) 2. b followed by all the permutations of (a, ¢, d) 3. ¢ followed by all the permutations of (a,b, d) 4. d followed by all the permutations of (a,b, ¢) ‘The expression “followed by all the permutations” is the clue to recursion. It implies that we can solve the problem for a set with n elements if we have an algorithm that works on n — 1 elements, These considerations lead to Algorithm 1.4, which is invoked by Perm(a,1,n). Try this algorithm out on sets of length one, two, and three to ensure that you understand how it works. Qa EXERCISES 1. Horner’s rule is a means for evaluating a polynomial at a point <9 using a minimum number of multiplications. If the polynomial is A(z) = ana" + anya"! +--+ + aye + a9, Horner’s rule is ALGORITHM SPECIFICATION, 13 1 Algorithm Perm(a, k,n) 2 3 if (k = n) then write (a[1 : n]); // Output permutation. 4 else // a[f : n] has more than one permutation. 5 7/ Generate these recursively. 6 for i:= k to ndo 7 8 t= afk]; afk] = afi]; afi] := t5 9 Perm(a,k + 1,7); 10 // All permutations of afk + 1:7] ul t:= alk); alk] 5 12 } 13 } Algorithm 1.4 Recursive permutation generator A(ao) = (+++ (ant + an-1)a0 + +++ + a1)£0 + a0 Write an algorithm to evaluate a polynomial using Horner's rule. 2. Given n boolean variables 2,r,..., and atm, we wish to print all possible combinations of truth values they can assume. For instance, ifn = 2, there are four possibilities: rue; true, false; false, true; and false, false. Write an algorithm to accomplish this. 3. Devise an algorithm that inputs three integers and outputs them in nondecreasing order. 4, Present an algorithm that searches an unsorted array a[1 : nj for the element x. If x occurs, then return a position in the array; else return zero. o ‘The factorial function n! has value 1 when n < 1 and value n*(n—1)! when n > 1. Write both a recursive and an iterative algorithm to compute n!. 6. The Fibonac fi-2 for i > compute f,. land fj = fiat . Write both a recursive and an iterative algorithm to 7. Give both a recursive and an iterative algorithm to compute the bino- mial coefficient (7) as defined in Section 1.2.2, where () = (") = 1. 14 10. i. ee CHAPTER 1. INTRODUCTION Ackermann’s function A(m,n) is defined as follows: n+l ifm=0 A(m,n) = 4 A(m-1, 1) ifn=0 A(m—1, A(m, n—1)) — otherwise This function is studied because it grows very fast for small values of m and n, Write a recursive algorithm for computing this function. Then write a nonrecursive algorithm for computing it. ‘The pigeonhole principle states that ifa function f has n distinct inputs but less than n distinct outputs, then there exist two inputs @ and b such that a # b and f(a) = f(b). Present an algorithm to find a and bsuch that f(a) = f(b). Assume that the function inputs are 1,2,. and n. Give an algorithm to solve the following problem: Given n, a positive integer, determine whether n is the sum of all of its divisors, that is, whether n is the sum of all ¢ such that 1 (n+ 3) (n for a[ ], one each for n, i, and s) a Example 1.6 Let us consider the algorithm RSum (Algorithm 1.7). As in the case of Sum, the instances are characterized by n. The recursion stack space includes space for the formal parameters, the local variables, and the return address. Assume that the return address requires only one word of memory. Bach call to RSum requires at least three words (including space for the values of n, the return address, and a pointer to a[]). Since the depth of recursion is n +1, the recursion stack space needed is > 3(n +1). 18 CHAPTER 1. INTRODUCTION 1.3.2. Time Complexity The time T(P) taken by a program P is the sum of the compile time and the run (or execution) time. The compile time does not depend on the instance characteristics. Also, we may assume that a compiled program will be run several times without recompilation. Consequently, we concern ourselves with just the run time of a program. This run time is denoted by tp(instance characteristics) Because many of the factors tp depends on are not known at the time a program is conceived, it is reasonable to attempt only to estimate tp. If we knew the characteristics of the compiler to be used, we could proceed to determine the number of additions, subtractions, multiplications, divisions, mpares, loads, stores, and so on, that would be made by the code for P. So, we could obtain an expression for tp(n) of the form tp(n) = cg ADD(n) + ¢.$UB(n) + emMUL(n) + caDIV(n) + where n denotes the instance characteristics, and ca, Cs, my Cay and So on, respectively, denote the time needed for an addition, subtraction, multipli- cation, division, and so on, and ADD, SUB, MUL, DIV, and so on, are functions whose values are the numbers of additions, subtractions, multipli- cations, divisions, and so on, that are performed when the code for P is used on an instance with characteristic n. Obtaining such an exact formula is in itself an impossible task, since the time needed for an addition, subtraction, multiplication, and so on, often depends on the numbers being added, subtracted, multiplied, and so on. The value of tp(n) for any given n can be obtained only experimentally. ‘The program is typed, compiled, and run on a particular machine. The execution time is physically clocked, and tp(n) obtained. Even with this experimental approach, one could face difficulties. In a multiuser system, the execution time depends on such factors as system load, the number of other programs running on the computer at the time program P is run, the characteristics of these other programs, and so on. Given the minimal utility of determining the exact number of additions, subtractions, and so on, that are needed to solve a problem instance with characteristics given by n, we might as well lump all the operations together (provided that the time required by each is relatively independent of the instance characteristics) and obtain a count for the total number of opera- tions. We can go one step further and count only the number of program ste A program step is loosely defined as a syntactically or semantically mean- ingful segment of a program that has an execution time that is independent of the instance characteristics. For example, the entire statement return a+ b+b*¢+(a+b—e)/(a+b) + 4.0; 1.3. PERFORMANCE ANALYSIS 19 of Algorithm 1.5 could be regarded as a step since cution time is independent of the instance characteristics (this statement is not strictly true, since the time for a multiply and divide generally depends on the numbers involved in the operation) ‘The number of steps any program statement is assigned depends on the kind of statement. For example, comments count as zero steps: signment statement which does not involve any calls to other algorithms is counted as one step; in an iterative statement such as the for, while, and repeat-until statements, we consider the step counts only for the control part of the statement. The control parts for for and while statements have the following forms: an as: for i := (expr) to (exprl) do while ((expr)) do Each execution of the control part of a while statement is given a step count equal to the number of step counts assignable to (expr). The step count for each execution of the control part of a for statement is one, unless the counts attributable to (expr) and (ecprl) are functions of the instance characteristics. In this latter case, the first execution of the control part of the for has a step count equal to the sum of the counts for (ecpr) and {exprl) (note that these expressions are computed only when the loop is started). Remaining executions of the for statement have a step count of one; and so on We can determine the number of steps needed by a program to solve a particular problem instance in one of two ways. Tn the first method, we introduce a new variable, count, into the program. This is a global vari- able with initial value 0. Statements to increment count by the appropriate amount, are introduced into the program. This is done so that each time a statement in the original program is executed, count is incremented by the step count of that statement. Example 1.7 When the statements to increment count are introduced into Algorithm 1.6, the result is Algorithm 1.8. The change in the value of count by the time this program terminates is the number of steps executed by Algorithm 1.6. Since we are interested in determining only the change in the value of count, Algorithm 1.8 may be simplified to Algorithm 1.9. For every initial value of count, Algorithms 1.8 and 1.9 compute the same final value for count. It is easy to sce that in the for loop, the value of count will increase by a total of 2n. If count is zero to start with, then it will be 2n + 3 on termination. So each invocation of Sum (Algorithm 1.6) executes a total of 2n + 3 steps oO 20 CHAPTER 1. INTRODUCTION 1 Algorithm Sum(a,n) 2 { 3 4 count +1; // count is global; it is initially zero. 5 1 to ndo 6 7 := count +1; // For for 8 8 + ali]; count := count + 1; // For assignment 9 10 count +1; // For last time of for iL count + 1; // For the return 12 return s; 13 } Algorithm 1.8 Algorithm 1.6 with count statements added 1 Algorithm Sum(a,n) 2 3 for i:= 1 to n do count := count + 2; 4 count := count + 35 5 Algorithm 1.9 Simplified version of Algorithm 1.8 1.3. PERFORMANCE ANALYSIS 21 Example 1.8 When the statements to increment count are introduced into Algorithm 1.7, Algorithm 1.10 is obtained. Let trsum(m) be the increase in the value of count when Algorithm 1.10 terminates. We see that trsum(0) = 2. When n > 0, count increases by 2 plus whatever increase results from the invocation of RSum from within the else clause. From the definition of trsum: it follows that this additional increase is tysum(n— 1). So, if the value of count is zero initially, its value at the time of termination is 2+¢rsum(n—1), n>. 1 Algorithm RSum(a,n) 2 { 3 count := count +1; // For the if conditional 4 if (n <0) then 5 6 count = count +1; // For the return 7 return 0.0; 8 } 9 else 10 { i count := count +1; // For the addition, function 12 // invocation and return 13 return RSum(a,n — 1) + a[n]; 14 15 } Algorithm 1.10 Algorithm 1.7 with count statements added When analyzing a recursive program for its step count, we often obtain a recursive formula for the step count, for example, tesun(n) = {2 ifn=0 Sum) = 1 24 trsum(n —1) ifn >0 These recursive formulas are referred to as recurrence relations. One way of solving any such recurrence relation is to make repeated substitutions for each occurrence of the function tasym on the right-hand side until all such 22 CHAPTER 1. INTRODUCTION trsum(n) = 2+ trsum(n — 1) 242+ trsum(n — 2) = 2(2) + trsum(n — 2) n(2) + trsum(0) = 2n+2, n20 So the step count for RSum (Algorithm 1.7) is 2n + 2. a The step count is useful in that it tel how the run time for a program changes with changes in the instance characteristics. From the step count for Sum, we see that if n is doubled, the run time also doubles (approximately); if n increases by a factor of 10, the run time increases by a factor of 10; and so on. So, the run time grows linearly in n. We say that Sum is a linear time algorithm (the time complexity is linear in the instance characteristic n). Definition 1.3 [Input size] One of the instance characteristics that is fre- quently used in the literature is the input size. The input size of any instanc of a problem is defined to be the number of words (or the number of ele ments) needed to describe that instance. The input size for the problem of summing an array with n elements is n + 1, n for listing the n elements and 1 for the value of n (Algorithms 1.6 and 1.7). The problem tackled in Algorithm 1.5 has an input size of 3. If the input to any problem instance is a single clement, the input size is normally taken to be the number of bits needed to specify that element. Run times for many of the algorithms presented in this text are expressed as functions of the corresponding input siz Example 1.9 [Matrix addition] Algorithm 1.11 is to add two m xn matrices a and b together. Introducing the count-incrementing statements leads to Algorithm 1,12, Algorithm 1:13 is a simplified version of Algorithm 1.12 that computes the same value for count. Examining Algorithm 1.13, we see that line 7 is executed n times for each value of i, or a total of mn times; line 5 is executed m times; and line 9 is executed once. If count is 0 to begin with, it will be 2mn + 2m + 1 when Algorithm 1.13 terminates. From this analysis we see that if m > n, then it is better to interchange the two for statements in Algorithm 1.11. If this is done, the step count becomes 2mn+2n-+1. Note that in this example the instance characteristics are given by m and n and the input size is 2mn + 2. a The second method to determine the step count of an algorithm is to build a table in which we list the total number of steps contributed by each statement. This figure is often arrived at by first determining the number of 1.3. PERFORMANCE ANALYSIS: 23 1 Algorithm Add(a,b,c,m,n) 2 { 3 :=1 to mdo 4 for j = 1 to ndo 5 ¢ ali, j] + fi, ds 6 } Algorithm 1.11 Matrix addition Algorithm Add(a,,c,m,n) { 1 2 3 for i:= 1 to m do 4 5 count +1; // For ‘for i” 6 ton 7 8 count := count +1; // For ‘for 7 9 cli, j] := ali, j] + bl, a] 10 count := count + 1; // For the assignment 11 12 count := count + 1;// For loop initialization and 13 7/ last time of ‘for 7? 14 15 count := count +1; —_// For loop initialization and 16 // last time of ‘for i” 17 } Algorithm 1.12 Matrix addition with counting statements 24 CHAPTER 1. INTRODUCTION 1 Algorithm Add(a, b, ¢,rm,n) 2 3 for i:= 1 to mdo 4 { 5 count = count +23 6 for j:=1tondo 7 count := count + 25 8 } 9 count := count +1; 10 } Algorithm 1.13 Simplified algorithm with counting only steps per execution (s/e) of the statement and the total number of times (i.e., frequency) each statement is executed. The s/e of a statement is the amount by which the count changes as a result of the execution of that statement. By combining these two quantities, the total contribution of each statement is obtained. By adding the contributions of all statements, the step count for the entire algorithm is obtained. In Table 1.1, the number of steps per execution and the frequency of each of the statements in Sum (Algorithm 1.6) have been listed. The total number of steps required by the algorithm is determined to be 2n + 3. It is important to note that the frequency of the for statement is n + 1 and not n. This is so because i has to be incremented to n + 1 before the for loop can terminate Table 1.2 gives the step count for RSum (Algorithm 1.7 under the s/e (steps per execution) column, the else clause has been given a count of 1 + trsum(n — 1). This is the total cost of this line each time it is executed. It includes all the steps that get executed as a result of the invocation of RSum from the else clause. The frequency and total steps columns have been split into two parts: one for the case n = 0 and the other for the case n > 0. This is necessary because the frequency (and hence total steps) for some statements is different for each of these cases. Table 1.3 corresponds to algorithm Add (Algorithm 1.11). Once again, note that the frequency of the first for loop is m+ 1 and not m. This s0 as i needs to be incremented up to m+ 1 before the loop can terminati Similarly, the frequency for the second for loop is m(n + 1). When you have obtained sufficient experience in computing step counts, you can avoid constructing the frequency table and obtain the st. ant as in the following example. ). Notice that 1.3. PERFORMANCE ANALYSIS 25 Statement frequency | total steps | T Algorithm Sum(a,n)|0 | — 0 oN 0 0 3 s:=0.0; 1 1 7 for ltondo /1 n+1 n+l 5 + ali]; 1 [an n 6 _ return s; 1 |i 1 7} o [- 0 [ Total n+3 l Table 1.1 Step table for Algorithm 1.6 T frequency total steps Statement. s/e n=0 n>0|n=0 n>0 T Algorithm RSum(a,n) 0 rn) 0 2 3 if (n <0) then 1 1 1 1 1 4 return 0.0; 1 1 0 1 0 5 else return 6 RSum(a,n — 1) +a{n]; | 1 +2) 0 1 0 l+e 7 al = 0 0 Total I 2 +a & = trsum(n —1) Table 1.2 Step table for Algorithm 1.7 26 CHAPTER 1. INTRODUCTION Statement s/e | frequency | total steps T Algorithm Add(a,b,c,m,n) 0 | — 0 2 jo 0 3 fori 1 to mdo 1 m+ m+1 4 for j :=1tondo 1 m(n+1) | mn+m 5 fi, f] = afi, f] + bli, gs | 1 | mn mn 6} jo |- 0 Total [ 2mm FIm+1 Table 1.3 Step table for Algorithm 1.11 of numbers starts Example 1.10 [Fibonacci numbers] The Fibonacci seque: as 0,1,1, 3,5,8, 13, 21,34, 55,... Each new term is obtained by taking the sum of the two previous terms. If we call the first term of the sequence fo, then fo = 0, fi = 1, and in general fu=fn-1t fra, n 22 Fibonacci (Algorithm 1.14) takes as input any nonnegative integer n and prints the value fn. To analyze the time complexity of this algorithm, we need to consider the two cases (1) n= 0 or 1 and (2) n > 1. When n = 0 or 1, lines 4 and 5 get executed once each. Since each line has an s/e of 1, the total step count for this case is 2. When n > 1, lines 4, 8, and 14 are each executed once. Line 9 gets executed n times, and lines 11 and 12 get executed n — 1 times each (note that the last time line 9 is executed, i is incremented to n+ 1, and the loop exited). Line 8 has an s/e of 2, line 12 has an s/e of 2, and line 13 has an s/e of 0. The remaining lines that get executed have s/e’s of 1. The total steps for the case n > 1 is therefore 4n +1. a Summary of Time Complexity The time complexity of an algorithm is given by the number of steps taken by the algorithm to compute the function it was written for. The number of steps is itself a function of the instance characteristics. Although any specific instance may have several 2 .g., the number of inputs, the number of outputs, the magnitudes of the inputs and outputs), the number 1.3. PERFORMANCE ANALYSIS 27 1 Algorithm Fibonacci(n) 2 // Compute the nth Fibonacci number 3 1 if (n <1) then 5 write (n); 6 else 7 { 8 fnm2 = 0; frml : 9 for i:=2 ton do 10 { 1 fn:= fnml + frm; 12 fnm2:= frm; fnaml = fn; 13 14 write (fn); 15 } Algorithm 1.14 Fibonacci numbers: of steps is computed as a function of some subset of these. Usually, we choose those characteristics that are of importance to us. For example, we miglit. wish to know how the computing (or run) time (i.e., time complexity) increases as the number of inputs increase. In this case the number of steps will be computed as a function of the number of inputs alone. For a different algorithm, we might be interested in determining how the computing time increases as the magnitude of one of the inputs number of steps will be computed as a function of the magnitude input alone. Thus, before the step count of an algorithm can be determined, we need to know exactly which characteristics of the problem instance are to be used. These define the variables in the expression for the step count. In the case of Sum, we chose to measure the time complexity as a function of the number n of elements being added. For algorithm Add, the choice of characteristics was the number m of rows and the number n of columns in the matrices being added. Once the relevant characteristics (n,m,p,q,r,...) have been selected, we can define what a step is. A step is any computation unit that is independent of the charac (n,m,p,q,r)-..). Thus, 10 additions can be one s 100 inultip! ns can also be one step; but n additions cannot. Nor can m/2 additions, p + q subtractions, and so on, be counted as one step. 28 CHAPTER 1. INTRODUCTION A systematic way to assign step counts was also discussed. Once this has been done, the time complexity (i.e., the total step count) of an algorithm can be obtained using either of the two methods discussed. ‘The examples we have looked at so far were sufficiently simple that the time complexities were nice functions of fairly simple characteristics like the number of inputs and the number of rows and columns. For many algo- rithms, the time complexity is not dependent solely on the number of inputs or outputs or some other easily specified characteristic. For example, the searching algorithm you wrote for Exercise 4 in Section 1.2, may terminate in one step if ¢ is the first element examined by your algorithm, or it may take two steps (this happens if « is the second element examined), and so on. In other words, knowing n alone is not enough to estimate the run time of your algorithm. ‘We can extricate ourselves from the difficulties resulting from situations when the chosen parameters are not adequate to determine the step count uniquely by defining three kinds of step counts: best case, worst case, and average. The best-case step count is the minimum number of steps that can be executed for the given parameters. The worst-case step count is the maximum number of steps that can be executed for the given parameters. ‘The average step count is the average number of steps executed on instances with the given parameters. Our motivation to determine step counts is to be able to compare the time complexities of two algorithms that compute the same function and also to predict the growth in run time as the i Determining the exact step count (best case, worst case, or average) of an algorithm can prove to be an exceedingly difficult task. Expending immense effort to determine the step count exactly is not a very worthwhile endeavor, since the notion of a step is itself inexact. (Both the instructions x := 3 and 2 := yt+2+(«/y) +(«*y*z—2/z); count as one step.) Because of the inexactness of what a step stands for, the exact step count is not very useful for comparative purposes. An exception to this is when the difference between the step counts of two algorithms is very large, as in 3n + 3 versus 100n +10. We might feel quite safe in predicting that the algorithm with step count 3n+3 will run in less time than the one with step count 100n+ 10. But even in this case, it is not necessary to know that the exact step count is 100n + 10. Something like, “it’s about 80n or 85n or 75n,” is adequate to arrive at the same conclusion. For most situations, it is adequate to be able to make a statement like en? < tp(n) < ean? or ta(n,m) n+ cym, where cy and ec are non- negative constants. This is so because if we have two algorithms with a complexity of en? + con and can respectively, then we know that the one with complexity c3n will be faster than the one with complexity cyn? + egn for sufficiently large values of n. For small values of n, either algorithm could be faster (depending on cy, ¢2, and c3). If cy = 1, ¢2 = 2, and cz = 100, then 1.3. PERFORMANCE ANALYSIS 29 ein? + con < egn for n < 98 ont qn? + en > cn for n > 98. Ife , and ¢s = 1000, then en? + en < egn for n < 998. No matter what the values of c, ¢2, and cg, there will be an n beyond which the algorithm with complexity c3n will be faster than the one with complexity ¢)n? + ¢yn. This value of n will be called the break-even point. If the break-even point is zero, then the algorithm with complexity csn is always faster (or at least as fast). The exact break-even point cannot be determined analytically. The algorithms have to be run on a computer in order to determine the break-even point. To know that there is a break-even point, it is sufficient to know that one algorithm has complexity ¢:n? + cn and the other cgn for some constants ¢1, ¢2, and ¢3. There is little advantage in determining the exact values of ¢), ¢, and ¢3. 1.3.3. Asymptotic Notation (O, 2, 9) With the previous discussion as motivation, we introduce some terminology that enables us to make meaningful (but inexact) statements about the time and space complexities of an algorithm. In the remainder of this chapter, the functions f and g are nonnegative functions. Definition 1.4 [Big “oh"] The function f(n) = O(g(n)) (read as “f of n is big oh of g of n”) iff (if and only if) there exist positive constants ¢ and no such that f(n) no. a Example 1.11 The function 3n +2 = O(n) as 3n +2 < 4n for all n > 2. 3n+3 = O(n) as 3n+3 < 4n for alln > 3. 100n +6 O(n) as 100n +6 < 101n for all n > 6. 10n?-+4n+2 = O(n?) as 10n? +4n+2 < 11n? for all > 5. 1000n? + 100n — 6 = O(n?) as 1000n? + 100n — 6 < 1001n? for n> 100. 6*2" +n? = O(2") as 642" +n? < 742" forn > 4. 8n+3 = O(n?) as 3n +3 < 3n? for n > 2. 10n? + 4n +2 = O(n) as 10n? + 4n+2 < 10n4 for n > 2. 3n+ 2 4 O(1) as 3n +2 is not less than or equal to ¢ for any constant ¢ and all n > no. 10n? + 4n+2 4 O(n) Qo We write O(1) to mean a computing time that is a constant. O(n) is called linear, O(n®) is called quadratic, O(n*) is called cubic, and O(2") is called exponential. If an algorithm takes time O(logn), it is faster, for iently large n, than if it had taken O(n). Similarly, O(n log n) is better than O(n?) but not as good as O(n). ‘These seven computing times-O(1), O(log), O(n), O(n logn), O(n?), O(n*), and O(2")-are the ones we see most often in this book. As illustrated by the previous example, the ent f(n) = O(g(n)) states only that g(n) is an upper bound on the value of f(n) for all n, n > ng. It does not say anything about how good this bound is. Notice 30 CHAPTER 1. INTRODUCTION that n = O(2"), n = O(n), n = O(n), n = O(2"), and so on. For the statement f(n) = O(g(n)) to be informative, g(n) should be as small a function of n as one can come up with for which f(n) = O(g(n)). So, while we often say that 3n +3 = O(n), we almost never say that 3n + 3 = O(n?), even though this latter statement is corre From the definition of O, it should be clear that f(n) = O(g(n)) is not the same as O(g(n)) = f(n). In fact, it is meaningless to say that O(g(n)) = f(n). The use of the symbol = is unfortunate because this symbol commonly denotes the equals relation. Some of the confusion that results from the use of this symbol (which is standard terminology) can be avoided by reading the symbol = as “is” and not as “equals.” ‘Theorem 1.2 obtains a very useful result concerning the order of f(n) (that is, the g(n) in f(n) = O(g(n))) when f(n) is a polynomial inn. Theorem 1.2 If f(n) = amn™ +---+.ain+ ap, then f(n) = O(n Proof: f(r) < Ciolasln’ Sn So |ailni™ < nS lai forn>1 So, f(n) = O(n™) (assuming that m is fixed). Qo Definition 1.5 [Omega] The function f(n) = Q(g(n)) (read as “f of n is omega of g of n”) iff there exist positive constants ¢ and no such that f(n) > e*9(n) for all n,n > no. o Example 1.12 The function 3n + 2 = Q(n) as 3n +2 > 3n for n > 1 (the inequality holds for n > 0, but the definition of @ requires an ng > 0). 3n+3 = Qn) as 3n+3 > 3n for n > 1. 100n+6 = Qn) as 100n+6 > 100n forn > 1. 10n? + 4n4+2 = O(n?) as 10n? + 4n +2 > n? for n > 1. 6 #2" $n? = 22") as 6+ 2" +n? > 2" for n > 1. Observe also that 3n +3 = QL), 10n? + 4n + 2 = Qn), 10n? + 4n + 2 = AI), 6 * IW 4+ n? = N2(n9), 62" +n? = 205), 6 42" +n? = V(n?), 62" +n? = O(n), and 6 #2" +n? = QI). a As in the case of the big oh notation, there are several functions g(n) for which f(n) = Q(g(n)). The function g(n) is only a lower bound on f(n). For the statement f(n) = 2(g(n)) to be informative, g(n) should be as large a function of n as possible for which the statement {(n) = Q(g(n)) is true. So, while we say that 3n + 3 = Q(n) and 6 + 2" +n? = Q(2"), we almost never say that 3n + 3 = Q(1) or 6 2" + n? = Q(1), even though both of these statements are correct. Theorem 1,3 is the analogue of Theorem 1.2 for the omega notation. 1.3. PERFORMANCE ANALYSIS 31 Theorem 1.3 If f(n) = ayn™ + +++ + ain +a9 and am > 0, then f(n) = Qn"). Proof: Left as an exercise. o Definition 1.6 [Theta] The function f(r) = @(9(n)) (read as “f of n is theta of g of n”) iff there exist positive constants c1,¢2, and no such that cig(n) < f(n) < e2g(n) for all n, n > no. oO Example 1.13 The function 3n +2 = O(n) as 3n +2 > 3n for alln > 2 and 3n+2 < 4n for all n > 2, soc) = 3, cz = 4, and ng = 2. 3n +3 = O(n), 10n? +4n +2 = O(n2), 6 #2" +n? = O(2"), and 10 + logn +4 = O(logn). 3n+2 £ O(1), 3n+3 FZ O(n’), 10n? +4n+2 F O(n), 10n? +4n +2 F O(1), 6x2" +n? £O(n2), 642" +n? O(n), and 6x2" +n? 4O(1). OG The theta notation is more precise than both the the big oh and omega notations. The function f(n) = @(g(n)) iff g(n) is both an upper and lower bound on f(n). Notice that the coefficients in all of the g(n)’s used in the preceding three examples have been 1. This is in accordance with practice. We almost never find ourselves saying that 3n + 3 = O(3n), that 10 = O(100), that 10n? +4n +2 = Q(dn2), that 6 * 2" +n? = O(6 * 2"), or that 6 +2" +n? = Q(4 2”), even though each of these statements is true. Theorem 1.4 If f(n) = ann + +--+ ain + ag and am > 0, then f(n) = O(n"). Proof: Left as an exercise. o Definition 1.7 [Little “oh”] The function f(n) = o(g(n)) (read as “f of n is little oh of g of n”) iff fin) _ 4 im = noe g(n) o Example 1.14 The function 3n + 2 = 0(n?) since limp. “432 = 0. 3n + 2 = o(nlogn). 3n+2 = o(nloglogn). 6 * 2" + n? = 0(3"). 642" +n? = 0(2” logn). 3n +2 £ o(n). 62" +n? F 0(2"). a Analogous to o is the notation w defined as follows. 32 CHAPTER 1. INTRODUCTION Definition 1.8 [Little omega] The function f(n) = w(g(n)) (read as “f of nis little omega of g of n”) iff g(r) nib. f(n) a Example 1.15 Let us reexamine the time complexity analyses of the pre- vious section. For the algorithm Sum (Algorithin 1.6) we determined that tsum(n) = 2n +3. So, tsum(n) = O(n). For Algorithm 1.7, tasum(n) = In+2= O(n). a Although we might all see that the O, @, and @ notations have been used correctly in the preceding paragraphs, we are still left with the question, Of what use are these notations if we have to first determine the step count exactly? The answer to this question is that the asymptotic complexity (ie., the complexity in terms of O, @, and @) can be determined quite casily without determining the exact step count. This is usually done by first determining the asymptotic complexity of each statement (or group of statements) in the algorithm and then adding these complexities. Tables 1.4 through 1.6 do just this for Sum, RSum, and Add (Algorithms 1.6, 1.7, and 11). Statement s/e ency | total steps T Algorithm Sum(a,n) [0 [— (0) 2{ 0 9(0) 3 os: 1 fa e(1) 4 fori:=1tondo |1 |n+1 Q(n) 5 s=stali}; 1 fa O(n) 6 | return 5; 1 |1 91) o |- (0) Total On) Table 1.4 Asymptotic complexity of Sum (Algorithm 1.6) Although the analyses of Tables 1.4 through 1.6 are carried out in terms of step counts, it is correct to interpret tp(n) = O(g(n)), tp(n) = A(g(n)), or tp(n) = O(g(n)) as a statement about the computing time of algorithm P. This is so because each step takes only @(1) time to execute. 1.3, PERFORMANCE ANALYSIS 33 { frequency total steps | Statement s/e n=0 n>0]}n=0 n>0 TT Algorithm RSum(a,n) 0 — 10 (0) 2 0 | | (0) | 3 if (2 <0) then 1 1 1 1 (1) | 4 return 0.0; 1 1 0 1 (0) 5 else return 6 RSum(a,n—1)+a{n]); |} 1+2| 0 1 mt) O(1+2) Lz 0 = 0 00) Total 2 O(1 +2) @ = trSum(r — 1) Table 1.5 Asymptotic complexity of RSum (Algorithm 1.7). [[Statement s/e | frequency | total steps T Algorithm Add(a,,¢,m,n) [0 | — O(0) ot 0 |= (0) 1 | O(m) O(m) 1 | (mn) — | O(mn) efi, i} = afi, j] + Of, Js | 1 | O(n) — | O(mn) o [- (0) ~ | Otmn) Table 1.6 Asymptotic complexity of Add (Algorithm 1.11) 34 CHAPTER 1. INTRODUCTION After you have had some experience using the table method, you will be in a position to arrive at the asymptotic complexity of an algorithm by taking a more global approach. We elaborate on this method in the following examples. Example 1.16 [Permutation generator] Consider Perm (Algorithm 1.4). When k =n, we see that the time taken is O(n). When k 1, a Example 1.17 [Magic square] The next example we consider is a problem from recreational mathematics. A magic square is an n X n matrix of the integers 1 to n? such that the sum of every row, column, and diagonal is the . Figure 1.2 gives an example magic square for the c: In this example, the common sum is 65. sam Figure 1.2 Example magic square H. Coxeter has given the following simple rule for generating a magic square when n is odd: Start with 1 in the middle of the top row; then go up and left, assigning numbers in increasing order to empty squares; if you fall off the square imagine the same square as tiling the plane and continue; if a square is occupied, move down instead and continue. 1.3. PERFORMANCE ANALYSIS 35, agic square of Figure 1.2 was formed using this rule. Algorithm 1.15 ting an m x n magic square for the case in which n is odd. This results from Coxeter’s rule. ‘The magic square is represented using a two-dimensional array having n rows and n columns. For this application it is convenient to number the rows (aud columns) from 0 to n—1 rather than from 1 to n. Thus, when the algorithmn “falls off the square,” the mod operator sets i and/or j back to Oorn=1 The time to initialize and output the square is ©(n?). The third for loop (in which key ranges over 2 through n?) is iterated n? — 1 times and each iteration takes (1) time. So, this for loop takes ©(n?) time. Hence the overall time complexity of Magic is @(n?). Since there are n? positions in which the algorithm must place a number, we see that @(n”) is the best bound an algorithm for the magic square problem can have. a Example 1.18 [Computing 2"] Our final example is to compute 2” for any real number « and integer n > 0. A naive algorithm for solving this problem is to perform n — 1 multiplications as follows: pou ; 1 to n—1 do power := power * x; Samet This algorithm takes ©(n) time. A better approach is to employ the “re peated squaring” trick. Consider the special case in which n is an integral power of 2 (that is, in which n equals 2* for some integer k). The following algorithm computes <”, power for to k do power := power?; The value of power after q iterations of the for loop is ©". Therefore, this al- gorithm takes only @(k) = ©(logn) time, which is a significant improvement over the run time of the first algorithm. Can the same algorithm be used when n is not an integral power of 2? Fortunately, the answer is yes. Let bxbj-1-- bibp be the binary representa- tion of the integer n, This means that n = Sho bg2%. Now, = (2x) + (2) (a) a eee (2) Also observe that by is nothing but n mod 2 and that |n/2J is byby—1 ---bi in binary form, These observations lead us to Exponentiate (Algorithm 1.16) for computing 2”. 36 CHAPTER 1. INTRODUCTION 1 Algorithm Magic(n) 2 // Create a magic square of size n, n being odd 3 { 4 if ((n mod 2) = 0) then 5 6 write (‘'n is even"); return; 7 8 else 9 { 10 for i:= 0ton—1do // Initialize square to zero. ll for j ton—1do square(i,j] = 0; 12 square[O,(n — 1)/2] = 1; // Middle of first row B // (i,j) is the current position. 4 j= (n—1)/3 15 for key := 2 to n? do 16 { 17 // Move up and left. The next two if statements. 18 7/ may be replaced by the mod_ operator if 19 ]/ -1 mod n has the value n — 20 if (i> 1) then k 21 if (j > 1) then: ; n-1; 22 if (square{k,[] > 1) then i+1) mod n; 23 else // square(k,!] is empty. 24 { 25 dss kyj: 26 27 squarefi, j] = key; 28 } 29 // Output the magic square. 30 for i:=0 to n—1do 31 for j = 0 to n—1 do write (square(i, j]); 32 ry) Algorithm 1.15 Magic square 1.3. PERFORMANCE ANALYSIS 37 1 Algorithm Exponentiate(«,n) 2 // Return 2” for an integer n > 0. 3 4 m= nj power = 1; an 5 while (m > 0) do 6 7 while ((m mod 2) = 0) do 8 9 m= [m/2I; 10 } ll mi := m — 1; power := power * 2} 12 13 return power; Ty Algorithm 1.16 Computation of 2” Proving the correctness of this algorithm is left as an exercise. The vari- able m starts with the value of n, and after every iteration of the innermost while loop (line 7), its value decreases by a factor of at least 2. Thus there will be only ©(log n) iterations of the while loop of line 7. Each such itera~ tion takes @(1) time. Whenever control exits from the innermost while loop, the value of m is odd and the instructions m :=_m — 1; power := power * z} xecuted once. After this execution, since m becomes even, either the entered again or the outermost while loop (line innermost while loop i 5) is exited (in case m = 0). Therefore the instructions m := m — 1; power := power * z; can only be executed O(logn) times. In summary, the overall run time of Exponentiate is O(log n). a 1.3.4 Practical Complexities ‘We have seen that the time complexity of an algorithm is generally some function of the instance cha . This function is very useful in de- termining how the time requir ments ‘vary as the instance characteristics change, The complexity function can also be used to compare two algo- rithms P and Q that perform the same task. Assume that algorithm P has complexity @(n) and algorithm Q has complexity O(n”). We can assert that algorithin P is faster than algorithm Q for sufficiently large n. To see the validity of this assertion, observe that the computing time of P is bounded 38 CHAPTER 1. INTRODUCTION from above by cn for some constant ¢ and for all n,n > ny, whereas that of Q is bounded from below by dn? for some constant d and all n, n > ng. Since cn < dn? for n > c/d, algorithm P is faster than algorithm @ whenever n > max{n1,n2, ¢/d}. You should always be cautiously aware of the presence of the phrase “suf- ficiently large” in an assertion like that of the preceding discussion. When deciding which of the two algorithms to use, you must know whether the n you are dealing with is in fact, sufficiently large. If algorithm P runs in n milliseconds, whereas algorithm @ runs in n? milliseconds, and if you always have n < 10°, then, other factors being equal, algorithm Q is the one to use To get a feel for how the various functions grow with n, you are advised to study Table 1.7 and Figure 1.3 very closely. It is evident from Table 1.7 and Figure 1.3 that the function 2" grows very rapidly with n. In fact, if an algorithm needs 2" steps for execution, then when n = 40, the number of steps needed is approximately 1.1 « 10!2, On a computer performing one billion steps per second, this would require about 18.3 minutes. If n = 50, the same algorithm would run for about 13 days on this computer. When n = 60, about 310.56 years are required to execute the algorithm and when n (00, about 4+ 10! years are needed. So, we may conclude that the utility of algorithms with exponential complexity is limited to small n (typically n< 40). logn | n[nlogn| 1 7 ea oy 1 0 T T 2 1| 2 2 4 8 4 2] 4 8} 16 64 16 3| 8 24) 64] 512 256 4| 16 64) 256] 4,096 65,536 5 | 32| 160 | 1,024 | 32,768 | 4,294,967,296 Table 1.7 Function values Algorithms that have a complexity that is a polynomial of high degree are also of limited utility. For example, if an algorithm needs no steps, then using our 1-billion-steps-per-second computer, we need 10 seconds when n. 0, 3171 years when n = 100, and 3.1710" years when n = 1000. If the algorithm’s complexity had been n* steps instead, then we would need one second when r = 1000, 110.67 minutes when n = 10,000, and 11.57 days when n = 100,000. 1.3. PERFORMANCE ANALYSIS 39 Figure 1.3 Plot of function values 40 CHAPTER 1. INTRODUCTION Table 1.8 gives the time needed by a one-billion-steps-per-second com- puter to execute an algorithm of complexity f(n) instructions. You should note that currently only the fastest computers can execute about 1 billion instructions per second. From a practical standpoint, it is evident that for reasonably large n (say n> 100), only algorithms of small complexity (such as n, nlogn, n, and n°) are feasible. Further, this is the case even if you could build a computer capable of executing 10!” instructions per second, In this case, the computing times of Table 1.8 would decrease by a factor of 1000. Now, when n = 100, it would take 3.17 years to execute n!° instruc- tions and 4 « 10" years to execute 2” instructions. Tine for [eu ngisations on 10° meine computer = aaa] pale we | Fay LLiny fayeaT ‘ Ta iar Ta a | oaks tei pia e esate S| othe Mis fay | 100 Tae 6 ns iu | ‘ime 16,000 | 10a 130 vom | r6.6r min | “hist yoo,e00 | 100 | 1.60 me Tos | thera’ | sin yr 000,000 | ‘rns | sors | ror in | aurige | sizto¥ ye Table 1.8 Times on a 1-billion-steps-per-second computer 1.3.5 Performance Measurement Performance measurement is concerned with obtaining the space and time requirements of a particular algorithm. These quantities depend on the compiler and options used as well as on the computer on which the algorithm is run. Unless otherwise stated, all performance values provided in this book are obtained using the Gnu C++ compiler, the default compiler options, and the Spare 10/30 computer workstation: In keeping with the discussion of the preceding section, we do not concern ourselves with the space and time needed for compilation. We justify this by the assumption that each program (after it has been fully debugged) is compiled once and then executed several times. Certainly, the space and time needed for compilation are important during program testing, when more time is spent on this task than in running the compiled code. We do not consider measuring the run-time space requirements of a pro- gram. Rather, we focus on measuring the computing time of a program. To obtain the computing (or run) time of a program, we need a clocking procedure. We assume the existence of a program GetTime() that returns the current time in milliseconds. 1,3. PERFORMANCE ANALYSIS 41 Suppose we wish to measure the worst-case performance of the sequential search algorithm (Algorithm 1.17). Before we can do this, we need to (1) decide on the values of n for which the times are to be obtained and (2) determine, for each of the above values of n, the data that exhibit the worst- case behavior. 1 Algorithm SeqSearch(a, x,n) 2 // Search for « in a[1: n]. a(0] is used as additional spa 3 4 isn; a[0] == 2; 5 while (a[i] # 2) doi 6 return i; 7} Algorithm 1.17 Sequential search ased on the amount of timing we wish to perform and also on what we expect to do with the times once re obtained. Assume that for Algorithm 1,17, our intent is simply to predict how long it will take, in the worst case, to sea for x, given the size n of a. An asymptotic analysis reveals that this time is @(n). So, we expect a plot of the times to be a straight line. Theoretically, if we know the times for any two values of n, the straight line is determined, and we can obtain the time for all other values of n from this line. In pra we need the times for more than two values of n, This is so for the following reasons: 1. Asymptotic analysis tells us the behavior only for sufficiently large values of n. For smaller values of n, the run time may not follow the asymptotic curve. To determine the point beyond which the asymp- totic curve is followed, we need to examine the times for several values. ofn. 2. Even in the region where the asymptotic behavior is exhibited, the times may not lie exactly on the predicted curve (straight line in the case of Algorithm 1.17) because of the effects of low-order terms that are discarded in the asymptotic analysis. For instance, an al- gorithm with asymptotic complexity O(n) can have time complexity cin + logn-+eg or, for that matter, any other function of n in which the highest-order term is cn for some constant c), 1 > 0. It is reasonable to expect that the asymptotic behavior of Algorithm 1.17 begins for some n that is smaller than 100. So, for n > 100, we obtain the 42 CHAPTER 1. INTRODUCTION run time for just a few values. A reasonable choice is n = 200, 300, 400, ... , 1000. There is nothing magical about this choice of values, We can just as well use n = 500,1,000,1,500,...,10,000 or n = 512, 1,024, 2,048, 25. It costs us more in terms of computer time to use the latter choi and we probably do not get any better information about the run time of Algorithm 1.17 using these choices. For n in the range (0, 100] we carry out a more-refined measuremer we are not quite sure where the asymptotic behavior begins. Of course, if our measurements show that the straight-line behavior does not begin in this range, we have to perform a more-detailed measurement in the range (100, 200], and so on, until the onset of this behavior is detected. ‘Times in the range (0, 100] are obtained in steps of 10 beginning at n = 0. Algorithm 1.17 exhibits its worst-case behavior when z is chosen such that. it is not one of the a{i]’s. For definiteness, we set ali] 121 (b) D% n(n + 1)(2n+1)/6, n>1 () Dhoa! = @!-yf(e-1), #41, n>0 Determine the frequency counts for all statements in the following two algorithm segments: 1 3 \ for i:=1tondo 2 while (i n); 8 t=; 9 while (i < |n/2|) do 10 ul =i+]; 12 13 } Algorithm 1,23 Example algorithm Algorithm Transpose(a,n) { for i:= 1 to n—1do for j :=i+1tondo if 5 aff, J] = alj, ah; al Algorithm 1.24 Matrix transpose 1.3. PERFORMANCE ANALYSIS 51 ltondo j:=1tondo efi, j] := efi, J] + aft, A] * b[k, 3]; Algorithm 1.25 Matrix multiplication 7. (a) Do Exercise 4 for Algorithm 1.26. This algorithm multiplies two: matrices a and 6, where a is an m x n matrix and b is an nx p matrix. 1 Algorithm Mult(a,, c,m,n,p) 2 { 3 for i:= 1 to mdo 4 for j = 1 to pdo 5 6 efi, i] = 05 7 for k := 1 to n do 8 = efi, i] + aff, kl + Ok, js 9 } Ww } Algorithm 1.26 Matrix multiplication (b) Under what conditions is it profitable to interchange the two out- ermost for loops? 8. Show that the following equalities are corr: (a) bn? -6n = O(n?) (b) nt = O(n") (c) 2n?2" 4+ nlogn = @(n?2") (@) Dey? = OW & . Obtain worst-ci CHAPTER 1. INTRODUCTION (ce) Doi = O(n4). (f) n2" +642" = O(n") (g) n}+108n? = @(n') (h) 6n3/(logn-+1) = O(n) (i) nhl + nlogn = O(n!) (i) n**4 nF logn = @(n***) for all fixed k and ¢, k > 0 and € > 0 (k) 10n® + 15n4 + 100n?2" = O(100n?2") (1) 338n3 + 4n? = Q(n?) (m) 33n3 +4n? = Q(n3) . Show that the following equalities are incorrect: (a) 10n? +9 = O(n) (b) n?logn = O(n?) (c) n?/logn = O(n?) (a) 82" + 6n23" = O(n*2") . Prove Theorems 1.3 and 1.4. . Analyze the computing time of SelectionSort (Algorithm 1.2). se run times for SelectionSort (Algorithm 1.2). Do this for suitable values of n in the range [0, 100]. Your report must include a plan for the experiment as well as the measured times. These times are to be provided both in a table and as a graph . Consider the algorithm Add (Algorithm 1.11). (a) Obtain run times for n = 1,10,20,... , 100. (b) Plot the times obtained in part (a) . Do the previous exercise for matrix multiplication (Algorithm 1.26). . A complex-valued matrix X is represented by a pair of matrices (A, B), where A and B contain real values. Write an algorithm that computes the product of two complex-valued matrices (A, B) and (C, D), where (A,B) « (C,D) = (A+4B) * (C+ iD) = (AC — BD) + i(AD + BC) Determine the number of additions and multiplications if the matrices arealln x n. 14, RANDOMIZED ALGORITHMS 53 1.4 RANDOMIZED ALGORITHMS 1.4.1 Basics of Probability Theory Probability theory has the goal of characterizing the outcomes of natural or conceptual “experiments.” Examples of such expe sing a coin ten times, rolling a die three times, playing a lottery, gambling, picking a ball from an urn containing white and red balls, and so on. ble outcome of an experiment is called a sample point and the le outcomes is known as the sample space S is text ich a sample space is called a discrete sample An event is a subset of the sample space S. If the sample space consists of n sample points, then there are 2” possible events. Example 1.19 [Tossing three coins] When a coin is tossed, there are two possible outcomes: heads (#) and tails (1). Consider the experiment. of throwing three coins. There are eight possible outcomes: HHH, HHT, ATH, HTT. THH, THT, TTH, and TTT. Each such outcome is a sample point. The sets {HHT, HTT, TTT}, {HHH, TTT}, and { } are three possible events. The third event has no sample points and is the empty s For this experiment there are 2° possible events. a Definition 1.9 [Probability] The probability of an event B is defined to be El, where § is the sample space. a Example 1.20 [Tossing three coins] The probability of the event {HHT, HTT. 1TT} is 3. The probability of the event {HHH,TTT} is 2 and that of the event { } is zero. a Note that the probability of S, the sample space Example 1.21 [Rolling two dice] Let us look at the experiment of rolling two (six-faced) dice. There are 36 possible outcomes some of which are (1,1), (1,2), (1,3),.... What is the probability that the sum of the two fac is 107 7 ‘he event that the sum is 10 consists of the following sample points: 1,9), (2.8), (8,7), (4,6), (5,5), 6,4), (7,3), (8,2), and (9,1). Therefore, the probability of this event is 3 = t. Qo Definition 1.10 [Mutual exclusion] Two events E; and Ep are said to be mutuully exclusive if they do not have any common sample points, that is, if. B=, e a4 CHAPTER 1. INTRODUCTION Example 1.22 [Tossing three coins] When we toss three coins, let Ey b event that there are two H’s and let E be the event that there are at least two T's. These two events are mutually exclusive since there sample points. On the other hand, if H%, is defined to be the event that there is at least one T, then £; and E% will not be mutually exclusive since they will have THH, HTH, and HHT as common sample points a The probability of event E is denoted as Prob.[E]. The complement of E, denoted E, is defined to be $— E. If Ey and By are two events, the probability of E, or Ey or both happening is denoted as Prob.[E} U Ey]. The probability of both B, and B» occurring at the same time is denoted as Prob[Ey M Bs]. The corresponding event is E; 1 Ep. Theorem 1.5 1. Prob.[B] 2. Prob{E U Bo] 1~ Prob.[E}. Prob[E\] + Prob.[E2] — Prob.[E, 0 Ea) Prob.[E\| + Prob.[E2] Alt Definition 1.11 [Conditional probability] Let Ei and Hp be any two events of an experiment. The conditional probability of Ey given Ep, denoted by Prob. (Ei|By], is defined as Pps, oO Example 1.23 [Tossing four coins] Consider the experiment of tossing four coins. Let Ei be the event that the number of H’s is even and let E2 be the event that there is at least one H. Then, Fy is the complement of tl event that there are no H’s. The probability of no H’s is j;. Therefor 1 = qe = 1B. Prob[Ey Ey] is ince the event Ey Ea 1 sample points HHHH, HHTT, HTHT, HTTH, THHT, THTH, and TTHH. Thus, Prob[E,|B2] is 4S = Definition 1.12 [Independence] Two events £1 and By are said to be inde- pendent if Prob.[E, 0 E2] = Prob[E,] * Prob.|E2). a Example 1.24 [Rolling a die twice] Intuitively, we say two events B, and ‘Ey are independent if the probability of one event happening is in no way af- fected by the occurrence of the other event. In other words, if Prob.[E |E: Prob.{E,], these two events are independent. Suppose we roll a die twic What is the probability that the outcome of the second roll is 5 (call th event E}), given that the outcome of the first roll is 4 (call this event. Ex)? The answer is $ no matter what the outcome of the first roll is. In this case E, and Ep are independent. Therefore, Prob.[E, 0 E] = 1.4, RANDOMIZED ALGORITHMS a »in is flipped 100 times what Example 1.25 (Flipping a coin 100 times] Ifa is the probability that all of the outcomes are tails? The probability that the first outcome is T is 4. Since the outcome of the second flip is independent of the outcome of the first flip, the probability that the first two ou can be obtained by multiplying the corresponding probabilities to . Extending the argument to all 100 outcomes, we conclude that the 100 sis (+). In this case we say the outcomes probability of obt of the 100 coin fli ning 100 s are mutually independent. oO Definition 1.13 [Random variable] Let $ be the sample space of an exper- iment. A random variable on § is a function that maps the elements of S to the set of real numbers. For any sample point s € S, X(s) denotes the image of s under this mapping. If the range of X, that is, the set of values X can take, is finite, we say X is discrete Let the range of a discrete random variable X be {r,r2,.--s7m}» Then, Prob.[X = ri], for any i, is defined to be the the number of sample points whose image is r; divided by the number of sample points in S. In this text we are concerned mostly with discrete random variables. Oo Example 1.26 We flip a coin four times. The sample space consists of 2* sample points. We can define a random variable X on $ as the number of heads in the coin flips. For this random variable, then, X(HTHH) X(HHHA) =4, and so on. The possible values that X can take 0, 1,2. and 4. Thus X is discrete. Prob.[X = 0] is +5, since the only sample point whose image is 0 is TTTT. Prob[X = 1] is 7, since the four sample points HITT, THTT, TTHT, and TITH have 1'as their image. a Definition 1.14 [Expected value] If the sample space of an experiment is S = {81,89,....5n}, the expected value or the mean of any random variable X is defined to be D7; Prob.[s)] *X(s:) = 1 Dy X(s2)- a Example 1.27 [Coin tosses] The sample space corresponding to the exper- iment of tossing three coins is S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. If X is the number of heads in the coin flips, then the expected value of X is 1(3+2+24+1+2+14+1+40)=15 Oo Definition 1.15 [Probability distribution] Let X be a discrete random vari- able defined over the sample space S. Let {r1,r2,-.-,?m} be its range. Then. the probability distribution of X is the sequence Prob[X = ri], Prob{X = rg], ..., Prob{X = rm]. Notice that 27", Prob[X =r] = 1. o 56 CHAPTER 1. INTRODUCTION Example 1.28 [Coin tosses] If a coin is flipped three times and X is the number of heads, then X can take on four values, 0, 1, 2, and 3. The probability distribution of X is given by Prob.[X = 0] = ¢, Prob.| y= 3, Prob[X =2] = 3, and Prob[X = 3] = a Definition 1.16 [Binomial distribution] A Bernoulli trial is an experiment that has two possible outcomes, namely, success and failure. The probability of success is p. Consider the experiment of conducting the Bernoulli trial n times. This experiment has a sample space $ with 2" sample points. Let X be a random variable on S defined to be the numbers of suct in the n trials. The variable X is said to have a binomial distribution with parameters (n, p). The expected value of X is np. Also, Prob.[X =i] = (;)ea — pr i o In several applications, it is necessary to estimate the probabilities at the tail ends of probability distributions. One such estimate is provided by the following lemma. Lemma 1.1 [Markov's inequality] If X is any nonnegative random variable whose mean is jz, then Prob[X >a] < aIr o Example 1.29 Let : be the mean of a random variable X. We can use Markov’s lemma (also called Markov’s inequality) to make the following statement: “The probability that the value of X exceeds 2y is < 4." Con- sider the example: if we toss a coin 1000 times, what is the probability that the number of heads is > 600? If X is the number of heads in 1000 tosses, then, the expected value of X, E[X], is 500. Applying Markov’s with « = 600 and j= 500, we infer that P[X > 600] < § ‘Though Markov’s inequality can be applied to any nonnegative random variable, it is rather weak. We can obtain tighter bounds for a number of important distributions including the binomial distribution, These bounds are due to Chernoff. Chernoff bounds as applied to the binomial distribution are employed in this text to analyze randomized algorithms. 1.4, RANDOMIZED ALGORITHMS 57 Lemma 1.2 [Chernoff bounds] If X is a binomial with parameters (n, p), and m > np is an integer, then Prob(X >m) < (2) elm—np), (1.1) Also, Prob.(X < |(1—e)pn]) < eenri?) (1.2) and Prob.(X > [(1+e)np]) < e-en/3) (1.3) for allO 600. We can use Equation 1.3 to estimate this probability. ‘The value for ¢ here is 0.2. Also, n = 1000 and p = 4. Equation 1.3 now becomes PLX > 600] < ef-(0-2)°(800/3)] = ¢-20/3 << 9.901273 This estimate is more precise than that given by Markov’s inequality. 0 1.4.2. Randomized Algorithms: An Informal Description A randomized algorithm is one that makes use of a randomizer (such as a random number generator). Some of the decisions made in the algorithm pend on the output of the randomizer. Since the output of any random- izer might differ in an unpredictable way from run to run, the output of a randomized algorithm could also differ from run to run for the same input. The execution time of a randomized algorithm could also vary from run to run for the same input. Randomized algorithms can be categorized into two classes: The first is algorithms that always produce the same (correct) output for the same input. These are called Las Vegas algorithms. The execution time of a Las Vegas algorithm depends on the output of the randomizer. If we are lucky, the algorithm might terminate fast, and if not, it might run for a longer period of time. In general the execution time of a Las Vegas algorithm is characte as a random variable (see Section 1.4.1 for a definition). The second is algorithms whose outputs might differ from run to run (for the same input). These are called Monte Carlo algorithms. Consider any problem for which there are only two possible answers, say, yes and no. If a Monte Carlo algorithm is employed to solve such a problem, then the algorithm might give incorrect answers depending on the output of the randomizer. We require that the probability of an incorrect answer from a Monte Carlo algorithm be low. ‘Typically, for a fixed input, a Monte Carlo algorithm does not display 58 CHAPTER 1. INTRODUCTION much variation in execution time between runs, whereas in the case of a Las Vegas algorithm this variation is significant. We can think of a randomized algorithm with one possible randomizer output to be different from the same algorithm with a different, possible randomizer output. Therefore, a randomized algorithm can be viewed as a family of algorithms. For a given input, some of the algorithms in this family may run for indefinitely long periods of time (or may give incorrect answers). The objective in the design of a randomized algorithm is to ensure that the number of such bad algorithms in the family is only a small fraction of the total number of algorithms. If for any input we can show that at least 1 —« (c being very close to 0) fraction of algorithms in the family will run quickly (respectively give the correct answer) on that input, then clearly, a random algorithm in the family will run quickly (or output the correct answer) on any input with probability > 1 —e. In this case we say that this family of algorithms (or this randomized algorithm) runs quickly (respectively gives the correct answer) with probability at least 1 —e, where e is called the error probability, Definition 1.17 [The O()] Just like the O() notation is used to characterize the run times of non randomized algorithms, O() is used for characterizing the run times of Las Vegas algorithms. We say a Las Vegas algorithm has a resource (time, space, and so on.) bound of O(g(n)) if there exists a constant csuch that the amount of resource used by the algorithm (on any input of size n) is no more than cag(n) with probability > 1 — 2;. We shall refer to these bounds as high probability bounds. Similar definitions apply also to such functions as 6(), 0, 6(), ete. 0 Definition 1.18 [High probability] By high probability we mean a probability of > 1—n~ for any fixed a. We call a the probability parameter. a As mentioned above, the run time T of any Las Vegas algorithm is typi- cally characterized as a random variable over a sample space S. The sample points of $ are all possible outcomes for the randomizer used in the algo- rithm. Though it is desirable to obtain the distribution of T, often this is a challenging and unnecessary task. The expected value of T often suffi good indicator of the run time. We can do better than obtaining the mean of T but short of computing the exact distribution by obtaining the high probability bounds. The high probability bounds of our interest are of the form “With high probability the value of T will not exceed To,” for some appropriate Ty. Several results from probability theory can be employed to obtain high probability bounds on any random variable. Two of the more useful such results are Markov’s inequality and Chernoff bounds. 1.4. RANDOMIZED ALGORITHMS 59 Next we give two examples of randomized algorithms. The first is of the Las Vegas type and the second is of the Monte Carlo type. Other examples are presented throughout the text. We say a Monte Carlo (Las Vegas) al- gorithm has failed if it does not give a correct answer (terminate within a specilied amount of time). 1.4.3 Identifying the Repeated Element Consider an array a[] of m numbers that has 3 distinct elements and > copies of another element. The problem is to identify the repeated element. Any deterministic algorithm for solving this problem will need at least +2 time steps in the worst case. This fact can be argued as follows: Gonsider an adversary who has perfect knowledge about the algorithm used and who is in charge of selecting the input for the algorithm. Such an adversary can make sure that the first 3 + 1 elements examined by the algorithm are all distinct. Even after having looked at 5 + 1 elements, the algorithm will not be in a position to infer the repeated element. It will have to examine at least # + 2 elements and hence take at least +2 time steps. In contrast there is a simple and elegant randomized Las Vegas algorithm that takes only O(logn) time. Tt randomily picks two array elements and checks whether they come from two different cells and have the same value. If they do, the repeated element has been found. If not, this basic step of sampling is repeated as many times as it takes to identify the repeated element. Tn this algorithm, the sampling performed is with repetitions; that is, the first and second elements are randomly picked from out of the n elements (each element being equally likely to be picked). Thus there is a probability (equal to 1) that the same array element is picked each time. If we just check for the equality of the two elements picked, our answer might be incorrect (in case the algorithm picked the same array index each time). Therefore, it ential to make sure that the two array indices picked are different and the two array cells contain the same value. is es Th array algorithm is given in Algorithm 1.27. The algorithm returns the index of one of the copies of the repeated element. Now we prove that the run time of the above algorithm is O(logn). Any iteration of the while loop will be successful in identifying the repeated number if # is any one the ® array indices corresponding to the repeated clement and j is any one of the sae % indices other than i. In other words, the probability that the algorithm quits in any given iteration of the while loop is P = 22-0 | which is > t for all n > 10. This implies that the probability that the algorithm does not quit in a given iteration is < 60 CHAPTER 1. INTRODUCTION 1 RepeatedElement(a, n) 2 // Finds the repeated element from a[1 : n]. 3 4 while (true) do 5 { 6 i := Random() mod n +1; j := Random() mod n+1; 7 // ‘and j are random numbers in the range (1,7). 8 if (i ¥ j) and (a(i] = a{j])) then return #; 9 10 } Algorithm 1.27 Identifying the repeated array number Therefore, the probability that the algorithm does not quit in 10 iterations 10 is < (4) < 1074. So, Algorithm 1.27 will terminate in 10 iterations or less with probability > .8926. The probability that the algorithm does not . : e100 ° : terminate in 100 iterations is < (#)" < 2.04+10-!. That is, almost certainly the algorithm will quit in 100 iterations or less. If n equals 2+ 10°, for example, any deterministic algorithm will have to spend at least one million time steps, as opposed to the 100 iterations of Algorithm 1.27! In general, the probability that the algorithm does not quit in the first ca log n (c is a constant to be fixed) iterations is < (4/5)°a!96" = ymca (5/4) which will be < n°@ if we pick ¢ > Geter Thus the algorithm terminates in —-fe7q70logn iterations or less with probability > 1—n7®. Since each iteration of the while loop takes O(1) time, the run time of the algorithm is O(logn). Note that this algorithm, if it terminates, will always output the correct answer and hence is of the Las Vegas type. The above analysis shows that the algorithm will terminate quickly with high probability. The same problem of inferring the repeated element can be solved using many deterministic algorithms. For example, sorting the array is one But sorting takes 0(n logn) time (proved in Chapter 10). An alternative is to partition the array into [] parts, where each part (possibly except for one part) has three array el and to search the individual parts for 1.4. RANDOMIZED ALGORITHMS 61 the repeated element. At least one of the parts will have two copies of the repeated element. (Prove this!) The run time of this algorithm is Q(n) 14.4 Primality Testing Any integer greater than one is said to be a prime if its only divisors are 1 and the integer itself. By convention, we take 1 to be a nonprime. ‘Then 2,3,5.7, 11, and 13 are the first six primes. Given an integer 1, the problem of deciding whether n is a prime is known as primality testing. It has a number of applications including cryptology. Ifa number n is composite (i.e., nonprime), it must have a divisor < | /nJ. This observation leads to the following simple algorithm for primality testing Consider each number £ in the interval [2, | /7]] and check whether £ divide: n. If none of these numbers divides n, then n. is prime; otherwise it is composite. Assuming that it takes ©(1) time to determine whether one integer divides another, the naive primality testing algorithm has a run time of O(\/n). ‘The input size for this problem is [(logn + 1)], since n can be represented in binary form with these many bits. Thus the run time of this simple algorithm is exponential in the input size (notice that Yn = 22!°8”) We can devise a Monte Carlo randomized algorithm for primality testing that runs in time O((logn)*). The output of this algorithm is correct with high probability. If the input is prime, the algorithm never gives an incorrect answer. However, if the input number is composite (ie., nonprime), then there is a small probability that the answer may be incorrect. Algorithms of this kind are said to have one-sided error. Before presenting further details, we list two theorems from number the- ory that will serve as the backbone of the algorithm. The proofs of these theorems can be found in the references supplied at the end of this chapter. Theorem 1.6 [Fermat] If n is prime, then a"~' = 1 (mod n) for any in- teger a 1 — ne, 1 Prime0(n, a) 2 // Returns true if n is a prime and false otherwise. 3 // cis the probability parameter. 4 { 5 -h 6 1 to large do // Specify large. 7 8 mi=gy=ly 9 a:= Random() mod q +1; 10 // Choose a random number in the range [1,n — 1]. ll Z=a 12 // Compute a"! mod n. 13. while (m > 0) do 44 15 while (m mod 2 = 0) do 16 { 7 = z* mod n; m:= |m/2]; 18 } 19 m:=m—1; y:=(y*2) mod n; 20 21 if (y # 1) then return false; 22 // ta"! mod n is not 1, n is not a prime. 23 } 24 return true; 25 + Algorithm 1.28 Primality testing: first attempt If the input is prime, Algorithm 1.28 will never output an incorrect an- swer. If n is composite, will Fermat’s equation never be satisfied for any a less than n and greater than one? If so, the above algorithm has to examine just one a before coming up with the correct answer. Unfortunately, the 1.4. RANDOMIZED ALGORITHMS 63 answer to this question is no. Even if n is composite, Fermat’s equation may be satisfied depending on the a chosen. Is it the case that for every n (that is composite) there will be some nonzero constant fraction of a's less than n that will not satisfy Fermat’s equation? If the answer is yes and if the above algorithm tries a sufficiently large number of a’s, there is a high probability that at least one a violating Fermat's equation will be found and hence the correct answer be output. Here again, the answer is no. There are composite numbers (known as Carmichael numbers) for which every a that is less than and relatively prime to n will satisfy Fermat’s equation. (The number of a’s that do not satisfy Fermat's equation need not be a constant fraction.) The numbers 561 and 1105 are examples of Carmichael numbers. Fortunately, a slight modification of the above algorithm takes care of these problems. The modified primality testing algorithm (also known Miller-Rabin’s algorithm) is the same as PrimeO (Algorithm 1.28) except that within the body of Prime0, we also look for nontrivial square roots of n. The modified version is given in Algorithm 1.29. We assume that n is odd. Miller-Rabin’s algorithm will never give an incorrect answer if the input prime, since Fermat's equation will always be satisfied and no nontrivial square root of 1 modulo n can be found. If n is composite, the above algorithm will detect the compositeness of n if the randomly chosen a either leads to the discovery of a nontrivial square root of 1 or violates Fermat's equation. Call any such a a witness to the compositeness of n. What is the probability that a randomly chosen a will be a witness to the compositenes of n? This question is answered by the following theorem (the proof can be found in the references at the end of this chapter). Theorem 1.8 There are at least witnesses to the compositeness of n if n is composite and odd. o Assume that 7 is composite (since if n is prime, the algorithm will always be correct). The probability that a randomly chosen a will be a witn > 4!, which is very nearly equal to $. This means that a randomly chosen a will fail to be a witness with probability < } Therefore, the probability that none of the first log a’s chosen is a ae witness is < ( = n°. In other words, the algorithm Prime will 2 give an incorrect answer with only probability < n~¢ The run time of the outermost while loop is nearly the same as that of Exponentiate (Algorithm 1.16) and equal to O(logn). Since this while loop is executed O(log n) times, the run time of the whole algorithm is O(log? n). 66 CHAPTER 1. 'TRODUCTION 5. Given a 2-sided coin. Using this coin, how will you simulate an n-sided coin (a) when n is a power of 2?. (b) when n is not a power of 22. 6. Compute the run time analysis of the Las Vegas algorithm given in Algorithm 1.30 and express it using the O() notation. LasVegas() { while (true) do i := Random() mod 2; if (i > 1) then return; eI OAR Algorithm 1.30 A Las Vegas algorithm 7. There are /7i copies of an element in the array c. Every other element of ¢ occurs exactly once. If the algorithm RepeatedElement is used to identify the repeated clement of ¢, will the run time still be O(log n)? If'so, why? If not, what is the new run tim 8. What is the minimum number of times that an element should be repeated in an array (the other elements of the array occurring exactly once) so that it can be found using RepeatedElement in O(log n) time? 9. An array a has of a particular unknown element x. Every other element in a has at most 2 copies. Present an O(logn) time Monte Carlo algorithm to identify x. The answer should be correct with high probability. Can you develop an O(logn) time Las Vegas algorithm for the same problem? 10. Consider the naive Monte Carlo algorithm for primality testing pre- 1 sented in Algorithm 1.31. Here Power(x,y) computes «¥. What should be the value of ¢ for the algorithm’s output to be correct with high probability? 1. Let A be a Monte Carlo algorithm that solves a decision problem 7 in time T. The output of A is correct with probability > $. Show how 14. RANDOMIZED ALGORITHMS 67 Mee eedSeHsene Primel(n) // Specify t. for i:=1to do { Power(7, 0.5); 3 andom() mod m + 2; if ((n mod j) = 0) then return false; // If j divides n, nis not prime, 0 l return true; 23 Algorithm 1.31 Another primality testing algorithm 12. 13. you can modify A so that its answer is correct with high probability. ‘The modified version can take O(T log n) time. In general a Las Vegas algorithm is preferable to a Monte Carlo algo- rithm, since the answer given by the former is guaranteed to be correct. ‘There may be critical situations in which even a very small probability of an incorrect answer is unacceptable. Say there is a Monte Carlo algorithm for solving a problem = in 7; time units whose output is correct with probability > 5. Also assume that there is another algo- rithm that can check whether a given answer is valid for 7 in Ty time units. Show how you use these two algorithms to arrive at a Las Vegas algorithm for solving 1 in time O((T; +72) log n) The problem considered here is that of searching for an element x in an array a{l : nj. Algorithm 1.17 gives a deterministic ©(n) time algorithm for this problem. Show that any deterministic algorithm will have to take O(n) time in the worst case for this problem. In contrast a randomized Las Vegas algorithm that searches for n — 1)”. Two more substantial operations are inserting and deleting elements. ‘The corresponding algorithms are Add and Delete (Algorithm 2.1). Each execution of Add or Delete takes a constant amount of time and is independent of the number of elements in the stack. Another way to represent a stack is by using links (or pointers). A node is a collection of data and link information. A stack can be represented by using nodes with two fields, possibly called data and link. The data field of each node contains an item in the stack and the corresponding link field points to the node containing the next item in the stack. The link field of the last node is zero, for we assume that all nodes have an address greater than zero, For example, a stack with the items A, B, C, D, and E inserted in that order, looks as in Figure 2.2. 2.1. STACKS AND QUEUES 7 Algorithm Add(itern) // Push an element onto the stack. Return true if su // else return false. item is used as an input. if (top >» —1) then write ("Stack is full!"); return false; else top := top + 1; stack{top] := item; return true; } 1 Algorithm Delete(item) 2 // Pop the top clement from the stack. Return true if success 3 // else return false. item is used as an output. 4 5 if (top < 0) then 6 i write ("Stack is empty!"); return false; 8 9 else 10 { 1l itern := stack|top]; top := top — 1; return true; 12 } 1} Algorithm 2.1 Operations on a stack stack i 2 + -Plttel 4+] tele data link Figure 2.2 Example of a five-element, linked stack 72 CHAPTER 2. ELEMENTARY DATA STRUCTURES // Type is the type of data. node =record Type data; node links 1 Algorithm Add(item) 2 3 // Get a new node. 4 temp := new node; 5 if (temp # 0) then 6 7 (temp — data) := item; (temp + link) := top; 8 top := temp; return true; 9 10 else iL 12 write ("Out of space! 13 return false; 14 } 15 } 1 Algorithm Delete(item) 2 3 if (top = 0) then 4 5 write ("Stack is empty 6 return false; 7 8 else 9 { 10 iter := (top + data); temp := top; ul top := (top — link); 12 delete temp; return true; 13 4 } Algorithm 2.2 Link representation of a stack 2.1. STACKS AND QUEUES 73 The variable top points to the topmost node (the last item inserted) in the list. The empty stack is represented by setting top := 0. Because of the way the links are pointing, insertion and deletion are easy to accomplish. See Algorithm 2.2. In the case of Add, the statement temp := new node; assigns to the variable termp the address of an available node. If no more nodes exist, it returns (). If a node exists, we store appropriate values into the two fields of the node. Then the variable top is updated to point to the new top element of the list. Finally, true is returned. If no more space exists, it prints an error incssage and returns false. Refering to Delete, if the stack is empty, then trying to delete an item produces the error message "Stack is empty!" and false is returned. Otherwise the top element is stored as the value of the variable iter, a pointer to the first node is saved, and top is updated to point to the next node. The deleted node is returned for future use and finally true is returned. ‘The use of links to represent a stack requires more storage than the se- quential array stack[0 : n — 1] does. However, there is greater flexibility when using links, for many structures can simultaneously use the same pool of available space. Most importantly the times for insertion and deletion using cither representation are independent of the size of the stack. An efficient queue representation can be obtained by taking an array q(0 : n — 1] and treating it as if it were circular. Elements are inserted by increasing the variable rear to the next free position. When rear = n — 1, the next element is entered at g[0] in case that spot is free. The variable front always points one position counterclockwise from the first element in the queue. The variable front = rear if and only if the queue is empty and we initially set front := rear := 0. Figui 3 illustrates two of the possible configurations for a circular queue containing the four elements J1 to J4 with n > 4. To insert an element, it is necessary to move rear one position clockwise. This can be done using the code if (rear else rear 1) then rear : rear +13 A more elegant way to do this is to use the built-in modulo operator which computes remainders. Before doing an insert, we increase the rear pointer by saying rear := (rear + 1) mod nj. Similarly, it is necessary to move front one position clockwise each time a deletion is made. An examination of Algorithm 2.3(a) and (b) shows that by treating the array circularly, addition and deletion for queues can be carried out in a fixed amount of time or O(1). One surprising feature in these two algorithms is that the test for queue full in AddQ and the test for queue empty in DeleteQ are the same. In the 74 CHAPTER 2. ELEMENTARY DATA STRUCTURES [nd] / {n-3] In-2] 10} [n-l] front = 0; rear=4 front = n-4; rear =0 Figure 2.3 Circular queue of capacity n — 1 containing four elements J1, J2, 33, and J4 case of AddQ, however, when front = rear, there is actually one space free, q{rear] ‘e the first element in the queue is not at q[front| but is one postion elk se from this point. However, if we insert an item there, then leaves front = rear. To avoid this, we signal queue full and permit a maximum of n —1 rather than n elements to be in the queue at any time. One way to use all n positions is to use another variable, tag, to distinguish between the two situations: that is, tag = 0 if and only if the queue is empty. This however slows down the two algorithms. Since the AddQ and DeleteQ algorithms are used many times in any problem involving queues, the loss of one queue position is more than made up by the reduction in computing time. Another way to represent a queue is by using links. Figure 2.4 shows a queue with the four elements A, B, C, and D entered in that order. As with the linked stack example, each node of the queue is composed of the two fields data and link. A queue is pointed at by two variables, front and rear. Deletions are made from the front, and inserti front = 0 signals an empty queue. The procedures for insertion and dele in linked queues are left as exercises. EXERCISES 1. Write algorithms for AddQ and DeleteQ, assuming the queue is repre- sented as a linked list. 2.1. STACKS AND QUEUES it) 1 Algorithm AddQ(item) 2 // Insert item in the circular queue stored in g[0 :n — 1]. 3 // rear points to the last item, and front is one 4 // position counterclockwise from the first item in g. 5 6 rear := (rear +1) mod n; // Advance rear clockwise. 7 if (front = rear) then 8 9 write ("Queue is full!"); 40 if (front = 0) then rear :=n—1; ul else rear := rear — 1; 2 // Move rear one position counterclockwise. Ls return false; u } 15, else 16 { 17 glrear] := item; // Insert new item. 1s return true; ct) } 20 } (a) Addition of an element 1 Algorithm DeleteQ(item) 2 // Removes and returns the front element of the queue q[0 :n— 1} 3 4 if (front = rear) then 5 6 write ("Queue is empty! 7 return false; 8 Bi 9 else Ww { 1 front := (front +1) mod n; // Advance front clockwise. i) itern := qlfront]; // Set item to front of queue. 13 return true; ul } 1s } (b) Deletion of an clement Algorithm 2.3 Basic queue operations 76 CHAPTER 2. ELEMENTARY DATA STRUCTURES data link [ex B c front rear Figure 2.4 A linked queue with four elements 2. A linear list is being maintained circularly in an array ¢[0 : n—1] with J and r set up as for circular queues. (a) Obtain a formula in terms of f,r, and n for the number of elements in the list. (b) Write an algorithm to dele (c) Write an algorithm to ins kth element. ¢ the kth element in the list. rt an element y immediately after the What is the time complexity of your algorithms for parts (b) and (c)? 3. Let X = (a1,...,2,) and Y = (y1,-.., ym) be two linked lists. Write an algorithm to merge the two lists to obtain the linked list Z = (21, yts 22-925 00-5 Ems Ym: Tms1s-+++8n) ifm < nor Z = (01,y1,02, 2, >@nsYnsYnt1s-++ Ym) ifm > n. 4, A double-ended queue (deque) is a linear list for which insertions and deletions can occur at either end. Show how to represent a deque in a one-dimensional array and write algorithms that insert and delete at either end. eo . Consider the hypothetical data object X2. The object X2 is a linear list with the restriction that although additions to the list can be made at either end, deletions can be made from one end only. Design a linked list representation for X2. Specify initial and boundary conditions for your representation. 2.2 TREES Definition 2.1 [Tree] A tree is a finite set of one or more nodes such that there is a specially designated node called the root and the remaining nod are partitioned into n > 0 disjoint sets Ti,...,Tn, where each of these sets is a tree, The sets T),...,Tp are called the subtrees of the root. o 2.2. TREES 7 2.2.1 Terminology There are many terms that are often used when referring to trees. Consider the tree in Figure 2.5. This tree has 13 nodes, each data item of a node being a single letter for convenience. The root contains A (we usually say node A), and we normally draw trees with their roots at the top. The number of. subtrees of a node is called its degree. The degree of A , of C is 1, and of F is 0. Nodes that have degree zero are called leaf or terminal nodes. The set {K, L, F, G, M, I, J} is the set of leaf nodes of Figure 2.5. The other nodes are referred to as nonterminals. The roots of the subtrees of a node X are the children of X. The node X is the parent of its children. Thus the children of D are H, I, and J, and the parent of D is A. level (A) 1 2 (G) 3 (M) 4 Figure 2.5 A sample tree Children of the same parent are said to be siblings. For example H, I, and J are siblings. We can extend this terminology if we need to so that we k for the grandparent of M, which is D, and so on. ‘The degree of a the maximum degree of the nodes in the tree. The tree in Figure 2.5 has degree 3. The ancestors of a node are all the nodes along the path from the root to that node. The ancestors of M are A, D, and H. The level of a node is defined by initially letting the root be at level one. If a node is at level p, then its children are at level p +1. Figure 2.5 shows the levels of all nodes in that tree. The height or depth of a tree is defined to be the maximum level of any node in the tree. A forest is a set of n > 0 disjoint trees. The notion of a forest is very close to that of a tree because if we remove the root of a tree, we get a forest. For example, in Figure 2.5 if we remove A, we get a forest with three tret 78 CHAPTER 2. ELEMENTARY DATA STRUCTURES Now how do we represent a tree in a computer's memory? If we wish to use a linked list in which one node corresponds to one node in the tree, then a node must have a varying number of fields depending on the number of children. However, it is often simpler to write algorithms for a data representation in which the node size is fixed. We can represent a tree using a fixed node size list structure. Such a list representation for the tree of Figure 2.5 is given in Figure 2.6. In this figure nodes have three fields: tag, data, and link. ‘The fields data and link are used as before with the exception that when tag = 1, data contains a pointer to a list rather than a data item. ‘A tree is represented by storing the root in the first node followed by nodes that point to sublists each of which contains one subtree of the root. tt) +s] E The tag field of a node is one if it has a down-pointing arrow; otherwise it is 2eTO. Figure 2.6 List representation for the tree of Figure 2.5 2.2.2 Binary Trees A binary tree is an important type of tree structure that occurs very often. It is characterized by the fact that any node can have at most two children; that is, there is no node with degree greater than two. For binary trees we distinguish between the subtree on the left and that on the right, whereas for other trees the order of the subtrees is irrelevant. Furthermore a binary tree is allowed to have zero nodes whereas any other tree must have at least one node. ‘Thus a binary tree is really a different kind of object than any other tree. Definition 2.2 A binary tree is a finite set of nodes that is either empty or consists of a root and two disjoint binary trees called the left and right o subtrees. Figure 2.7 shows two sample binary trees. These two trees are speci: kinds of binary trees. Figure 2.7(a) is a skewed tree, skewed to the left. 2.2. TREES 79 There is a corresponding tree skewed to the right, which is not shown. The tree in Figure 2.7(b) is called a complete binary tree. This kind of tree is defined formally later on. Notice that for this tree all terminal nodes are on two adjacent levels. The terms that we introduced for trees, such as degree, level, height, leaf, parent, and child, all apply to binary trees in the same way. level nv (b) (a) Figure 2.7 Two sample binary trees Lemma 2.1 The maximum number of nodes on level i of a binary tree is 2-1. Also, the maximum number of nodes in a binary tree of depth k is 2 1,k > 0. a The binary tree of depth & that has exactly 2* — 1 nodes is called a full binary tree of depth k. Figure 2.8 shows a full binary tree of depth 4. ‘A very clegant sequential representation for full binary trees results from sequentially numbering the nodes, starting with the node on level one, then going to those on level two, and so on. Nodes on any level are numbered from left. to right (see Figure 2.8). A binary tree with n nodes and depth & is complete iff its nodes correspond to the nodes that are numbered one to n in the full binary tree of depth &. A consequence of this definition is that in a complete tree, leaf nodes occur on at most two adjacent levels. The nodes 80 CHAPTER 2. ELEMENTARY DATA STRUCTURES of an n-node complete tree may be compactly stored in a one-dimensional array, tree{1 : n], with the node numbered i being stored in tree{i]. The next lemma shows how to easily determine the locations of the parent, left child, and right child of any node i in the binary tree without explicitly storing any link information. Figure 2.8 Full binary tree of depth 4 Lemma 2.2 If a complete binary tree with n nodes is represented sequen- tially as described before, then for any node with index i, 1 n, i has no left child. 3. rehild(i) is at 26+ 1 if +1 n, ihas no right child. 0 ‘This representation can clearly be used for all binary trees though in most there is a lot of unutilized space. For complete binary trees the representation is ideal as no space is wasted. For the skewed tree of Figure 2.7, however, less than a third of the array is utilized. In the worst case a right-skewed tree of depth k requires 2* — 1 locations. Of these only k are occupied. Although the sequential representation, as in Figure 2.9, appears to be good for complete binary trees, it is wasteful for many other binary trees. In addition, the representation suffers from the general inadequacies of sequen- tial representations. Insertion or deletion of nodes requires the movement 2.3. DICTIONARIES 81 ee Ago) |e - — we [a|B|C|D|E|F|G a] t | (1) (2) @) (4) G) (6) (7) (8) 9) see (16) Figure 2.9 Sequential representation of the binary trees of Figure 2.7 of potentially many nodes to reflect the change in level number of the re maining nodes. These problems can be easily overcome through the use of a linked representation. Each node has three fields: Ichild, data, and rchild. Although this node structure makes it difficult to determine the parent of a node, for most applications it is adequate. In case it is often necessary to be able to determine the parent of a node, then a fourth field, parent, can be included with the obvious interpretation. The representation of the binary trees of Figure 2.7 using a three-field structure is given in Figure 2.10. 2.3. DICTIONARIES An abstract data type that supports the operations insert, delete, and search is called a dictionary. Dictionaries have found application in the design of numerous algorithms. Example 2.1 Consider the database of books maintained in a library sys- tem, When a user wants to check whether a particular book is available, a search operation is called for. If the book is available and is issued to the user, a delete operation can be performed to remove this book from the set of available books. When the user returns the book, it can be inserted bac into the a It is essential that we are able to support the above-mentioned opera~ tions as efficiently as possible since these operations are performed quite frequently. A number of data structures have been devised to realize a dic- tionary. At avery high level these can be categorized as comparison methods and direct access methods. Hashing is an example of the latter. We elaborate only on binary search trees which are an example of the former. 82 CHAPTER 2. ELEMENTARY DATA STRUCTURES (a) (b) Figure 2.10 Linked representations for the binary trees of Figure 2.7 2.3. DICTIONARIES 83 2.3.1. Binary Search Trees Definition 2.3 [Binary search tree] A binary search tree is a binary tree. It may he empty. If it is not empty, then it satisfies the following properti 1, Every element has a key and no two elements have the same key (i.e. the keys are distinct). 2. The keys (if any) in the left subtree are smaller than the key in the root. 3. The keys (if any) in the right subtree are larger than the key in the root 4, The left and right subtrees are also binary search trees. A binary search tree can support the operations search, insert, and delete among others. In fact, with a binary search tree, we can searcl th for a data element both by key value and by rank (ie., find an element with key 2, find the fifth-smallest element, delete the element with key x, delete the fifth-smallest element, insert an element and determine its rank, and so on) ‘There is some redundancy in the definition of a binary search tree. Prop- erties 2, 3, and 4 together imply that the keys must be distinct. So, property 1 can be replaced by the property: The root has a key. Some examples of binary trees in which the elements have distinct keys are shown in Figure 2.11. The tree of Figure 2.11(a) is not a binary search tree, despite the fact that it satisfies properties 1, 2, and 3. The right subtree fails to satisfy property 4. This subtree is not a binary search tree, as its right subtree has a key value (22) that is smaller than that in the subtree’s root (25). The binary trees of Figure 2.11(b) and (c) are binary search trees. Searching a Binary Search Tree Since the definition of a binary search tree is recursive, it is easiest to des a recursive search method. Suppose we wish to search for an element with key x. An element could in general be an arbitrary structure that has as one of its fields a key. We assume for simplicity that the element just consists of a key and use the terms element and key interchangeably. We begin at the root. If the root is 0, then the sea ins no elements and the search is unsuccessful. Otherwise, we compare x with the key in the root. If x equals this key, then the search terminates successfully. If z is less than the key in the root, then no element in the right subtree can have key value x, and only the left subtree is to be searched. If z is larger than the key in the root, only the right subtree needs to be searched. The subtrees can be searched recursively as in Algorithm 2.4. ‘This function assumes a linked 84 CHAPTER 2. ELEMENTARY DATA STRUCTURES Figure 2.11 Binary trees representation for the search tree. Each node has the three fields lchild, rehild, and data. The recursion of Algorithm 2.4 is easily replaced by a while loop, as in Algorithm 2.5. Algorithm Search(t, x) if (t= 0) then return 0; else if (r = t + data) then return ¢; else if (x Ichild,«)s else return Search(t —> rehild, ©); wera kee Algorithm 2.4 Recursive search of a binary search tree If we wish to search by rank, each node should have an additional field leftsize, which is one plus the number of clements in the left subtree of the node. For the search tree of Figure 2.11(b), the nodes with keys 2, 5, 30, and 40, respectively, have leftsize equal to 1, 2, 3, and 1. Algorithm 2.6 searches for the kth-smallest element. As can be seen, a binary search tree of height h can be searched by key as well as by rank in O(h) time. 2.3. DICTIONARIES 85 Algorithm |Search(z) found : tis tree; while ((¢ #0) and not found) do ib false; if t+ data)) then found else if (x < (t > data)) then t: else t:= (t > rehild)s ue; > Iehild)s ae if (not found) then return 0; else return 1; Algorithm 2.5 Tterative search of a binary search tree ene cerncaan 10 i 12 13 15. 16 Algorithm Searchk(k) found := false; t := trees while ((t #0) and not found) do if (k =(t — leftsize)) then found else if (k < (t+ leftsize)) then ¢ else Ms k= k(t + leftsize); t= (t > rehild); } if (not found) then return 0; else return #; Algorithm 2.6 Searching a binary search tree by rank 86 CHAPTER 2. ELEMENTARY DATA STRUCTURES Insertion into a Binary Search Tree To insert a new element xr, we must first verify that its key is different from those of existing elements. To do this, a search is carried out. If the search is ful, then the element is inserted at the point the s¢ terminated. tance, to insert an element with key 80 into the tree of Figure 2.12(a), we first search for 80. This search tern ecessfully, and the last e examined is the one with key 40. The new inserted as the right child of this node. The resulting search tree is shown in Figure 2.12(b). Figure 2.12(c) shows the result of inserting the key 35 into the search tree of Figure 2.12(b). (a) (b) (©) Figure 2.12 Inserting into a binary search tree Algorithm 2.7 implements the insert strategy just described. If a node has a leftsize field, then this is updated too. Regardless, the insertion can be performed in O(h) time, where h is the height of the search tre Deletion from a Binary Search Tree Deletion of a leaf element is quite easy. For example, to delete 35 from the tree of Figure 2.12(c), the left-child field of its parent is set to 0 and the node disposed. This gives us the tree of Figure 2.12(b). To delete the 80 from this tree, the right-child field of 40 is set to 0; this gives the tree of Figure 2.12(a). Then the node containing 80 is disposed. The deletion of a nonleaf element that has only one child is also easy. The node containing the element to be deleted is disposed, and the single child takes the place of the disposed node. So, to delete the element 5 from the tree of Figure 2.12(b), we simply change the pointer from the parent node (i.c., the node containing 30) to the single-child node (i.e. the node containing 2). 2.3. DICTIONARIES 87 Algorithm Insert(:r) i { / Insert « into the binary search tree. found := false; ch for «. q is the parent of p. while ((p #0) and not found) do { 5 // Save p. (p > data)) then found + if (x < (p data)) then p else p := (p> rehild); true; (p > Ichild); // Perform insertion. if (not found) then p:= new TreeNodes (p + Ichild) := 05 (p > rehild) := 0; (p + data) = 23 if (tree #0) then if (© < (q > data)) then (q -> Ichild) else (q— rehild) := p; else tree := ps Algorithm 2.7 Insertion into a binary search tree 88 CHAPTER 2. ELEMENTARY DATA STRUCTURES When the element to be deleted is in a nonleaf node that has two children, the element is replaced by either the largest element in its left subtree or the smnallest one in its right subtree. Then we proceed to delete this replacing clement from the subtree from which it was taken. For instance, if we wish to delete the element with key 30 from the tree of Figure 2.13(a), then we replace it by either the largest element, 5, in its left subtree or the smallest element, 40, in its right subtree. Suppose we opt for the largest element in the left subtree. The 5 is moved into the root, and the tree of Figure 2.13(b) is obtained. Now we must delete the second 5. Since this node has only one child, the pointer from its parent is changed to point to this child. The tree of Figure 2.13(c) is obtained. We can verify that regardless of whether the replacing element is the largest in the left subtree or the smallest in the right subtree, it is originally in a node with a degree of at most one. So, deleting it from this node is quite easy. We leave the writing of the deletion procedure as an exercise, It should be evident that a deletion can be performed in O(h) time if the search tree has a height of h. (a) (b) (c) Figure 2.13 Deletion from a binary search tree Height of a Binary Search Tree Unless care become as large as 7. This used to insert the keys (1, 2, binary search tree. It can, however, be shown that whe deletions are made at random using the procedures given h the binary search tree is O(log n) on the average. Search trees with a worst-case height of O(log n) are called balanced search trees. Balanced search trees that permit searches, inserts, and deletes to be performed in O(log n) time are listed in Table 2.1. Examples include AVL trees, 2-3 trees, Red-Black trees, and B-trees. On the other hand splay trees is taken, the height of a binary search tree with n elements can the case, for instance, when Algorithm 2.7 n], in this order, into an initially empty insertions and the height of 2.3, DICTIONARIES 89 take ((logn) time for each of these operations in the amortized sense. A description of these balanced trees can be found in the book by E. Horowitz, S. Sahni, and D. Mehta ‘d at the end of this chapter. search insert delete Oin) (wey Oin) (wey O(n) (wey O(log n) (av) | Ollogn) (av) | O(ogn) (av) [AVL tree Ollogn) (we) | O(log n) (we) | Ollog n) (we) 2-3 tree Ollogn) (we) | O(log n) (we) | O(log n) (we) Rod-Black tree | Ofllogn) (we) | Ollogn) (we) | Ollogn) (1 [B-tree Ollogn) (we) | Ollogn) (we) | O(log n) (we) ~[OTog n) fam) | Ollog n) (am) [OUog n) (am) 2.3.2 Cost Amortization Suppose that a sequence I1, 12, D1, 13, 14, 15, 16, D2, 17 of insert and delete operations is performed on a set. Assume that the actual cost of each of the seven inserts is one. (We use the terms cost and complezity interchangeably.) By this, we mean that each insert takes one unit of time. Further, suppose that the delete operations D1 and D2 have an actual cost of eight and ten, respectively. So, the total cost of the sequence of operations is 25. In an amortization scheme we charge some of the actual cost of an oper- ation to other operations. This reduces the charged cost of some operations and increases that of others. The amortized cost of an operation is the total cost charged to it. The cost transferring (amortization) scheme is required to be such that the sum of the amortized costs of the operations is greater than or equal to the sum of their actual costs. If we charge one unit of the cost of a delete operation to each of the inserts since the last delete operation (if any), then two units of the cost of D1 get transferred to II and I2 (the charged cost of each increases by one), and four units of the cost of D2 get transferred to 13 to 16. The amortized cos ach of 11 to 16 becomes two, that of 17 becomes equal to its actual cost (that is, one), and that of each of D1 and D2 becomes 6. The sum of the amortized costs is 25, which is the same as the sum of the actual costs. Now suppose we can prove that no matter what: sequence of insert and delete operations is performed, we can charge costs in such a way that the amortized cost of each insertion is no more than two and that of each deletion 90 CHAPTER 2. ELEMENTARY DATA STRUCTURES is no more than six. This enables us to claim that the actual cost of any insert/delete sequence is no more than 2 * i +6 *d, where i and d ar respectively, the number of insert and delete operations in the sequence. Suppose that the actual cost of a deletion is no more than ten and that of an insertion is one. Using actual costs, we can conclude that the sequence cost is no more than i + 10 « d. Combining these two bounds, we obtain min{2 «i +6 *d, i+ 10 *d} as a bound on the sequence cost. Hence, using the notion of cost amortization, we can obtain tighter bounds on the complexity of a sequence of operations. ‘The amortized time complexity to perform insert, delete, and search op- erations in splay trees is O(logn). This amortization is over n operations. In other words, the total time taken for processing an arbitrary sequence of n operations is O(n logn). Some operations may take much longer than O(log n) time, but when amortized over n operations, each operation costs Ollogn) time. EXERCISES 1. Write an algorithm to delete an element 2, from a binary search tree t. What is the time complexity of your algorithm? 2. Present an algorithm to start with an initially empty binary search tree and make n random insertions. Use a uniform random number generator to obtain the values to be inserted. Measure the height of the resulting binary search tree and divide this height by log, n. Do this for n = 100,500, 1,000, 2,000, 3,000,...,10,000. Plot the ratio height/logyn as a function of n. The ratio should be approximately constant (around 2). Verify that this is so. 3. Suppose that each node in a binary search tree also has the field leftsize as described in the text. Design an algorithm to insert an element « into such a binary search tree. The complexity of your algo- rithm should be O(h), where h is the height of the search tree. Show that this is the case. 4. Do Exercise 3, but this time present an algorithm to delete the element with the kth-smallest key in the binary search tr Find an efficient data structure for representing a subset § of the in- tegers from 1 to n. Operations we wish to perform on the set are eo e INSERT(i) to insert the integer i to the set S. If i is already in the set, this instruction must be ignored. ¢ DELETE to delete an arbitrary member from the set. ¢ MEMBER(i) to check whether i is a member of the set. 2.4. PRIORITY QUEUES 91 Your data structure should enable each one of the above operations in constant time (irrespective of the cardinality of 8). e Any algorithm that merges two sorted lists of size n and m, respec- ‘ely, inust make at least n +m — 1 comparisons in the worst case What implications does this have on the run time of any comparison- based algorithm that combines two binary search trees that have n and m elements, respectively? 7. [t is known that every comparison-based algorithm to sort n elements must make O(n logn) comparisons in the worst case. What implica- tions does this have on the complexity of initializing a binary search tree with n clements? 2.4 PRIORITY QUEUES Any data structure that supports the operations of search min (or max), insert, and delete min (or max, respectively) is called a priority queue. Example 2.2 Suppose that we are selling the services of a machine. Bach user pays a fixed amount per use, However, the time needed by each user is different. We wish to maximize the returns from this machine under the assumption that the machine is not to be kept idle unless no user is available. This can be done by maintaining a priority queue of all persons waiting to use the machine. Whenever the machine becomes available, the user with the smallest. time requirement is selected. Hence, a priority queue that supports delete min is required. When a new user requests the machine, his or her request is put into the priority queue. If each user needs the same amount of time on the machine but people are willing to pay different amounts for the service, then a priority queue based on the amount of payment can be maintained. Whenever the machine becoines available, the user willing to pay the most is selected. This requires a delete max operation. a Example 2.3 Suppose that we are simulating a large factory. This factory has many machines and many jobs that require processing on some of the machines. An event is said to occur whenever a machine completes the processing of a job. When an event occurs, the job has to be moved to the queue for the next machine (if any) that it needs. If this queue is empty, the job can be assigned to the machine immediately. Also, a new job can be scheduled on the machine that has become idle (provided that its queue is not empty). To determine the occurrence of events, a priority queue is used. This queue contains the finish time of all jobs that are presently being worked on. 92 CHAPTER 2. ELEMENTARY DATA STRUCTURES The next event occurs at the least time in the priority queue. So, a priority queue that supports delete min can be used in this application. Qo The simplest way to represent a priority queue is as an unordered linear list. Suppose that we have n elements in this queue and the delete max operation is to be supported. If the list is represented sequentially, additions are most easily performed at the end of this list. Hence, the insert time is O(1). A deletion requires a search for the clement with the largest key, followed by its deletion. Since it takes O(n) time to find the largest element in an n-clement unordered list, the delete time is O(n). Bach deletion takes @(n) time. An alternative is to use an ordered linear list. The elements are in nondecreasing order if a sequential representation is used. The delete time for cach representation is (1) and the insert time O(n). When a mag heap is used, both additions and deletions can be performed in O(logn) time. 2.4.1 Heaps Definition 2.4 [Heap] A maz (min) heap is a complete binary tree with the property that the value at each node is at least as large as (as small as) the values at its children (if they exist). Call this property the heap property. a In this section we study in detail an efficient way of realizing a priority queue. We might first consider using a queue since inserting new elements would be very efficient. But finding the largest element would necessitate a scan of the entire queue. A second suggestion might be to use a sorted list that is stored sequentially. But an insertion could require moving all of the items in the list. What we want is a data structure that allows both operations to be done efficiently. One such data structure is the max heap. The definition of a max heap implies that one of the largest elements is at the root of the heap. If the elements are distinct, then the root contains the largest item. A max heap can be implemented using an array a{ J. To insert an element into the heap, one adds it “at the bottom” of the heap and then compares it with its parent, grandparent, greatgrandparent, and so on, until it is less than or equal to one of these values. Algorithm Insert (Algorithm 2.8) describes this process in detail. Figure 2.14 shows one example of how Insert would insert a new value into an existing heap of six elements. It is clear from Algorithm 2.8 and Figure 2.14 that the time for Insert can vary. In the best case the new element is correctly positioned initially and no values need to be rearranged. In the worst case the number of executions of the while loop is proportional to the number of levels in the heap. Thus if there are n elements in the heap, inserting a new element takes Q(logn) time in the worst case. 2.4, PRIORITY QUEUES 93 1 Algorithm Insert(a,n) 2 3 // inserts a{n] into the heap which is stored in a[1 sn — 1] 4 i= nj item = aln}; 5 while ((i > 1) and (a[|i/2|] < item)) do 6 { 7 afi] = afli/2f]s t= |é/2]s 8 } 9 a{i] := item; return true; 10 } Algorithm 2.8 Insertion into a heap Figure 2.14 Action of Insert inserting 90 into an existing heap 94 CHAPTER 2. ELEMENTARY DATA STRUCTURES To delete the maximum key from the max heap, we use an algorithm called Adjust. Adjust takes as input the array a{ | and the integers i and n. It regards a[1 : n] as a complete binary tree. If the subtrees rooted at 2i and 2i+1 are already max heaps, then Adjust will rearrange elements of a[ ] such that the tree rooted at i is also a max heap. The maximum element from the max heap a(l : n] can be deleted by deleting the root of the corresponding complete binary tree. The last element of the array, that is, a{n], is copied to the root, and finally we call Adjust(a, 1, — 1). Both Adjust and DelMax are described in Algorithm 2.9. 1 Algorithm Adjust(a, i,n) 2 // The complete binary trees with roots 2i and 2i +1 are 3 // combined with node i to form a heap rooted at i. No 4 // node has an address greater than n or less than 1 5 6 j = 2i; item == ali); 7 while (j alj]) then break; 13 //_A position for item is found. 4 af j/2]] = ali]; J = 295 15 16 a{|j/2)) := item; 7 } 1 Algorithm DelMax(a,n, 2) 2 // Delete the maximum from the heap a[1 : n] and store it in x. 3 4 if (n = 0) then 5 { 6 write ("heap is empty"); return false; 7 } 8 x :=a{l}; a[1] = a{n); 9 Adjust(a, 1,n — 1); return true; 10 } Algorithm 2.9 Deletion from a heap 2.4. PRIORITY QUEUES 95 Note that the worst-case run time of Adjust is also proportional to the height of the tree. Therefore, if there are n elements in a heap, deleting the maximum can be done in O(logn) time. To sort n elements, it suffices to make n insertions followed by n deletions from a heap. Algorithm 2.10 has the details. Since insertion and deletion take O(logn) time each in the worst case, this sorting algorithm has a time complexity of O(n logn) Algorithm Sort(a,n) // Sort the elements a[1 : n]. for i fori 1 to n do Insert(a, i); nto 1 step —1 do DelMax(a, i, x) ali] = 23 i coenansene } Algorithm 2.10 A sorting algorithm It turns out that we can insert n elements into a heap faster than we can apply Insert n times. Before getting into the details of the new algorithm, let us consider how the n inserts take place. Figure 2.15 shows how the data (40, 80, 35, 90, 45, 50, and 70) move around until a heap is created when using Insert. Trees to the left of any — represent the state of the array a{1 : i before some call of Insert. Trees to the right of + show how the array was altered by Insert to produce a heap. The array is drawn as a complete binary tree for clarity. The data set that causes the heap creation method using Insert to behave in the worst way is a set of elements in ascending order. Each new element rises to become the new root. There are at most 2'~' nodes on level i of a coniplete binary tree, 1 < i < flog(n + 1)]. For a node on level i the distance to the root is i—1. Thus the worst-case time for heap creation using Insert is SD, (6 1)2°! < flogg(n + 1)]2!82™+H1 = O(nlogn) 1+ és floga(n+ U1 A surprising fact about Insert is that its average behavior on n random in- puts is asymptotically faster than its worst case, O(n) rather than O(n log n). 96. CHAPTER 2. ELEMENTARY DATA STRUCTURES o-€ (80) (90) 90 Oe (30) - a ee, ) Figure 2.15 Forming a heap from the set {40, 80,35, 90, 45, 50, 70} 2.4. PRIORITY QUEUES 97 This implies that on the average a constant number of levels in the tree. It is possible to devise an algorithm that can perform n inserts in O(n) time rather than O(nlogn). This reduction is achieved by an algorithm that regards any array a[l : n] as a complete binary tree and works from the leaves up to the root, level by level. At each level, the left and right subtrees of any node are heaps. Only the value in the root node may violate the heap property. Given n elements in a[l : n], we can create a heap by applying Adjust. It is easy to see that leaf nodes are already heaps. So we can begin by call- ing Adjust for the parents of leaf nodes and then work our way up, level by level, until the root is reached. The resultant algorithm is Heapify (Algo- rithm 2.11). In Figure 2.16 we observe the action of Heapify as it creates a heap out of the given seven elements. The initial tree is drawn in Fig- ure 2.16(a). Since n = 7, the first call to Adjust has i= 3. In Figure 2.16(b) the threc elements 118, 151, and 132 are rearranged to form a heap. Sub- sequently Adjust is called with i = 2 and i = 1; this gives the trees in Figure 2.16(c) and (d). ch new value only ri 1 Algorithm Heapify(a,n) 2 // Readjust the elements in a{1 : n] to form a heap. 3 4 for i := |n/2| to 1 step —1 do Adjust(a, i,7); Boy Algorithm 2.11 Creating a heap out of n arbitrary elements For the worst-case analysis of Heapify let 2°! 0) do i: 4 return i; 5} Algorithm 2.13 Simple algorithms for union and find Algorithm 2.13 gives the descriptions of the union and find operations just discussed. Although these two algorithms are very easy to state, their performance characteristics are not very good. For instance, if we start with q elements each in a set of its own (that is, S; = {i}, 1 plj}) then 7 {// i has fewer nodes. Plt] == 55 pig] := temp; 10 else u {// j has fewer or equal nodes. 12 Lj] = i; pli] := temp; 13. 4 } Algorithm 2.14 Union algorithm with weighting rule 2.5. SETS AND DISJOINT SET UNION 107 Let 1 be a tree with m nodes created by WeightedUnion. Consider the last union operation performed, Union(k, j). Let a be the number of nodes in tree j, and m—a the number in &. Without loss of generality we can assume l 0) do r := pir); // Find the root. 7 while (i #r) do // Collapse nodes from i to root r. 8 9 plils pli] = 10 } il return 7; 12 } Algorithm 2.15 Find algorithm with collapsing rule In the algorithms WeightedUnion and CollapsingFind, use of the collaps- ing rule roughly doubles the time for an individual find. However, it redui the worst-case time over a sequence of finds. The worst-case complexity of processing a sequence of unions and finds using WeightedUnion and Collaps- ingFind is stated in Lemma 2.4. This lemma makes use of a function a(p, q) that is related to a functional inverse of Ackermann’s function A(i, j). These functions are defined as follows: A(1, j) = 27 for j>1 A(i,i) = Al 1,2) fori >2 Ali,j) = AGi—1, AQ, — 1) for i, 7 > 2 a(p.q) = min{z > 1A(2, ED > logsg}, p> q>1 The function A(i,j) is a very rapidly growing function. Consequently, @ grows very slowly as p and q are increased. In fact, since A(3,1) = 16, a(p,q) < 3 for q < 2!© = 65,536 and p > q. Since A(4,1) is a very large number and in our application q is the number 7 of set elements and p is n+ f (f is the number of finds), a(p,q) < 4 for all practical purposes. Lemma 2.4 [Tarjan and Van Leeuwen] Assume that we start with a forest of trees, each having one node. Let T(f,u) be the maximum time required to process any intermixed sequence of f finds and u unions, Assume that u>¥. Then 110 CHAPTER 2. ELEMENTARY DATA STRUCTURES kiln + fa(f +n,n)] < T(f,u) < keln + fatf +n,n)] for some positive constants ky and kp. Qo ‘The requirement that u > when u < 4, some elements are involved in no union operation. These elements remain in singleton sets throughout the sequence of union and find operations and can be eliminated from consideration, as find operations that, involve these can be done in O(1) time each. Even though the function a(f,u) isa very slowly growing function, the complexity of our solution to the set representation problem is not linear in the number of unions and finds. The space requirements are one node for each clement. In the exercises, we explore alternatives to the weight rule and the col- lapsing rule that preserve the time bounds of Lemma 2.4. really not significant, as EXERCISES 1. Suppose we start with n sets, each containing a distinct element. (a) Show that if wu unions are performed, then no set contains more than u + 1 elements. (b) Show that at most n — 1 unions can be performed before the number of sets becomes 1. (c) Show that if fewer than [¥] unions are performed, then at least one set with a single element in it remains. (d) Show that if w unions are performed, then at least max{n —2u, 0} singleton sets remain. 2. Experimentally compare the performance of SimpleUnion and Sim- pleFind (Algorithm 2.13) with WeightedUnion (Algorithm 2.14) and CollapsingFind (Algorithm 2.15). For this, generate a random sequence of union and find operations (a) Present an algorithm HeightUnion that uses the height rule for union operations instead of the weighting rule. This rule is defined below: Definition 2.7 [Height rule] If the height of tree i is less than that of tree j, then make j the parent of i; otherwise make i the parent of j. o Your algorithm must run in O(1) time and should maintain the height of each tree as a negative number in the p field of the root. 2.5. SETS AND DISJOINT SET UNION ibhe (b) (d) ) ) Show that the height bound of Lemma 2.3 applies to trees con- structed using the height rule. Give an example of a sequence of unions that start with n single- ton sets and create trees whose heights equal the upper bounds given in Lemma 2.3. Assume that each union is performed using the height rule. Experiment with the algorithms WeightedUnion (Algorithm 2.14) and HeightUnion to determine which produces better results when used in conjunction with CollapsingFind (Algorithm 2.15). Write an algorithm SplittingFind that uses path splitting, defined below, for the find operations instead of path collapsing, Definition 2.8 [Path splitting] ‘The parent pointer in each node (except the root and its child) on the path from i to the root is changed to point to the node’s grandparent. a Note that when path splitting is used, a single pass from i to the root suffices. R. Tarjan and J ‘an Leeuwen have shown that. Lemma 2.4 holds when path splitting is used in conjunction with either the weight or the height rule for unions. Experiment with CollapsingFind (Algorithm 2.15) and SplittingFind to determine which produces better results when used in conjunc- tion with WeightedUnion (Algorithm 2.14). Design an algorithm HalvingFind that uses path halving, defined below, for the find operations instead of path collapsing. Definition 2.9 [Path halving] In path halving, the parent pointer of every other node (except the root and its child) on the path from i to the root is changed to point to the nodes grandparent. a Note that path halving, like path splitting (Exercise 4), can be implemented with a single pass from i to the root. However, in path halving, only half as many pointers are changed as in path splitting. Tarjan and Van Leeuwen have shown that Lemma 2.4 holds when path halving is used in conjunction with either the weight or the height rule for unions. Experiment with CollapsingFind and HalvingFind to determine which produces better results when used in conjunction with Weighte- dUnion (Algorithm 2.14). 112 CHAPTER 2. ELEMENTARY DATA STRUCTURES 2.6 GRAPHS 2.6.1 Introduction The first recorded evidence of the use of graphs dates back to 1736, when Leonhard Euler used them to solve the now classical Konigsberg bridge prob- lem. In the town of Kénigsberg (now Kaliningrad) the river Pregel (Pre- golya) flows around the island Kneiphof and then divides into two. There are, therefore, four land areas that have this river on their borders (see Fi; ure 2.24(a)). These land areas are interconnected by seven bridges, labeled a to g. The land areas themselves are labeled A to D. The Konigsberg bridge problem is to determine whether, starting at one land area, it is possible to walk across all the bridges exactly once in returning to the starting land area. One possible walk: Starting from land area B, walk across bridge a to island A, take bridge ¢ to area D, take bridge g to C, take bridge d to A, take brid; to B, and take bridge f to D. This walk does not go across all bridge a nor does it return to the starting land area B. Ruler answered the Konigsberg bridge problem in the negative: The people of Kénigsberg cannot walk across each bridge exactly once and return to the starting point. He solved the problem by representing the land areas ‘tices and the bridges as edges in a graph (actually a multigraph) as in Figure 2.24(b). His solution is clegant and applies to all graphs. Defining the degree of a vertex to be number of edges incident to it, Euler showed that there is a walk starting at any vertex, going through each edge exactly once and terminating at the start vertex if and only if the degree of each vertex is even. A walk that does this is called Bulerian, There is no Eulerian walk for the Kénigsberg bridge problem, as all four vertices are of odd degree. Since this first application, graphs have been used in a wide variety of applications. Some of these applications are the analysi cuits, finding shortest routes, project planning, identification of chemical compounds, statistical mechani cybernetics, linguistics, 80 sciences, and so on. Indeed, it might well be said that of all mathematical structures, graphs are the most widely used. 2.6.2 Definitions ‘A graph G consists of two sets V and B. The set V is a finite, nonempty set of vertices. The set E is a set of pairs of vertices; these pairs are called edges. The notations V (@) and E(G) represent the sets of vertices and edges, respectively, of graph G. We also write G = (V, E) to represent a graph. In an undi graph the pair of vertices representing any edge is unordered. Thus, the pairs (u,v) and (v,u) represent the same edge. In a directed graph cach edge is represented by a directed pair (u,v); u is the tail and v the 2.6. GRAPHS 113 Figure 2.24 Section of the river Pregel in Kénigsberg and Euler's graph 114 CHAPTER 2. ELEMENTARY DATA STRUCTURES head of the edge. Therefore, (v,u) and (u,v) represent two different edges. Figure 2.25 shows three graphs: G1, Gz, and G3. The graphs G and G2 are undirected; Gs is directed. Figure 2.25 Three sample graphs The set representations of these graphs are V(Gi) = {1,2,3,4} EG) = {( V(G2) = {1,2,3.4,5,6,7} E(G2) = {( ViG5) = 123} E(Gs) = {6 Notice that the edges of a directed graph are drawn with an arrow from the tail to the head. The graph G2 is a tree; the graphs G1 and G's are not. Since we define the edges and vertices of a graph as sets, we impose the following restrictions on graphs: 1. A graph may not have an edge from a vertex v back to itself. That is, edges of the form (v, v) and (v,v) are not legal. Such edges are known as self-edges or self-loops. If we permit self-edges, we obtain a data object referred to as a graph with self-edges. An example is shown in Figure 2.26(a). 2. A graph may not have multiple occurrene remove this restriction, we obtain a data obj graph (see Figure 2.26(b)). of the same edge. If we ct referred to as a multi- The number of distinct unordered pairs (u,v) with u # v in a graph with n vertices is “=. ‘This is the maximum number of edges in any n-vertex, undirected graph. An n-vertex, undirected graph with exactly “= edges is said to be complete. The graph G of Figure 2.26(a) is the complete graph 2.6. GRAPHS M5 (a) Graph with a self edge (b) Multigraph Figure 2.26 Examples of graphlike structures on four vertices, whereas Gy and Gy are not complete graphs. In the case of a directed graph on n vertices, the maximum number of edges is n(n — 1) If (u,v) is an edge in E(@), then we say vertices u and v ate adjacent and edge (u,v) is incident on vertices u and v. The vertices adjacent to vertex 2 in Gy are 4, 5, and 1. The edges incident on vertex 3 in Gy are (1,3), (3,6), and (3,7). If (u,v) is a directed edge, then vertex u is adjacent to v, and v is adjacent from u. The edge (u,») is incident to u and v. In Gs, the edges incident to vertex 2 are (1,2), (2,1), and (2,3) ‘A subgraph of G is a graph G’ such that V(G') C V(G) and E(G!) © E(G). Figure 2.27 shows some of the subgraphs of G, and Gs. A path from vertex u to vertex » in graph G is a sequence of vertices Usin, i2y.-- i450, such that (u,i1), (i152), +++ (ikyv) are edges in E(G). If G’ is directed, then the path consists of the edges (u, 1), (i1,i2)s «++. (ixs») in E(G'). The length of a path is the number of edges on it. A simple path is a path in which all vertices except possibly the first and last are distinct. ‘A path such as (1,2), (2,4), (4,3), is also written as 1, 2, 4, 3, Paths 1, 2, 4, 3 and 1. 2, 4, 2 of G, are both of length 3. The first is a simple path: the second is not. The path 1, 2, 3 is a simple directed path in G3, but 1, 2, 3, 2 is not a path in Gy, as the edge (3,2) is not in E(G3). ‘A cyele is a simple path in which the first and last vertices are the same. ‘The path 1, 2, 3, 1 is a cycle in G; and 1, 2, 1 isa cycle in Gy. For directed graphs we normally add the prefix “directed” to the terms cycle and path. In an undirected graph G, two vertices u and v are said to be connected iff there is a path in G from u to v (since G is undirected, this means there must path from v tou). An undirected graph is said to be connected iff y pair of distinct vertices u and v in V(G), there is a path from u to » in @. Graphs G; and G2 are connected, whereas G, of Figure 2.28 is not. 116 CHAPTER 2. ELEMENTARY DATA STRUCTURES © ® (3) ) @ Gi) Gi) (a) Some of the subgraphs of G; © ® © D a v Vs ® ‘a ) @ Gi) Gi) (iy) (b) Some of the subgraphs of G3 Figure 2.27 Some subgraphs 2.6. GRAPHS u7 A connected component (or simply a component) H of an undirected graph is a mazimal connected subgraph. By “maximal,” we mean that G contain no other subgraph that is both connected and properly contains H. Gy he two components, Hy and Hy (see Figure 2.28) & Gs Figure 2.28 A graph with two connected components A tree is a connected acyclic (i.e., has no cycles) graph. A directed graph G is said to be strongly connected iff for every pair of distinct vertices u and v in V(G), there is a directed path from u to v and also from v to u. The graph Gs (repeated in Figure 2.29(a)) is not strongly connected, as there is no path from vertex 3 to 2. A strongly connected component imal subgraph that is strongly connected. The graph G3 has two connected components (see Figure 2.29(b)) ‘The degree of a vertex is the number of edges incident to that vertex. The «legree of vertex 1 in Gi, is 3. If G is a directed graph, we define the in-deyree of a vertex v to be the number of edges for which v is the head. ‘The vut-degree is defined to be the number of edges for which v is the tail. Vertex 2 of Gy has in-degree 1, out-degree 2, and degree 3. If d; is the degree of vertex iin a graph G with n vertices and e edges, then the number of edges is -(S) In the remainder of this chapter, we refer to a directed graph as a digraph. When we use the term graph, we assume that it is an undirected graph. 118 CHAPTER 2. ELEMENTARY DATA STRUCTURES rn (a) (b) Figure 2.29 A graph and its strongly connected components 2.6. Graph Representations Although several representations for graphs are pos three most commonly used: adjacency matrices, adjacency lists, and ad- jacency multilists. Once again, the choice of a particular representation nds on the application we have in mind and the functions we expect to perform on the graph. ible, we study only the Adjacency Matrix Let G = (V,E) be a graph with n vertices, n > 1. The adjacency matrix mal n Xn array, say a, with the property that ali, j] = 1 iff the edge (i,j) ((i,) for a directed graph) is in E(@). ‘The element afi, j] = 0 if there is no Such edge in G. The adjacency matrices for the graphs G1, G3, and Gy are shown in Figure 2.30. The adjacency matrix for an undirected graph is symmetric, as the edge (i, j) is in E(@) iff the edge (j,i) is also in B(G). The adjacency matrix for a directed graph may not be ymmetric (as is the case for G3). ‘The space needed to represent a graph ig its adjacency matrix is n? bits. About half this space can be saved in the case of an undirected graph by storing only the upper or lower triangle of the matrix. From the adjacency matrix, we can readily determine whether there is an edge connecting any two vertices i and j. For an undirected graph the degree of any vertex # is its row sum: 2.6. GRAPHS a 12345678 1f0 110000 0| 210010000; 3110010000 123 4 4jo 1100000 et eee 500000100 21011 1}/o0 10 600001010 31101 2/101 700000101 4/1110 3/0 0 0 80000001 0| @G, () Gs OG. Figure 2.30 Adjacency matrices Sali] j=l For a directed graph the row sum is the out-degree, and the column sum is the in-degree Suppose we want to answer a nontrivial question about graphs, such as How many edges are there in G? or Is G connected? Adjacency matrices require at least n? time, as n? —n entries of the matrix (diagonal enti are eto) have to be examined. When graphs are sparse (i.e., most of the terms in the adjacency matrix are zero), we would expect that the former question could be answered in significantly less time, say O(c +n), where is the number of edges in G, and e < ®. Such a speedup is made possible through the use of a representation in which only the edges that are in G are explicitly stored. ‘This leads to the next representation for graphs, adjacency lists. Adjacency Lists In this representation of graphs, the n rows of the adjacency matrix are represented as n linked lists. ‘There is one list for each vertex in G. ‘The nodes in list i represent the vertices that are adjacent from vertex i. Bach node has at least two fields: verter and link. The vertez field contains the indices of the vertices adjacent to vertex i. The adjacency lists for Gi, Gs, 120 CHAPTER 2. ELEMENTARY DATA STRUCTURES feed moded vertex link aa} { 4 2 3/0 2 3 a ijo B) 1) 0 4 [0] head nodes m0 {279 | ir) 3 10 BI 0 (b) G3 head nodes ©) Gs Figure 2.31 Adjacency lists 2.6. GRAPHS ae and Gz are shown in Figure 2.31. Notice that the verti ch list are not required to be ordered. Each list has a head node. The head nodes are sequential, and so provide easy random access to the adjacency list for any particular vertex. For an undirected graph with n vertices and e edges, this representation requires n head nodes and 2¢ list nodes. Each list node has two fields. In terms of the number of bits of storage needed, this count should be multiplied by log n for the head nodes and logn + loge for the list nodes, as it takes O(log m) bits to represent a number of value m. Often, you can sequentially pack the nodes on the adjacency lists, and thereby eliminate the use of pointers. In this case, an array node [1 : n + 2e +1] can be used, The nodei| gives the starting point of the list for vertex i, 1 N4 NS vertex 3: N2—>N4->N6 vertex 4: N3 NS >N6 Figure 2.35 Adjacency multilists for G, of Figure 2.25(a) 124 CHAPTER 2, ELEMENTARY DATA STRUCTURES Weighted Edges In many applications, the edges of a graph have weights assigned to them. ‘These weights may represent the distance from one vertex to another or the cost of going from one vertex to an adjacent vertex. In these applications, the adjacency matrix entries a[i,j] keep this information too. When adja- cency lists are used, the weight information can be kept in the list nodes by including an additional field, weight. A graph with weighted edges is called a network. EXERCISES 1. Does the multigraph of Figure 2.36 have an Eulerian walk? If so, find one. Figure 2.36 A multigraph 2. For the digraph of Figure 2.37 obtain (a) the in-degree and out-degree of each vertex (b) its adjacency-matrix representation (c) its adjacency-list representation (d) i (e) adjacency-multilist representation its strongly connected components 3. Devise a suitable representation iy "graphs so that they can b stored a Ajacency matrix. Write another algorithm that crea from the disk input. 4. Draw the complete undirected graphs on one, two, three, four, and five vertices. Prove that the number of edges in an n-vertex comple n(n) 2.6. GRAPHS 125 Figure 2.37 A digraph 5. Us the directed graph of trongly connected? List all the simple paths. ure (3) Figure 2.38 A directed graph 6. Obtain the adjacency-matrix, adj representations of the graph of Figu ncy-list, and adjacency-multilist 2.38. Show that the sum of the degrees of the vertices of an undirected graph. is twice the number of edges. 8. Prove or dispro HGUV, 8) is afte di of each vertex is at le: in G, ted graph such that the out-degr t one, then there is a directed cycle 9. (a) Let G be a connected, undirected graph on n vertices. Show that G must have at least n — 1 edges and that all connected, undirected graphs with n—1 edges are trees. 126, CHAPTER 2. ELEMENTARY DATA STRUCTURES (b) What is the minimum number of edges in a strongly connected digraph with n vertices? What form do such digraphs have? 10. For an undir are equivalent: ted graph G with n vert ces, prove that the following (a) G isa tree. (b) G is connected, but if any edge is removed, the resulting graph is not connected. (c) For any two distinct vertices u € V(G) and v € V(G), there is exactly one simple path from u to v. (4) G contains no cycles and has n — 1 edges. 11. Write an algorithm to input the number of vertices in an undirected graph and its edges one by one and to set up the linked adjacen representation of the graph. You may assume that no edge is input twice. What is the run time of your procedure as a function of the number of vertices and the number of edges? 12. Do the preceding exercise but now set up the multilist representation. 13. Let G be an undirected, connected graph with at least one vertex of odd degree. Show that G contains no Eulerian walk. 2.7 REFERENCES AND READINGS A wide-ranging examination of data structures and their efficient implemen- tation can be found in the following: Fundamentals of Data Structures in C++, by E. Horowitz, 8. Sahni, and D. Mehta, Computer Science Press, 1995. Data Structures and Algorithms 1: Sorting and Searching, by K. Mehlhorn, Springer-Verlag, 1984. Introduction to Algorithms: A Creative Approach, by U. Manber, Addison- Wesley, 1989. Handbook of Algorithms and Data Structures, second edition, by G. H. Gonnet and R. Baeza-Yates, Addison-Wesley, 1991. Proof of Lemma 2.4 can be found in “Worst-case analysis of set_union algorithms,” by R. Tarjan and J. Van Leeuwen, Journal of the ACM 31; no. 2 (1984): 245-281. Chapter 3 DIVIDE-AND-CONQUER 3.1 GENERAL METHOD Given a function to compute on n inputs the divide-and-conquer strategy suggests splitting the inputs into distinct subsets, 1 < k 15 7 Apply DAndC to each of t 8; 8 return Combine(DAndC(P;),DAndC(P3),. . ..DAndC(P4))3 9 10 } Algorithm 3.1 Control abstraction for divide-and-conquer computing time of DAndC is described by the recurrence relation 7 (n) small ‘9 70) = { HP) 20m) +- -4T(m) + fn) otherwise — 1) where T(n) is the time for DAndC on any input of size n and g(n) is the time to compute the answer directly for small inputs. ‘The function f(n) is the time for dividing P and combining the solutions to subproblems. For divide- and-conquer-based algorithms that produce subproblems of the same type as the original problem, it is very natural to first describe such algorithms using recursion. ‘The complexity of many divide-and-conquer algorithms is given by recur- rences of the form fel) n=1 , oe { aP(n/b) + f(n) n>1 (3.2) where a and 6 are known constants. We assume that T(1) is known and n is a power of b (ic., n =F). One of the methods for solving any such recurrence relation is called the substitution method. This method repeatedly makes substitution for each occurrence of the function T in the right-hand side until all such occurrences disappear. 3.1. GENERAL METHOD cae Example 3.1 Consider the case in which a = 2 and b = 2. Let T(1) = 2 and f(n) =n. We have T(n) = 27(n/2) +n 22T(n/4) + n/2 +n AT (n/4) +2n 4[27(n/8) +n/4] +2n 87 (n/8) +3n In general, we see that T(r) = 2'T(n/2") + in, for any logyn >i > 1. In particular, then, T(n) = 2! "T(n/2!°82") + nlogy n, corresponding to the choice of i = logy n. Thus, T(n) = nT(1) + nlogyn = nlogyn + 2n. a Beginning with the recurrence (3.2) and using the substitution method, it can be shown that T(n) = ni" (T(1) + u(n)] where u(n) = Tf, A(b’) and A(n) = f(n)/nl°%*, Table 3.1 tabulates the asymptotic value of u(m) for various values of h(n). ‘This table allows one to casily obtain the asymptotic value of T(n) for many of the recurrences one encounters when analyzing divide-and-conquer algorithms. Let us consider some examples using this table. hin) a(n) j O(n"), r <0 OU) O((ogn)), 750 | O(Mlogny* E+ D) ny, 1 >0 OUR) ‘Table 3.1 u(n) values for various h(n) values Example 3.2 Look at the following recurrence when n is a power of 2: T(l =1 rin) ={ Tn)2) +e aS 130 CHAPTER 3. DIVIDE-AND-CONQUER Comparing with (3.2), we see that a = 1, b = 2, and f(n) = c. So, log,(a) = O and h(n) = f(n)/n'%* = ¢ = e(logn)® = @((logn)?). From Table 3.1, we obtain u(n) = O(logn). So, T(n) = n!°%4[c+ O(logn)] = O(logn). Qo Example 3.3 Next consider the case in which a = 2, 6 = 2, and f(n) =n. For this recurrence, log, a = 1 and h(n) = f(n)/ (log n)°). Hence, u(n) = O(log n) and T(n) = n[P(1) + O(ogn)] = O(n log n). a Example 3.4 As another example, consider the recurrence T(n) = 77(n/2)+ 18n?, n > 2 and a power of 2. We obtain a = 7, b= 2, and f(n) = 18n?. So, log,a = logy 7 ~ 2.81 and h(n) = 18n2/n!%? = 18n?—let27 = O(n"), where r= 2—log,7 <0. So, u(n) = O(1). The expression for T(n) is Tin) = nlo&T7(1) + O(1)] = O(n'%7) as T(1) is assumed to be a constant. a Example 3.5 Asa final example, consider the recurrence T(n) 4n§, n > 3 and a power of 3. Comparing with (3.2), we obtain a = 9, b = 3, and f(n) = 4n5. So, logya = 2 and h(n) = 4n8/n? = 4n* = Q(n4). From Table 3.1, we see that u(n) = O(h(n)) = O(n). So, T(n) = n2{T(1) + O(n*) = O(n’) as T(1) can be assumed constant. a EXERCISES 1. Solve the recurrence relation (3.2) for the following choices of a,b, and J(n) (c being a constant): (a) a=1, b= 2, and f(n) = cn (b) a =5, b=4, and f(n) = cn? (c) a = 28, b=3, and f(n) = cn? 2. Solve the following recurrence relations using the substitution method: (a) All three recurrences of Exercise 1. (b) n<4 T 1 T(n)={ T(ym)+e n>4 3.2. BINARY SEARCH 131 © rny= fi n<4 (2) = 1 21m) +logn n>4 (ay ns4 1 T(n) = { (Vn) + eben > 4 3.2 BINARY SEARCH Let aj, 1 ag: In this case the sublist to be searched is dy+1,...,a¢. P reduces to (€- qydgeiy--+54e,2). In this example, any given problem P gets divided (reduced) into one new subproblem. This division takes only @(1) time. After a compari- son with aq, the instance remaining to be solved (if any) can be solved by using this divide-and-conquer scheme again. If q is always chosen such that «a, is the middle element (that is, ¢ = [(n +1)/2]), then the result- ing search algorithm is known as binary search. Note that the answer to the uew subproblem is also the answer to the original problem P: there is no need for any combining. Algorithm 3.2 describes this binary search method, where BinSrch has four inputs a[ ], 7,1, and x. It is initially invoked as BinSrch(a, 1,n, 2). A nonrecursive version of BinSrch is given in Algorithm 3.3. BinSearch has three inputs a,n, and c. The while loop continues processing as long as there are more clements left to check. At the conclusion of the procedure 0 is returned if x is not present, or j is returned, such that alj] = z. Is BinSearch an algorithm? We must be sure that all of the operations such as comparisons between x and a[mid] are well defined. The relational operators carry out the comparisons among elements of a correctly if these ‘operators are appropriately defined. Does BinSearch terminate? We observe 132 CHAPTER 3. DIVIDE-AND-CONQUER 1 Algorithm BinSrch(a, i,1, 2) 2 // Given an array afi : 1) of elements in nondecreasing 3. // order, 1 0, determine whether z is present, and 4 // ifso, return j such that x = a{j]; else return 0. 5 6 low := 1; high := n 7 while (low < high) do 8 { 9 mid := [(low + high)/2\3 10 if (« < almid)) then high := mid— 1; ul else if (x: > almid]) then low -= mid + 1; 12 else return mid; 13 14 return 0; 15 } Algorithm 3.3 Iterative binary search 3.2. BINARY SEARCH 133 that low and high are integer variables such that each time through the loop either « is found or low is increased by at least one or high is decreased by at least one. ‘Thus we have two sequences of integers approaching each other and eventually low becomes greater than high and causes termination in a finite number of steps if z is not present. Example 3.6 Let us select the 14 entries Sicc settee ssa eeeess eet aseeeaee ae 31, 142, 151 place then in a{1 : 14), and simulate the steps that BinSearch goes through as it searches for different values of 2. Only the variables low, high, and mid uced to be traced as we simulate the algorithm. We try the following values for ir: 151, —14, and 9 for two successful searches and one unsuccessful search. ‘Table 3.2 shows the traces of BinSearch on these three inputs. a r= 11 low high mid z=-M low high mid tow 7 1 WT ee tae i oe 2 M13 12 i deed id eee found 241 not found 2 =9 low high mid row 7 1 6 3 4° 6 5 found rch on 14 elements Table 3.2 Three examples of binary se: ‘These examples may give us a little more confidence about Algorithm 3.3, but they by no means prove that it is correct. Proofs of algorithms are very useful because they establish the correctness of the algorithm for all possible inputs, whereas testing gives much less in the way of guarantees. Unfortunately, algorithm proving is a very difficult process and the complete proof of an algorithm can be many times longer than the algorithm itself. We content ourselves with an informal “proof” of BinSearch. Theorem 3.1 Algorithm BinSearch(a,n,z) works correctly. We assume that all statements work as expected and that compar- s such as x > a[mid] are appropriately carried out. Initially low = 1, high += n, n > 0, and a[l] < a[2] <--- < a{n). If n = 0, the while loop is 134 CHAPTER 3. DIVIDE-AND-CONQUER not entered and 0 is returned. Otherwise we observe that each time through the loop the possible elements to be checked for equality with « are allow] allow + 1], ..., amid], ..., alhigh]. If « = almid], then the algorithm ter- minates successfully. Otherwise the range is narrowed by either increasing low to mid + 1 or decreasing high to mid ~ 1. Clearly this narrowing of the range does not affect the outcome of the search. If low becomes greater than high, then « is not present and hence the loop is exited. a Notice that to fully test binary search, we need not concern ourselves with the values of a[1 : n]. By varying « sufficiently, we can observe all possible computation sequences of BinSearch without devising different values for a ‘To test all successful searches, x must take on the n values in a. To test all unsuccessful searches, « need only take on n+ 1 different values. Thus the complexity of testing BinSearch is 2n + 1 for each n. Now let’s analyze the execution profile of BinSearch. ‘The two relevant characteristics of this profile are the frequency counts and space required for the algorithm. For BinSearch, storage is required for the n elements of the array plus the variables low, high, mid, and x, or n+4 locations. As for the time, there are three poss’ ies to consider: the best, average, and worst cases. Suppose we begin by determining the time for BinSearch on the prev ous data set. We observe that the only operations in the algorithm are comparisons and some arithmetic and data movements. We concentrate on comparisons between x and the elements in a{ ], recognizing that. the fre- quency count of all other operations is of the same order as that for these comparisons. Comparisons between 2 and elements of a[ ] are referred to as element comparisons. We assume that only one comparison is needed to determine which of the three possibilities of the if statement holds. The number of element comparisons needed to find each of the 14 elements is ® ue @ © 8 @ Bs im oy By nay oa Ba Blements it 0) ee ara get od teat tol tae lle sige ade ist beri te te ee No element requires more than 4 comparisons to be found. The average is obtained by summing the comparisons needed to find all 14 items and dividing by 14; this yields 45/14, or approximately 3.21, comparisons per successful search on the average. There are 15 possible ways that an unsuc- cessful search may terminate depending on the value of x. If x < a[l], the algorithm requires 3 element comparisons to determine that « is not present. For all the remaining possibilities, BinSearch requires 4 element comparisons. ‘Thus the average number of element comparisons for an unsuccessful search is (3 + 14 * 4)/15 = 59/15 = 3.93. The analysis just done applies to any sorted sequence containing 14 ele- ments. But the result we would prefer is a formula for n elements. A good 3.2. BINARY SEARCH 135 way to derive such a formula plus a better way to understand the algorithm is to consider the sequence of values for mid that are produced by BinSearch for all possible values of x. ‘These values are nicely described using a binary decision tree in which the value in each node is the value of mid. For ex- ample, if m = 14, then Figure 3.1 contains a binary decision tree that. traces the way in which these values are produced by BinSearch. as) dooooodoooooooc cee Figure 3.1 Binary decision tree for binary ‘The first comparison is x with a[7]. If x < a[7], then the next comparison is with a(3]; similarly, if z > a[7], then the next comparison is with a[11]. Each path through the tree represents a sequence of comparisons in the binary search method. If x is present, then the algorithm will end at one of the circular nodes that lists the index into the array where « was found. If x ix not present, the algorithm will terminate at one of the square nodes. Circular nodes are called internal nodes, and square nodes are referred to as external nodes. Theorem 3.2 Ifn isin the range [2*~', 2*), then BinSearch makes at most k element comparisons for a successful search and either k— 1 or k comparisons for an unsuccessful search. (In other words the time for a successful search is O(log n) and for an unsuccessful search is @(logn)). Proof: Consider the binary decision tree describing the action of BinSearch on n clements. All successful searches end at a circular node whereas all unsuccessful searches end at a square node. If 2*-! a[n|). However, for successful searches BinSearchl may make (logn)/2 more element comparisons than BinSearch (for example, when x = alrid]). ‘The analysis of BinSearch] is left as an ex- ercise. It should be easy to see that the best-, average-, and worst-case times for BinSearch1 are O(log n) for both successful and unsuccessful searches. ‘These two algorithms were run on a Spare 10/30. ‘The first two rows in Table 3.3 represent the average time for a successful search. ‘The second set of two rows give the average times for all possible unsuccessful searches. For both successful and unsuccessful searches BinSearch1 did marginally better than BinSearch. EXERCISES 1. Run the recursive and iterative versions of binary search and compare the times. For appropriate sizes of n, have each algorithm find every clement in the set. Then try all n +1 possible unsuccessful searches. 2. Prove by induction the relationship £ = I +2n for a binary tree with n internal nodes. The variables and J are the external and internal path length, respectively. 138 CHAPTER 3. DIVIDE-AND-CONQUER Coransene ll ab 13 14 15 Algorithm BinSearch1(a,n, 7) // Same specifications as BinSearch except n > 0 low := 1; high =n +15 // high is one more than possible. while (low < (high — 1)) do mid := |(low + high) /2]3 if (x < almid)) then high := mid; // Only one comparison in the loop. else low := mid; // x > almid) if (x = a[low]) then return low; // « is present. else return 0; // is not present. Algorithm 3.4 Binary search using one comparison per cycle {| Array sizes [5,000 [10,000 | 15,000 [20,000 | 25,000 [ 30,000 successful searches [a earch | 51.30 | 67.95 73.85 _[ 7677 | 73.40 inSearchl | 47.68 | 53.92 68.95_[ 711i inSearch | 50.40 [78.20 1.15 |_BinSearchT_| 41.93 69.22 [72.26 Table 3.3 Computing times for two binary search algorithms; times are in microseconds FINDING THE MAXIMUM AND MINIMUM 139 3. In an infinite array, the first n cells contain integers in sorted order and the rest of the cells are filled with oo. Present an algorithm that lakes z as input and finds the position of x in the array in @(logn) lime. You are not given the value of n 4, Devise a “binary” search algorithm that splits the set not into two sets of (almost) equal sizes but into two sets, one of which is twice the si of the other. How does this algorithm compare with binary search? 5. Devise a ternary search algorithm that first tests the element at posi- tion n/3 for equality with some value «r, and then checks the element at 2n/3 and either discovers « or reduces the set size to one-third the size of the original. Compare this with binary search. 6. (a) Prove that BinSearch1 works correctly. (b) Verify that the following algorithm segment functions correcth . 4 o v according to the specifications of binary search. Discuss its com- puting time. low := 1; high repeat { mid := (low + high)/2J3 if (x > alméd)) then low := mids else high := mid } until (low + 1) = high) ns 3.3. FINDING THE MAXIMUM AND MINIMUM Let us consider another simple problem that can be solved by the divide- and-conquer technique. ‘The problem is to find the maximum and minimum. items in a set of n clements. Algorithm 3.5 is a straightforward algorithm to accomplish this. In analyzing the time complexity of this algorithm, we once again con- centrate on the number of element comparisons. The justification for this is thal the frequency count for other operations in this algorithm is of the same order as that for element comparisons. More importantly, when the elements in a[1 : n] are polynomials, vectors, very large numbers, or strings of characters, the cost of an element comparison is much higher than the cost of the other operations. Hence the time is determined mainly by the total cost of the element comparisons. StraightMaxMin requires 2(n — 1) element comparisons in the best, aver- age, and worst cases, An immediate improvement is possible by realizing 140 CHAPTER 3. DIVIDE-AND- 1 Algorithm StraightMaxMin(a,n, max, min) 2 // Set maz to the maximum and min to the minimum of a[l : 1] 3 4 maz := min := a{l}; 5 for i:= 2 to n do 6 7 if (a{i] > mar) then ma: 8 if (ali) < min) then min := afi] 9 10 ¥ Algorithm 3.5 Straightforward maximum and minimum that the comparison ali] < min is necessary only when afi] > maz is false. Hence we can replace the contents of the for loop by if (a[i] > maz) then maz := else if (ali) < min) then min := ali; Now the best case occurs when the elements are in increasing order. is n — 1. The worst case occurs when creasing order. In this case the number of element. comparisons is 2(n — 1). The average number of element comparisons is less than 2(n — 1). On the average, ali] is greater than maz half the time, and so the average number of comparisons is 3n/2— 1. A eee algorithm for this problem would proceed as fol- lows: Let P = (n,a[i],...,a[j]) denote an arbitrary instance of the problem. Here n is the number of elements in the list afi) and we are inter- ed in finding the maximum and minimum of this list. Let Small(P)/ be true when n < 2. In this case, the maximum and minimum are a{i] ifn = 1. Itn the problem can be solved by making one comparison. If the list has more than two elements, P has to be divided into smaller fa s. For example, we might divide P into the two instances P; ([n/2| ,a[1],..-, a[n/2]]) and P, = (n ~ [n/2| ,a{[n/2] + 1],..-,¢[n]). AE ter having divided P into two smaller subproblems, we can solve them by recursively invoking the same divide-and-conquer algorithm. How can we combine the solutions for P, and P to obtain a solution for P? If MAX(P) and MIN(P) are the maximum and minimum of the elements in P, then MAX(P) is the larger of MAX(P;) and MAX(P,). Also, MIN(P) is the smaller of MIN(P,) and MIN(P3). 3.3. FINDING THE MAXIMUM AND MINIMUM 1 Algorithm 3.6 results from applying the strategy just described. MaxMin is a recursive algorithm that finds the maximum and minimum of the set of elements {a(i),a(i + 1),...,a(j)}. ‘The situation of set sizes one (i = j) and two (i = j — 1) are handled separately. For sets containing more than two elements, the midpoint is determined (just as in binary search) and two new subproblems are generated. When the maxima and minima of these subproblems are determined, the two maxima are compared and the two miniuia are compared to achieve the solution for the entire 1 Algorithm MaxMin(i, j,maz, min) 2 //a{l:n] isa global array. Parameters i and j are integers, 3) //1 min1) then min := minl; 28 } 29 } Algorithm 3.6 Recursively finding the maximum and minimum 142 CHAPTER 3. DIVIDE-AND-CONQUER ‘The procedure is initially invoked by the statement MaxMin(1,n,,y) Suppose we simulate MaxMin on the following nine elements: a fi) 2] 8) 4 B) @ 7 8 22 138 -5 -8 15 60 17 31 47 A good way of keeping track of recursive calls is to build a tree by adding a node each time a new call is made. For this algorithm each node has four items of information: i, j, mac, and min. On the array a[ ] above, the tree of Figure 3.2 is produced. 8,9,47,31 (Om 1,2,22,13] 3.3,-5.-5| Figure 3.2 Trees of recursive calls of MaxMin Examining Figure 3.2, we see that the root node contains 1 and 9 as the values of i and j corresponding to the initial call to MaxMin. ‘This execution produces two new calls to MaxMin, where i and j have the values 1, 5 and 6, 9, respectively, and thus split the set into two subsets of approximately the same size. From the tree we can immediately see that the maximum depth of recursion is four (including the first call). 'The circled numbers in the upper left corner of each node represent the orders in which maz and min are assigned values. 3.3. FINDING THE MAXIMUM AND MINIMUM 143 Now what is the number of element comparisons needed for MaxMin? If T(n) represents this number, then the resulting recurrence relation is 1 n=2 0 n=1 { T([n/2])+T([n/2])+2 n>2 T(n) = When n is a power of two, n = 2* for some positive integer k, then T(n) = 2T(n/2) +2 2(2T(n/4) +2) +2 AT(n/4)+4+2 (3.3) 212) + Dicice1 2 att 49 2 = 3n/2—2 Note that 3n/2 ~ 2 is the best- parisons when n is a power of tw Compared with the 2n ~ 2 comparisons for the straightforward method, this is a saving of 25% in comparisons. It can be shown that no algorithm based on comparisons uses less than 3n/2~ 2 comparisons. So in this sense algorithm MaxMin is optimal (see Chapter 10 for more details). But does this imply that MaxMin is better in practice? Not necessarily. In terms of storage, MaxMin is worse than the straightforward algorithm because it requires stack space for i, j, max,min,mazl, and min1. Given n elements, there will be [logy n| +1 levels of recursion and we need to save seven values for each recursive call (don’t forget the return address is also needed). Let us see what the count is when element comparisons have the same cost as comparisons between i and j. Let C(n) be this number. First, we observe that lines 6 and 7 in Algorithm 3.6 can be replaced with if (@ > 5-1) { // Small(P) average-, and worst-case number of com- to achieve the same effect. Hence, a single comparison between i and j is adequate to implement the modified if statement. Assuming n = 2* for some positive integer k, we get olny = { 200) +3 n>2 n=2 144 CHAPTER 3. DIVIDE-AND-CONQUER Solving this equation, we obtain C(n) = 2C(n/2) +3 4C(n/4) +643 34 2k-10(2) +3 De? 2 ee = #434213 = 5n/2-3 The comparative figure for StraightMaxMin is 3(n — 1) (including the com- parison needed to implement the for loop). This is larger than 5n/2— 3. Despite this, MaxMin will be slower than StraightMaxMin because of the overhead of stacking é,j,maz, and min for the recursion. Algorithm 3.6 makes several points. If comparisons among the elements of a{ | are much more costly than comparisons of integer variables, then the divide-and-conquer technique has yielded a more efficient (actually an opti- mal) algorithm. On the other hand, if this assumption is not true, the tech- nique yields a less-efficient algorithm. Thus the divide-and-conquer strategy is seen to be only a guide to better algorithm design which may not always succeed. Also we see that it is sometimes necessary to work out the con- stants associated with the computing time bound for an algorithm. Both MaxMin and StraightMaxMin are @(n), so the use of asymptotic notation is not enough of a discriminator in this situation. Finally, sce the exe for another way to find the maximum and minimum using only 3n/2 comparisons. Note: In the design of any divide-and-conquer algorithm, typically, it is a straightforward task to define Small(P) and S(P). So, from now on, we only discuss how to divide any given problem P and how to combine the solutions to subproblems. EXERCISES 1. Translate algorithm MaxMin into a computationally equivalent proce- dure that uses no recursion. 2. Test your iterative version of MaxMin derived above against Straight- MaxMin. Count all comparisons. 3. There is an iterative algorithm for finding the maximum and minimum which, though not a divide-and-conquer-based algorithm, is proba- bly more efficient than MaxMin. It works by comparing consecutive pairs of elements and then comparing the larger one with the current maximum and the smaller one with the current minimum. Write out 3.4. MERGE SORT 145 the algorithm completely and analyze the number of comparisons it requires. 4. In Algorithm 3.6, what happens if lines 7 to 17 are dropped? Does the resultant function still compute the maximum and minimum elements correctly? 3.4 MERGE SORT ‘As another example of divide-and-conquer, we investigate a sorting algo- rithm that has the nice property that in the worst case its complexity is O(nlogn). This algorithm is called merge sort. We assume throughout that the elements are to be sorted in nondecreasing order. Given a sequence of rn elements (also called keys) a[l],...,a{n], the general idea is to imagine them split into two sets a/l],...,a[|n/2|] and a[in/2| + 1],...,a[n]. Each and the resulting sorted sequences are merged to produce a single sorted sequence of n elements. Thus we have another ideal example of the divide-and-conquer strategy in which the splitting is into two equal-sized sets and the combining operation is the merging of two sorted sets into one. MergeSort (Algorithm 3.7) describ recursion and a function Merge (Algorithm 3.8) which merges two sorted sets. Before executing MergeSort, the n elements should be placed in a{1 : n Then MergeSort(1,n) causes the keys to be rearranged into nondecreasing order in a, this process very succinctly using Example 3.7 Consider the array of ten elements a1 : 10] = (310, 285, 179, 652, 351, 423, 861, 254, 450, 520). Algorithm MergeSort begins by splitting a|] into two subarrays each of size five (aL : 5] and a[6 : 10)). The elements in a[l : 5] are then split into two subarrays of size three (a/1 : 3]) and two (a[4 : 5]). Then the items in a[1 ; 3] are split into subarrays of size two (all : 2}) and one (a[3 : 3}). The two values in a[1 : 2] are split a final time into one-element subarrays, and now the merging begins. Note that no movement of data has yet taken place. A record of the subarrays is implicitly maintained by the recursive mechanism. Pictorially the file can now be viewed as (310 | 285 | 179 | 652, 351 | 423, 861, 254, 450, 520) where vertical bars indicate the boundaries of subarrays. Elements a[1] and a2] are merged to yield (285, 310 | 179 | 652, 351 | 423, 861, 254, 450, 520) 146 CHAPTER 3. DIVIDE-AND-CONQUER 1 Algorithm MergeSort(low, high) 2 // allow : high] is a global array to be sorted 3. // Small(P) is true if there is only one element 4 —// to sort. In this case the list is already sorted. 5 6 if (low < high) then // If there are more than one element 8 // Divide P into subproblems. 9 // Find where to split the set. 10 mid := |(low + high)/2|3 u // Solve the subproblems. 12 MergeSort(/ow, mid); 1B MergeSort(mid + 1, high); 14 // Combine the solutions. 15 Merge(low, mid, high); 16 ww } Algorithm 3.7 Merge sort ‘Then a[3] is merged with a[l : 2] and (179, 285, 310 | 652, 351 | 423, 861, 254, 450, 520) is produced. Next, elements a{4] and a[5] are merged: (179, 285, 310 | 351, 652 | 423, 861, 254, 450, 520) and then a[1 :3] and a[4: 5] (179, 285, 310, 351, 652 | 423, 861, 254, 450, 520) At this point the algorithm has returned to the first invocation of MergeSort and is about to process the second recursive call. Repeated recursive calls are invoked producing the following subarrays: (179, 285, 310, 351, 652 | 423 | 861 | 254 | 450, 520) Elements a[6] and a[7] are merged. Then a[8] is merged with a[6 : 7]: 3.4. MERGE SORT 147 Algorithm Merge(low, mid, high) // allow : high] is a global array containing two sorted 7/ subsets in allow : mid] and in almid +1: high]. The goal // is to merge these two sets into a single set residing 7/ im allow : high]. b[ ] is an auxiliary global array. h:= low; i:= low; j = mid +13 while ((h < mid) and (j < high)) do i (a{h] < a{j]) then bfi] = afh];h = h +1; else bf) = als I+ itl; } if (h > mid) then for k:=j to high do Off] = alk]s i= a +15 else for k :=h to mid do i+ } for k := low to high do alk] = blk]; Algorithm 3.8 Merging two sorted subarrays using auxiliary storage 148 CHAP1 JR 3. DIVIDE-AND-CONQUER (179, 285, 310, 351, 652 | 254, 423, 861 | 450, 520) Next a[9] and a[10] are merged, and then a[6 : 8] and a[9 : 10): (179, 285, 310, 351, 652 | 254, 423, 450, 520, 861) At this point there are two sorted subarrays and the final merge produces the fully sorted result (179, 254, 285, 310, 351, 423, 450, 520, 652, 861) 9.10 10,10 Figure 3.3 Tree of calls of MergeSort(1, 10) Figure 3.3 is a tree that represents the sequence of recursive calls that are produced by MergeSort when it is applied to ten clements. The pair of values in each node are the values of th low and high. Notice how the splitting continues until sets containing a single element are produced. Figure 3.4 is a tree representing the calls to procedure Merge by MergeSort. For example, the node containing 1, 2, and 3 represents the merging of a{1 : 2] with a[3). Oo If the time for the merging operation is proportional to n, then the com- puting time for merge sort is described by the recurrence relation Tin) = { % n= 1,aa constant (") = | 27(n/2) +n n>1,c a constant 3.4. MERGE SORT 149 When n ix a power of 2, n = 2", we can solve this equation by succ substitutions: 2(2T(n/4) + en/2) + en AT (n/4) + 2cn 4(2T (n/8) + en/4) + en T(n) 2T(1) + ken an+cnlogn It is easy to see that if 2* 1. { for j to n do // al :j — 1] is already sorted. idem := ali}; t= 5 — 1; while ((i > 1) and (item < ali])) do afi +1) := afi i:= 1-15 ali + 1] := item; Algorithm 3.9 Insertion sort The statements within the while loop can be executed zero up to a maximum of j times. Since j goes from 2 to n, the wors! procedure is bounded by Its best-case computing time is @(m) under the assumption that the body of the while loop is never entered. This will be true when the data is already in sorted order. We are now ready to present the revised version of merge sort with the inclusion of insertion sort and the links. Function MergeSort1 (Algorithm 3.10) is initially invoked by placing the keys of the records to be sorted in a{l : n] and setting link[1 : n] to zero. Then one says MergeSortl(1,n). A pointer to a list of indices that give the elements of a[ | in sorted order is returned. Insertion sort is used whenever the number of items to be sorted is less than 16. The version of insertion sort as given by Algorithm 3.9 needs to be altered so that it sorts allow : high] into a linked list. Call the altered version InsertionSortl. The revised merging function, Mergel, is given in Algorithm 3.11. 152 CHAPTER 3. DIVIDE-AND-CONQUER Algorithm MergeSort1 (low, high) // The global array allow : high] is sorted in nondec: // using the auxiliary array link|low : high]. The values in link // represent a list of the indices low through high giving a[ ] in 1 2 easing order 3 4 5 // sorted order. A pointer to the beginning of the list is returned. 6 7 8 9 if (high — low) < 15) then return InsertionSortl(a, link, low, high); else 10 ll mid := |(low + high)/2|; 12 q = MergeSortl (Low, mid); 13 MergeSortl (mid + 1, high); 14 return Mergel(q,r); 15 16 } Algorithm 3.10 Merge sort using links Example 3.8 As an aid to understanding this new version of merge sort, suppose we simulate the algorithm as it sorts the eight-element sequence (50, 10, 25, 30, 15, 70, 35, 55). We ignore the fact that less than 16 elements would normally be sorted using InsertionSort. The link array is initialized to zero. Table 3.4 shows how the link array changes after each call of MergeSortl completes. On each row the value of p points to the list in link that was created by the last completion of Mergel. To the right are the subsets of sorted elements that are represented by these lists. For example, in the last row p = 2 which begins the list of links 2, 5, 3, 4, 7, 1, 8, and 6; this implies a[2] < a8] < a[3] < ala] < af7] < all] < af8] < a6). o EXERCISES 1. Why is it necessary to have the auxiliary array [low : high] in function Merge? Give an example that shows why in-place merging is inefficient. -case time of procedure MergeSort is O(n log n). What is its e time? Can we say that the time for MergeSort is O(n log n)? 3. A sorting method is said to be stable if at the end of the method, identical elements occur in the same order as in the original unsorted 3.4. MERGE SORT 153 1 2 s to lists contained in the global array 3 Link{0] is introduced only for convenience and need 4 alized. The lists pointed at by q and r are merged 5 // anda pointer to the beginning of the merged list is returned. 6 { 7 8 9 tisgspssryk:=0; // The new list starts at link(0] 9 while ((i #0) and (j #0) do 10 { // While both lists are nonempty do ul if (ali] < a{j}) then 12 { // Find the smaller key. 13 link|k] := i; k 2= i; i 2= link(s uM // Add a new key to the list. 5 16 else 17 { 18 link{h] = yj = linklj]; 19 } 20 } al if (i = 0) then link|A] 22 else link|k] := 4; 23 return link|0]; 2} Algorithm 3.11 Merging linked lists of sorted elements 154 CHAPTER 3. DIVIDE-AND-CONQUER @ @ @ OY ® 6 M ®) ~ 56 2% 30 13 7035 ob eg ooo oo 6 2 0 1 0 0 0 0 0 © (10,50) 3 0 1 4 0 0 0 6 06 10; 50), (25, 30) 2 0 3 4 1 0 0 0 oO 5, 30, 50) ee gd Ge 0. 0) 50), (15, 70) 7 0 3 4 1 6 5 8 0 30), (15, 70) 5 0 3 4 1 7 0 8 6 50) (15, 35, 2 8 5 4 7 3 0 1 6 30, 35, 50,5 MergeSort1 applied to a[l : 8] = (50, 10, 25, 30, 15, 70, 35, 55) Table 3.4 Example of link array changes set. Is merge sort a stable sorting method? 4. Suppose a[l : m] and b[1 : n] both contain sorted element decreasing order. Write an algorithm that merges the c{l : m+n}. Your algorithm should be shorter than Algorithm 3.8 (Merge) since you can now place a large value in am + 1] and b[n +1] o Given a file of n records that are partially sorted as x1 < az <++- < ain and 241 <--- < tp, is it possible to sort the entire file in time O(n) using only a small fixed amount of additional storage? 6. Another way to sort a file of n records is to scan the file, merge conse utive pairs of size one, then merge pairs of size two, and so on. Write an algorithm that carries out this process. Show how your algorithm works on the data set (100, 300, 150, 450, 250, 350, 200, 400, 500). 7. A version of insertion sort is used by Algorithm 3.10 to sort small subarrays. However, its parameters and intent are slightly different from the procedure InsertionSort of Algorithm 3.9. Write a version of insertion sort that will work as Algorithm 3.10 expects. 8. The sequences X;, X2,..., Xp are sorted sequences such that D6, [Xj] = n. Show how to merge these ¢ sequences in time O(n log é). 3.5 QUICKSORT ‘The divide-and-conquer approach can be used to arrive at an efficient sorting method different from merge sort. In merge sort, the file a[1 : n] was divided 3.5. QUICKSORT 155 at its midpoint into subarrays which were independently sorted and later merged. In quicksort, the division into two subarrays is made so that the sorted subarrays do not need to be merged later. This is accomplished by rearranging the elements in a[l : n] such that ali] < ali] for all i between 1 and mm and all j between m+ 1 and n for some m, 1 alm] and that a{m] is the partitioning element. Ifm = 1 and p—1 =n, then a{r +1] must be defined and must be greater than or equal to all elements in a[1 : nj. The assumption that alr] is the partition element is merely for convenience; other choices for the partitioning element than the first item in the set are better in practice. The function Interchange(a, i, j) anges afi] with a[j]. Example 3.9 As an example of how Partition works, consider the following ray of nine elements. The function is initially invoked as Partition(a, 1, 10). The ends of the horizontal line indicate those elements which were inter- changed to produce the next row. The element a{1] = 65 is the partitioning element and it is eventually (in the sixth row) determined to be the fifth smallest. clement of the set. Notice that the remaining elements are unsorted but partitioned about a[5] = 65. a J) (2) (4) (5) (6) (@ (8) (9) (10) ¢ p 65 70 8085 60 55 50 45 +00 2 9 65 45 75 80 85 60 55 50 70 +00 3 8 65 45 50 80 85 60 55 75 70 +0 4 7 65 45 50 55 85 60 80 75 70 +0 65 45 50 55 60 85 80 75 70 +0 6 5 on ° Of 40) 00 55) Gor 90) 008 75) 70. co Using Hoare’s clever method of partitioning a set of elements about a chosen element, we can directly devise a divide-and-conquer method for completely sorting n elements. Following a call to the function Partition, two sets 5; and S$ are produced. All elements in S; are less than or equal 156 CHAPTER 3. DIVIDE-AND-CONQUER 1 Algorithm Partition(a, m,p) 2 // Within a[m],a{m + 1],...,a[p —1] the elements are 3 // rearranged in such a manner that if initi t=a[m), 4 // then after completion a[q] = ¢ for some q between m 5 // and p—1, afk] t 6 //forq > 0); 4 repeat 15, j=j-l; 16 until (a[j] < v); 7 if (i < j) then Interchange(a, i, j); 18 } until (i > 3); 19 alm) := alj); al] = v; return j; 20 } 1 Algorithm Interchange(a, i, ) 2 // Exchange afi] with alj. 3 { 4 afi); 5 aft] := aj); alg] = ps 6 Algorithm 3.12 Partition the array a[m : p — 1] about alm] 3.5. QUICKSORT 157 to the elements in $2. Hence $; and $2 can be sorted independently. Each set is sorted by reusing the function Partition. Algorithm 3.13 describes the complete proces: 1 Algorithm QuickSort(p,q) 2 // Sorts the elements alp],...,a{q] which reside in the global 3 // array a[l: n) into ascending order; a[n + 1] is considered to 4 // be defined and must be > all the elements in a{l : n] 5 6 A (p ol 1 nl a for example. n is the smallest value in »—1]. The amount of stack space needed can be reduced to O(logn) which is less than 2 log n. As remarked in Section 3.4, InsertionSort is exceedingly fast for n less than about 16. Hence InsertionSort can be used to speed up QuickSort2 whenever g—p < 16. The exercises explore various possibilities for selection of the partition element. 3.5.1 Performance Measurement QuickSort and MergeSort were evaluated on a SUN workstation 10/30. In both cases the recursive versions were used. For QuickSort the Partition fu ry out the median of three rule (i.e. the partitioning clement was the median of alm], all (m+p—1)/2|] and alp—1]). Each data ed of random integers in the range (0, 1000). Tables 3.5 and 3.6 record the actual computing times in milliseconds. Table 3.5 displays the average computing times. For each n, 50 random data sets were used. Table 3.6 shows the worst-case computing times for the 50 data Scanning the tables, we immediately see that QuickSort is faster than MergeSort for all values. Even though both algorithms require O(n logn) time on the average, QuickSort usually performs well in practice. The exer- cises «liscuss other tests that would make useful comparisons. 3.5.2 Randomized Sorting Algorithms Though algorithm QuickSort has an average time of O(n log n) on n elements, its worst-case time is O(n”). On the other hand it does not make use of any 160 CHAPTER 3. DIVIDE-AND-CONQUER 1 Algorithm QuickSort2(p, q) 2 // Sorts the elements in alp : g] 3 4 // stack is a stack of size 2log(n) 5 repeat 6 in while (p 5. But 5 is not a magic number; in the machine employed, this seems to give the best results. In general this number should be determined empirically. 1 Algorithm RQuickSort(p, q) 2 // Sorts the elements alp],... ,a{g] which reside in the global 3 // array a[1: n] into ascending order. a[n +1] is considered to 4 // be defined and must be > all the elements in a{1 : 5 6 if (p 5) then 9 Interchange(a,Random() mod (q— p+ 1) +p,p)3 10 n(a,p.q + 1); 1 is the position of the partitioning element 12 RQuickSort(p, j — 1)3 13 RQuickSort(j + 1,q)5 14 15 } Algorithm 3.15 Randomized quick sort algorithm The proof of the fact that RQuickSort has an expected O(n logn) time is the same as the proof of the average time of QuickSort. Let A(n) be the average time of RQuickSort on any input of n elements, Then the number of elements in the second part will be 0,1,2,....m—2, or n—1, all with an equal probability of 1 (in the probability space of outcomes for the randomizer) Thus the recurrence relation for A(n) will be A(n) = 2 SD (Ae-1) + Anh) #41 1Skhen This is the same as Equation 3.4, and hence its solution is O(n logn). RQuickSort and QuickSort (without employing the median of three ele- ments rule) were evaluated on a SUN 10/30 workstation. Table 3.7 displays 3.5. QUICKSORT 163 the times for the two algorithms in milliseconds averaged over 100 runs. For each 7, the input considered was the sequence of numbers 1,2,...,n._ As we can sce from the table, RQuickSort performs much better than QuickSort. Note that the times shown in this table for QuickSort are much more than the corresponding entries in Tables 3.5 and 3.6, ‘The reason is that Quick- Sort makes @(n2) comparisons on inputs that are already in sorted order. However. on random inputs its average performance is very good. n 1000 | 2000 | 3000 | 4000 | 5000 | QuickSort [195.5 | 759.2] 1728 [3165 | 4829 [RQuickSort [9.4 [21.0 [305 | 416 | 528 Table 3.7 Comparison of QuickSort and RQuickSort ou the input afi] = i, 1 é,}. After having partitioned X into s+1 parts, we sort each part recursively. For a proper choice of s, the number of comparisons made in this algorithm is only nlogn + 0(nlogn) Note the constant 1 before nlogn. We see in Chapter 10 that this number is very close to the information theoretic lower bound for sorting. Choose s = j35- The and comparisons if we u: sample clements in an array, say 6[ ], for each « € X, we can determine which part X; it belongs to in < logn comparisons using binary search on 6{ ]. Thus the partitioning process takes n log n + O(n) comparisons. In the exercises you are asked to show that with high probability the cardinality 164 CHAPTER 3. DIVIDE-AND-CONQUER Algorithm RSort(a,7) // Sort the elements a{1 : nj. Randomly sample s elements from al ]s Sort this sample; Partition the input using the sorted sample as partition keys; Sort each part separately; wryoakenre Algorithm 3.16 A randomized algorithm for sorting of each X; is no more than O("logn) = O(log?n). Using HeapSort or MergeSort to sort each of the X;’s (without employing recursion on any of them), the total cost of sorting the X;’s is a4 oH YE OUK Hog |Xsl) = max {log Xi} 1° OUD ist j, then the kth-smallest element is the (k — j)th-smallest element in alj +1: n]. The resulting algorithm ig function Select1 (Algorithm 3.17). This function places the kth-smallest element into position a[k] and partitions the remaining elements so that afi] < alk], 1 alk], k alk] fork 0, such that TKn) < ens +l YS Thlm-)+ Y Tie@-v), n>2 Mk Reign So, R(n) < ont A max {¥} km-)+ Y RE-Y} 1stek kien Fi nt not Rin) < ent amex { SY R)+ HRW}, n>2 (3.8) nok F We assume that ¢ is chosen such that R(1) < ¢ and show, by induction on n, that R(n) < 4en. Induction Base: For n = 2, (3.8) gives Rin) < 20+ } max {R(1), R()} < 2.5¢ < 4en Induction Hypothesis: Assume R(n) < den for all n,2 2, the elements of C’ can be computed using matrix multiplication and addition operations applied to matrices of size n/2 x n/2. Since n is a power of 2, these matrix products can be recur- wvely computed by the same algorithm we are using for the n x n case. This algorithm will continue applying itself to smaller-sized submatrices until n becomes suitably small (n = 2) so that the product is computed directly. To compute AB using (3.12), we need to perform eight multiplications of n/2 x n/2 matrices and four additions of n/2 x n/2 matrices. Since two n/2xn/2 matrices can be added in time cn? for some constant c, the overall computing time T(n) of the resulting divide-and-conquer algorithm is given by the recurrence elements are typically T(n) = b n<2 ™=\ 8T(n/2)+en? n>2 where b and c are constants. This recurrence can be solved in the same way as earlier recurrences to obtain T(n) = O(n*). Hence no improvement over the conventional method has been made. Since matrix multiplications are more expensive than matrix additions (O(n3) versus O(n2)), we can attempt to reformulate the equations for Ciz so as to have fewer multiplications and possibly more additions. Volker Strassen has discovered a way to compute the Cjj’s of (3.12) using only 7 multiplications and 18 additions or subtractions. His method involves first computing the seven n/2 x n/2 matrices P, Q, R, S, T, U, and Vas in (3.13). Then the Cij’s are computed using the formulas in (3.14). As can be seen, P, Q, R, S, T, U, and V can be computed using 7 matrix multiplications and 10 matrix additions or subtractions. The Cjj’s require an additional 8 additions or subtractions: 3.7. STRASSEN’S MATRIX MULTIPLICATION P = (Au + Ax)(Bir + Boo) Q = (Aa + Av) Bi Ro = Ai(By2— Bn) 8 Az(Bo — Bu) T (An + Aiz) Boo U (Ag, — Ari)(Bui + Bia) Vo = (Az = A22)(Bai + Boe) P+S-T+V R+T Cy = Q+8 Cy = P+R-Q+U The resulting recurrence relation for T(n) is 4 b n<2 Tn) = { IT(n/2)+ar2 n>2 where a and 6 are constants. Working with this formula, we get Tn) = an?[l+7/4+ (7/4)? +--+ (7/4) 1] + FTL) en? (7/4)!82" + 7°82", © a constant logs 4+loge 7log24 4 ql027 ) = O(n?*!) IA en O(n: EXERCISES Verify by hand that Equations 4 for Cri, C12, Cai, and C22. 3 and 181 (3.13) (3.14) (3.15) 3.14 yield the correct values 2. Write an algorithm that multiplies two n x n matrices using O(n*) op- erations. Determine the precise number of multiplic and array element accesses. 3. IE is a nonnegative constant, then prove that the recurrence tn) =f n=1 Y= 1 37(n/2)+kn n>1 has the following solution (for na power of 2): T(n) = 3kn!°%* — 2kn ions, additions, (3.16) (3.17) 184 CHAPTER 3. DIVIDE-AND-CONQUER Figure 3.6 Convex hull: an example (1) obtain the vertices of the convex hull (these vertices are also called ex- treme points), and (2) obtain the vertices of the convex hull in some order (clockwise, for example). Here is a simple algorithm for obtaining the extreme points of a given set 5 of points in the plane. To check whether a particular point p € S is extreme, look at each possible triplet of points and see whether p lies in the triangle formed by these three points. If p lies in any such triangle, it is not extreme; otherwise it is. Testing whether p lies in a given triangle can be done in O(1) time (using the methods described in Section 3.8.1). Since there are @(n*) possible triangles, it takes @(n°) time to determine whether a given point is an extreme point or not. Since there are n points, this algorithm runs in a total of ©(n*) time. ing divide-and-conquer, we can solve both versions of the convex hull problem in @(7 log n) time. We develop three algorithms for the convex hull in this section, The first has a worst-case time of @(n?) whereas its aver- age time is @(nlogn). This algorithm has a divide-and-conquer structure similar to that of QuickSort. The second has a worst-case time complexity of O(nlogn) and is not based on divide-and-conquer. The third algorithm is based on divide-and-conquer and has a time complexity of (nlogn) in the worst case. Before giving further details, we digress to discuss some primitive geometric methods that are used in the convex hull algorithms. 3.8.1 Some Geometric Primitives Let A be an n x n matrix whose elements are {aj}, 1 1 Consider the directed line segment (p,,p2) from some point p1 = (21.41) to some other point pz = (1r2, y2). If q = (0r3,ys) is another point, we say q is to the left (right) of (p1,p2) if the angle ppoq is a left (right) turn. (An angle is said to be a left (right) turn if it is less than or equal to (greater than or equal to) 180°.) We can check whether q is to the left (right) of (p1,p2) by evaluating the determinant of the following matrix: vy} ty £3 oe cesses If this determinant is positive (negative), then q is to the left (right) of (p1,p2). If this determinant is zero, the three points are colinear. This test can be used, for example, to check whether a given point p is within a triangle formed by three points, say p;,p2. and py (in clockwise order). The point p is within the triangle iff p is to the right of the line segments (p;,p2), (p2,ps), aud (p3,p1)- Also, for any three points (1,41), (2+y2). and (x. formed by the corresponding triangle is given by one- determinant. Let. py, p2,..+.Pn be the verti order. Let p be any other point. It ). the signed area If of the above of the convex polygon Q in clockwi desired to check whether p li in the interior of Q or outside. Consider a horizontal line h that extends from —oe to oo and goes through p. There are two possibilities: (1) h does not intersect. any of the edges of Q, (2) h intersects some of the edges of Q. If case (I) is true, then, p is outside Q. In case (2), there can be at most two points of intersection. If h intersects Q at a single point, it is counted as two. Count the number of points of intersections that are to the left of p. If this number is even, then p is external to Q; otherwise it is internal to Q This method of checking whether p is interior to Q takes @(n) time. 3.8.2 The QuickHull Algorithm An algorithm that is similar to QuickSort can be devised to compute the convex hull of a set X of n points in the plane. This algorithm, called QuickHull, first identifies the two points (call them p; and pa) of X with the sinallest and largest -coordinate values. Assume now that there are no ties. Later we see how to handle ties, Both p; and p» are extreme points and part of the convex hull. The set X is divided into X; and Xz so that 186 CHAPTER 3. DIVIDE-AND-CONQUER X; has all the points to the left of the li gment (p),p2) and X2 has all the points to the right of (p1,p2). Both X; and X include the two points p: and py. Then, the convex bulls of X; and X» (called the upper hull and lower hull, respectively) are computed using a divide-and-conquer algorithm called Hull. The union of these two convex hulls is the overall convex hull. If there is more than one point with the smallest «-coordinate, let p, and pi be the points from among these with the least and largest y-coordinates, respectively. Similarly define p', and pf for the points with the largest x- coordinate values. Now X1 will be all the points to the left of (p/p) (including p'’ and p'!) and X2 will be all the points to the right of (p},p)) (including p{, and p5). In the rest of the discussion we assume for simplicity that there are no ties for p; and pz. Appropriate modifications are needed in the event of ties. now describe how Hull computes the convex hull of X;. We determine a point of X, that belongs to the convex hull of X1 and use it to partition the problem into two independent subproblems. Such a point is obtained by computing the area formed by p,,p, and p» for each p in X, and picking the one with the largest (absolute) area. Ties are broken by picking the point p for which the angle ppip2 is maximum. Let ps be that point. Now X; is divided into two parts; the first part contains all the points of X, that are to the left of (p1,ps) (including p; and ps), and the second part contains all the points of X; that are to the left of (p3, p2) (including py and pz) (see Figure 3.7). There cannot be any point of X, that is to the left of both (p1,p3) and (p3,p2). Also, all the other points are interior points and can be dropped from future consideration. The convex hull of each part is computed recursively, and the two convex hulls are merged easily by placing one next to the other in the right order. If there are m points in X), we can identify the point of division ps in time O(m). Partitioning X; into two parts can also be done in O(m) time. Merging the two convex hulls can be done in time O(1). Let T(m) stand for the run time of Hull on a list of m points and let 1m and mg denote the sizes of the two resultant parts. Note that m; +m: , all of which are left turns. The final hull obtained is p, 1,3, 4,6,9, and 10, which are points on the hull in counterclockwise (cew) order. a is given in Algorithm 3.21. In this algorithm the set of s a doubly linked list ptslist. Function Scan runs in O(n) triplet examined, either the scan moves one node ahead or one point gets removed. In the latter case, the scan moves one node back. Also note that for each triplet, the test as to whether a left or right turn is formed can be done in O(1) time. Function Area computes the signed area formed by three points. The major work in the algorithm is in sorting the points. Since sorting takes O(n log n) time, the total time of Graham's scan algorithm is O(n log). 3.8.4 An O(nlogn) Divide-and-Conquer Algorithm In this section we present a simple divide-and-conquer algorithm, called DCHull, which also takes O(nlogn) time and computes the convex bull in clockwise order. 3.8. CONVEX HULL 189 point. = record{ float .r; float y; point «prev; point *next; Algorithm Scan(list) // list is a pointer to the first node in the input list. ap = list; «pl := list; repeat { p2:= (pl — next); if (p24 nest) £0) then p3 else return; // End of the list temp := Area((pl + x). (pl + y). (p24 2), (p2 + y), (p3 + (p3 + y))s if (temp > 0.0) then pl := (pl + next); // If pl, p2,p3 form a left turn, move one point ahead: // Xf not, delete p2 and move back. else (p2 + next); (pl + newt) := p3; (p3 + prev) := pl; delete p2; pl:= (pl prev); } until (false); Algorithm ConvexHull(ptslist) { // ptslist is a pointer to the first item of the input list. Find 7/ the point p in ptslist of lowest y-coordinate. Sort the 7/ points according to the angle m Sort(pislist); Scan(ptslist); PrintList(ptslist)s Algorithm 3.21 Graham's scan algorithm 190 CHAPTER 3. DIVIDE-AND-CONQUER Given a set X of n points, like that in the case of QuickHull, the problem is reduced to finding the upper hull and the lower hull separately and then putting them together. Since the computations of the upper and lower hulls very similar, we restrict our discussion to computing the upper hull. The divide-and-conquer algorithm for computing the upper hull partitions X into two nearly equal halves. Partitioning is done according to the x-coordinate values of points using the median ¢-coordinate as the splitter (see Section 3.6 for a discussion on median finding). Upper hulls are recursively computed for the two halves. These two hulls are then merged by finding the line of tangent ( a straight line cting a point each from the two halve such that all the points of X are on one side of the line) (see Figure 3.9). Figure 3.9 Divide and conquer to compute the convex hull To begin with, the points p; and py are identified [where p; (pz) is the point with the least (largest) «-coordinate value]. This can be done in O(n) time. Ties can be handled in exactly the same manner as in QuickHull. So, assume that there are no ties. All the points that are to the left of the line segment (p1,p2) are separated from those that are to the right. This separation also can be done in O(n) time. From here on, by input” and *X” we mean all the points that are to the left of the line segment (p1,p2). Also let |X| = N. Sort the input points according to their x-coordinate values. Sorting, can be done in O(N log N) time. This sorting is done only once in the computation of the upper hull. Let q1,q2,-.-.qiv be the sorted order of these 3.8. CONVEX HULL 191 points. Now partition the input into two equal halves with q1,q2,...,@N/2 in the first half and qyjo41.dyyo42+++--dy in the second half. The upper hull of each half is computed recursively. Let H; and H be the upper hulls. Upper hulls are maintained as linked lists in clockwise order. We refer to the first element in the list as the leftmost point and the last element as the rightuiost point The line of tangent is then found in O(log? N) time. If (u,v) is the line of tangent, then all the points of H; that are to the right of u are dropped, Similarly, all the points that are to the left of v in Hz are dropped. The remaining part of Hy, the line of tangent, and the remaining part of Ho form the upper hull of the given input set. If 7(N) is the run time of the above recursive algorithm for the upper hull on an input of NV points, then we have T(N) = 2T(N/2) + O(log? N) which solves to P(N) = O(.V). Thus the run time is dominated by the initial sorting step. The only part of the algorithm that remains to be specified is how to find the line of tangent (u,v) in O(log? N) time, The way to find the tangent is to start from the middle point, call it p, of Hy. Here the middle point refers to the middle element of the corresponding list. Find the tangent of p with Hy, Let (p,q) be the tangent. Using (p,q), we can determine whether u is to the left of, equal to, or to the right of p in Hy, A binary search in this fashion on the points of H reveals u. Use a similar procedure to isolate v. Lemma 3.1 Let Hy and Hy be two upper hulls with at most m points each. If p is any point of Hy, its point q of tangency with Hy can be found in O(log m) time. Proof. If q' is any point in Hj, we can check whether q' is to the left of, equal to, or to the right of q in O(1) time (see Figure 3.10). In Figure 3.10, wx and y are the left and right neighbors of q' in Hy. respectively. If Zpq's is aright. turn and dpq'y is a left turn, then q is to the right of q' (see case 1 of Figure 3.10). If Zpq'e and /pq'y are both right turns, then q/ = q (see case 2 of Figure 3.10); otherwise q is to the left of q/ (see case 3 of Figure 3.10). ‘Thus we can perform a binary search on the points of Hy and identify q in O(log m) time. a Lemma 3.2 If H; and H2 are two upper hulls with at most m points each, their common tangent can be computed in O(log” m) time. Proof. Let u € Hy and v € Hp be such that (u,v) is the line of tangent. Also let p be an arbitrary point of Hy and let q € Hz be such that (p,q) is a 192 CHAPTER 3, DIVIDE-AND-CONQUER. case | case 3 Figure 3.10 Proof of Lemma 3.1 tangent of Hz. Given p and q, we can check in O(1) time whether w is to the left of, equal to, or to the right of p (see Figure 3.11). Here x and y are left and right neighbors, respectively, of p in Hy. If (p,q) is also tangential to Ay, then p =u. If Zxpq is a left turn, then u is to the left of p; otherwise u is to the right of p. This suggests a binary search for u. For each point p of Hj chosen, we have to determine the tangent from p to Hy and then decide the relative positioning of p with respect to u. We can do this computation in O(logm x logm) = O(log? m) time. oO In summary, given two upper hulls with ¥ points each, the line of tangent can be computed in O(log? N) time. Theorem 3.4 A convex hull of n points in the plane can be computed in O(n log n) time. Oo 3.9. REFERENCES AND READINGS 193 A, Figure 3.11 Proof of Lemma 3 EXERCISES 1. Write an algorithin in pseudocode that implements QuickHull and test t using suitable data. de the divide-and-conquer algorithm DCHull and test it using ap- propriate data. 3. Run the three algorithms for convex hull discussed in thi: ion on various random inputs and compare their performances. 4, Algorithm DCHull can be modified as follows: Instead of using the mnedian as the splitter, we could use a randomly chosen point as the splitter. Then X is partitioned into two around this point. The rest of the function DCHull is the same. Write code for this modified algorithm. and compare it with DCHull empirically. 5. Let $ be a set of n points in the plane. Tt is given that there is only a constant (say ¢) number of points on the hull of $. Can you devise a convex hull algorithm for $ that runs in time o(nlogn)? Conceive of special algorithms for ¢ = 3 and ¢ =4 first and then generalize. 3.9 REFERENCES AND READINGS Algorithm MaxMin (Algorithm 3.6) is due to I. Pohl and the quicksort algo- rithm (Algorithm 3.13) is due to C. A. R. Haore. The randomized sorting algorithin in Algorithm 3.16 is due to W. D. Frazer and A. C. McKeller and 194 CHAPTER 3. DIVIDE-AND-CONQUER the selection algorithm of Algorithm 3.19 is due to M. Blum, R. Floyd, V. Pratt, R. Rivest and R. E. Tarjan. For more on randomized sorting and selection see: “Expected time bounds for selection,” by R. Floyd and R. Rivest, Commu nications of the ACM 18, no. 3 (1975): 165-172 “Samplesort: A Sampling Approach to Minimal Storage Tree Sorting,” by W. D. Frazer and A. C. McKellar, Journal of the ACM 17, no. 3 (1970): 496-507. “Derivation of Randomized Sorting and Selection Algorithms,” by S. Ra- jasekaran and J. H. Reif, in Parallel Algorithm Derivation and Program Transformation, edited by R. Paige, J. H. Reif, and R. Wachter, Kluwer Academic Publishers, 1993, pp. 187-205. ‘The matrix multiplication algorithm in Section 3.7 is due to V. Strassen. For more information on the matrix multiplication problem see “Matrix mul- tiplication via arithmetic progressions,” by D. Coppersmith and 8. Wino- grad, Journal of Symbolic Computation 9 (1990): 251-280. A complex O(n?) time algorithm for multiplying two n x n matrices is given in this paper. For more applications of divide-and-conquer see: Computational Geometry, by F. Preparata and M. I. Shamos, Springer Verlag, 1985. Computational Geometry: An Introduction Through Randomized Algorithms by K. Mulmuley, Prentice-Hall, 1994. Introduction to Algorithms: A Creative Approach, by U. Wesley, 1989. lanber, Addison- 3.10 ADDITIONAL EXERCISES 1. What happens to the worst-case rin time of quicksort if we use the median of the given keys as the splitter key? (Assume that the selection algorithm of Section 3.6 is employed to determine the median). 2. The sets A and B have n elements each given in the form of sorted arrays. Present an O(n) time algorithm to compute AUB and AN B. 3. The sets A and B have m and n elements (respectively) from a linear order. These sets are not necessarily sorted. Also assume that m > n). Pr worst case in time O (ni2£%) and checks whether all these n numbers each number is an integer in the ent an algorithm that runs in the are distinct. Your algorithm should use only O(n) space. Let S' be a sequence of n? integers in the range [1,n]. Let R(i) be the number of i's in the sequence (for i = 1,2,...,n). Given 8, we have to compute an approximate value of R(i) for each i. If N(j) is an approximation to R(i),i = 1,...,n, it should be the case that (with high probability) N(i) > R(i) for each i and DL, N(i) = O(n?). Of course we can do this computation in deterministic O(n”) time. Design a randomized algorithm for this problem that runs in time O(n log? n). Chapter 4 THE GREEDY METHOD 4.1 THE GENERAL METHOD ‘The greedy method is perhaps the most straightforward design technique we consider in this text, and what’s more it can be applied to a wide variety of problems. Most, though not all, of these problems have n inputs and require us to obtain a subset that satisfies some constraints. Any subset that satis- fies these constraints is called a feasible solution. We need to find a feasible solution that either maximizes or minimizes a given objective function. A feasible solution that does this is called an optimal solution. There is usu- ally an obvious way to determine a feasible solution but not necessarily an optimal solution. The greedy method suggests that one can devise an algorithm that works in stages, considering one input at a time. At each stage, a decision is made regarding whether a particular input is in an optimal solution. This is done by considering the inputs in an order determined by some selection proce- dure. If the inclusion of the next input into the partially constructed optimal solution will result in an infeasible solution, then this input is not added to the partial solution. Otherwise, it is added. The selection procedure itself is based on some optimization measure. This measure may be the objecti function. In fact, several different optimization measures may be plausible for a given problem. Most of these, however, will result in algorithms that generate suboptimal solutions. This version of the greedy technique is called the subset paradigm. We can describe the subset paradigm abstractly, but more pr above, by considering the control abstraction in Algorithm 4.1. ‘The function Select selects an input from a[] and removes it. The selected input’s value is assigned to x. Feasible is a Boolean-valued function that determines whether « can be included into the solution vector. The function Union combines x with the solution and updates the objective function. The isely than oe 198 CHAPTER 4. THE GREEDY METHOD 1 Algorithm Greedy(a,n) 2 //a{l:n| contains the n inputs. 3 4 solution := 0; // Initialize the solution. 5 for i:=1 ton do 6 { 7 = Select (a); 8 if Feasible(solution, xr) then 9 solution := Union(solution, x); 10 } u return solution; 12} function Greedy d s a greedy algorithm will look, once a particular problem is chosen and the functions Select, Feasible, and Union are properly implemented. For problems that do not call for the s greedy method we make de dering the inputs in some order. Each decision is made using an optimization criterion that can be computed using decisions already made. Call this version of the greedy method the ordering paradigm. Sections 3, 4.4, and 4.5 consider problems that fit the subset paradigm, and Sec 4.7, and 4.8 consider problems that fit the ordering paradigm. tion of an optimal subset, in the EXERCISE 1. Write a control abstraction for the ordering paradigm. 4.2 KNAPSACK PROBLEM Let us try to apply the greedy method to solve the knapsack problem. We are given n objects and a knapsack or bag. Object é has a weight w; and the knapsack has a capacity m. If a fraction 27;, 0 <2; < 1, of object i is placed into the knapsack, then a profit of p;r; is earned. The objective is to obtain a filling of the knapsack that maximizes the total profit earned. Since the knapsack capacity is m, we require the total weight of all chosen objects to be at most m. Formally, the problem can be stated as 4.2, KNAPSACK PROBLEM 199 maximize > pyri (4.1) Sten subject to S> wir; wyng < m T and 2; =Oorl, l 6, then the node u gets split and d(u) is set to zero. Computation proceeds from the leaves toward the root. In the tree of Figure 4.2, let 6 = 5. For each of the leaf nodes 7,8,5,9, and 10 the delay is zero. The delay for any node is computed only after the delays for its children have been determined. Let u be any node and C(u) be the set of all children of u. Then d(u) is given by d(u) = —- + w(u,v)} Using the above formula, for the tree of Figure 4.2, d(4) = 4. Since d(4) + w(2,4) = 6 > 6, node 4 gets split. We set d(4) = 0. Now d(2) can be 4.3. TREE VERTEX SPLITTING 205, Figure 4.2 An example tree computed and is equal to 2. Since d(2) +1(1,2) exceeds 6, node 2 gets split and d(2) is set to zero. Then d(6) is equal to 3. Also, since d(6)+w(3,6) > 6, node 6 has to be split. Set d(6) to zero. Now d(3) is computed as 3. Finally, d(1) is computed as 5. Figure 4.3 shows the final tree that results after splitting the nodes 2, 4, and 6. This algorithm is described in Algorithm 4.3, which is invoked as TVS(root, 6), root being the root of the tree. The order in which TVS visits (i.e., computes the delay values of) the nodes of the tree is called the post order and is studied again in Chapter 6. Figure 4.3 The final tree after splitting the nodes 2, 4, and 6 206 CHAPTER 4. THE GREEDY METHOD 1 Algorithm TVS(T, 6) 2 // Determine and output the nodes to be split. 3 // w() is the weighting function for the edges. 4 5 if (T #0) then 6 { 7 dt 8 for each child v of T do 9 10 TVS(u, 6)3 u d[T] = max{d{T], dlv] + w(T,»)}5 12 13 if ((F is not the root) and 4 (dT) + w(parent(T),T) > 5) then 15 16 write (T); d[T] = 0; 17 } 18 19 } Algorithm 4.3 The tree vertex splitting algorithm Algorithm TVS takes @(n) time, where n is the number of nodes in the tree. This can be seen as follows: When TVS is called on any node T, only a constant number of operations are performed (excluding the time taken for the recursive calls). Also, TVS is called only once on each node T in the tree. Algorithm 4.4 is a revised version of Algorithm 4.3 for the special case of directed binary trees. A sequential representation of the tree (see Section 2.2) has been employed. The tree is stored in the array tree[ ] with the root at tree[1]. Edge weights are stored in the array weight| ]. If tree[i] has a tree node, the weight of the incoming edge from its parent is stored in weight[i]. The delay of node i is stored in dji]. The array d[ | is initialized to zero at the beginning. Entries in the arrays tree| | and weight[ ] corresponding to nonexistent nodes will be zero. As an example, for the tree of Figure 4.2, tree{ ] will be set to {1,2,3,0,4,5,6,0,0,7,8,0,0,9, 10} starting at cell 1. Also, weight[ ] will be set to {0,4,2,0, 2, 1,3, 0,0, 1,4,0, 0,2,3} at the beginning, starting from cell 1. The algorithm is invoked as TVS(1, 6). Now. we show that TVS (Algorithm 4.3) will always split a minimal number of nodes. 4.3. TREE VERTEX SPLITTING 207 Algorithm TVS(i, 6) // Determine and output a minimum cardinality split set. // The tree is realized using the sequential representation. // Root is at tree[1]. N is the largest number such that 1 2 3 4 5 Y/ ode. 6 7 8 bs if (tree[i] #0) then // If the tree is not empty if (21 > N) then d{i] := 0; // i is a leaf. else i 1ax(d[i], d[2i] + weight|2i]); a if (2i+1< N) then 15 TVS(2i +1, 8)3 16 d{i] := max(qfi), d[2i + 1] + weight[2i + 1])s 3 19 if ((tree[i] # 1) and (d{i] + weight[i] > 6)) then 21 write (¢ree{i]); di] : Algorithm 4.4 TVS for the special case of binary trees Theorem 4.2 Algorithm TVS outputs a minimum cardinality set U such that d(P/U) < 6 on any tree T, provided no edge of T has weight > 6. Proof: The proof is by induction on the number of nodes in the tree. If the has a single node, the theorem is true. Assume the theorem for all trees e |U|—1. In other words, |W’| > |U|. a EXERCISES 1. For the tree of Figure 4.2 solve the TVSP when (a) 6 = 4 and (b) 6=6. 2. Rewrite TVS (Algorithm 4.3) for general trees. Make use of pointers. 4.4 JOB SEQUENCING WITH DEADLINES We are given a set of n jobs. Associated with job ¢ is an integer deadline dj > 0 and a profit pj > 0. For any job i the profit p; is earned iff the job is completed by its deadline. To complete a job, one has to process the job on a machine for one unit of time. Only one machine is available for processing jobs. A feasible solution for this problem is a subset J of jobs such that each. job in this subset can be completed by its deadline. The value of a feasible solution J is the sum of the profits of the jobs in J, or Dicy pi. An optimal solution is a feasible solution with maximum value. Here problem involves the identification of a subset, it fits the subset paradigm. Example 4.2 Let n= 4, (p1,p2,ps; ps) = (100, 10, 15, 27) and (di, da, ds, da) (2,1,2,1). The feasible solutions and their values are: feasible processing solution sequence value 1 (1,2) 1 110 2 (14,3) 1, 30r3,1 115 3 (4) 4 127 ae (ea) oes 25 5. (3,4) 4,3 42 6 (ot 100 7 QQ) 2 10 8 (3) 8 15 9 (4) 4 QT Solution 3 is optimal. In this solution only jobs 1 and 4 are proc: the value is 127. These jobs must be processed in the order job 4 followed by job 1. Thus the processing of job 4 begins at time zero and that of job 1 is completed at time 2. a sed and 4.4, JOB SEQUENCING WITH DEADLINES 209 ‘To formulate a greedy algorithm to obtain an optimal solution, we must formulate an optimization measure to determine how the next job is chosen. As a first attempt we can choose the objective function 37j.) pi as our op- timization measure. Using this measure, the next job to include is the oue that increases 7.) p; the most, subject to the constraint that the resulting J is a feasible sohition. This requires us to consider jobs in nonincreasing order of the p;’s. Let us apply this criterion to the data of Example 4.2. We begin with J =0 and Dy; p: =0. Job 1 is added to J as it has the largest profit and J = {1} is a feasible solution. Next, job 4 is considered. The solution J = {1,4} is also feasible. Next, job 3 is considered and discarded as J = {1,3,4} is not feasible. Finally, job 2 is considered for inclusion into J. It is discarded as J = {1,2,4} is not feasible. Hence, we are left with the solution J = {1,4} with value 127. This is the optimal solution for the given problem instance. Theorem 4.4 proves that the greedy algorithm just described always obtains an optimal solution to this sequencing problem. Before attempting the proof, let us see how we can determine whether a given JJ is a feasible solution. One obvious way is to try out all possible permutations of the jobs in J and check whether the jobs in J can be pro- cessed in any one of these permutations (sequences) without violating the deadlines. For a given pernmtation o = 71, %2,i3,.-., i, this is easy to do, since the earliest time job ig,1 < q < k, will be completed is q. If q > dj,. then using c, at least job i, will not be completed by its deadline, However, if |J| = i, this requires checking i! permutations. Actually, the feasibility of a set J can be determined by checking only one permutation of the jobs in J. This permutation is any one of the permutations in which jobs are ordered in nondecreasing order of deadlines. Theorem 4.3 Let J be a set of k jobs and o = it, i2,..., i, a permutation of jobs in J such that dj, < dj, <-+- < dj,. Then J is a feasible solution iff the jobs in J can be processed in the order o without violating any deadline. Proof: Clearly, if the jobs in J can be processed in the order o without violating any deadline, then J is a feasible solution. So, we have only to show that if J is feasible, then o represents a possible order in which the jobs can be processed. If J is feasible, then there exists o! = r1,72.....Tr such that d,, >, 1 a. In o! we can interchange rq and ry. Since d,, > dy,, the resulting permutation o” = 81,82,...)6 represents an order in which the jobs can be processed without violating a deadline. Continuing in this way, o! can be transformed into o without violating any deadline. Hence, the theorem is proved. oO ‘Theorem 4.3 is true even if the jobs have different processing times t; > 0 (see the exercises). 210 CHAPTER 4. THE GREEDY METHOD Theorem 4.4 The greedy method described above always obtains an opti- mal solution to the job sequencing problem. Proof: Let (pi,d;),1 < i py for all jobs b that are in J but not in J. To see this, note that if p, > pa, then the greedy method would consider job b before job a and include it into I. Now, consider feasible schedules $; and Sy for I and J respectively. Let. ibe a job such that i € I and i € J. Let i be scheduled from ¢ to +1 in S; and t' to t'+1lin Sj. Ift < t’, then we can interchange the job (if any) heduled in [¢’, #” + 1] in S; with i. If no job is scheduled in [¢’, t! + 1] in J, then i is moved to {#’, +1]. The resulting schedule is also feasible. If t’ < t, then a similar transformation can be made in $j. In this way, we can obtain ‘hedules S', and S', with the property that all jobs common to J and J are theduled at the same time. Consider the interval [tq,te + 1] in S} in which the job a (defined above) is scheduled. Let b be the job (if any) scheduled in S', in this interval. From the choice of a,pq > py. Scheduling a from ta to t; + 1 in $4 and discarding job b gives us a feasible schedule for job set J! = J~ {b}U {a}. Clearly, J’ has a profit value no less than that of J and differs from I in one less job than J does. By repeatedly using the transformation just described, J can be trans- formed into I with no decrease in profit value. So J must be optimal. 0 A high-level description of the greedy algorithm just discussed appears as Algorithm 4.5. This algorithm constructs an optimal set J of jobs that can be processed by their due times. The selected jobs can be processed in the order given by Theorem 4.3. Now, let us see how to represent the set J and how to carry out the test of lines 7 and 8 in Algorithm 4.5. Theorem 4.3 tells us how to determin whether all jobs in J U {i} can be completed by their deadlines. We avoid sorting the jobs in J each time by keeping the jobs in J ordered by deadlines. We can use an array d[1 : n] to store the deadlines of the jobs in the order of their p-values. The set J itself can be represented by a one- dimensional array J[1 : k] such that J[r], 1 py > ++: > pn. Further it assumes that'n > 1 and the deadline d{i] of job i is at least 1. Note that no job with d{i] <1 can ever be finished by its deadline. Theorem 4.5 proves that JS is a correct implementation of the greedy strategy. Theorem 4.5 Function JS is a correct implementation of the greedy-based method described above. Proof: Since d{i] > 1. the job with the largest p; will always be in the greedy solution. As the jobs are in nonincreasing order of the pj's, line 8 in Algorithm 4.6 includes the job with largest p;. ‘The for loop of line 10 considers the remaining jobs in the order required by the greedy method described earlier. At all times, the set of jobs already included in the solution is maintained in J. If J[i], 1 < i < k, is the set already included, then J is such that d[J[iJ] < d[J[i + 1], 1 dli], w w. This is verified in line 16 (note r = w on exit from the while loop if d[J[q]] 4a, w <4 1,1<1< nare the deadlin 3. // ave ordered such that pil] > pl2] > - 4 5 // is the ith job in the optimal solution, 1 <7 < k. 5 // Also, at termination d{J{i]] < d[J[i +1], 1 d{i]) and (d[J[r)] #r)) do r= 16 if ((d[J[r]] < d[i]) and (d{i] > r)) then V7 18 // Insert i into J[]. 19 ‘0 (r +1) step —1 do J[qg+1 20 i; kth 21 22 23 return k; 24 } Algorithm 4.6 Greedy algorithm for sequencing unit time jobs with dead- lines and profits For JS there are two possible parameters in terms of which its complexity can be measured. We can use n, the number of jobs, and s, the number of jobs included in the solution J. The while loop of line 15 in Algorithm 4.6 is iterated at most h times. Fach iteration takes @(1) time. If the c of line 16 is true, then lines 19 and 20 are executed. These ©(k — r) time to insert job i. Hence, the total time for each iteration of the for loop of line 10 is ©(k). This loop is iterated n — 1 times. If s is the final value of k, that is, s is the number of jobs in the final solution, then the total time needed by algorithm JS is @(sn). Since s pl2| > +++ > pln] and that 6 = min{n,max;(d{i))}. 4{ 5 // Initially there are b +1 single node trees. 6 0 to b do ffi]:= i; 7 0; // Initialize. 8 to n do 9 {// Use greedy rule, 10 q:= CollapsingFind(min(n, d{i])); ql if (f(q] #0) then 12 13 k= k+13 J[k]:= i; // Select job i. 14 m= CollapsingFind( f(g) 1) 15 WeightedUnion(rn, 4); 16 fla] := fim); // q may be new root. 7 } 18. 19 } Algorithm 4.7 Faster algorithm for job sequencing EXERCISES 1 You are given a set of n jobs. Associated with each job # i time #; and a deadline d; by which it must be complet schedule is a permutation of the jobs such that if the jobs are processed in that order, then each job finishes by its deadline, Define a greedy schedule to be one in which the jobs are processed in nondecreasing order of deadlines. Show that if there exists a feasible schedule, then all greedy schedules are feasible. [Optimal assignment] Assume there are n workers and n jobs. Let v¥jj be the value of assigning worker i to job j. An assignment of workers to jobs corresponds to the assignment of 0 or 1 to the variables aj, 1 < i, 3 0. 3. (a) What is the solution generated by the function JS when n = 7, (Piypa,.-+.P7) = (3,5,20, 18, 1,6,30), and (di.dg,....d7) = (1,3,4,3, 2,1, 2)? (b) Show that Theorem 4.3 is true even if jobs have different process- ing requirements. Associated with job i is a profit pj > 0, a time requirement t; > 0, and a deadline d; > ti. (c) Show that for the situation of part (a), the greedy method of this ss Id an optimal solution. 4. (a) For the job section, show that the subset J repre processed according to err job i in J hasn't been assigned a processing time, then assign it to the slot [a — 1, a], where a is the least integer r such that 1 cost{k, j])) 25 then near|k] := js 6 } 2 return mincost; 28 F Algorithm 4.8 Prim’s minimum-cost spanning tree algorithin 222 CHAPTER 4. THE GREEDY METHOD 4 in with no edges ; he current graph with no edges selected. Edge (1,6) is the frst edge considered. It is inchuded in the spanning tree being built. This yields the graph of Figure 4.8(b). Next, the edge (3, 4) is selected and included in the tree (Figure 4.8(c)). The next edge to be considered is (2,7). Its inclusion in the tree being built does not create a cycle, so we get the graph of Figure 4.8(d). Edge (2,3) is considered next and included in the tree Figure 4.8(e). Of the edges not yet. considered, (7,4) has the least cost. It is considered next. Its inclusion in the tree results in a cycle, so this edge is discarded. Edge (5,4) is the next edge to be added to the tree being built. This results in the configuration of Figure 4.8(f). The next edge to be considered is the edge (7,5). It is discarded, as its inclusion creates a cycle. Finally, edge (6,5) is considered and inchided in the tree being built. This completes the spanning tree. The resulting tree (Figure 4.6(b)) has cost 99. a For clarity, Kruskal’s method is written out more formally in Algorithm 4.9. Initially E is the set of all edges in G. The only functions we wish to perform on th are (1) determine an edge with minimum cost (line 4) and (2) delete this edge (line 5). Both these functions can be performed efficiently if the edges in E are maintained as a sorted sequential list. It is not essential to sort all the edges so long as the next edge for line 4 can be determined easily. If the edges are maintained as a minheap, then the next edge to consider can be obtained in O(log |E|) time. The construction of the heap itself takes O(|E|) time. ‘To be able to perform step 6 efficiently, the vertices in G should be grouped together in such a way that one can easily determine whether the vertices » and w are already connected by the earlier selection of edges. If they are, then the edge (v,w) is to be discarded. If they are not, then (v, w) is to be added to #. One possible grouping is to place all vertices in the same connected component of t into a set (all connected components of t will also be trees). Then, two vertices 7 and w are connected in ¢ iff they are in the same set. For example, when the edge (2,6) is to be considered, the sets are {1,2}, {3,4,6}, and {5}. Vertices 2 and 6 are in different sets so these sets are combined to give {1,2,3,4,6} and {5}. The next edge to be considered is (1,4). Since vertices 1 and 4 are in the same set, the edge is rejected. The edge (3,5) connects vertices in different sets and results in the final span- ning tree. Using the set representation and the union and find algorithms of Section 2.5, we can obtain an efficient (almost linear) implementation of line 6. The computing time is, therefore, determined by the time for lines 4 and 5, which in the worst case is O(|E|log |E|). If the representations discussed above are used, then the pseudocode of Algorithm 4.10 results. In line 6 an initial heap of edges is constructed. In line 7 each vertex is assigned to a distinct set (and hence to a distinct tree). The set t is the set of edges to be included in the minimum-cost spanning 4.5. MINIMUM-COST SPANNING TREES 223 (d) te) (f Figure 4.8 Stages in Kruskal’s algorithm tree and i is the number of edges in t. The set t can be represented as a sequential list using a two-dimensional array ¢[{1 : n—1, Edge (u,v) can be added to ¢ by the assignments ¢{i, 1] =u; and ¢{i, 2] = v3. In the while loop of line 10, edges are removed from the heap one by one in nondecreasing order of cost. Line 14 determines the sets containing u and v. If j # k, then su and v are in different sets (and so in different trees) and edge included into ¢, The sets containing u and v are combined (line 20) v, the edge (u,v) is discarded as its inclusion into £ would create a cycle. Line 23 determines whether a spanning tree was found. It follows that i # n — 1 iff the graph G is not connected. computing time is O(|E|log |E|), where E is the edge set of G. Theorem 4.6 Kruskal’s algorithm generates a minimum-cost spanning tree for every connected undirected graph G. 224 CHAPTER 4, THE GRE) wer oakene t=O while ((¢ has less than n ~ 1 edges) and (E 0) do { Choose an edge (v, w) from E of lowest cost; Delete (v,w) from Es if (v,w) does not create a cycle in t then add (v,w) to 3 else discard (v,w) Algorithm 4.9 Early form of minimum-cost spanning tree algorithm due to Kruskal Algorithm Kruskal( £, cost, n,t) // Bis the set of edges in G. G has n vertices. cost[u, v] is the // cost of edge (u,v). t is the set of edges in the minimum-cost, // spanning tree. The final cost is returned. Construct @ heap out of the edge costs using Heapifys vertex isina different set. = 0.05 while ((i cost(q). Now, reconsider the graph with edge set E(t’) U {q}. Removal of any edge on the cycle q,¢1,€2,.« ill leave behind a tree ¢” (Exercise 5). In particular, if we delete the edge then the resulting tree ¢” will have a cost no more than the cost of f’ (as cost(e)) > cost(e)). Hence, t” is also a minimmum-cost tree. By repeatedly using the transformation described above, tree t! can be transformed into the spanning tree ¢ without any increase in cost. Henc is a minimum-cost spanning tree. 4.5.3 An Optimal Randomized Algorithm (+) Any algorithm for finding the minimum-cost spanning tree of a given graph G(V,E) will have to spend Q(|V| + |E|) time in the worst case, since it has to examine each node and each edge at least once before determining the correct answer. A randomized Las Vegas algorithm that runs in time O(\V| + |E\) can be devised as follows: (1) Randomly sample m edges from G (for some suitable m). (2) Let G’ be the induced subgraph; that is, G’ has V as its node set and the sampled edges in its edge set. ‘The subgraph G! need not be connected. Recursively find a minimum-cost spanning tree for each component of G'. Let F be the resultant minimum-cost spanning forest of G'. (3) Using F, eliminate certain edges (called the F-heavy ed of G that cannot possibly be in a minimum-cost spanning tree. Let G" be the graph that results from G after elimi: (4) Recursively find a minimum-cost spanning will also be a minimum-cost spanning tree for G. Steps 1 to 3 are useful in reducing the number of edges in G. The al- gorithm can be speeded up further if we can reduce the number of nod in the input graph as well. Such a node elimination can be effected using the Borivka steps. In a Borivka step, for each node, an incident edge with minimum weight is chosen. For example in Figure 4.9(a), the edge (1,3) is 226, CHAPTER 4. THE GREEDY METHOD chosen for node 1, the edge (6,7) is chosen for node 7, and so on. All the chosen edges are shown with thick lines. The connected components of the induced graph are found. In the example of Figure 4.9(a), the nodes 1, 2, and 3 form one component, the nodes 4 and 5 form a second component, and the nodes 6 and 7 form another component. Replace each component with a single node, The component with nodes 1, 2, and 3 is replaced with the node a. The other two components are replaced with the nodes b and c. respectively. Edges within the individual components are thrown away. The resultant graph is shown in Figure 4.9(b). In this graph keep only an edge of minimum weight between any two nodes. Delete any isolated nodes. Since an edge is chosen for every node, the number of nodes after one Borvka step reduces by a factor of at least two. A minimum-cost span- ning tree for the reduced graph can be extended easily to get a minimum- cost spanning tree for the original graph. If £! is the set of edges in the minimum-cost spanning tree of the reduced graph, we simply include into E' the edges chosen in the Bortvka step to obtain the minimum-cost span- ning tr ges for the original graph. In the example of Figure 4.9, a minimum-cost spanning tree for (c) will consist of the edges (a,b) and (b, ¢) Thus a minimum-cost spanning tree for the graph of (a) will have the edge: (1,3), (3, 2), (4,5), (6,7), (3,4), and (2,6). More details of the algorithms are given below. Definition 4.2 Let F bea forest that forms a subgraph of a given weighted graph G(V, £). Ifu and v are any two nodes in F, let F(u,v) denote the path (if any) connecting u and v in F and let Feost(u,v) denote the maximum weight of any edge in the path F(u,v). If there is no path between u and v in F, Feost(u,v) is taken to be oo. Any edge (z,y) of G is said to be F-heavy if cost|,y] > Feost(x,y) and F-light otherwise, a Note that all the edges of F are F-light. Also, any F-heavy edge cannot belong to a minimum-cost spanning tree of G. ‘The proof of this is left as an exercise. The randomized algorithm applies two Borivka steps to reduce the number of nodes in the input graph. Next, it samples the edges of G and processes them to eliminate a constant fraction of them. A minimum-cost spanning tree for the resultant reduced graph is recursively computed. From this tree, a spanning tree for G is obtained. A detailed description of the algorithm appears as Algorithm 4.11. Lemma 4.3 states that Step 4 can be completed in time O(|V| + |E}) The proof of this can be found in the references supplied at the end of this chapter. Step 1 takes O(/V|+|E|) time and step 2 takes O(|B|) time. Step 6 takes O(|E|) time as well. The time taken in all the recursive calls in steps 3 and 5 can be shown to be O(|V|+|E]). For a proof, see the references at the end of the chapter. A crucial fact that is used in the proof is that both the number of nodes and the mumber of edges are reduced by a constant factor, with high probability, in each level of recursion, 4.5. MINIMUM-COST SPANNING TREES 227 (b) (c) Figure 4.9 A Borivka step Lemma 4.3 Let G(V,£) be any weighted graph and let F be a subgraph of G that forms a forest. Then, all the F-heavy edges of G can be identified in time O(\V| + |Z). a Theorem 4.7 A minimum-weight spanning tree for any given weighted graph can be computed in time O(|V| + |E|) a EXERCISES 1. Compute a minimum cost spanning tree for the graph of Figure 4.10 using (a) Prim’s algorithm and (b) Kruskal’s algorithm. Prove that Prim’s method of this section generates minimum-cost spanning trees, 228 CHAPTER 4. THE GREEDY METHOD Step 1. Apply two Boriivka steps. At the end, the number of nodes will have decreased by a factor at least 4. Let the resultant graph be G(V, ) Step 2. Form a subgraph G'(V', B") of G, where each edge of G is chosen randomly to be in E’ with probability . The expected number of edges in B’ is #1 Step 3. Recursively find a minimum-cost spanning forest F’ for a. Step 4. Eliminate all the F-heavy edges from G. With high probability, at least a constant fraction of the edges of G will be eliminated. Let G” be the resultant graph. Step 5. Compute a minimum-cost spanning tree (call it T”) for G" recursively. The tree T" will also be a minimum-cost spanning tree for G. Step 6. Return the edges of T” together with the edges chosen in the Boriivka steps of step 1. These are the edges of a minimum- cost spanning tree for G. Algorithm 4.11 An optimal randomized algorithm 3. (a) Rewrite Prim’s algorithm under the assumption that the graphs are represented by adjacency lists. (b) Program and run the above version of Prim algorithm against Algorithm 4.9. Compare the two on a representative set of graphs (c) Analyze precisely the computing time and space requirements of your new version of Prim’s algorithm using adjacency lists. 4. Program and run Kruskal’s algorithm, described in Algorithm 4.10. 5. (a) Show that if is a spanning tree for the undi You will have to modify functions Heapify and Adjust of Chapter 2. Use the same test data you devised to test Prim’s algorithm in Exercise 3. ected graph G, then the addition of an edge q, q ¢ E(t) and q € E(G), to t creates a unique cycle. 4.6. OPTIMAL STORAGE ON TAPES 229 Figure 4.10 Graph for Exercise 1 (b) Show that if any of the edges on this unique cycle is deleted from E(t) U{q}, then the remaining edges form a spanning tree of G. 6. In Figure 4.9, find a minimum-cost spanning tree for the graph of part (c) and extend the tree to obtain a minimum cost spanning tree for the graph of part (a). Verify the correctness of your answer by applying cither Prim’s algorithm or Kruskal’s algorithm on the graph of part (a). 7. Let G(V, E) be any weighted connected graph. (a) If C is any eycle of G, then show that the heaviest edge of C cannot belong to a minimum-cost spanning tree of G. (b) Assume that F is a forest that is a subgraph of G. Show that any F-heavy edge of G cannot belong to a minimum-cost spanning tree of G. 8. By considering the complete graph with n vertices, show that the num- ber of spanning trees in an n vertex graph can be greater than 2"! ~2 4.6 OPTIMAL STORAGE ON TAPES ‘There are n programs that are to be stored on a computer tape of length 1. Associated with each program i is a length l;,1 < i Ij, then interchanging ig and iy results in a permutation J’ with dl!) = |Sn-k+ 0h, | + (0 ka ky +i, + (n—b+ hi, Subtracting d(J’) from d(Z), we obtain d(I) — d(I') (n-a+I1)(li, —i,) + (n— b+ 1)(li, — bi) (b—a)(li, — fi) > 0 Hence, no permutation that is uot in nondecreasing order of the 1,’s can have minimum d. It is easy to see that all permutations in nondecreasing order of the I;’s have the same d value. Hence, the ordering defined by i; = 4,1 1 tapes, Tp,..-.Tin1, then the programs are to be distributed over these tapes. For each tape a storage permutation is to be provided. If is the storage permutation for the subset of programs on tape j, then d(J,) s as defined earlier. The total retrieval time (TD) is Socjem 1 dj). The objective is to store the programs in such a way as to minimize T'D. The obvious generalization of the solution for the one-tape case is to consider the programs in nondecreasing order of l's. The program currently 232 CHAPTER 4. THE GREEDY METHOD 1 Algorithm Store(n,m) 2 // nis the number of programs and m the number of tapes. 3 0; // Next tape to store on 5 = 1tondo 6 7 write ("append program", i, 8 “to permutation for tape", j)s 9 j= (J +1) mod m; 10 } u } Algorithm 4.12 Assigning programs to tapes being considered is placed on the tape that re: in the minimum increas in TD. This tape will be the one with the least amount of tape used so far. If there is more than one tape with this property, then the one with the smallest index can be used. If the jobs are initially ordered so that |) < lp < +++ 1. The problem is to select a maximum subset @ of the programs for storage on the tape. (A maximum subset is one with the maximum number of programs in it). A greedy algorithm for this problem would build the subset Q by including programs in nondecreasing order of aj (a) Assume the P; are ordered such that a1 < a2 < +++ < aq. Write a function for the above strategy. Your function should output an array s[1 : n] such that s[i] = 1 if P; is in Q and sfi] = 0 otherw a (b) Show that this strategy always finds a maximum subset Q such that Dpcqai Sl. (c) Let Q be the subset obtained using the above greedy strategy. How small can the tape utilization ratio (p,q a1)/l get? (d) Suppose the objective now is to determine a subset of programs that. maximizes the tape utilization ratio. A greedy approach 234 CHAPTER 4. THE GREEDY METHOD would be to consider programs in nonincreasing order of aj. If there is enough space left on the tape for P;, then it is included in Q. Assume the programs ar red so that ay > az > +++ > dn. Write a function i time and space complexity? (c) Show that the strategy of part (d) doesn’t necessarily yield a subset that maximizes (S°p,cqai)/l. How small can this ratio get? Prove your bound. 4, Assume n programs of lengths l1,[9,... ln are to be stored on a tape. Program i is to be retrieved with frequency f;. If the programs are stored in the order #1,i2,...,%n, the expected retrieval time (ERT) is [Eu x 4] ISA (a) Show that storing the programs in nondecreasing order of I; does not necessarily minimize the ERT. (b) Show that storing the programs in nonincreasing order of f; does not necessarily minimize the ERT. Show that the ERT is minimized when the programs are stored in nonincreasing order of f;/li. 5. Consider the tape storage problem of this section. Assume that two tapes T1 and T2, are available and we wish to distribute n given programs of lengths ly, l2,...Jy onto these two tapes in such a manner that the maximum retrieval time is minimized. That is, if A and B are the sets of programs on the tapes T1 and 2 respectively, then we wish to choose A and B such that max { Sieqlis Dien hi } is minimized. A possible greedy approach to obtaining and B would be to start with A and B initially empty. Then consider the programs one at a time. The program currently being considered is assigned to set A if Sel = min { Dicali, Diew li }; otherwise it is assigned to B. Show that this does ‘not guarantee optimal solutions even if ly < [p< +++ < In Show that the same is true if we require f) > ly > +++ > ly. 4.7 OPTIMAL MERGE PATTERNS In Section 3.4 we saw that two sorted files containing n and m records respectively could be merged together to obtain one sorted file in time O(n+ m). When more than two sorted files are to be merged together, the merge can be accomplished by repeatedly merging sorted files in pairs. Thus, if 4.7. OPTIMAL MERGE PATTERNS 235 files 21.72.73, and 21 are to be merged, we could first merge 21 and x2 to get a file yy. Then we could merge y; and <3 to get yo. Finally, we could merge yz and 2 to get the desired sorted file. Alternatively, we could first merge 2; and ry getting y1, then merge 3 and x4 and get yp, and finally merge y; and yp and get the desired sorted file. Given n sorted files, there are many ways in which to pairwise merge them into a single sorted file. Different pairings require differing amounts of computing time. The problem we address ourselves to now is that of determining an optimal way (one requiring the fowest comparisons) to pairwise merge n sorted files. Since this problem calls for an ordering among the pairs to be merged, it fits the ordering paradigm Example 4.9 The files 1,7, and :ry are three sorted files of length 30,20, and 10 records each. Merging 1 and :r2 requires 50 record moves. Merging the result with «3 requires another 60 moves. The total number of record moves required to merge the three files this way is 110. If, instead, we first merge tz and 3 (taking 30 moves) and then 21 (taking 60 moves), the total record moves made is only 90. Hence, the second merge pattern is faster than the first. a A greedy attempt to obtain an optimal merge pattern is easy to formulate. Since merging an n-record file and an m-record file requires possibly n+ m record moves, the obvious choice for a selection criterion is: at each step merge the two smallest size files together. Thus, if we have five files (x),...,25) with sizes (20, 30, 10,5,30), our greedy rule would generate the following merge pattern: merge x; and -r3 to get 2; (|zi| = 15), merge 2, and xy to get zz (|zo| = 35), merge zy and x5 to get 23 (|z3| = 60), and merge z, and 25 to get the answer 24. The total number of record moves is 205 One can verify that this is an optimal merge pattern for the given problem instance. The merge pattern such as the one just described will be referred to as a two-way merge pattern (each merge step involves the merging of two files). The two-way merge patterns can be represented by binary merge trees. Figure 4.11 shows a binary merge tree representing the optimal merge pattern obtained for the above five files. The leaf nodes are drawn as squares and represent the given five files. These nodes are called external nodes. The remaining nodes are drawn as circles and are called internal nodes. Each internal node has exactly two children, and it represents the file obtained by merging the files represented by its two children. ‘The number in each node is the length (i., the number of records) of the file represented by that node. external node 21 is at a distance of 3 from the root node 24 (a node at level i is at a distance of i— 1 from the root). Hence, the records of file 24 are moved three times, once to get 21, once again to get 2», and finally one more time to get z4. If d, is the distance from the root to the external 236 CHAPTER 4. THE GREEDY METHOD Figure 4.11 Binary merge tree representing a merge pattern node for file x; and qi, the length of 2; is then the total number of record moves for this binary merge tree is Sava i This sum is called the weighted external path length of the tree. An optimal two-way merge pattern corresponds to a binary merge tree with minimum weighted external path length. The function Tree of Algo- rithm 4.13 uses the greedy rule stated earlier to obtain a two-way merge tree for n files. The algorithm has as input a list list of n trees. Bach node in a tree has three fields, Ichild, rchild, and weight. Initially, each tree in list has exactly one node. This node is an external node and has [child and rehild fields zero whereas weight is the length of one of the n files to be merged. During the course of the algorithm, for any tree in list with root node t, t - weight is the length of the merged file it represents (t + weight equals the sum of the lengths of the external nodes in tree t). Function Tree uses two functions, Least(list) and Insert(list,t). Least(list) finds a tree in list whose root has least weight and returns a pointer to this tree. This tree is removed from list. Insert(list,t) inserts the tree with root t into list. The- orem 4.10 shows that Tree (Algorithm 4.13) generates an optimal two-way merge tree. 4,7. OPTIMAL MERGE PATTERNS 237 treenode = record { treenode * child; treenode + rchilds integer weight; k 1 Algorithm Tree(n) 2 // list is a global list of n single node 3. // binary trees as described above. at 5 for i:= 1ton—1do of 7 pt := new treenodes // Get a new tree node. 8 (pt + Ichild) := Least(List); // Merge two trees with 9 (pt + rehild) := Least(list); // smallest lengths. 10 (pt + weight) := ((pt + Ichild) > weight) u +((pt + rehild) + weight); 12 Insert (list, pt)s 13 14 return Least(list); // Tree left in list is the merge tree 15 } Algorithm 4.13 Algorithm to generate a two-way merge tree Example 4.10 Let us see how algorithm Tree works when list initially rep- resents six files with lengths (2,3,5,7,9, 13). Figure 4.12 shows list at the end of each iteration of the for loop. The binary merge tree that results at the end of the algorithm can be used to determine which files are merged. Merging is performed on those files which are lowest (have the greatest depth) in the tree. a The main for loop in Algorithm 4.13 is executed n — 1 times. If list is kept in nondecreasing order according to the weight value in the roots, then Least(list) requires only O(1) time and Insert(list,t) can be done in O(n) time. Hence the total time taken is O(n2). In case list is represented as a minheap in which the root value is less than or equal to the values of its children (Section 2.4), then Least(list) and Insert(List, t) can be done in O(log n) time. In this case the computing time for Tree is O(nlogn). Some speedup may be obtained by combining the Insert of line 12 with the Least of line 9. 238, CHAPTER 4. THE GREEDY METHOD Theorem 4.10 If ist initially contains n > 1 single node trees with weight values (q1,92,.--,@n), then algorithm Tree generates an optimal two-way merge tree for n files with these lengths. Proof: The proof is by induction on n. For n = 1, a tree with no internal nodes is returned and this tree is clearly optimal. For the induction hypoth- esis, assume the algorithm generates an optimal two-way merge tree for all am), 1 L(T2). (b) Using the data of a, obtain T1 and T2 € D such that L(T1) = L(L2) but SL(T1) > SL(T2) (c) Show that if the subalgorithm Least used in algorithm Tree is such that in case of a tie it returns the tree with least depth, then Tree a tree with the properties of T* 4.8 SINGLE-SOURCE SHORTEST PATHS Graphs can be used to represent the highway structure of a state or country with vertices representing cities and edges representing sections of highway. The edges can then be assigned weights which may be either the distance between the two cities connected by the edge or the average time to drive along that, sect hway. A motorist wishing to drive from city A to B would be interested in answers to the following questions: 242 CHAPTER 4. THE GREEDY METHOD Path Length 11,4 10 2) 1,4,5 25 314,52 45 4) 1,3 45 (a) Graph (b) Shortest paths from 1 Figure 4.15 Graph and shortest paths from vertex 1 to all destinations ¢ Is there a path from A to B? If there is more than one path from A to B, which is the shortest path? The problems defined by these questions are special cases of the path problem we study in this section. The length of a path is now defined to be the sum of the weights of the edges on that path. The starting vertex of the path is referred to as the sourc ind the last vertex the destination. The graphs are digraphs to allow for one-way streets. In the problem we consider, we are given a directed graph G = (V,B), a weighting function cost for the edges of G, and a source vertex vp. The problem is to determine the shortest paths from vp to all the remaining vertices of G. It is assumed that all the weights are positive. The shortest path between vg and some other node v is an ordering among a subset of the edges. Hence this problem fits the ordering paradigm. Example 4.11 Consider the directed graph of Figure 4.15(a). The numbers on the edges are the weights. If node 1 is the source vertex, then the shortest path from 1 to 2 is 1,4,5,2. The length of th 1,2 which is of length 50. There is no path from 1 to 6. Figure 4.15(b) lists the shortest paths from node 1 to nodes 4,5, 2, and 3, respectively. The paths have been listed in nondecreasing order of path length. a To formulate a greedy-based algorithm to generate the shortest paths. we must conceive of a multistage solution to the problem and also of an optimization measure. One possibility is to build the shortest paths one by 4.8. SINGLE-SOURCE SHORTEST PATHS 243 one. As an optimization measure we can use the sum of the lengths of all paths so far generated. For this measure to be minimized, each individual path must be of minimum length. If we have already constructed i shortest paths, then using this optimization measure, the next path to be constructed should be the next shortest minimum length path. The greedy way (and also a systematic way) to generate the shortest paths from vp to the remaining vertices is to generate these paths in nondecreasing order of path length. First, a shortest path to the nearest vertex is generated, Then a shortest path to the second nearest vertex is gencrated, and so on, For the graph of Figure 4.15(a) the nearest vertex to v = 1 is 4 (cos#[1,4] = 10). The path 1,4 is the first path gener The second nearest vertex to node 1 is 5 and the distance betwoen | and 5 is 25. The path 1,4,5 is the next path generated. In order to generate the shortest paths in this order, we need to be able to determine (1) the next vertex to which a shortest path must be generated and (2) a shortest path to this vertex. Let S denote the set of vertices (including vo) to which the shortest paths have already been generated. For w not in S, let dist{w] be the length of the shortest path Starting from v9, going through only those vertices that are in S, and ending at, w. We observe that: 1. If the next. shortest path is to vertex u, then the path begins at 1, ends at u, and goes through only those vertices that are in §. To prove this, we must show that all the intermediate vertices on the shortest path to ware in S. Assume there is a vertex w on this path that is not in S. Then, the vp to u path also contains a path from vg to w that is of length less than the vg to u path. By assumption the shortest. paths are being generated in nondecreasing order of path length, and so the shorter path vp to w must already have been generated. Hence, there can be no intermediate vertex that is not in S. 2. The destination of the next path generated must be that of vertex u which has the minimum distance, dist{u]. among all vertices not in S’ This follows from the definition of dist and observation 1. In case there are several vertices not in S with the same dist, then any of these may be selected 3. Having selected a vertex u as in observation 2 and generated the short- est vo to u path, vertex u becomes a member of S. At this point the length of the shortest paths starting at vy, going though vertices only in S, and ending at a vertex w not in S may decrease: that is, the value of dist{w] may change. If it does change, then it must be due to a shorter path starting at vp and going to u and then to w. The intermediate vertices on the vp to u path and the u to w path must all be in S. Further, the vp to uw path must be the shortest such path: otherwise dist[w] is not defined properly, Also, the u to w path can be chosen so as not to contain any intermediate vertices. Therefore, 244 CHAPTER 4. THE GREEDY METHOD we can conclude that if dist{w] is to change (i.e., decrease), then it is because of a path from vp to u to w, where the path from vp to u is the shortest such path and the path from u to w is the edge (u, w) The length of this path is dist[u] + cost{u, w]. The above observations lead to a simple Algorithm 4.14 for the single- source shortest path problem. This algorithm (known as Dijkstra’s algo- rithm) only determines the lengths of the shortest paths from vp to all other vertices in G. The generation of the paths requires a minor extension to this algorithm and is left as an exercise. In the function ShortestPaths (Algorithm 4.14) is assumed that the n vertices of G are numbered 1 through n. The maintained as a bit array with S[é] = 0 if vertex i is not in S and if it is. It is assumed that the graph itself is represented by its cost adjacency matrix with cost{i, j]’s being the weight of the edge (i,j). The weight cost(i, j] is set to some large number, oo, in case the edge (i, j) is not in E(G). For i= j, cost{i, j) can be set to any nonnegative number without affecting the outcome of the algorithm. From our earlier discussion, it is easy to see that the algorithm is correct. The time taken by the algorithm on a graph with n vertices is O(n?). To see this, note that. the for loop of line 7 in Algorithm 4.14 takes @(n) time. The for loop of line 12 is executed n — 2 times. Each execution of this loop requires O(n) time at lines 15 and 16 to select the next vertex and again at the for loop of line 18 to update dist. So the total time for this loop is O(n?). In case a list t of vertices currently not in s is maintained, then the number of nodes on this list would at any time be n — num. This would speed up lines 15 and 16 and the for loop of line 18, but the asymptotic time would remain O(n). This and other variations of the algorithm are explored in the exercises. Any shortest path algorithm must examine each edge in the graph at least once since any of the edges could be in a shortest path. Hence, the minimum possible time for such an algorithm would be Q(/B]). Since c adjacency matrices were used to represent the graph, it takes O(n?) time just to determine which edges are in G, and so any shortest path algorithm using this representation must take Q(n?) time. For this representation then, algorithm ShortestPaths is optimal to within a constant factor. If a change to adjacency lists is made, the overall frequency of the for loop of line 18 can be brought down to O(|E]) (since dist can change only for vertices adjacent from u). If V — S$ is maintained as a red-black tree (see Section 2.4.2), each execution of lines 15 and 16 takes O(logn) time. Note that a red-black tree supports the following operations in O(logn) time: insert, delete (an arbitrary clement), find-min, and search (for an arbitrary element). Each update in line 21 takes O(logn) time as well (since an update can be done using a delete and an insertion into the red-black tree). Thus the overall run time is O((n + |E|) log n). 4.8. SINGLE-SOURCE SHORTEST PATHS 245 1 Algorithm ShortestPaths(v, cost, dist,n) 2 // dist{j), 1 dist{u] + cost[u,w])) then 21 dist[w] := dist{u] + cost{u, w]s 22 23 } Algorithm 4.14 Greedy algorithm to generate shortest paths Example 4.12 Consider the eight vertex digraph of Figure 4.16(2) with cost adjacency matrix as in Figure 4.16(b). The values of dist and the vertices selected at each iteration of the for loop of line 12 in Algorithm 4.14 for finding all the shortest paths from Boston are shown in Figure 4.17. To begin with, S contains only Boston. In the first iteration of the for loop (that is, for num = 2), the city u that is not in S and whose dist[u] is minimum is identified to be New York. New York e the set $. Also the dist{ | values of Chicago, Miami, and New Orleans get altered since there are shorter paths to these cities via New York. In the next iteration of the for loop, t! ‘ity that enters S is Miami since it has the smallest dis¢[ | value from among all the nodes not in S. None of the dist{ | values are altered. The algorithm continues in a similar fashion and terminates when only seven of the eight vertices are in S. By the definition of dist, the distance of the last vertex, in this case Los Angeles, is correct as the shortest path from Boston to Los Angeles can go through only the remaining six vertices. 0 246 CHAPTER 4. THE GREEDY METHOD Boston _G) /250 4 San Francisco New York O— 300) 100 ae 100 Los Angeles New Orleans (a) Digraph 1 A 3 4 5 6 7 8 1 0 7 2 | 300 0 3. 100 800 0 4 1200 0 5 1500 0. .250) 6 1000 0 900 1400 | 7 0 1000 8 | 1700 o| (b) Length-adjacency matrix Figure 4.16 Figures for Example 4.12 One can easily verify that the edges on the shortest paths from a ver- tex v to all remaining vertices in a connected undirected graph @ form a spanning tree of G. This spanning tree is called a shortest-path spanning tree. Clearly, this spanning tree may be different for different root vertices v. Figure 4.18 shows a graph G, its minimum-cost spanning tree, and a shortest-path spanning tree from vertex 1 4.8. SINGLE-SOURCE SHORTEST PATHS 247 Iteration | S Vertex, LA SF DEN CHI BOST MIA NO| selected. | a QQ) 3) 4] (s] (6) a {8} Initial | - 0 +00 yoo 1500 oO 250 +0 +00 | 15.6. iL } Figure 4.17 Action of ShortestPaths EXERCISES 1. Use algorithm ShortestPaths to obtain in nondecreasing order the lengths of the shortest paths from vertex 1 to all remaining vertices in the di- graph of Figure 4.19. 2. Using the directed graph of Figure 4.20 explain why ShortestPaths will not work properly. What is the shortest path between vertices 1, and v7 7 3. Rewrite algorithm ShortestPaths under the following assumptions: (a) @ is represented by its adjacency lists, The head nodes are HEAD(1),..., HEAD(m) and each list node has three fields: VER- TEX, COST, and LINK. COST is the length of the corresponding edge and n the number of vertices in G. (b) Instead of representing S, the set of vertices to which the shortest paths have already been found, the set T = V(G) — $ is repre- sented using a linked list. What can you say about the computing time of your new algorithm relative to that of ShortestPaths? 4. Modify algorithm ShortestPaths so that it obtains the shortest. p: in addition to the lengths of these paths. What is the computing time of your algorithn 248 CHAPTER 4. THE GREEDY METHOD (c) Shortest path spanning tree from vertex 1. Figure 4.18 Graphs and spanning trees Figure 4.19 Directed graph 4.9. REFERENCES AND READINGS 249 Figure 4.20 Another directed graph 4.9 REFERENCES AND READINGS The linear time algorithm in Section 4.3 for the tree vertex splitting problem can be found in “Vertex upgrading problems for VLSI,” by D. Paik, Ph.D thesis, Department of Computer Science, University of Minnesota, October 1991. ‘The two greedy methods for obtaining minimum-cost spanning trees are due to R. C. Prim and J. B. Kruskal, respectively, An O(c log log») time spanning tree algorithm has been given by A. C. Yao. The optimal randomized algorithm for minimum-cost spanning trees pre- sented in this chapter appears in “A randomized linear-time algorithm for finding minimum spanning tr y P.N. Klein and R. E. Tarjan, in Pro- ceedings of the 26th Annual Symposium on Theory of Computing, 1994, pp. 9-15. See also “A randomized linear-time algorithm to find minimum span- ning trees,” by D. R. Karger, P. N. Klein, and R. E. Tarjan, Journal of the ACM 42, no. 2 (1995): 321-328. Proof of Lemma 4.3 can be found in “Verification and sensitivity analysis of minimum spanning trees in linear time.” by B. Dixon, M. Rauch, and R. E Tarjan, SIAM Journal on Computing 21 (1992): 1184-1192, and in “A simple minimum spanning tree verification algorithm,” by V. King, Proceedings of the Workshop on Algorithms and Data Structures, 1 5. ‘A very nearly linear time algorithm for minimum-cost spanning trees ap- pears in “Efficient algorithms for finding minimum spanning trees in undi- rected and directed graphs,” by H. N. Gabow, Z. Galil, T. Spencer, and R. E. Tarjan, Combinatorica 6 (1986): 109-122. 250 CHAPTER 4. THE GREEDY METHOD A linear time algorithm for minimum-cost spanning trees on a stronger model where the edge weights can be manipulated in their binary form is given in “Trans-dichotomous algorithms for minimum spanning trees and shortest paths,” by M. Fredman and D. E. Willard, in Proceedings of the ist Annual Symposium on Foundations of Computer Science, 1990. pp. 719-725. The greedy method developed here to optimally store programs on tapes was first devised for a machine scheduling problem. In this problem n jobs have to be scheduled on m processors. Job i takes ¢; amount of time. ‘The time at which a job finishes is the sum of the job times for all jobs preced- ing and including job i. The average finish time corresponds to the mean access time for programs on tapes. The (m!)"/™ schedules referred to in Theorem 4.9 are known as SPT (shortest processing time) schedules. The rule to generate SPT schedules as well as the rule of Exercise 4 (Section 4.6) are due to W. E. Smith. The greedy algorithm for generating optimal merge trees is due to D. Huffman. For a given set {q1,.-.,@n} there are many sets of Huffman codes mini- mizing )> qidj. From amongst. these code sets there is one that has minimum Xd; and minimum max {d;}. An algorithm to obtain this code set was given by E. S. Schwartz. The shortest-path algorithm of the text is due to E. W. Dijkstra. For planar graphs, the shortest-path problem can be solved in linear time as has been shown in “Faster shortest-path algorithms for planar graphs,” by P. Klein, S. Rao, and M. Rauch, in Proceedings of the ACM Symposium on Theory of Computing, 1994 The relationship between greedy methods and matroids is discussed in Combinatorial Optimization, by E. Lawler, Holt, Rinehart and Winston, 1976 4.10 ADDITIONAL EXERCISES 1. (Coin changing] Let An = {a1,a2,...,@n} be a finite set of distinct oin types (for example, ay = 50, a2 = 25f, a3 = 10¢, and so on.) We an assume each a; is an integer and a, > a2 > --- > a_. Each type is available in unlimited quantity. The coin-changing problem is to make up an exact amount C using a minimum total number of coins. C is an integer > 0. 4.10. ADDITIONAL EXERCISES 251 (a) Show that if, # 1, then there exists a finite set of coin types and a C for which there is no solution to the coin-changing problem. (b) Show that there is always a solution when ay, = 1 (c) When a, = 1, a greedy solution to the problem makes change by using the coin types in the order ay.a2,...,4,. When coin type a; is being considered, as many coins of this type as possible are given. Write an algorithm based on this strategy. Show that this algorithm doesn’t necessarily generate solutions that use the minimum total number of coins. (d) Show that if Ay = {h"-1,k"-2,....k°} for some k > 1, then the greedy inethod of part (c) always yields solutions with a minimun number of coins 2. [Set cover] You are given a family $ of m sets $,1 w;r; <_m and 0<2;<1) a problem (Section ions, We have to 1, then on Example 5.2 [Optimal merge patterns] This problem was discussed in Sec- tion 4.7, An optimal merge pattern tells us which pair of files should be merged at each step. As a decision sequence, the problem calls for us to de- cide which pair of files should be merged first, which pair second, which pair third, and so on. An optimal sequence of decisions is a least-cost sequence. o Example 5.3 [Shortest path] One way to find a shortest path from vertex ito vertex j in a directed graph G is to decide which vertex should be the second vertex, which the third, which the fourth, and so on, until vertex j is reached. An optimal sequence of decisions is one that results in a path of least length. ao 254 CHAPTER 5. DYNAMIC PROGRAMMING For some of the problems that may be viewed in this way, an optimal sequence of decisions can be found by making the decisions one at a time and never making an erroneous decision. This is true for all problems solvable by the greedy method. For many other problems, it is not possible to make stepwise decisions (based only on local information) in such a manner that the sequence of decisions made is optimal. Example 5.4 [Shortest path] Suppose we wish to find a shortest path from vertex i to vertex j. Let A; be the vertices adjacent from vertex 7. Which of the vertices in A; should be the second vertex on the path? There is no way to make a decision at this time and guarantee that future decisions leading to an optimal sequence can be made. If on the other hand we wish to find a shortest path from vertex i to all other vertices in G, then at each step, a correct decision can be made (see Section 4.8). a One way to solve problems for which it is not possible to make a sequence of stepwise decisions leading to an optimal decision sequence is to try all pos- sible decision sequences. We could enumerate all decision sequences and then pick out the best. But the time and space requirements may be prohibitive Dynamic programming often drastically reduces the amount of enumeration by avoiding the enumeration of some decision be optimal. In dynamic programming an optimal sequence of de obtained by making explicit appeal to the principle of optimality. Definition 5.1 [Principle of optimality] The principle of optimality states that, an optimal sequence of decisions has the property that whatever the initial state and decision are, the remaining decisions must constitute an optimal decision sequence with regard to the state resulting from the first decision. a Thus, the essential difference between the greedy method and dynamic programming is that in the greedy method only one decision sequence is ever generated. In dynamic programming, many decision sequences may be generated. However, sequences containing suboptimal subsequences cannot be optimal (if the principle of optimality holds) and so will not (as far as possible) be generated. Example 5.5 [Shortest path] Consider the shortest-path problem of Exam- ple 5.3. Assume that i, i1, i2,..., ig, 7 is a shortest path from i to j. Starting with the initial vertex i, a decision has been made to go to vertex i. Fol- lowing this decision, the problem state is defined by vertex i) and we need to find a path from #; to j. It is clear that the sequence iy, i2,....ix,j must constitute a shortest i; to j path. If not, let i,71,72,--.,1%q)Jj be a shortest in to j path. Then é,é1,r1,++,7q.j is an i to j path that is shorter than the path 7, 1, 2,...,i.j. Therefore the principle of optimality applies for this problem. oO 5.1. THE GENERAL METHOD 255 Example 5.6 [0/1 knapsack] The 0/1 knapsack problem is similar to the knapsack problem of Section 4.2 except that the ;’s are restricted to have a value of either 0 or 1. Using KNAP(I, j,y) to represent the problem maximize D)cic; piti subject to Dycjcj witi SY (5.1) aj =0orl, T 0 and gn(y) = ~c0 for y <0. From gn(y), one can obtain gn—1(y) using (5.3) with i=n-—1. Then, using gn_1(y), one can obtain g,-2(y). Repeating in this way, one can determine g;(y) and finally go(m) using (5.3) with i = 0. Example 5.11 fora knapsack] ce der the case in which n = 3, wr = 2,we = 3,03 =4, pr = 1pe = . and m = 6. We have to compute g0(6). The value of ‘0(6) = max ‘t(6). gi(4) + 1}: In turn, gi (6) = max {g2(6), 92(3)+2}. But go(6) = max {93(6), 93(2)4 5} = max {0,5} = 5. Also, go(3) = max {g3(3), g3(3 — 4) + 5} = max {0,—oo} = 0. Thus, 1(6) = max {5,2} Similarly, 91(4) = max {g2(4), go(4~ 3) +2}. But go(4) = max {99(4), go(4—4) +5} = max {0,5} = 5. The value of ga(1) = max {g9(1), 93(1 4) +5} = max {0,—co} = 0. Thus, gi(4) = max {5,0} = Therefore, go(6) = max {5,5 +1} =6. o Example 5.12 [Shortest path] Let P; be the set of vertices adjacent to ver- tex j (that is, k € P; iff (k, j) € E(G)). For each k € Pj, let Ty, be a shortest i to k path. The principle of optimality holds and a shortest i to j path is the shortest of the paths {I,. j|k € Pj} To obtain this formulation, we started at vertex j and looked at the last decision made. The last decision was to use one of the edges (k,j), k € Pj. In a sense, we are looking backward on the i to j path. a Example 5.13 [0/1 knapsack] Looking backward on the sequence of deci- sions 21,2... ,n, we see that Fy) = max {fj-1(y), f-1(y ~ wy) + ry} (5.4) where f;(y) is the value of an optimal solution to KNAP(I, j.y)- 5.2. MULTISTAGE GRAPHS 257 The value of an optimal solution to KNAP(1,7, 7) is fy(m). Equation 5.4 can be solved by beginning with fo(y) = 0 for all y, y >'0. and fo(y) for all y, y <0. From this, fi, fo,-... Jn can be successively obtained. The solution method outlined in Examples 5.12 and 5.13 may indicate ible decision sequences to obtain an optimal not the case. Be- cause of the use of the principle of optimality, decision sequences containing subsequences that are suboptimal are not considered. Although the total number of different decisi s is exponential in the number of dec sions (if there are d ns to be made then t! are d” possible decision sequences), dynamic programming algorithms often have a polynomial complexity. Another important feature of the dynamic programming approach is that optimal solutions to subproblems are retained so as to avoid recomputing their values. The use of these tabulated values makes it natural to recast the recursive equations into an iterative algorithm. Most of the dynamic programming algorithms in this chapter are expressed in this way. ‘The remaining sections of this chapter apply dynamic programming to a. variety of problems. These examples should help you understand the method better and also realize the advantage of dynamic programming over explicitly enumerating all decision sequences. EXERCISES 1. The principle of optimality does not hold for every problem whose solution can be viewed as the result of a sequence of decisions. Find two problems for which the principle does not hold. Explain why the principle does not hold for these problems. 2. For the graph of Figure 5.1, find the shortest path between the nodes 1 and 2. Use the recurrence relations derived in Examples 5.10 and 5.13. 5.2. MULTISTAGE GRAPHS A multistage graph G = (V, E) is a directed graph in which the vertic partitioned into k > 2 disjoint sets V;, 1 0, for every edge (i,j), we only require that G have no cycles with negative length. Note that if we allow G to contain a cycle of negative length, then the shortest path between any two vertices on this cycle has length —oo. Let us examine a shortest i to j path in G, i # j. This path originates at vertex i and goes through some intermediate vertices (possibly none) and terminates at vertex j. We can assume that this path contains no cycles for if there is a cycle, then this can be deleted without increasing the path length (no cycle has negative length). If k is an intermediate vertex on this shortest path, then the subpaths from i to k and from k to j must be shortest paths from i to & and k to j, respectively. Otherwise, the i to j path is not of minimum length. So, the principle of optimality holds. This alerts us to the prospect of using dynamic programming. If k is the intermediate vertex with highest index, then the i to k path is a shortest i to k path in G going through no vertex with index greater than & — 1. Similarly the k to j path is a shortest k to j path in G going through no vertex of index greater than 266 CHAPTER 5. DYNAMIC PROGRAMMING k—1. We can regard the construction of a shortest i to j path as first requiring a decision as to which is the highest indexed intermediate vertex k. Once this decision has been made, we need to find two shortest paths, one from i to k and the other from k to j. Neither of these may go through a vertex with index greater than k — 1. Using A*(i,j) to represent the length of a shortest path from i to j going through no vertex of index greater than k, we obtain A(i,j) = min { min {A*'(i,&) + A*1(k, j)}, cost(i, j)} (5.7) 1sk (n—1)M, then there is no directed path from i to j in G. Even for this choice of 00, care should be taken to avoid any floating point overflows. ‘The time needed by AllPaths (Algorithm 5.3) i lly easy to deter- mine because the looping is independent of the data in the matrix A. Line 11 is iterated n? times, and so the time for AllPaths is @(n*). An exercise examines the extensions needed to obtain the i to j paths with these lengths. Some speedup can be obtained by noticing that the innermost for loop need be executed only when A(i,k) and A(k,j) are not equal to oo. 5,3. ALL-PAIRS SHORTEST PATHS 269 EXERCISES 1. (a) Does the recurrence (5.8) hold for the graph of Figure 5.7? Why? Figure 5.7 Graph for Exercise 1 (b) Why does Equation 5.8 not hold for graphs with cycles of negative length? 2. Modify the function AllPaths so that a shortest. path is output for each , j). What are the time and space complexities of the 3. Let A be the adjacency matrix of a directed graph G. Define the transitive closure A+ of A to be a matrix with the property A+ (i,j) = iff G has a directed path, containing at least one edge, from vertex i to vertex j. A*(i,j) = 0 otherwise. The reflexive transitive closure A* is a matrix with the property A*(i,j) = 1 iff G has a path, containing zero or more edges, from i to j. A*(i,j) = 0 otherwise. (a) Obtain A* and A’ for the directed graph of Figure 5.8. Figure 5.8 Graph for Exercise 3 (b) Let A*(i, 3) = 1 iff there is a path with zero or more edges from to j going through no vertex of index greater than k. Define A° in terms of the adjacency matrix A. 270 CHAPTER 5. DYNAMIC PROGRAMMING (c) Obtain a recurrence between A* and A’ similar to (5.8). Use the logical operators or and and rather than min and +. (4) Write an algorithm, using the recurrence of part (c), to find A* Your algorithm can use only O(n2) space. What is its time com- plexity? (e) Show that A* = A x A‘, where matrix multiplication is defined as At(i,j) = VBL,(A(i,k) A A%(k,j)). The operation V is the logical or operation, and A the logical and operation. Hence A* may be computed from A*. 5.4 SINGLE-SOURCE SHORTEST PATHS: GENERAL WEIGHTS We now consider the single-source shortest path problem discussed in Section 4.8 when some or all of the edges of the directed graph G may have negative length. ShortestPaths (Algorithm 4.14) does not necessarily give the correct results on such graphs. To see this, consider the graph of Figure 5.9. Let v = 1 be the source vertex. Referring back to Algorithm 4.14, since n = 3, the loop of lines 12 to 22 is iterated just once. Also u = 3 in lines 15 and 16, and so no changes are made to dist[ |. The algorithm terminates with dist(2] = 7 and dist(3] = 5. The shortest path from 1 to 3 is 1,2,3. This path has length 2, which is less than the computed value of dis¢(3] Figure 5.9 Directed graph with a negative-length edge When negative edge lengths are permitted, we require that the graph have no cycles of negative length. This is necessary to ensure that shortest paths consist of a finite number of edges. For example, in the graph of Figure 5.5, the length of the shortest path from vertex 1 to vertex 3 is oo. The length of the path 1, 2,1,2,1,2,-++,1,2,3 can be made arbitrarily small as was shown in Example 5.14. When there are no cycles of negative length, there is a shortest path between any two vertices of an n-vertex graph that has at most n— 1 edges 5.4. SINGLE-SOURCE SHORTEST PATHS: GENERAL WEIGHTS 271 on it, To see this, note that a path that has more than n — 1 edges must. repeat at least one vertex and hence must contain a cycle. Elimination of the cycles from the path results in another path with the same source and destination. This path is cycle-free and has a length that is no more than that of the original path, as the length of the eliminated cycles was at least zero, We can use this observation on the maximum number of edges on a cycle-free shortest path to obtain an algorithm to determine a shortest path from a source vertex to all remaining vertices in the graph. As in the case of ShortestPaths (Algorithm 4.14), we compute only the length, dist[u], of the shortest path from the source vertex v to u. An exercise examines the extension needed to construct the shortest paths. Let dist®{u] be the length of a shortest path from the source vertex v to vertex u under the constraint that the shortest path contains at most £ edges. Then, dist![u] = cost[v,u], 1 1, edges has no more than k — 1 edges, then dist*[u] = dist*![ul. 2. If the shortest path from v to u with at most k, k > 1, edges has exactly k edges, then it is made up of a shortest. path from v to some vertex j followed by the edge (j,u). The path from v to j has k= 1 edges, and its length is dist*—'(j]. All vertices i such that the edge (i,u) is in the graph are candidates for j. Since we are interested in a shortest path, the 7 that minimizes dist*~"[i] + cost{i, u] is the correct value for j. These observations result in the following recurrence for dist: dist*[u] = min {dist (ul, min {dist*—"[i] + cost[i,u]}} This recurrence can be used to compute dist® from dist*—", for k = 2,3,. n-1 Example 5.16 Figure 5.10 gives a seven-vertex graph, together with the arrays dist*, k = 1,...,6. These arrays were computed using the equation just given. For instance, dist*(1] = 0 for all k since 1 is the source node. Also, dist"[2] = 6, dist![3] = 5, and dist"[4] = 5, since there are edges from 272 CHAPTER 5, DYNAMIC PROGRAMMING 1 to these nodes. The distance dist'|] is oo for the nodes 5,6, and 7 since there are no edges to these from 1. dist?[2}_ = min {dist'[2], min; dist! i) + cost{i, 2]} = min {6,0 + 6,5 — 2,5 +00, 00 + 00,00 + 00,00 + co} = Here the terms 0 + 6,5 — 2,5 + 00, 00 + 00,00 +00, and 00 + 00 correspond to a choice of i = 1,3,4,5,6, and 7, respectively. The rest of the entries are computed in an analogous manner. a | dist*(1..7) 234567 6 5 5 ame 335540 135247 135045 135043 135043 (a) A directed graph (b) dist* Figure 5.10 Shortest paths with negative edge lengths hows that if we use the same memory location dis¢[u] for dist*[u], k = 1,...,n—1, then the final value of dist[u] is still dist"—"[u). Using this fact and the recurrence for dist shown above, we arrive at the pseudocode of Algorithm 5.4 to compute the length of the shortest path from vertex v to each other vertex of the graph. ‘This algorithm is referred to as the Bellman and Ford algorithm. An exercise 7 to 12 takes O(n?) time if adja- are used and O(e) time if adjacency lis Here e is the number of edges in the graph. The overall complexity is O(n?) when adjacency matrices are used and O(ne) when adjacency lists are used. The observed complexity of the shortest-path algorithm can be reduced by not- ing that if none of the dist values change on one iteration of the for loop of lines 7 to 12, then none will change on successive iterations. So, this loop can be rewritten to termina ither after n — 1 iterations or after the 5.4. SINGLE-SOURCE SHORTEST PATHS: GENERAL WEIGHTS 273 1 Algorithm BellmanFord(v, cost, dist,n) 2 // Single-source/all-destinations shortest 3/7 paths with negative edge costs 4 5 for i= 1 to n do // Initialize dist. 6 dist{i] := cost[u, i}; 7 for k:=2 to n—1do 8 for each u such that uv and u has 9 at least one incoming edge do 10 for each (i,u) in the graph do u if dist{u] > dist{i) + cost{i,u] then 12 dist\u] == dist{i] + cost[i,ul3 isy Algorithm 5.4 Bellman and Ford algorithm to compute shortest paths first iteration in which no dist values are changed, whichever occurs first. Another possibility is to maintain a queue of vertices i whose dist values changed on the previous iteration of the for loop. These are the only values for i that need to be considered in line 10 during the next iteration. When a queue of these values is maintained, we can rewrite the loop of lines 7 to 12 so that on each iteration, a vertex i is removed from the queue, and the dist values of all vertices adjacent from 7 are updated as in lines 11 and 12. Vertices whose dist values decrease as a result of this are added to the end of the queue unless they are already on it. The loop terminates when the queue becomes empty. These two strategies to improve the performance of BellmanFord are considered in the exercises. Other strategies for improving performance are discussed in References and Readings. a EXERCISES e shortest paths from node 1 to every other node in the graph 5.11 using the Bellman and Ford algorithm: 2. Prove the correctness of BellmanFord (Algorithm 5.4). Note that. this algorithm does not faithfully implement the computation of the recur- rence for dist, In fact, for k < n—1, the dist values following iteration k of the for loop of lines 7 to 12 may not be dist®. 3. Transform BellmanFord into a program. Assume that graphs are repre- sented using adjacency lists in which each node has an additional field

You might also like