کامپیوتر و IT و اینترنتعلوم مهندسی

مبانی ساختمان داده ها و الگوریتم ها 2

صفحه 1:
1 0:11:02 5ع عاناناناعبانا/1 5 2:10 120 5© 1" Focus of the course: ° Datastructuresand relatedalgorithms 5 Correctnessand(timeand space) complexity | Prerequisites 5 CS 2o0:stacks, queues,lists, binary searchtrees, ... 5 CS 4o:functions, recurrence equations,induction, ... 9 CS 60:C,C++,andUNIX

صفحه 2:
Course Organization BGrading: 5 See course web page = Policy: 5 Nolatehomeworks. 5 Cheatingandplagiaris: F gradeand disciplinary actions 9 ‏1[آأذ[ذ‎ Homepage: www.cs.ucsbh.edu/~cs730a = Email:cs730a@cs.ucsh.edu "Teaching assistants: See course web page

صفحه 3:
Introduction = Afamous quote: Program =Algorithm+ Data Structure. = ALC of youhave programmed! thus have already been exposedto algorithmsanddata structure. = Perhaps youdidn't see themas separateentities: = Perhaps yousaw data structures as simple programming constructs (provided by STL--standardtemplatelibrary). = However, data structures are quite distinct fromalgorithms, andveryimportantintheir own right.

صفحه 4:
Objectives = Themain focus of this courseis tointroduce youtoa systematic study of a(gorithmsanddata structure. = Thetwoguiding principles of the courseare:abstractionand formal analysis. @ Abstraction: Wefocus on topics that are broadly applicableto avariety of problems. = Analysis; Wewant aformal way tocompare two objects (data structures or algorithms). = Inparticular,wewill worry about ‘always correct'-ness,and worst-case bounds ontimeandmemory (space).

صفحه 5:
Textbook | Textbook for thecourseis: = DataStructuresandAlgorithmAnalysisin C++ = byMarkAllen Weiss =ButIwillusematerial fromother booksand research papers,sotheultimate source shouldbe my lectures.

صفحه 6:
Course Outline ™ C++Review (Ch. 7) © A(gorithmAnalysis (Ch. 2) Setswithinsert /delete/member: Hashing (Ch. 5) Setsingeneral: Balancedsearch trees (Ch.4and72.2) ® Setswithpriority: Heaps, priority queues (Ch. 6) ® Graphs:Shortest-pathalgorithms (Ch. 9.7.9.3.2) ® Setswithdisjoint union: Union/findtrees (Ch. 8.7.8.5) Graphs: Minimum spanning trees (Ch. 9.5) Sorting (Ch. 7)

صفحه 7:
120: AlgorithmAnalysis = Foundations of ACgorithmAnalysisandData Structures. = Analysis: © dow topredictanalgorithm’'s performance 2 How wellanalgorithmscalesup 5 4fow tocompare different algorithms for a problem = DataStructures 5 Sow to efficiently store, access,manage data Oo Data structures effect algorithm's performance

صفحه 8:
Example Algorithms = Twoalgorithms for computing the Factorial = Whichoneis better? ® int factorial (int n) ( if(n<=7) return?! efse return n *factorial(n-7)! > ™ intfactorial dnt n) ( if(n<es) returns! efse( fact=1{ for (R=2ikeon{f++) fact *- ‏ع‎ return fact! 0 )

صفحه 9:
Examples of famous algorithms ™ Constructions of Euclid = Newton's root finding = Fast Fourier Transform = Compression (Huffman, Lempel-Ziv, GIF, MPEG) = DES,RSAencryption ® Simplex algorithmfor linear programming = Shortest Path Algorithms (Dijkstra, Bel(man-Ford) = Error correcting codes (CDs, DVDs) = TCPcongestioncontrol,IP routing = Pattern matching (Genomics) ™ Search Engines

صفحه 10:
Role of ACgorithmsin Modern World = Enormousamount of data E-commerce (Amazon, Ebay) Network traffic (telecom billing, monitoring) Database transactions (Sales,inventory) Scientificmeasurements (astrophysics, geology) Sensor networks. RFID tags 8 قا ۵ 8 0 ظ Bioinformatics (genome, protein bank) = Amazonhiredfirst Chief Algorithms Officer (‘Udi Manber)

صفحه 11:
Areal-worldProblem = CommunicationintheInternet = Message (email, ftp) broken downintoIP packets. ™ Sender/receiver identified byIP address. ™ ThepacketsareroutedthroughtheInternet by special computers calledRouters. = Eachpacket is stampedwithits destinationaddress, but not the route. ™ Because the Internet topologyandnetworkloadis constantly changing, routersmust discover routes dynamically. = What should the Routing Table look like?

صفحه 12:
IP PrefixesandRouting ™ Eachrouter is really aswitch:it receives packetsat several input ports,andappropriately sends themout to output ports. = Thus, for eachpacket, the router needs to transfer the packet to that output port that getsit closer toits destination. ™ Shouldeachrouter keepa table:IP address x Output Port? ™ How bigis this table? = Whenalinkor router fails, how muchinformation wouldneed tobemodified? = A router typically forwards several million packets/sec!

صفحه 13:
Data Structures ™ TheIP packet forwardingisa Data Structureproblem! = Efficiency, scalability isveryimportant. ™ Similarly, how does Googlefind the documents matching your query sofast? ۳ Uses sophisticatedalgorithms tocreateindexstructures,which arejust data structures. = Algorithmsanddata structuresareubiquitous. = Withthe data glut created by the new technologies, theneedto organize,search,andupdate MASSIVE amounts of information FASTis more severe than ever before.

صفحه 14:
Algorithms toProcess theseData Whichare the top KselCers? Correlation between time spent at awebsiteandpurchase amount? Whichflowsat a router account for >7%traffic? DidsourceS senda packet inlast s seconds? Sendanalarmifanyinternational arrival matchesa profileinthe database Similarity matches against genome databases Etc.

صفحه 15:
Max Subsequence Problem ‎Givena sequence of integers. A7,A2,.... An findthemaximum possible value‏ "ا ‎ofa subsequence Ai,...,. Aj.‏ ‎= Numberscan be negative. ‎© Youwanta contiguous chunkwithlargest sum. ‎= Example: -2,77,-4,13,-5,-2 = Theansweris 20 (subseq. A2throughA4). ‎= Wewill discuss 4 different algorithms,withtime complexities O(n*),0(n*), O(n log n),andO(n). ‎=| Withn = 70%, algorithm7 may take>zoyears: algorithm4will takea fraction ofa second! ‎

صفحه 16:
Algorithm 7for Max Subsequence Sum "Given A,,.,A,,findthemaximumvalue of A;+A,,,+' +A, oifthemaxvalueis negative int maxSum = 0; | ou for( int i = 0; i < a.size( ); i+ +) for( int j = i; j < a.size( ‏نالل‎ nt nim OD )- 3( ۱02 2 )[- 3( j- Ja 120 ‏دز‎ ‏زه‎ jou Jou a k<=j; ‏وف‎ ‎thisSum += a[ k ]; if( thisSum > maxSum ) maxSum = thisSum; =Time ‏مکحم مدمه(‎ OU) int thisSu for( int k +) 6

صفحه 17:
Algorithm 2 BIdea:Givensumfromitoj-7,wecan compute the sum fromitojinconstant time. = This eliminates one nestedloop,andreduces the running timetoO(n’). into maxSum = 0; for( int i = 0; i < a.size( ); i+ +) int thisSum = 0; for( int j = i; j < a.size( ); i++) thisSum += a[ j J; if( thisSum > maxSum ) maxSum = thisSum; ae

صفحه 18:
Algorithm 3 | This algorithmuses divide-and-conquer paradigm. = Suppose we split theinput sequenceat midpoint. = Themax subsequence isentirely in the left half, entirelyinthe right half, orit straddles themidpoint. "Example: Ceft half | right half ‏اقا | اهب‎ -2 @ Maxinleftis 6 (A7throughA3):maxinright is 8 (Aé through A7). But straddling maxis 77(A7thruA7).

صفحه 19:
Algorithm 3(cont.) = Example: Ceft half | right half 4-35 -2| -126-2 = Max subsequencesin eachhalf found by recursion, = How dowefindthe straddling max subsequence? = Key Observation: O Left half of the straddling sequenceis themax subsequence endingwith-2. 5 Right halfisthemax subsequence beginning with-7. = aCinear scanletsus compute thesein O(n) time.

صفحه 20:
Algorithm 3: Analysis = The divideandconquer is best analyzedthrough recurrence: 1010-1 Tn) =2T(n/2) + O(n) = This recurrence solves toT(n) =O(nlogn).

صفحه 21:
Algorithms 2, 3, -2, 1, -5, 4, 1, -3, 4, -1, 2 int maxSum = 0, thisSum = 0; for( int j = 0; j < a.size( ); j++) thisSum += al j ]; if (thisSum > maxSum ) maxSum = thisSum; else if ( thisSum < 0 ) thisSum = 0; mg return maxSum; B but wny aves it Worr: 1.2. proof of correctness. ea

صفحه 22:
Proof of Correctness = Max subsequencecannot start or endatanegativeAi. = More generally, themax subsequence cannot haveaprefix with anegativesum. Ex: -2 11 -4 13 -5 -2 = Thus, if weeverfindthat Ai through Aj sums to<o,thenwecan advanceitoj+7 © Proof. Supposejis thefirst index afteri when the sumbecomes <o © Themax subsequencecannot start at anypbetweeniandj. Because A; throughA, ,ispositive,sostarting at iwowdhave beeneven better.

صفحه 23:
و ‎int maxSum =0, thisSum=o!‏ ‎for(intj=o1j<a.size()tj++)‏ thisSum+=aljli if (thisSum>maxSum) maxSum = thisSum! elseif (thisSum<o) thisSum= oi! 20 returnmaxSum * Thealgorithm resets whenever prefixis<o. Otherwise,it forms new sumsandupdates maxSumin onepass.

صفحه 24:
Why Efficient Algorithms Matter ™ Suppose N = 70° = APCcan read/process Nrecordsin 7 sec. = But ifsomealgorithmdoes N*Ncomputation, thenit takes 7M seconds = 77 days!!! ‎700City Traveling Salesman Problem.‏ "ا ‎5 A supercomputer checking 700 billion tours/sec still requires 10 years! ‎= Fast factoring algorithms can breakencryption schemes. Algorithms research determines what is safe code length. (> 700 digits) ‎

صفحه 25:
How to Measure Algorithm Performance = What metric should beused to judge algorithms? 5 Length of the program (Lines of code) 5 Ease of programming (bugs,maintenance) Memory required DRunning time =Runningtimeis the dominant standard. OD Quantifiableandeasy tocompare OD Often thecritical bottleneck

صفحه 26:
Abstraction = Analgorithmmay run differently depending on: 5 thehardwareplatform(PC,Cray,Sun) Oo theprogramming language (C, Java,C++) 5 theprogrammer (you, me, Bill Joy) = While different in detail,all hardwareandprog models are equivalent in some sense: Turingmachines. ™ It suffices tocount basicoperations. ™ Crude but valuable measure of algorithm's performance asa function ofinput size.

صفحه 27:
Average, Best,and Worst-Case = Onwhichinput instances should the algorithm's performance bejudged? ا ۲ Real world distributions difficult topredict ™ Best case: 5 Seemsunrealistic = Worst case: 5 Givesanabsolute guarantee 5 WewilCuse the worst-casemeasure.

صفحه 28:
Examples B Vector additionZ= A+B for (inti=oli<nitit++) 2111 - ۵11 +4 Tin)=cn =Vector (inner) multiplication z=A*B z=0! for (inti=oli<niit++) 2 <2+ ۵11*2111 ‏؛‎ ‎7)90( - ‏ار + أ‎

صفحه 29:
Examples = Vector (outer) multiplicationZ=A*B for (int i=oti<nii++) for (int j=o!j<nij++) Zlij] = Alt] * BU] ‏؛‎ ‎Tn) =c,n? = Aprogram does all the above Tn)=c,+¢,n+¢,n7f

صفحه 30:
Simplifying the Bound BT(n)=c,n' +c, n°’ +e, Ne? +46, NHC, Otoocomplicated 5 toomany terms O Difficult to compare two expressions, eachwith 70 or 20 terms = Dowereally needthat many terms?

صفحه 31:
Simplifications ™ Keepjust oneterm! 5 thefastest growing term (dominates theruntime) = Noconstant coefficients are kept 5 Constant coefficients affected by machines, languages, etc. = Asymtoticbehavior (as ngetslarge) is determinedentirely by theCeading term. 5 Example. Tin) =70n* +n7+40n + 800 © Ifn=7,000, then Tin) = 10,007,040,800 © erroriso.o7%ifwe drop all but the n? term 5 Inanassembly line the sCowest worker determines the throughput rate oa

صفحه 32:
080 00و Simplification = Drop theconstant coefficient 5 Does not effect the relCativeorder ca? r= 1000 n, g= 4 02, b=n?, k= 0.01 2° co 1 20 300 40 600 eco 7۵5 00

صفحه 33:
Simplification = The fastergrowing term (suchas 2") eventuallywill outgrow the sCower growing terms (e.g.,7000n) no matter what their coefficients! = Put another way,givenacertainincreasein alCocatedtime,a higher order algorithmwill not reap the benefit by solving muchlarger problem

صفحه 34:
Complexity and Tractability 10 [Olus| .0305| tus) Ins| 10۳3 105 1۳5 10° | 1ous 10° |100us Assume the computer does 1 billion ops per sec.

صفحه 35:
0 1 0 1 1 2 1 2 2 4 8 4 2 4 8 16 04 16 3 8 24 64 512 256 4 16 64 256 4056 65,536, 5 160 1,024 32,766 6 ‏سم‎ 2۰ 3 5 ani 3 en, in 5 eins can anlog n aun a ow ۲ ۰ ea 3 ۰ ‏سر‎ ‎vo AL” gh aa A com P19 Rw) (ee eet logan 0 ‏وین و 2 هيه يه‎ ۳ elt a Aa ۳ log n

صفحه 36:
Another View ™ More resources (timeand/or processing power) translateinto Cargeproblems solvedif complexity is Cow T(n) Problem size | Problem Increase in solved in 103 size solved | Problem sec in 104sec_ |size 100n 10 100 10 1000n 1 10 10 5n? 14 45 3:2 NB 10 22 2.2 2n 10 13 ‏ون‎

صفحه 37:
Asympotics Tm) keep one drop coef 3n°+4n+1 3r r 101 r°-+102 1012 r 15 ‏د15 6م‎ r an’-ntc an rv = They all havethesame “growth rate

صفحه 38:
Caveats = Follow the spirit, not theletter 5a 200onalgorithmismoreexpensive than n? algorithmwhen n < 700 ™ Other considerations: ‏لآ‎ times Daprogramrunonsmall data sets D ease of coding, porting,maintenance Omemory requirements

صفحه 39:
Asymptotic Notations ™ Big-0, ‘bounded above by”: Tin) = O(f(n)) 5 For somecand N,T(n) <c'fin) whenever n> N. ™ Big-Omega, ‘bounded below by”: Tin) = Q(fi(n)) 5 For somec>oand N, Tin) =c'fin) whenever n> N. 5 Sameasfin) =O(T(n)). = Big-Theta, ‘boundedaboveand below": Tin) = O(f(n)) 5 Tn) =Of(n)) andalso Tin) = Q(f(n)) @ Little-o, ‘strictly bounded above”: Tin) =o(fin)) 5 Tn)/fin)0 as n> 00

صفحه 40:
By Pictures = Big-Oh(most commonly used) / ‏تا‎ | پ ۱ص ‎Big-Omega‏ = ‎N‏ 07 اء 6 ان 3 = Big-Theta Dexactly Bsmall-o 5 not asexpensiveas... N 0

صفحه 41:
Example Tn) =r + 27 O(?) Q(?) ‏م۰‎ 0 ‏م زر‎ rm ‏بر‎ ‎rm @r

صفحه 42:
Asymptomic rug Si n jk Lal pa Sei Examples ee

صفحه 43:
Summary (Why O(n)?) BT(n)=c,n' +c, ,n*'+C, ,N? +..46,N4+C, = Toocomplicated Zo) Casingletermwithconstant coefficient dropped = Muchsimpler,extratermsand coefficients donot matterasymptotically "Other criteriahardto quantify

صفحه 44:
Runtime Analysis = useful rules 5 simple statements (read, write, assign) ©» O17) (constant) 5 simple operations (+- * 2 om 5 sequence of simple statements/operations © rule ofsums 5 for, do, while Coops © rules of products

صفحه 45:
Runtime Analysis (cont.) = Twoimportant rules O Rule of sums © if youdoa number of operations in sequence, the runtime is dominated by the most expensive operation Oo Rule of products ® if yourepeat anoperationa number of times, the total runtimeis the runtime of the operation multiplied by the iterationcount

صفحه 46:
Runtime Analysis (cont.) if(cond)then O(7) body, Tin) else body, Tn) endif Tn) = O(max (T,(n), T,(n))

صفحه 47:
Runtime Analysis (cont.) = Methodcalls ‏نا‎ 5 ۲ 9 © Detc. @ A sequence of operations when call sequences are flattened Tin) =max(T,,(n),T,(n), Tn)

صفحه 48:
Example for (t=7fi<n $i++) if A(i) >maxVal then max Val=A(i)! maxPos=i! Asymptotic Complexity:O(n)

صفحه 49:
Example for (t=7!i<n-7!i++) for j=n{j>=i+7! j--) if (AG-7) > AG)) then temp =AG-7)! AG- - (4 20( 2 ‏؛ مزالت‎ endif endfor endfor = Asymptotic Complexity is O(n?)

صفحه 50:
5 ۷ 11۵07 بقل BT(n)isdefinedrecursivelyin terms of T(k),k<n = The recurrence relations allow Tin) to be‘unwound” recursively into some base cases (e.g.,T(0) or T(7)). = Examples: 5 Factorial 5 Hanoi towers

صفحه 51:
Example: Factorial int factorial (int n)¢ te =Tn- +d =r 2)+d+d ‏حيرا‎ 3( + 0+ 0+ if (m<=7) return?! e(se returnn *factorial(n-7)! > (+ ‏-ه)‎ (۵ factorial (n)=n*n-7*n-2*.., #7 =e+(n- 0 n *factorial(n-7) T(n) 3 ‏ی‎ T(n-1) n-2 *factorialin-3) T(n-2 wo ۲ ( ) “oo 2 *factorial(7) T(1) 60

صفحه 52:
Example: Factorial (cont.) int factorials(int n) ¢ if (n<=7) return?! elset fact=7! for (k=2!k<=n k++) ١ a fact *=k! an returnfact! | 00 > 2 = BothalgorithmsareO(n).

صفحه 53:
Example: Hanot Towers BHanoi(n,ABC) = @ H{anoi(n-7,A,C,B)+Hanoi(7,A,B,C)+Hanoi(n-7,C,B,A) 1 =2T(n- 1)+c =?’ TUn- 2)+2c+c =2T(n- 3)+2?c+2c+e 2+1(6 ج.. +2 2) + 117 - 2+1(0 + +22 +اع)- ‎=O(2")‏

صفحه 54:
Worst Case, Best Case, and Average Case template<class T> void SelectionSort(T a[], int n) { //Early-terminating version of selection sort bool sorted = false; for (int size=n; !sorted && (size>1); size--) { int pos = 0; sorted = true; // find largest for (int i = 1; i < size; i++) if (a[pos] <= a[i]) pos = i; else sorted = false; // out of order Swap(a[pos], a[size - 1]); } = Worst Case = Best Case

صفحه 55:
f(N) (N) (N) T(N)=O(£(N)) ۱ mo=wande=7,fiNI=N TUN)=6N+4 <= ‏محال‎ - 79010: 30<-4 ‏مبهوج‎ - 030 0360 ممجكدوه NONI? 31-7 و2000 31-7 و3010 2-7 200-002 ۱ 31-37 و3010 2-00 1000000-01( 007 مدو

صفحه 56:
An Analogy: Cooking Recipes = Algorithmsare detailedandpreciseinstructions. = txample:bakeachocolatemoussecake. 5 Convert rawingredientsintoprocessedoutput. 3 Hardware (PC, supercomputer vs. oven, stove) 5 Pots,pans,pantryaredata structures. = Interplay ofhardwareandalgorithms © Different recipesfor oven, stove,microwaveetc, = Newadvances, 5 New models: clusters,Internet,workstations 5 Microwave cooking, 5-minute recipes, refrigeration 6ه

CS 130 A: Data Structures and Algorithms   1 Focus of the course:  Data structures and related algorithms  Correctness and (time and space) complexity Prerequisites  CS 20: stacks, queues, lists, binary search trees, …  CS 40: functions, recurrence equations, induction, …  CS 60: C, C++, and UNIX Course Organization Grading:  See course web page  Policy:     No late homeworks. Cheating and plagiaris: F grade and disciplinary actions Online info: Homepage: www.cs.ucsb.edu/~cs130a Email: cs130a@cs.ucsb.edu  Teaching assistants: See course web page  2 Introduction      3 A famous quote: Program = Algorithm + Data Structure. All of you have programmed; thus have already been exposed to algorithms and data structure. Perhaps you didn't see them as separate entities; Perhaps you saw data structures as simple programming constructs (provided by STL--standard template library). However, data structures are quite distinct from algorithms, and very important in their own right. Objectives      4 The main focus of this course is to introduce you to a systematic study of algorithms and data structure. The two guiding principles of the course are: abstraction and formal analysis. Abstraction: We focus on topics that are broadly applicable to a variety of problems. Analysis: We want a formal way to compare two objects (data structures or algorithms). In particular, we will worry about "always correct"-ness, and worst-case bounds on time and memory (space). Textbook Textbook for the course is:  Data Structures and Algorithm Analysis in C++  by Mark Allen Weiss   5 But I will use material from other books and research papers, so the ultimate source should be my lectures. Course Outline  C++ Review (Ch. 1) Algorithm Analysis (Ch. 2) Sets with insert/delete/member: Hashing (Ch. 5)  Sets in general: Balanced search trees (Ch. 4 and 12.2)  Sets with priority: Heaps, priority queues (Ch. 6)  Graphs: Shortest-path algorithms (Ch. 9.1 – 9.3.2)  Sets with disjoint union: Union/find trees (Ch. 8.1–8.5)  Graphs: Minimum spanning trees (Ch. 9.5) Sorting (Ch. 7)    6 130a: Algorithm Analysis    7 Foundations of Algorithm Analysis and Data Structures. Analysis:  How to predict an algorithm’s performance  How well an algorithm scales up  How to compare different algorithms for a problem Data Structures  How to efficiently store, access, manage data  Data structures effect algorithm’s performance Example Algorithms    Two algorithms for computing the Factorial Which one is better? int factorial (int n) { if (n <= 1) return 1; else return n * factorial(n-1); }  } 8 int factorial (int n) { if (n<=1) return 1; else { fact = 1; for (k=2; k<=n; k++) fact *= k; return fact; } Examples of famous algorithms            9 Constructions of Euclid Newton's root finding Fast Fourier Transform Compression (Huffman, Lempel-Ziv, GIF, MPEG) DES, RSA encryption Simplex algorithm for linear programming Shortest Path Algorithms (Dijkstra, Bellman-Ford) Error correcting codes (CDs, DVDs) TCP congestion control, IP routing Pattern matching (Genomics) Search Engines Role of Algorithms in Modern World   10 Enormous amount of data  E-commerce (Amazon, Ebay)  Network traffic (telecom billing, monitoring)  Database transactions (Sales, inventory)  Scientific measurements (astrophysics, geology)  Sensor networks. RFID tags  Bioinformatics (genome, protein bank) Amazon hired first Chief Algorithms Officer Manber) (Udi A real-world Problem        11 Communication in the Internet Message (email, ftp) broken down into IP packets. Sender/receiver identified by IP address. The packets are routed through the Internet by special computers called Routers. Each packet is stamped with its destination address, but not the route. Because the Internet topology and network load is constantly changing, routers must discover routes dynamically. What should the Routing Table look like? IP Prefixes and Routing       12 Each router is really a switch: it receives packets at several input ports, and appropriately sends them out to output ports. Thus, for each packet, the router needs to transfer the packet to that output port that gets it closer to its destination. Should each router keep a table: IP address x Output Port? How big is this table? When a link or router fails, how much information would need to be modified? A router typically forwards several million packets/sec! Data Structures       13 The IP packet forwarding is a Data Structure problem! Efficiency, scalability is very important. Similarly, how does Google find the documents matching your query so fast? Uses sophisticated algorithms to create index structures, which are just data structures. Algorithms and data structures are ubiquitous. With the data glut created by the new technologies, the need to organize, search, and update MASSIVE amounts of information FAST is more severe than ever before. Algorithms to Process these Data        14 Which are the top K sellers? Correlation between time spent at a web site and purchase amount? Which flows at a router account for > 1% traffic? Did source S send a packet in last s seconds? Send an alarm if any international arrival matches a profile in the database Similarity matches against genome databases Etc. Max Subsequence Problem        15 Given a sequence of integers A1, A2, …, An, find the maximum possible value of a subsequence Ai, …, Aj. Numbers can be negative. You want a contiguous chunk with largest sum. Example: -2, 11, -4, 13, -5, -2 The answer is 20 (subseq. A2 through A4). We will discuss 4 different algorithms, with time complexities O(n3), O(n2), O(n log n), and O(n). With n = 106, algorithm 1 may take > 10 years; algorithm 4 will take a fraction of a second! Algorithm 1 for Max Subsequence Sum  Given A1,…,An , find the maximum value of Ai+Ai+1+···+Aj 0 if the max value is negative int maxSum = 0; +) ) +)  16 O(1) for( int i = 0; i < a.size( ); i+ for( int j = i; j < a.size( ); j++ O(1) { O(1) int thisSum = 0; for( int k = i; k <= j; k+ O(1) thisSum += a[ k ]; if( thisSum > maxSum ) maxSum = thisSum; } 3 Time return complexity: maxSum;On  O( j  i) n 1 n 1 n 1 j i i 0 j i O( ( j  i)) O(  ( j  i)) Algorithm 2 Idea: Given sum from i to j-1, we can compute the sum from i to j in constant time.  This eliminates one nested loop, and reduces the running time to O(n2).  into maxSum = 0; +) for( int i = 0; i < a.size( ); i+ j++ ) int thisSum = 0; for( int j = i; j < a.size( ); { ) 17 thisSum; } thisSum += a[ j ]; if( thisSum > maxSum maxSum = Algorithm 3 This algorithm uses divide-and-conquer paradigm.  Suppose we split the input sequence at midpoint.  The max subsequence is entirely in the left half, entirely in the right half, or it straddles the midpoint.  Example: left half | right half 4 -3 5 -2 | -1 2 6 -2   18 Max in left is 6 (A1 through A3); max in right is 8 (A6 through A7). But straddling max is 11 (A1 thru A7). Algorithm 3 (cont.)      19 Example: left half | right half 4 -3 5 -2 | -1 2 6 -2 Max subsequences in each half found by recursion. How do we find the straddling max subsequence? Key Observation:  Left half of the straddling sequence is the max subsequence ending with -2.  Right half is the max subsequence beginning with -1. A linear scan lets us compute these in O(n) time. Algorithm 3: Analysis  The divide and conquer is best analyzed through recurrence: T(1) = 1 T(n) = 2T(n/2) + O(n)  20 This recurrence solves to T(n) = O(n log n). Algorithm 4 2, 3, -2, 1, -5, 4, 1, -3, 4, -1, 2 0; int maxSum = 0, thisSum = for( int j = 0; j < a.size( ); j+ +) { thisSum += a[ j ]; if ( thisSum > maxSum ) maxSum = thisSum; else if ( thisSum < 0 ) thisSum = 0; }  Time return maxSum; clearly complexity }  21 O(n) But why does it work? I.e. proof of correctness. Proof of Correctness    Max subsequence cannot start or end at a negative Ai. More generally, the max subsequence cannot have a prefix with a negative sum. Ex: -2 11 -4 13 -5 -2 Thus, if we ever find that Ai through Aj sums to < 0, then we can advance i to j+1  Proof. Suppose j is the first index after i when the sum becomes < 0  The max subsequence cannot start at any p between i and j. Because Ai through Ap-1 is positive, so starting at i would have been even better. 22 Algorithm 4 int maxSum = 0, thisSum = 0; for( int j = 0; j < a.size( ); j++ ) { thisSum += a[ j ]; if ( thisSum > maxSum ) maxSum = thisSum; else if ( thisSum < 0 ) thisSum = 0; } return maxSum • 23 The algorithm resets whenever prefix is < 0. Otherwise, it forms new sums and updates maxSum in one pass. Why Efficient Algorithms Matter    Suppose N = 106 A PC can read/process N records in 1 sec. But if some algorithm does N*N computation, then it takes 1M seconds = 11 days!!!  100 City Traveling Salesman Problem.  A supercomputer checking 100 billion tours/sec still requires 10100 years!  Fast factoring algorithms can break encryption schemes. Algorithms research determines what is safe code length. (> 100 digits) 24 How to Measure Algorithm Performance 25  What metric should be used to judge algorithms?  Length of the program (lines of code)  Ease of programming (bugs, maintenance)  Memory required  Running time  Running time is the dominant standard.  Quantifiable and easy to compare  Often the critical bottleneck Abstraction  An algorithm may run differently depending on:  the hardware platform (PC, Cray, Sun)  the programming language (C, Java, C++)  the programmer (you, me, Bill Joy)  While different in detail, all hardware and prog models are equivalent in some sense: Turing machines.  It suffices to count basic operations.  Crude but valuable measure of algorithm’s performance as a function of input size. 26 Average, Best, and Worst-Case  On which input instances should the algorithm’s performance be judged?  Average case:  Real world distributions difficult to predict Best case:  Seems unrealistic Worst case:  Gives an absolute guarantee  We will use the worst-case measure.   27 Examples  Vector addition Z = A+B for (int i=0; i<n; i++) Z[i] = A[i] + B[i]; T(n) = c n  Vector (inner) multiplication z =A*B z = 0; for (int i=0; i<n; i++) z = z + A[i]*B[i]; T(n) = c’ + c1 n 28 Examples  Vector (outer) multiplication Z = A*BT for (int i=0; i<n; i++) for (int j=0; j<n; j++) Z[i,j] = A[i] * B[j]; T(n) = c2 n2;  A program does all the above T(n) = c0 + c1 n + c2 n2; 29 Simplifying the Bound  T(n) = ck nk + ck-1 nk-1 + ck-2 nk-2 + … + c1 n + co too complicated  too many terms  Difficult to compare two expressions, each with 10 or 20 terms  Do we really need that many terms?  30 Simplifications    Keep just one term!  the fastest growing term (dominates the runtime) No constant coefficients are kept  Constant coefficients affected by machines, languages, etc. Asymtotic behavior (as n gets large) is determined entirely by the leading term.   31 Example. T(n) = 10 n3 + n2 + 40n + 800  If n = 1,000, then T(n) = 10,001,040,800  error is 0.01% if we drop all but the n3 term In an assembly line the slowest worker determines the throughput rate Simplification  32 Drop the constant coefficient  Does not effect the relative order Simplification  The faster growing term (such as 2n) eventually will outgrow the slower growing terms (e.g., 1000 n) no matter what their coefficients!  Put another way, given a certain increase in allocated time, a higher order algorithm will not reap the benefit by solving much larger problem 33 Complexity and Tractability T(n) n 10 20 30 40 50 100 103 104 105 106 n n log n n2 n3 n4 .01s .03s .1s 1s 10s .02s .09s .4s 8s 160s .03s .15s .9s s 810s .04s .21s 1.6s s 2.56ms .05s .28s s s 6.25ms .1s .66s 10s 1ms 100ms 1s 9.96s 1ms 1s 16.67m s 130s 100ms 16.67m 115.7d s 1.66ms 10s 11.57d 3171y ms 19.92ms 16.67m 31.71y 3.17107y n10 10s 2.84h 6.83d 121d 3.1y 3171y 3.171013y 3.171023y 3.171033y 3.171043y 2n 1s 1ms 1s 18m 13d 41013y 3210283y Assume the computer does 1 billion ops per sec. 34 log n n n log n n2 0 1 0 1 1 2 2 4 2 4 8 16 3 8 24 64 4 16 64 256 5 32 160 1,024 n2 2n 70000 n3 1 8 64 512 4096 32,768 2n 100000 60000 40000 30000 100 20000 n log n 10000 n 0 n 35 n2 n 1000 n3 n3 n log n 10000 50000 2n 2 4 16 256 65,536 4,294,967,296 log n log n 10 1 n Another View  More resources (time and/or processing power) translate into large problems solved if complexity is low 36 T(n) Problem size Problem Increase in solved in 103 size solved Problem sec in 104 sec size 100n 10 100 10 1000n 1 10 10 5n2 14 45 3.2 N3 10 22 2.2 2n 10 13 1.3 Asympotics  37 T(n) keep one drop coef 3n2+4n+1 3 n2 n2 101 n2+102 101 n2 n2 15 n2+6n 15 n2 n2 a n2+bn+c a n2 n2 They all have the same “growth” rate Caveats Follow the spirit, not the letter  a 100n algorithm is more expensive than n2 algorithm when n < 100  Other considerations:  a program used only a few times  a program run on small data sets  ease of coding, porting, maintenance  memory requirements  38 Asymptotic Notations  Big-O, “bounded above by”: T(n) = O(f(n))  For some c and N, T(n)  c·f(n) whenever n > N.  Big-Omega, “bounded below by”: T(n) = (f(n))  For some c>0 and N, T(n)  c·f(n) whenever n > N.  Same as f(n) = O(T(n)).  Big-Theta, “bounded above and below”: T(n) = (f(n))  T(n) = O(f(n)) and also T(n) = (f(n))  Little-o, “strictly bounded above”: T(n) = o(f(n))  T(n)/f(n)  0 as n   39 By Pictures Big-Oh (most commonly used)  bounded above  Big-Omega  bounded below  Big-Theta  exactly  Small-o N0  not as expensive as ...  N0 N0 40 Example 3 2 T(n) n  2n O(?) (?)  10 n 5 n 3 n 41 0 n 2 n 3 n Examples f (n) c k i  i1 ci n  in1 i  in1 i2 n k  i1i  in0ri n! n  i11/ i 42 Asymptomic (1) k (n ) (n2) (n3) k1 (n ) (rn ) (n(n/e)n ) (logn) Summary (Why O(n)?)  T(n) = ck nk + ck-1 nk-1 + ck-2 nk-2 + … + c1 n + co  Too complicated  O(nk ) a single term with constant coefficient dropped  Much simpler, extra terms and coefficients do not matter asymptotically  Other criteria hard to quantify  43 Runtime Analysis  Useful rules  simple statements (read, write, assign)   simple operations (+ - * / == > >= < <=   rule of sums for, do, while loops  44 O(1) sequence of simple statements/operations   O(1) (constant) rules of products Runtime Analysis (cont.)  Two important rules  Rule of sums   Rule of products  45 if you do a number of operations in sequence, the runtime is dominated by the most expensive operation if you repeat an operation a number of times, the total runtime is the runtime of the operation multiplied by the iteration count Runtime Analysis (cont.) if (cond) then body1 else body2 O(1) T1(n) T2(n) endif T(n) = O(max (T1(n), T2(n)) 46 Runtime Analysis (cont.) Method calls  A calls B  B calls C  etc.  A sequence of operations when call sequences are flattened  T(n) = max(TA(n), TB(n), TC(n)) 47 Example for (i=1; i<n; i++) if A(i) > maxVal then maxVal= A(i); maxPos= i; Asymptotic Complexity: O(n) 48 Example for (i=1; i<n-1; i++) for (j=n; j>= i+1; j--) if (A(j-1) > A(j)) then temp = A(j-1); A(j-1) = A(j); A(j) = tmp; endif endfor endfor  49 Asymptotic Complexity is O(n2) Run Time for Recursive Programs T(n) is defined recursively in terms of T(k), k<n  The recurrence relations allow T(n) to be “unwound” recursively into some base cases (e.g., T(0) or T(1)).  Examples:  Factorial  Hanoi towers  50 Example: Factorial T(n) T(n 1)  d T(n 2)  d  d T(n 3)  d  d  d int factorial (int n) { if (n<=1) return 1; else return n * factorial(n-1); } .... T(1)  (n 1) * d factorial (n) = n*n-1*n-2* … *1 n c  (n 1) * d * factorial(n-1) n-1 T(n) * factorial(n-2) n-2 O(n) T(n-1) * factorial(n-3) T(n-2) … 2 *factorial(1) T(1) 51 Example: Factorial (cont.) int factorial1(int n) { if (n<=1) return 1; else { fact = 1; O(1) for (k=2;k<=n;k++) O(n) fact *= k; O(1) return fact; } }  52 Both algorithms are O(n). Example: Hanoi Towers Hanoi(n,A,B,C) =  Hanoi(n-1,A,C,B)+Hanoi(1,A,B,C) +Hanoi(n-1,C,B,A)  T(n) 2T(n 1)  c 22T(n 2)  2c  c 23T(n 3)  22 c  2c  c .... 2n 1T(1)  (2n 2  ... 2 1)c (2n 1  2n 2  ... 2 1)c O(2n ) 53 Worst Case, Best Case, and Average Case template<class T> void SelectionSort(T a[], int n) { // Early-terminating version of selection sort bool sorted = false; for (int size=n; !sorted && (size>1); size--) { int pos = 0; sorted = true; // find largest for (int i = 1; i < size; i++) if (a[pos] <= a[i]) pos = i; else sorted = false; // out of order Swap(a[pos], a[size - 1]); } } Worst Case  Best Case  54 n0 c f(N) T(N) f(N) T(N)=O(f(N))               55 T(N)=6N+4 : n0=4 and c=7, f(N)=N T(N)=6N+4 <= c f(N) = 7N for N>=4 7N+4 = O(N) 15N+20 = O(N) N2=O(N)? N log N = O(N)? N log N = O(N2)? N2 = O(N log N)? N10 = O(2N)? 6N + 4 = W(N) ? 7N? N+4 ? N2? N log N? N log N = W(N2)? 3 = O(1) 1000000=O(1) Sum i = O(N)? An Analogy: Cooking Recipes     56 Algorithms are detailed and precise instructions. Example: bake a chocolate mousse cake.  Convert raw ingredients into processed output.  Hardware (PC, supercomputer vs. oven, stove)  Pots, pans, pantry are data structures. Interplay of hardware and algorithms  Different recipes for oven, stove, microwave etc. New advances.  New models: clusters, Internet, workstations  Microwave cooking, 5-minute recipes, refrigeration

51,000 تومان