صفحه 1:
1 0:11:02 5ع عاناناناعبانا/1 5 2:10 120 5©
1" Focus of the course:
° Datastructuresand relatedalgorithms
5 Correctnessand(timeand space) complexity
| Prerequisites
5 CS 2o0:stacks, queues,lists, binary searchtrees, ...
5 CS 4o:functions, recurrence equations,induction, ...
9 CS 60:C,C++,andUNIX
صفحه 2:
Course Organization
BGrading:
5 See course web page
= Policy:
5 Nolatehomeworks.
5 Cheatingandplagiaris: F gradeand disciplinary actions
9 1[آأذ[ذ
Homepage: www.cs.ucsbh.edu/~cs730a
= Email:cs730a@cs.ucsh.edu
"Teaching assistants: See course web page
صفحه 3:
Introduction
= Afamous quote: Program =Algorithm+ Data Structure.
= ALC of youhave programmed! thus have already been exposedto
algorithmsanddata structure.
= Perhaps youdidn't see themas separateentities:
= Perhaps yousaw data structures as simple programming
constructs (provided by STL--standardtemplatelibrary).
= However, data structures are quite distinct fromalgorithms,
andveryimportantintheir own right.
صفحه 4:
Objectives
= Themain focus of this courseis tointroduce youtoa systematic
study of a(gorithmsanddata structure.
= Thetwoguiding principles of the courseare:abstractionand
formal analysis.
@ Abstraction: Wefocus on topics that are broadly applicableto
avariety of problems.
= Analysis; Wewant aformal way tocompare two objects (data
structures or algorithms).
= Inparticular,wewill worry about ‘always correct'-ness,and
worst-case bounds ontimeandmemory (space).
صفحه 5:
Textbook
| Textbook for thecourseis:
= DataStructuresandAlgorithmAnalysisin C++
= byMarkAllen Weiss
=ButIwillusematerial fromother booksand research
papers,sotheultimate source shouldbe my lectures.
صفحه 6:
Course Outline
™ C++Review (Ch. 7)
© A(gorithmAnalysis (Ch. 2)
Setswithinsert /delete/member: Hashing (Ch. 5)
Setsingeneral: Balancedsearch trees (Ch.4and72.2)
® Setswithpriority: Heaps, priority queues (Ch. 6)
® Graphs:Shortest-pathalgorithms (Ch. 9.7.9.3.2)
® Setswithdisjoint union: Union/findtrees (Ch. 8.7.8.5)
Graphs: Minimum spanning trees (Ch. 9.5)
Sorting (Ch. 7)
صفحه 7:
120: AlgorithmAnalysis
= Foundations of ACgorithmAnalysisandData Structures.
= Analysis:
© dow topredictanalgorithm’'s performance
2 How wellanalgorithmscalesup
5 4fow tocompare different algorithms for a problem
= DataStructures
5 Sow to efficiently store, access,manage data
Oo Data structures effect algorithm's performance
صفحه 8:
Example Algorithms
= Twoalgorithms for computing the Factorial
= Whichoneis better?
® int factorial (int n) (
if(n<=7) return?!
efse return n *factorial(n-7)!
>
™ intfactorial dnt n) (
if(n<es) returns!
efse(
fact=1{
for (R=2ikeon{f++)
fact *- ع
return fact!
0
)
صفحه 9:
Examples of famous algorithms
™ Constructions of Euclid
= Newton's root finding
= Fast Fourier Transform
= Compression (Huffman, Lempel-Ziv, GIF, MPEG)
= DES,RSAencryption
® Simplex algorithmfor linear programming
= Shortest Path Algorithms (Dijkstra, Bel(man-Ford)
= Error correcting codes (CDs, DVDs)
= TCPcongestioncontrol,IP routing
= Pattern matching (Genomics)
™ Search Engines
صفحه 10:
Role of ACgorithmsin Modern World
= Enormousamount of data
E-commerce (Amazon, Ebay)
Network traffic (telecom billing, monitoring)
Database transactions (Sales,inventory)
Scientificmeasurements (astrophysics, geology)
Sensor networks. RFID tags
8 قا ۵ 8 0 ظ
Bioinformatics (genome, protein bank)
= Amazonhiredfirst Chief Algorithms Officer (‘Udi Manber)
صفحه 11:
Areal-worldProblem
= CommunicationintheInternet
= Message (email, ftp) broken downintoIP packets.
™ Sender/receiver identified byIP address.
™ ThepacketsareroutedthroughtheInternet by special
computers calledRouters.
= Eachpacket is stampedwithits destinationaddress, but not the
route.
™ Because the Internet topologyandnetworkloadis constantly
changing, routersmust discover routes dynamically.
= What should the Routing Table look like?
صفحه 12:
IP PrefixesandRouting
™ Eachrouter is really aswitch:it receives packetsat several
input ports,andappropriately sends themout to output ports.
= Thus, for eachpacket, the router needs to transfer the packet to
that output port that getsit closer toits destination.
™ Shouldeachrouter keepa table:IP address x Output Port?
™ How bigis this table?
= Whenalinkor router fails, how muchinformation wouldneed
tobemodified?
= A router typically forwards several million packets/sec!
صفحه 13:
Data Structures
™ TheIP packet forwardingisa Data Structureproblem!
= Efficiency, scalability isveryimportant.
™ Similarly, how does Googlefind the documents matching your
query sofast?
۳ Uses sophisticatedalgorithms tocreateindexstructures,which
arejust data structures.
= Algorithmsanddata structuresareubiquitous.
= Withthe data glut created by the new technologies, theneedto
organize,search,andupdate MASSIVE amounts of
information FASTis more severe than ever before.
صفحه 14:
Algorithms toProcess theseData
Whichare the top KselCers?
Correlation between time spent at awebsiteandpurchase
amount?
Whichflowsat a router account for >7%traffic?
DidsourceS senda packet inlast s seconds?
Sendanalarmifanyinternational arrival matchesa
profileinthe database
Similarity matches against genome databases
Etc.
صفحه 15:
Max Subsequence Problem
Givena sequence of integers. A7,A2,.... An findthemaximum possible value "ا
ofa subsequence Ai,...,. Aj.
= Numberscan be negative.
© Youwanta contiguous chunkwithlargest sum.
= Example: -2,77,-4,13,-5,-2
= Theansweris 20 (subseq. A2throughA4).
= Wewill discuss 4 different algorithms,withtime complexities O(n*),0(n*),
O(n log n),andO(n).
=| Withn = 70%, algorithm7 may take>zoyears: algorithm4will takea
fraction ofa second!
صفحه 16:
Algorithm 7for Max Subsequence Sum
"Given A,,.,A,,findthemaximumvalue of A;+A,,,+' +A,
oifthemaxvalueis negative
int maxSum = 0; | ou
for( int i = 0; i < a.size( ); i+
+)
for( int j = i; j < a.size( نالل nt nim
OD )- 3( ۱02 2 )[- 3(
j- Ja 120 دز
زه jou Jou a
k<=j; وف
thisSum += a[ k ];
if( thisSum > maxSum )
maxSum = thisSum;
=Time مکحم مدمه( OU)
int thisSu
for( int k
+)
6
صفحه 17:
Algorithm 2
BIdea:Givensumfromitoj-7,wecan compute the sum
fromitojinconstant time.
= This eliminates one nestedloop,andreduces the
running timetoO(n’).
into maxSum = 0;
for( int i = 0; i < a.size( ); i+
+)
int thisSum = 0;
for( int j = i; j < a.size( );
i++)
thisSum += a[ j J;
if( thisSum > maxSum
)
maxSum =
thisSum;
ae
صفحه 18:
Algorithm 3
| This algorithmuses divide-and-conquer paradigm.
= Suppose we split theinput sequenceat midpoint.
= Themax subsequence isentirely in the left half,
entirelyinthe right half, orit straddles themidpoint.
"Example:
Ceft half | right half
اقا | اهب -2
@ Maxinleftis 6 (A7throughA3):maxinright is 8 (Aé through
A7). But straddling maxis 77(A7thruA7).
صفحه 19:
Algorithm 3(cont.)
= Example:
Ceft half | right half
4-35 -2| -126-2
= Max subsequencesin eachhalf found by recursion,
= How dowefindthe straddling max subsequence?
= Key Observation:
O Left half of the straddling sequenceis themax subsequence
endingwith-2.
5 Right halfisthemax subsequence beginning with-7.
= aCinear scanletsus compute thesein O(n) time.
صفحه 20:
Algorithm 3: Analysis
= The divideandconquer is best analyzedthrough
recurrence:
1010-1
Tn) =2T(n/2) + O(n)
= This recurrence solves toT(n) =O(nlogn).
صفحه 21:
Algorithms
2, 3, -2, 1, -5, 4, 1, -3, 4, -1, 2
int maxSum = 0, thisSum =
0;
for( int j = 0; j < a.size( );
j++)
thisSum += al j ];
if (thisSum > maxSum )
maxSum = thisSum;
else if ( thisSum < 0 )
thisSum = 0;
mg return maxSum;
B but wny aves it Worr: 1.2. proof of correctness.
ea
صفحه 22:
Proof of Correctness
= Max subsequencecannot start or endatanegativeAi.
= More generally, themax subsequence cannot haveaprefix with
anegativesum.
Ex: -2 11 -4 13 -5 -2
= Thus, if weeverfindthat Ai through Aj sums to<o,thenwecan
advanceitoj+7
© Proof. Supposejis thefirst index afteri when the sumbecomes
<o
© Themax subsequencecannot start at anypbetweeniandj.
Because A; throughA, ,ispositive,sostarting at iwowdhave
beeneven better.
صفحه 23:
و
int maxSum =0, thisSum=o!
for(intj=o1j<a.size()tj++)
thisSum+=aljli
if (thisSum>maxSum)
maxSum = thisSum!
elseif (thisSum<o)
thisSum= oi!
20
returnmaxSum
* Thealgorithm resets whenever prefixis<o.
Otherwise,it forms new sumsandupdates
maxSumin onepass.
صفحه 24:
Why Efficient Algorithms Matter
™ Suppose N = 70°
= APCcan read/process Nrecordsin 7 sec.
= But ifsomealgorithmdoes N*Ncomputation, thenit takes 7M
seconds = 77 days!!!
700City Traveling Salesman Problem. "ا
5 A supercomputer checking 700 billion tours/sec still requires
10 years!
= Fast factoring algorithms can breakencryption schemes.
Algorithms research determines what is safe code length. (> 700
digits)
صفحه 25:
How to Measure Algorithm Performance
= What metric should beused to judge algorithms?
5 Length of the program (Lines of code)
5 Ease of programming (bugs,maintenance)
Memory required
DRunning time
=Runningtimeis the dominant standard.
OD Quantifiableandeasy tocompare
OD Often thecritical bottleneck
صفحه 26:
Abstraction
= Analgorithmmay run differently depending on:
5 thehardwareplatform(PC,Cray,Sun)
Oo theprogramming language (C, Java,C++)
5 theprogrammer (you, me, Bill Joy)
= While different in detail,all hardwareandprog models are
equivalent in some sense: Turingmachines.
™ It suffices tocount basicoperations.
™ Crude but valuable measure of algorithm's performance asa
function ofinput size.
صفحه 27:
Average, Best,and Worst-Case
= Onwhichinput instances should the algorithm's performance
bejudged?
ا
۲ Real world distributions difficult topredict
™ Best case:
5 Seemsunrealistic
= Worst case:
5 Givesanabsolute guarantee
5 WewilCuse the worst-casemeasure.
صفحه 28:
Examples
B Vector additionZ= A+B
for (inti=oli<nitit++)
2111 - ۵11 +4
Tin)=cn
=Vector (inner) multiplication z=A*B
z=0!
for (inti=oli<niit++)
2 <2+ ۵11*2111 ؛
7)90( - ار + أ
صفحه 29:
Examples
= Vector (outer) multiplicationZ=A*B
for (int i=oti<nii++)
for (int j=o!j<nij++)
Zlij] = Alt] * BU] ؛
Tn) =c,n?
= Aprogram does all the above
Tn)=c,+¢,n+¢,n7f
صفحه 30:
Simplifying the Bound
BT(n)=c,n' +c, n°’ +e, Ne? +46, NHC,
Otoocomplicated
5 toomany terms
O Difficult to compare two expressions, eachwith 70 or
20 terms
= Dowereally needthat many terms?
صفحه 31:
Simplifications
™ Keepjust oneterm!
5 thefastest growing term (dominates theruntime)
= Noconstant coefficients are kept
5 Constant coefficients affected by machines, languages, etc.
= Asymtoticbehavior (as ngetslarge) is determinedentirely by
theCeading term.
5 Example. Tin) =70n* +n7+40n + 800
© Ifn=7,000, then Tin) = 10,007,040,800
© erroriso.o7%ifwe drop all but the n? term
5 Inanassembly line the sCowest worker determines the
throughput rate
oa
صفحه 32:
080 00و
Simplification
= Drop theconstant coefficient
5 Does not effect the relCativeorder
ca? r= 1000 n, g= 4 02, b=n?, k= 0.01 2°
co 1 20 300 40 600 eco 7۵5 00
صفحه 33:
Simplification
= The fastergrowing term (suchas 2") eventuallywill
outgrow the sCower growing terms (e.g.,7000n) no
matter what their coefficients!
= Put another way,givenacertainincreasein
alCocatedtime,a higher order algorithmwill not reap
the benefit by solving muchlarger problem
صفحه 34:
Complexity and Tractability
10 [Olus| .0305| tus) Ins| 10۳3 105 1۳5
10° | 1ous
10° |100us
Assume the computer does 1 billion ops per sec.
صفحه 35:
0 1 0 1 1 2
1 2 2 4 8 4
2 4 8 16 04 16
3 8 24 64 512 256
4 16 64 256 4056 65,536,
5 160 1,024 32,766 6
سم 2۰ 3 5
ani 3 en, in 5
eins can anlog n
aun a ow ۲ ۰
ea 3 ۰ سر
vo AL” gh
aa A
com P19 Rw) (ee eet logan
0 وین و 2 هيه يه ۳ elt a Aa
۳ log n
صفحه 36:
Another View
™ More resources (timeand/or processing power) translateinto
Cargeproblems solvedif complexity is Cow
T(n) Problem size | Problem Increase in
solved in 103 size solved | Problem
sec in 104sec_ |size
100n 10 100 10
1000n 1 10 10
5n? 14 45 3:2
NB 10 22 2.2
2n 10 13 ون
صفحه 37:
Asympotics
Tm) keep one drop coef
3n°+4n+1 3r r
101 r°-+102 1012 r
15 د15 6م r
an’-ntc an rv
= They all havethesame “growth rate
صفحه 38:
Caveats
= Follow the spirit, not theletter
5a 200onalgorithmismoreexpensive than n?
algorithmwhen n < 700
™ Other considerations:
لآ times
Daprogramrunonsmall data sets
D ease of coding, porting,maintenance
Omemory requirements
صفحه 39:
Asymptotic Notations
™ Big-0, ‘bounded above by”: Tin) = O(f(n))
5 For somecand N,T(n) <c'fin) whenever n> N.
™ Big-Omega, ‘bounded below by”: Tin) = Q(fi(n))
5 For somec>oand N, Tin) =c'fin) whenever n> N.
5 Sameasfin) =O(T(n)).
= Big-Theta, ‘boundedaboveand below": Tin) = O(f(n))
5 Tn) =Of(n)) andalso Tin) = Q(f(n))
@ Little-o, ‘strictly bounded above”: Tin) =o(fin))
5 Tn)/fin)0 as n> 00
صفحه 40:
By Pictures
= Big-Oh(most commonly used) /
تا |
پ ۱ص Big-Omega =
N 07 اء 6 ان 3
= Big-Theta
Dexactly
Bsmall-o
5 not asexpensiveas... N
0
صفحه 41:
Example
Tn) =r + 27
O(?) Q(?)
م۰ 0
م زر
rm بر
rm @r
صفحه 42:
Asymptomic
rug
Si
n jk
Lal
pa
Sei
Examples
ee
صفحه 43:
Summary (Why O(n)?)
BT(n)=c,n' +c, ,n*'+C, ,N? +..46,N4+C,
= Toocomplicated
Zo)
Casingletermwithconstant coefficient dropped
= Muchsimpler,extratermsand coefficients donot
matterasymptotically
"Other criteriahardto quantify
صفحه 44:
Runtime Analysis
= useful rules
5 simple statements (read, write, assign)
©» O17) (constant)
5 simple operations (+- *
2 om
5 sequence of simple statements/operations
© rule ofsums
5 for, do, while Coops
© rules of products
صفحه 45:
Runtime Analysis (cont.)
= Twoimportant rules
O Rule of sums
© if youdoa number of operations in sequence, the runtime
is dominated by the most expensive operation
Oo Rule of products
® if yourepeat anoperationa number of times, the total
runtimeis the runtime of the operation multiplied by the
iterationcount
صفحه 46:
Runtime Analysis (cont.)
if(cond)then O(7)
body, Tin)
else
body, Tn)
endif
Tn) = O(max (T,(n), T,(n))
صفحه 47:
Runtime Analysis (cont.)
= Methodcalls
نا 5
۲ 9 ©
Detc.
@ A sequence of operations when call sequences are
flattened
Tin) =max(T,,(n),T,(n), Tn)
صفحه 48:
Example
for (t=7fi<n $i++)
if A(i) >maxVal then
max Val=A(i)!
maxPos=i!
Asymptotic Complexity:O(n)
صفحه 49:
Example
for (t=7!i<n-7!i++)
for j=n{j>=i+7! j--)
if (AG-7) > AG)) then
temp =AG-7)!
AG- - (4
20( 2 ؛ مزالت
endif
endfor
endfor
= Asymptotic Complexity is O(n?)
صفحه 50:
5 ۷ 11۵07 بقل
BT(n)isdefinedrecursivelyin terms of T(k),k<n
= The recurrence relations allow Tin) to be‘unwound”
recursively into some base cases (e.g.,T(0) or T(7)).
= Examples:
5 Factorial
5 Hanoi towers
صفحه 51:
Example: Factorial
int factorial (int n)¢ te
=Tn- +d
=r 2)+d+d
حيرا 3( + 0+ 0+
if (m<=7) return?!
e(se returnn *factorial(n-7)!
>
(+ -ه) (۵
factorial (n)=n*n-7*n-2*.., #7 =e+(n- 0
n *factorial(n-7) T(n) 3
ی T(n-1)
n-2 *factorialin-3) T(n-2
wo ۲ ( )
“oo
2 *factorial(7)
T(1)
60
صفحه 52:
Example: Factorial (cont.)
int factorials(int n) ¢
if (n<=7) return?!
elset
fact=7!
for (k=2!k<=n k++) ١ a
fact *=k! an
returnfact! | 00
>
2
= BothalgorithmsareO(n).
صفحه 53:
Example: Hanot Towers
BHanoi(n,ABC) =
@ H{anoi(n-7,A,C,B)+Hanoi(7,A,B,C)+Hanoi(n-7,C,B,A)
1
=2T(n- 1)+c
=?’ TUn- 2)+2c+c
=2T(n- 3)+2?c+2c+e
2+1(6 ج.. +2 2) + 117 -
2+1(0 + +22 +اع)-
=O(2")
صفحه 54:
Worst Case, Best Case, and Average Case
template<class T>
void SelectionSort(T a[], int n)
{ //Early-terminating version of selection sort
bool sorted = false;
for (int size=n; !sorted && (size>1); size--) {
int pos = 0;
sorted = true;
// find largest
for (int i = 1; i < size; i++)
if (a[pos] <= a[i]) pos = i;
else sorted = false; // out of order
Swap(a[pos], a[size - 1]);
}
= Worst Case
= Best Case
صفحه 55:
f(N)
(N)
(N)
T(N)=O(£(N))
۱ mo=wande=7,fiNI=N
TUN)=6N+4 <= محال - 79010: 30<-4
مبهوج - 030
0360 ممجكدوه
NONI?
31-7 و2000
31-7 و3010
2-7
200-002
۱
31-37 و3010
2-00
1000000-01(
007 مدو
صفحه 56:
An Analogy: Cooking Recipes
= Algorithmsare detailedandpreciseinstructions.
= txample:bakeachocolatemoussecake.
5 Convert rawingredientsintoprocessedoutput.
3 Hardware (PC, supercomputer vs. oven, stove)
5 Pots,pans,pantryaredata structures.
= Interplay ofhardwareandalgorithms
© Different recipesfor oven, stove,microwaveetc,
= Newadvances,
5 New models: clusters,Internet,workstations
5 Microwave cooking, 5-minute recipes, refrigeration
6ه
CS 130 A: Data Structures and Algorithms
1
Focus of the course:
Data structures and related algorithms
Correctness and (time and space) complexity
Prerequisites
CS 20: stacks, queues, lists, binary search trees, …
CS 40: functions, recurrence equations, induction, …
CS 60: C, C++, and UNIX
Course Organization
Grading:
See course web page
Policy:
No late homeworks.
Cheating and plagiaris: F grade and disciplinary
actions
Online info:
Homepage: www.cs.ucsb.edu/~cs130a
Email: cs130a@cs.ucsb.edu
Teaching assistants: See course web page
2
Introduction
3
A famous quote: Program = Algorithm + Data Structure.
All of you have programmed; thus have already been
exposed to algorithms and data structure.
Perhaps you didn't see them as separate entities;
Perhaps you saw data structures as simple programming
constructs (provided by STL--standard template library).
However, data structures are quite distinct from
algorithms, and very important in their own right.
Objectives
4
The main focus of this course is to introduce you to a
systematic study of algorithms and data structure.
The two guiding principles of the course are: abstraction
and formal analysis.
Abstraction: We focus on topics that are broadly
applicable to a variety of problems.
Analysis: We want a formal way to compare two objects
(data structures or algorithms).
In particular, we will worry about "always correct"-ness,
and worst-case bounds on time and memory (space).
Textbook
Textbook for the course is:
Data Structures and Algorithm Analysis in
C++
by Mark Allen Weiss
5
But I will use material from other books and
research papers, so the ultimate source should
be my lectures.
Course Outline
C++ Review (Ch. 1)
Algorithm Analysis (Ch. 2)
Sets with insert/delete/member: Hashing (Ch. 5)
Sets in general: Balanced search trees (Ch. 4 and 12.2)
Sets with priority: Heaps, priority queues (Ch. 6)
Graphs: Shortest-path algorithms (Ch. 9.1 – 9.3.2)
Sets with disjoint union: Union/find trees (Ch. 8.1–8.5)
Graphs: Minimum spanning trees (Ch. 9.5)
Sorting (Ch. 7)
6
130a: Algorithm Analysis
7
Foundations of Algorithm Analysis and Data Structures.
Analysis:
How to predict an algorithm’s performance
How well an algorithm scales up
How to compare different algorithms for a problem
Data Structures
How to efficiently store, access, manage data
Data structures effect algorithm’s performance
Example Algorithms
Two algorithms for computing the Factorial
Which one is better?
int factorial (int n) {
if (n <= 1) return 1;
else return n * factorial(n-1);
}
}
8
int factorial (int n) {
if (n<=1) return 1;
else {
fact = 1;
for (k=2; k<=n; k++)
fact *= k;
return fact;
}
Examples of famous algorithms
9
Constructions of Euclid
Newton's root finding
Fast Fourier Transform
Compression (Huffman, Lempel-Ziv, GIF, MPEG)
DES, RSA encryption
Simplex algorithm for linear programming
Shortest Path Algorithms (Dijkstra, Bellman-Ford)
Error correcting codes (CDs, DVDs)
TCP congestion control, IP routing
Pattern matching (Genomics)
Search Engines
Role of Algorithms in Modern World
10
Enormous amount of data
E-commerce (Amazon, Ebay)
Network traffic (telecom billing, monitoring)
Database transactions (Sales, inventory)
Scientific measurements (astrophysics, geology)
Sensor networks. RFID tags
Bioinformatics (genome, protein bank)
Amazon hired first Chief Algorithms Officer
Manber)
(Udi
A real-world Problem
11
Communication in the Internet
Message (email, ftp) broken down into IP packets.
Sender/receiver identified by IP address.
The packets are routed through the Internet by special
computers called Routers.
Each packet is stamped with its destination address, but
not the route.
Because the Internet topology and network load is
constantly changing, routers must discover routes
dynamically.
What should the Routing Table look like?
IP Prefixes and Routing
12
Each router is really a switch: it receives packets at
several input ports, and appropriately sends them out to
output ports.
Thus, for each packet, the router needs to transfer the
packet to that output port that gets it closer to its
destination.
Should each router keep a table: IP address x Output
Port?
How big is this table?
When a link or router fails, how much information would
need to be modified?
A router typically forwards several million packets/sec!
Data Structures
13
The IP packet forwarding is a Data Structure problem!
Efficiency, scalability is very important.
Similarly, how does Google find the documents matching
your query so fast?
Uses sophisticated algorithms to create index structures,
which are just data structures.
Algorithms and data structures are ubiquitous.
With the data glut created by the new technologies, the
need to organize, search, and update MASSIVE amounts
of information FAST is more severe than ever before.
Algorithms to Process these Data
14
Which are the top K sellers?
Correlation between time spent at a web site and
purchase amount?
Which flows at a router account for > 1% traffic?
Did source S send a packet in last s seconds?
Send an alarm if any international arrival matches a
profile in the database
Similarity matches against genome databases
Etc.
Max Subsequence Problem
15
Given a sequence of integers A1, A2, …, An, find the maximum
possible value of a subsequence Ai, …, Aj.
Numbers can be negative.
You want a contiguous chunk with largest sum.
Example: -2, 11, -4, 13, -5, -2
The answer is 20 (subseq. A2 through A4).
We will discuss 4 different algorithms, with time complexities O(n3),
O(n2), O(n log n), and O(n).
With n = 106, algorithm 1 may take > 10 years; algorithm 4 will
take a fraction of a second!
Algorithm 1 for Max Subsequence Sum
Given A1,…,An , find the maximum value of Ai+Ai+1+···+Aj
0 if the max value is negative
int maxSum = 0;
+)
)
+)
16
O(1)
for( int i = 0; i < a.size( ); i+
for( int j = i; j < a.size( ); j++
O(1)
{
O(1)
int thisSum = 0;
for( int k = i; k <= j; k+
O(1)
thisSum += a[ k ];
if( thisSum > maxSum )
maxSum = thisSum;
}
3
Time return
complexity:
maxSum;On
O( j i)
n 1
n 1 n 1
j i
i 0 j i
O( ( j i)) O( ( j i))
Algorithm 2
Idea: Given sum from i to j-1, we can compute
the sum from i to j in constant time.
This eliminates one nested loop, and reduces the
running time to O(n2).
into maxSum = 0;
+)
for( int i = 0; i < a.size( ); i+
j++ )
int thisSum = 0;
for( int j = i; j < a.size( );
{
)
17
thisSum;
}
thisSum += a[ j ];
if( thisSum > maxSum
maxSum
=
Algorithm 3
This algorithm uses divide-and-conquer
paradigm.
Suppose we split the input sequence at
midpoint.
The max subsequence is entirely in the left half,
entirely in the right half, or it straddles the
midpoint.
Example:
left half
|
right half
4 -3 5 -2 | -1 2 6 -2
18
Max in left is 6 (A1 through A3); max in right is 8 (A6
through A7). But straddling max is 11 (A1 thru A7).
Algorithm 3 (cont.)
19
Example:
left half
|
right half
4 -3 5 -2 | -1 2 6 -2
Max subsequences in each half found by recursion.
How do we find the straddling max subsequence?
Key Observation:
Left half of the straddling sequence is the max
subsequence ending with -2.
Right half is the max subsequence beginning with -1.
A linear scan lets us compute these in O(n) time.
Algorithm 3: Analysis
The divide and conquer is best analyzed through
recurrence:
T(1) = 1
T(n) = 2T(n/2) + O(n)
20
This recurrence solves to T(n) = O(n log n).
Algorithm 4
2, 3, -2, 1, -5, 4, 1, -3, 4, -1, 2
0;
int maxSum = 0, thisSum =
for( int j = 0; j < a.size( ); j+
+)
{
thisSum += a[ j ];
if ( thisSum > maxSum )
maxSum = thisSum;
else if ( thisSum < 0 )
thisSum = 0;
}
Time
return
maxSum; clearly
complexity
}
21
O(n)
But why does it work? I.e. proof of correctness.
Proof of Correctness
Max subsequence cannot start or end at a negative Ai.
More generally, the max subsequence cannot have a
prefix with a negative sum.
Ex:
-2 11 -4 13 -5 -2
Thus, if we ever find that Ai through Aj sums to < 0, then
we can advance i to j+1
Proof. Suppose j is the first index after i when the sum
becomes < 0
The max subsequence cannot start at any p between i
and j. Because Ai through Ap-1 is positive, so starting at
i would have been even better.
22
Algorithm 4
int maxSum = 0, thisSum = 0;
for( int j = 0; j < a.size( ); j++ )
{
thisSum += a[ j ];
if ( thisSum > maxSum )
maxSum = thisSum;
else if ( thisSum < 0 )
thisSum = 0;
}
return maxSum
•
23
The algorithm resets whenever prefix is <
0. Otherwise, it forms new sums and
updates maxSum in one pass.
Why Efficient Algorithms Matter
Suppose N = 106
A PC can read/process N records in 1 sec.
But if some algorithm does N*N computation, then it
takes 1M seconds = 11 days!!!
100 City Traveling Salesman Problem.
A supercomputer checking 100 billion tours/sec still
requires 10100 years!
Fast factoring algorithms can break encryption
schemes. Algorithms research determines what is safe
code length. (> 100 digits)
24
How to Measure Algorithm Performance
25
What metric should be used to judge
algorithms?
Length of the program (lines of code)
Ease of programming (bugs, maintenance)
Memory required
Running time
Running time is the dominant standard.
Quantifiable and easy to compare
Often the critical bottleneck
Abstraction
An algorithm may run differently depending on:
the hardware platform (PC, Cray, Sun)
the programming language (C, Java, C++)
the programmer (you, me, Bill Joy)
While different in detail, all hardware and prog models
are equivalent in some sense: Turing machines.
It suffices to count basic operations.
Crude but valuable measure of algorithm’s performance
as a function of input size.
26
Average, Best, and Worst-Case
On which input instances should the algorithm’s
performance be judged?
Average case:
Real world distributions difficult to predict
Best case:
Seems unrealistic
Worst case:
Gives an absolute guarantee
We will use the worst-case measure.
27
Examples
Vector addition Z = A+B
for (int i=0; i<n; i++)
Z[i] = A[i] + B[i];
T(n) = c n
Vector (inner) multiplication z =A*B
z = 0;
for (int i=0; i<n; i++)
z = z + A[i]*B[i];
T(n) = c’ + c1 n
28
Examples
Vector (outer) multiplication Z = A*BT
for (int i=0; i<n; i++)
for (int j=0; j<n; j++)
Z[i,j] = A[i] * B[j];
T(n) = c2 n2;
A program does all the above
T(n) = c0 + c1 n + c2 n2;
29
Simplifying the Bound
T(n) = ck nk + ck-1 nk-1 + ck-2 nk-2 + … + c1 n +
co
too complicated
too many terms
Difficult to compare two expressions, each
with 10 or 20 terms
Do we really need that many terms?
30
Simplifications
Keep just one term!
the fastest growing term (dominates the runtime)
No constant coefficients are kept
Constant coefficients affected by machines,
languages, etc.
Asymtotic behavior (as n gets large) is determined
entirely by the leading term.
31
Example. T(n) = 10 n3 + n2 + 40n + 800
If n = 1,000, then T(n) = 10,001,040,800
error is 0.01% if we drop all but the n3 term
In an assembly line the slowest worker determines the
throughput rate
Simplification
32
Drop the constant coefficient
Does not effect the relative order
Simplification
The faster growing term (such as 2n) eventually
will outgrow the slower growing terms (e.g.,
1000 n) no matter what their coefficients!
Put another way, given a certain increase in
allocated time, a higher order algorithm will not
reap the benefit by solving much larger problem
33
Complexity and Tractability
T(n)
n
10
20
30
40
50
100
103
104
105
106
n
n log n
n2
n3
n4
.01s
.03s
.1s
1s
10s
.02s
.09s
.4s
8s
160s
.03s
.15s
.9s
s
810s
.04s
.21s 1.6s
s
2.56ms
.05s
.28s s s
6.25ms
.1s
.66s
10s
1ms
100ms
1s 9.96s
1ms
1s
16.67m
s 130s 100ms 16.67m
115.7d
s 1.66ms
10s 11.57d
3171y
ms 19.92ms 16.67m 31.71y 3.17107y
n10
10s
2.84h
6.83d
121d
3.1y
3171y
3.171013y
3.171023y
3.171033y
3.171043y
2n
1s
1ms
1s
18m
13d
41013y
3210283y
Assume the computer does 1 billion ops per sec.
34
log n n n log n
n2
0
1
0
1
1
2
2
4
2
4
8
16
3
8
24
64
4
16
64
256
5
32
160 1,024
n2
2n
70000
n3
1
8
64
512
4096
32,768
2n
100000
60000
40000
30000
100
20000
n log n
10000
n
0
n
35
n2
n
1000
n3
n3
n log n
10000
50000
2n
2
4
16
256
65,536
4,294,967,296
log n
log n
10
1
n
Another View
More resources (time and/or processing power) translate
into large problems solved if complexity is low
36
T(n)
Problem size Problem
Increase in
solved in 103 size solved Problem
sec
in 104 sec
size
100n
10
100
10
1000n
1
10
10
5n2
14
45
3.2
N3
10
22
2.2
2n
10
13
1.3
Asympotics
37
T(n)
keep one
drop coef
3n2+4n+1
3 n2
n2
101 n2+102
101 n2
n2
15 n2+6n
15 n2
n2
a n2+bn+c
a n2
n2
They all have the same “growth” rate
Caveats
Follow the spirit, not the letter
a 100n algorithm is more expensive than
n2 algorithm when n < 100
Other considerations:
a program used only a few times
a program run on small data sets
ease of coding, porting, maintenance
memory requirements
38
Asymptotic Notations
Big-O, “bounded above by”: T(n) = O(f(n))
For some c and N, T(n) c·f(n) whenever n > N.
Big-Omega, “bounded below by”: T(n) = (f(n))
For some c>0 and N, T(n) c·f(n) whenever n > N.
Same as f(n) = O(T(n)).
Big-Theta, “bounded above and below”: T(n) = (f(n))
T(n) = O(f(n)) and also T(n) = (f(n))
Little-o, “strictly bounded above”: T(n) = o(f(n))
T(n)/f(n) 0 as n
39
By Pictures
Big-Oh (most commonly used)
bounded above
Big-Omega
bounded below
Big-Theta
exactly
Small-o
N0
not as expensive as ...
N0
N0
40
Example
3
2
T(n) n 2n
O(?) (?)
10
n
5
n
3
n
41
0
n
2
n
3
n
Examples
f (n)
c
k
i
i1 ci n
in1 i
in1 i2
n k
i1i
in0ri
n!
n
i11/ i
42
Asymptomic
(1)
k
(n )
(n2)
(n3)
k1
(n )
(rn )
(n(n/e)n )
(logn)
Summary (Why O(n)?)
T(n) = ck nk + ck-1 nk-1 + ck-2 nk-2 + … + c1 n +
co
Too complicated
O(nk )
a single term with constant coefficient
dropped
Much simpler, extra terms and coefficients do
not matter asymptotically
Other criteria hard to quantify
43
Runtime Analysis
Useful rules
simple statements (read, write, assign)
simple operations (+ - * / == > >= < <=
rule of sums
for, do, while loops
44
O(1)
sequence of simple statements/operations
O(1) (constant)
rules of products
Runtime Analysis (cont.)
Two important rules
Rule of sums
Rule of products
45
if you do a number of operations in sequence, the
runtime is dominated by the most expensive
operation
if you repeat an operation a number of times, the
total runtime is the runtime of the operation
multiplied by the iteration count
Runtime Analysis (cont.)
if (cond) then
body1
else
body2
O(1)
T1(n)
T2(n)
endif
T(n) = O(max (T1(n), T2(n))
46
Runtime Analysis (cont.)
Method calls
A calls B
B calls C
etc.
A sequence of operations when call sequences
are flattened
T(n) = max(TA(n), TB(n), TC(n))
47
Example
for (i=1; i<n; i++)
if A(i) > maxVal then
maxVal= A(i);
maxPos= i;
Asymptotic Complexity: O(n)
48
Example
for (i=1; i<n-1; i++)
for (j=n; j>= i+1; j--)
if (A(j-1) > A(j)) then
temp = A(j-1);
A(j-1) = A(j);
A(j) = tmp;
endif
endfor
endfor
49
Asymptotic Complexity is O(n2)
Run Time for Recursive Programs
T(n) is defined recursively in terms of T(k), k<n
The recurrence relations allow T(n) to be
“unwound” recursively into some base cases
(e.g., T(0) or T(1)).
Examples:
Factorial
Hanoi towers
50
Example: Factorial
T(n)
T(n 1) d
T(n 2) d d
T(n 3) d d d
int factorial (int n) {
if (n<=1) return 1;
else return n * factorial(n-1);
}
....
T(1) (n 1) * d
factorial (n) = n*n-1*n-2* … *1
n
c (n 1) * d
* factorial(n-1)
n-1
T(n)
* factorial(n-2)
n-2
O(n)
T(n-1)
* factorial(n-3) T(n-2)
…
2
*factorial(1)
T(1)
51
Example: Factorial (cont.)
int factorial1(int n) {
if (n<=1) return 1;
else {
fact = 1;
O(1)
for (k=2;k<=n;k++)
O(n)
fact *= k;
O(1)
return fact;
}
}
52
Both algorithms are O(n).
Example: Hanoi Towers
Hanoi(n,A,B,C) =
Hanoi(n-1,A,C,B)+Hanoi(1,A,B,C)
+Hanoi(n-1,C,B,A)
T(n)
2T(n 1) c
22T(n 2) 2c c
23T(n 3) 22 c 2c c
....
2n 1T(1) (2n 2 ... 2 1)c
(2n 1 2n 2 ... 2 1)c
O(2n )
53
Worst Case, Best Case, and Average Case
template<class T>
void SelectionSort(T a[], int n)
{
// Early-terminating version of selection sort
bool sorted = false;
for (int size=n; !sorted && (size>1); size--) {
int pos = 0;
sorted = true;
// find largest
for (int i = 1; i < size; i++)
if (a[pos] <= a[i]) pos = i;
else sorted = false; // out of order
Swap(a[pos], a[size - 1]);
}
}
Worst Case
Best Case
54
n0
c f(N)
T(N)
f(N)
T(N)=O(f(N))
55
T(N)=6N+4 : n0=4 and c=7, f(N)=N
T(N)=6N+4 <= c f(N) = 7N for N>=4
7N+4 = O(N)
15N+20 = O(N)
N2=O(N)?
N log N = O(N)?
N log N = O(N2)?
N2 = O(N log N)?
N10 = O(2N)?
6N + 4 = W(N) ? 7N? N+4 ? N2? N log N?
N log N = W(N2)?
3 = O(1)
1000000=O(1)
Sum i = O(N)?
An Analogy: Cooking Recipes
56
Algorithms are detailed and precise instructions.
Example: bake a chocolate mousse cake.
Convert raw ingredients into processed output.
Hardware (PC, supercomputer vs. oven, stove)
Pots, pans, pantry are data structures.
Interplay of hardware and algorithms
Different recipes for oven, stove, microwave etc.
New advances.
New models: clusters, Internet, workstations
Microwave cooking, 5-minute recipes, refrigeration