صفحه 1:
1 0:11:02 5ع عاناناناعبانا/1 5 2:10 120 5©
1" Focus of the course:
° Datastructuresand relatedalgorithms
5 Correctnessand(timeand space) complexity
| Prerequisites
5 CS 2o0:stacks, queues,lists, binary searchtrees, ...
5 CS 4o:functions, recurrence equations,induction, ...
9 CS 60:C,C++,andUNIX
صفحه 2:
Course Organization
BGrading:
5 See course web page
= Policy:
5 Nolatehomeworks.
5 Cheatingandplagiaris: F gradeand disciplinary actions
9 1[آأذ[ذ
Homepage: www.cs.ucsbh.edu/~cs730a
= Email:cs730a@cs.ucsh.edu
"Teaching assistants: See course web page
صفحه 3:
Introduction
= Afamous quote: Program =Algorithm+ Data Structure.
= ALC of youhave programmed! thus have already been exposedto
algorithmsanddata structure.
= Perhaps youdidn't see themas separateentities:
= Perhaps yousaw data structures as simple programming
constructs (provided by STL--standardtemplatelibrary).
= However, data structures are quite distinct fromalgorithms,
andveryimportantintheir own right.
صفحه 4:
Objectives
= Themain focus of this courseis tointroduce youtoa systematic
study of a(gorithmsanddata structure.
= Thetwoguiding principles of the courseare:abstractionand
formal analysis.
@ Abstraction: Wefocus on topics that are broadly applicableto
avariety of problems.
= Analysis; Wewant aformal way tocompare two objects (data
structures or algorithms).
= Inparticular,wewill worry about ‘always correct'-ness,and
worst-case bounds ontimeandmemory (space).
صفحه 5:
Textbook
| Textbook for thecourseis:
= DataStructuresandAlgorithmAnalysisin C++
= byMarkAllen Weiss
=ButIwillusematerial fromother booksand research
papers,sotheultimate source shouldbe my lectures.
صفحه 6:
Course Outline
™ C++Review (Ch. 7)
© A(gorithmAnalysis (Ch. 2)
Setswithinsert /delete/member: Hashing (Ch. 5)
Setsingeneral: Balancedsearch trees (Ch.4and72.2)
® Setswithpriority: Heaps, priority queues (Ch. 6)
® Graphs:Shortest-pathalgorithms (Ch. 9.7.9.3.2)
® Setswithdisjoint union: Union/findtrees (Ch. 8.7.8.5)
Graphs: Minimum spanning trees (Ch. 9.5)
Sorting (Ch. 7)
صفحه 7:
120: AlgorithmAnalysis
= Foundations of ACgorithmAnalysisandData Structures.
= Analysis:
© dow topredictanalgorithm’'s performance
2 How wellanalgorithmscalesup
5 4fow tocompare different algorithms for a problem
= DataStructures
5 Sow to efficiently store, access,manage data
Oo Data structures effect algorithm's performance
صفحه 8:
Example Algorithms
= Twoalgorithms for computing the Factorial
= Whichoneis better?
® int factorial (int n) (
if(n<=7) return?!
efse return n *factorial(n-7)!
>
™ intfactorial dnt n) (
if(n<es) returns!
efse(
fact=1{
for (R=2ikeon{f++)
fact *- ع
return fact!
0
)
صفحه 9:
Examples of famous algorithms
™ Constructions of Euclid
= Newton's root finding
= Fast Fourier Transform
= Compression (Huffman, Lempel-Ziv, GIF, MPEG)
= DES,RSAencryption
® Simplex algorithmfor linear programming
= Shortest Path Algorithms (Dijkstra, Bel(man-Ford)
= Error correcting codes (CDs, DVDs)
= TCPcongestioncontrol,IP routing
= Pattern matching (Genomics)
™ Search Engines
صفحه 10:
Role of ACgorithmsin Modern World
= Enormousamount of data
E-commerce (Amazon, Ebay)
Network traffic (telecom billing, monitoring)
Database transactions (Sales,inventory)
Scientificmeasurements (astrophysics, geology)
Sensor networks. RFID tags
8 قا ۵ 8 0 ظ
Bioinformatics (genome, protein bank)
= Amazonhiredfirst Chief Algorithms Officer (‘Udi Manber)
صفحه 11:
Areal-worldProblem
= CommunicationintheInternet
= Message (email, ftp) broken downintoIP packets.
™ Sender/receiver identified byIP address.
™ ThepacketsareroutedthroughtheInternet by special
computers calledRouters.
= Eachpacket is stampedwithits destinationaddress, but not the
route.
™ Because the Internet topologyandnetworkloadis constantly
changing, routersmust discover routes dynamically.
= What should the Routing Table look like?
صفحه 12:
IP PrefixesandRouting
™ Eachrouter is really aswitch:it receives packetsat several
input ports,andappropriately sends themout to output ports.
= Thus, for eachpacket, the router needs to transfer the packet to
that output port that getsit closer toits destination.
™ Shouldeachrouter keepa table:IP address x Output Port?
™ How bigis this table?
= Whenalinkor router fails, how muchinformation wouldneed
tobemodified?
= A router typically forwards several million packets/sec!
صفحه 13:
Data Structures
™ TheIP packet forwardingisa Data Structureproblem!
= Efficiency, scalability isveryimportant.
™ Similarly, how does Googlefind the documents matching your
query sofast?
۳ Uses sophisticatedalgorithms tocreateindexstructures,which
arejust data structures.
= Algorithmsanddata structuresareubiquitous.
= Withthe data glut created by the new technologies, theneedto
organize,search,andupdate MASSIVE amounts of
information FASTis more severe than ever before.
صفحه 14:
Algorithms toProcess theseData
Whichare the top KselCers?
Correlation between time spent at awebsiteandpurchase
amount?
Whichflowsat a router account for >7%traffic?
DidsourceS senda packet inlast s seconds?
Sendanalarmifanyinternational arrival matchesa
profileinthe database
Similarity matches against genome databases
Etc.
صفحه 15:
Max Subsequence Problem
Givena sequence of integers. A7,A2,.... An findthemaximum possible value "ا
ofa subsequence Ai,...,. Aj.
= Numberscan be negative.
© Youwanta contiguous chunkwithlargest sum.
= Example: -2,77,-4,13,-5,-2
= Theansweris 20 (subseq. A2throughA4).
= Wewill discuss 4 different algorithms,withtime complexities O(n*),0(n*),
O(n log n),andO(n).
=| Withn = 70%, algorithm7 may take>zoyears: algorithm4will takea
fraction ofa second!
صفحه 16:
Algorithm 7for Max Subsequence Sum
"Given A,,.,A,,findthemaximumvalue of A;+A,,,+' +A,
oifthemaxvalueis negative
int maxSum = 0; | ou
for( int i = 0; i < a.size( ); i+
+)
for( int j = i; j < a.size( نالل nt nim
OD )- 3( ۱02 2 )[- 3(
j- Ja 120 دز
زه jou Jou a
k<=j; وف
thisSum += a[ k ];
if( thisSum > maxSum )
maxSum = thisSum;
=Time مکحم مدمه( OU)
int thisSu
for( int k
+)
6
صفحه 17:
Algorithm 2
BIdea:Givensumfromitoj-7,wecan compute the sum
fromitojinconstant time.
= This eliminates one nestedloop,andreduces the
running timetoO(n’).
into maxSum = 0;
for( int i = 0; i < a.size( ); i+
+)
int thisSum = 0;
for( int j = i; j < a.size( );
i++)
thisSum += a[ j J;
if( thisSum > maxSum
)
maxSum =
thisSum;
ae
صفحه 18:
Algorithm 3
| This algorithmuses divide-and-conquer paradigm.
= Suppose we split theinput sequenceat midpoint.
= Themax subsequence isentirely in the left half,
entirelyinthe right half, orit straddles themidpoint.
"Example:
Ceft half | right half
اقا | اهب -2
@ Maxinleftis 6 (A7throughA3):maxinright is 8 (Aé through
A7). But straddling maxis 77(A7thruA7).
صفحه 19:
Algorithm 3(cont.)
= Example:
Ceft half | right half
4-35 -2| -126-2
= Max subsequencesin eachhalf found by recursion,
= How dowefindthe straddling max subsequence?
= Key Observation:
O Left half of the straddling sequenceis themax subsequence
endingwith-2.
5 Right halfisthemax subsequence beginning with-7.
= aCinear scanletsus compute thesein O(n) time.
صفحه 20:
Algorithm 3: Analysis
= The divideandconquer is best analyzedthrough
recurrence:
1010-1
Tn) =2T(n/2) + O(n)
= This recurrence solves toT(n) =O(nlogn).
صفحه 21:
Algorithms
2, 3, -2, 1, -5, 4, 1, -3, 4, -1, 2
int maxSum = 0, thisSum =
0;
for( int j = 0; j < a.size( );
j++)
thisSum += al j ];
if (thisSum > maxSum )
maxSum = thisSum;
else if ( thisSum < 0 )
thisSum = 0;
mg return maxSum;
B but wny aves it Worr: 1.2. proof of correctness.
ea
صفحه 22:
Proof of Correctness
= Max subsequencecannot start or endatanegativeAi.
= More generally, themax subsequence cannot haveaprefix with
anegativesum.
Ex: -2 11 -4 13 -5 -2
= Thus, if weeverfindthat Ai through Aj sums to<o,thenwecan
advanceitoj+7
© Proof. Supposejis thefirst index afteri when the sumbecomes
<o
© Themax subsequencecannot start at anypbetweeniandj.
Because A; throughA, ,ispositive,sostarting at iwowdhave
beeneven better.
صفحه 23:
و
int maxSum =0, thisSum=o!
for(intj=o1j<a.size()tj++)
thisSum+=aljli
if (thisSum>maxSum)
maxSum = thisSum!
elseif (thisSum<o)
thisSum= oi!
20
returnmaxSum
* Thealgorithm resets whenever prefixis<o.
Otherwise,it forms new sumsandupdates
maxSumin onepass.
صفحه 24:
Why Efficient Algorithms Matter
™ Suppose N = 70°
= APCcan read/process Nrecordsin 7 sec.
= But ifsomealgorithmdoes N*Ncomputation, thenit takes 7M
seconds = 77 days!!!
700City Traveling Salesman Problem. "ا
5 A supercomputer checking 700 billion tours/sec still requires
10 years!
= Fast factoring algorithms can breakencryption schemes.
Algorithms research determines what is safe code length. (> 700
digits)
صفحه 25:
How to Measure Algorithm Performance
= What metric should beused to judge algorithms?
5 Length of the program (Lines of code)
5 Ease of programming (bugs,maintenance)
Memory required
DRunning time
=Runningtimeis the dominant standard.
OD Quantifiableandeasy tocompare
OD Often thecritical bottleneck
صفحه 26:
Abstraction
= Analgorithmmay run differently depending on:
5 thehardwareplatform(PC,Cray,Sun)
Oo theprogramming language (C, Java,C++)
5 theprogrammer (you, me, Bill Joy)
= While different in detail,all hardwareandprog models are
equivalent in some sense: Turingmachines.
™ It suffices tocount basicoperations.
™ Crude but valuable measure of algorithm's performance asa
function ofinput size.
صفحه 27:
Average, Best,and Worst-Case
= Onwhichinput instances should the algorithm's performance
bejudged?
ا
۲ Real world distributions difficult topredict
™ Best case:
5 Seemsunrealistic
= Worst case:
5 Givesanabsolute guarantee
5 WewilCuse the worst-casemeasure.
صفحه 28:
Examples
B Vector additionZ= A+B
for (inti=oli<nitit++)
2111 - ۵11 +4
Tin)=cn
=Vector (inner) multiplication z=A*B
z=0!
for (inti=oli<niit++)
2 <2+ ۵11*2111 ؛
7)90( - ار + أ
صفحه 29:
Examples
= Vector (outer) multiplicationZ=A*B
for (int i=oti<nii++)
for (int j=o!j<nij++)
Zlij] = Alt] * BU] ؛
Tn) =c,n?
= Aprogram does all the above
Tn)=c,+¢,n+¢,n7f
صفحه 30:
Simplifying the Bound
BT(n)=c,n' +c, n°’ +e, Ne? +46, NHC,
Otoocomplicated
5 toomany terms
O Difficult to compare two expressions, eachwith 70 or
20 terms
= Dowereally needthat many terms?
صفحه 31:
Simplifications
™ Keepjust oneterm!
5 thefastest growing term (dominates theruntime)
= Noconstant coefficients are kept
5 Constant coefficients affected by machines, languages, etc.
= Asymtoticbehavior (as ngetslarge) is determinedentirely by
theCeading term.
5 Example. Tin) =70n* +n7+40n + 800
© Ifn=7,000, then Tin) = 10,007,040,800
© erroriso.o7%ifwe drop all but the n? term
5 Inanassembly line the sCowest worker determines the
throughput rate
oa
صفحه 32:
080 00و
Simplification
= Drop theconstant coefficient
5 Does not effect the relCativeorder
ca? r= 1000 n, g= 4 02, b=n?, k= 0.01 2°
co 1 20 300 40 600 eco 7۵5 00
صفحه 33:
Simplification
= The fastergrowing term (suchas 2") eventuallywill
outgrow the sCower growing terms (e.g.,7000n) no
matter what their coefficients!
= Put another way,givenacertainincreasein
alCocatedtime,a higher order algorithmwill not reap
the benefit by solving muchlarger problem
صفحه 34:
Complexity and Tractability
10 [Olus| .0305| tus) Ins| 10۳3 105 1۳5
10° | 1ous
10° |100us
Assume the computer does 1 billion ops per sec.
صفحه 35:
0 1 0 1 1 2
1 2 2 4 8 4
2 4 8 16 04 16
3 8 24 64 512 256
4 16 64 256 4056 65,536,
5 160 1,024 32,766 6
سم 2۰ 3 5
ani 3 en, in 5
eins can anlog n
aun a ow ۲ ۰
ea 3 ۰ سر
vo AL” gh
aa A
com P19 Rw) (ee eet logan
0 وین و 2 هيه يه ۳ elt a Aa
۳ log n
صفحه 36:
Another View
™ More resources (timeand/or processing power) translateinto
Cargeproblems solvedif complexity is Cow
T(n) Problem size | Problem Increase in
solved in 103 size solved | Problem
sec in 104sec_ |size
100n 10 100 10
1000n 1 10 10
5n? 14 45 3:2
NB 10 22 2.2
2n 10 13 ون
صفحه 37:
Asympotics
Tm) keep one drop coef
3n°+4n+1 3r r
101 r°-+102 1012 r
15 د15 6م r
an’-ntc an rv
= They all havethesame “growth rate
صفحه 38:
Caveats
= Follow the spirit, not theletter
5a 200onalgorithmismoreexpensive than n?
algorithmwhen n < 700
™ Other considerations:
لآ times
Daprogramrunonsmall data sets
D ease of coding, porting,maintenance
Omemory requirements
صفحه 39:
Asymptotic Notations
™ Big-0, ‘bounded above by”: Tin) = O(f(n))
5 For somecand N,T(n) <c'fin) whenever n> N.
™ Big-Omega, ‘bounded below by”: Tin) = Q(fi(n))
5 For somec>oand N, Tin) =c'fin) whenever n> N.
5 Sameasfin) =O(T(n)).
= Big-Theta, ‘boundedaboveand below": Tin) = O(f(n))
5 Tn) =Of(n)) andalso Tin) = Q(f(n))
@ Little-o, ‘strictly bounded above”: Tin) =o(fin))
5 Tn)/fin)0 as n> 00
صفحه 40:
By Pictures
= Big-Oh(most commonly used) /
تا |
پ ۱ص Big-Omega =
N 07 اء 6 ان 3
= Big-Theta
Dexactly
Bsmall-o
5 not asexpensiveas... N
0
صفحه 41:
Example
Tn) =r + 27
O(?) Q(?)
م۰ 0
م زر
rm بر
rm @r
صفحه 42:
Asymptomic
rug
Si
n jk
Lal
pa
Sei
Examples
ee
صفحه 43:
Summary (Why O(n)?)
BT(n)=c,n' +c, ,n*'+C, ,N? +..46,N4+C,
= Toocomplicated
Zo)
Casingletermwithconstant coefficient dropped
= Muchsimpler,extratermsand coefficients donot
matterasymptotically
"Other criteriahardto quantify
صفحه 44:
Runtime Analysis
= useful rules
5 simple statements (read, write, assign)
©» O17) (constant)
5 simple operations (+- *
2 om
5 sequence of simple statements/operations
© rule ofsums
5 for, do, while Coops
© rules of products
صفحه 45:
Runtime Analysis (cont.)
= Twoimportant rules
O Rule of sums
© if youdoa number of operations in sequence, the runtime
is dominated by the most expensive operation
Oo Rule of products
® if yourepeat anoperationa number of times, the total
runtimeis the runtime of the operation multiplied by the
iterationcount
صفحه 46:
Runtime Analysis (cont.)
if(cond)then O(7)
body, Tin)
else
body, Tn)
endif
Tn) = O(max (T,(n), T,(n))
صفحه 47:
Runtime Analysis (cont.)
= Methodcalls
نا 5
۲ 9 ©
Detc.
@ A sequence of operations when call sequences are
flattened
Tin) =max(T,,(n),T,(n), Tn)
صفحه 48:
Example
for (t=7fi<n $i++)
if A(i) >maxVal then
max Val=A(i)!
maxPos=i!
Asymptotic Complexity:O(n)
صفحه 49:
Example
for (t=7!i<n-7!i++)
for j=n{j>=i+7! j--)
if (AG-7) > AG)) then
temp =AG-7)!
AG- - (4
20( 2 ؛ مزالت
endif
endfor
endfor
= Asymptotic Complexity is O(n?)
صفحه 50:
5 ۷ 11۵07 بقل
BT(n)isdefinedrecursivelyin terms of T(k),k<n
= The recurrence relations allow Tin) to be‘unwound”
recursively into some base cases (e.g.,T(0) or T(7)).
= Examples:
5 Factorial
5 Hanoi towers
صفحه 51:
Example: Factorial
int factorial (int n)¢ te
=Tn- +d
=r 2)+d+d
حيرا 3( + 0+ 0+
if (m<=7) return?!
e(se returnn *factorial(n-7)!
>
(+ -ه) (۵
factorial (n)=n*n-7*n-2*.., #7 =e+(n- 0
n *factorial(n-7) T(n) 3
ی T(n-1)
n-2 *factorialin-3) T(n-2
wo ۲ ( )
“oo
2 *factorial(7)
T(1)
60
صفحه 52:
Example: Factorial (cont.)
int factorials(int n) ¢
if (n<=7) return?!
elset
fact=7!
for (k=2!k<=n k++) ١ a
fact *=k! an
returnfact! | 00
>
2
= BothalgorithmsareO(n).
صفحه 53:
Example: Hanot Towers
BHanoi(n,ABC) =
@ H{anoi(n-7,A,C,B)+Hanoi(7,A,B,C)+Hanoi(n-7,C,B,A)
1
=2T(n- 1)+c
=?’ TUn- 2)+2c+c
=2T(n- 3)+2?c+2c+e
2+1(6 ج.. +2 2) + 117 -
2+1(0 + +22 +اع)-
=O(2")
صفحه 54:
Worst Case, Best Case, and Average Case
template<class T>
void SelectionSort(T a[], int n)
{ //Early-terminating version of selection sort
bool sorted = false;
for (int size=n; !sorted && (size>1); size--) {
int pos = 0;
sorted = true;
// find largest
for (int i = 1; i < size; i++)
if (a[pos] <= a[i]) pos = i;
else sorted = false; // out of order
Swap(a[pos], a[size - 1]);
}
= Worst Case
= Best Case
صفحه 55:
f(N)
(N)
(N)
T(N)=O(£(N))
۱ mo=wande=7,fiNI=N
TUN)=6N+4 <= محال - 79010: 30<-4
مبهوج - 030
0360 ممجكدوه
NONI?
31-7 و2000
31-7 و3010
2-7
200-002
۱
31-37 و3010
2-00
1000000-01(
007 مدو
صفحه 56:
An Analogy: Cooking Recipes
= Algorithmsare detailedandpreciseinstructions.
= txample:bakeachocolatemoussecake.
5 Convert rawingredientsintoprocessedoutput.
3 Hardware (PC, supercomputer vs. oven, stove)
5 Pots,pans,pantryaredata structures.
= Interplay ofhardwareandalgorithms
© Different recipesfor oven, stove,microwaveetc,
= Newadvances,
5 New models: clusters,Internet,workstations
5 Microwave cooking, 5-minute recipes, refrigeration
6ه