صفحه 1:
Cheaper 07: Qevpvery Opsew
صفحه 2:
+ وان 0۳: Racer Oyster
OkssFiccion ساد
Gtorage سس
Recovery od رو
LoxpBased Recovery
Gkodow Pata
Recovery Dik Oowured Treceuniow
OPPer Oonnewect
(Poture with Loss oF Ovavohathe Store
Odcred Recovery تج هل
@ORIEG Revovery Okprikee
Rewoe Backup Oysters
@eedrwr Gyetre Oocowytr, Oe. ne ©Sbervehnts, Cork ced Cnakershe
صفحه 3:
ح ات0 سول +
۲ موب Poke?
eee merce cee ee eee ee ee ee te
مج
مس ره مه ها اس مرو لو Cyetew errors! he ©
dexfock) ری مج و مت صا جل
همه سوت اوه و سور او و سا سوه تلو مسق ۲
او و مرو با
ی اس مب بو وی و ارو زا
اه و بو تمه
هط تاه روا موه ما مرو له(
یب
Baker destroys df or pat of debe باس مور اه میاه تسوا 0 ۳
pion
© Desiring is weaved tp be detevicble! disk drives use chevkouvs ty detent
یله
|۱۳ Gyetre Oocowytr, Oe. ne ©Sbervehnts, Cork ced Cnakershe
صفحه 4:
+ Revovery و
له ری لعج وا هه ول ترس ۲
اه( ال رالد له توت شمسا
©) Come oP this chapter
© Revvery cherthws hove tw ports
6 Qrtces toed durtog aero trocsaniva processtoy io Poure Porch
iPorwoticg exists to recover Pro Phares
واه و وا عون the dotubase رو وا ولو و له وله (Brticos
رات له ری ,وود سور
Od. ne ©Sbervehnts, Cork ced Cnakershe ,تن 6 تیه
صفحه 5:
+ Ouray Onoter
© Ockile strroce!
© does wt sue syste rushes
© من رم eur, cache رو
© Ocak serene!
© sunives aysiew rushes
© exnopkes! cok, tore, Phok ww,
cexruokile (botery barked uw) REO
© Orbb veraw:
© اوه Por of storage thot survives و( له
© pproxtroted by cotctcttay wuliple copies ca dettact ملحب جمدم
Od. me ©Sbervehnts, Cork ced Cnakershe ,تن 6 تیه
صفحه 6:
وه ام( Gicb's-Owraye
۳۲ مس لا موی خن ری لین نی( separcie disks
© copes con be ام اد sites وه اوه مس وا suck us Pre or
© وميك عملت dota rousPer on sill resull to tervosistedt copies! Block trae Per
ذا انود ددم
۰ مه خی
۱
© Dord Bake! ال جر ایا مت
Bl Proeokn store weda Pro Poker dure doa اس (oe ooh):
© Crete cp opera os Polos (asm unrkny LUD copes oP suck block):
(0 Onte he kPorewatea ro he Pret phypird book.
Okeo the Pirst wrte successPuly cowpletes, vars the score toPorcatiog
wnt the seooed physicd block.
he culpa cowpleted واه رای he seoved write success huly
copes.
@eedrwr Gyetre Oocowytr, Oe. me ©Sbervehnts, Cork ced Cnakershe
صفحه 7:
+ | Ieopleweucios (Coc.)
BH Protein store wed Brow Paki dure de traePer (oo.):
BE Oops of a Hock موه توت مد ان سدق رو Po recover Prow
Poker:
6 @irst Prod eeocwintedt bloke!
۱ سوم( مشاه رو he two copies of every disk bok.
ماه و
© موی من مرن با موه لس store (Dookie ROO
pr لخي همه اس
۶ Ose the ند سا امن رم مد وت may be
ووه وس و ابو سب bee,
۶ میت لو ٩۱۱0 مود
4P ether copy oP oe مه ها وا لستجط ع ولا متسه error (bad checker),
venue thy the ober copy. “IP bots have a7 error, but ore dPPeredt, pverurie
the seuoed black by the First block.
سا0 لح 0 لا سواه 1 وجو @eedrwr Gyetre Oocowytr, Oe.
صفحه 8:
Oxta Oovess
© @hyetd books we those blocks residiry va the disk.
۲ uPPer blobs we the biochs residiey tewporady in wot wewory.
© @lck woveweus between dish ood wot wewory we totfcted tus the
اطلام uae تصش
© tagn(®) those Pere the physiod block (Bt cocks رم
© )نحت )©( irenePers the buPPer block (P to the disk, oad replaces the
pproprinte physicd block there.
bool cope of ol cia ات وا hur ts private Wwork-ureu ,7 ما ها
by ilove her. لول tews urreezed ond
tow Xt oiled x, مق و P's bro copy of ©
© Oe weanve, Por skophoty, thot euch cote lew Pity to, ond is stored رطس ot stop
blocks.
Od. ne ©Sbervehnts, Cork ced Cnakershe ,تن 6 تیه
صفحه 9:
+ Oxta Bovese (Ova)
© Prnswivd troPers dota tews betweru sysiew bubPer blocks ood ts private
وود ustey the Portes pperaiocs |
© ret(0) seeks he uoke of skis few 20 the bool varkdble x,
© vorte(X) cost the ude oF جما vartable 2, to dc tec OG i the bubPer
book.
© bok bese commends way vevessicie he tour of مه تلا مه
وا مخ ,مت سا موی Bev whic اتوم )ل ts ot dread) fe
wer.
صم 1 31
تا سا ۲ رسیه ساحاند (00 )لجر ومرو تاوت ©
Ol xbeequect accesses oe to the bod vopy. ©
)ات تمه مت ره سا Per ©
0
سس ۱ مات موه
Od. ne ©Sbervehnts, Cork ced Cnakershe ,تن 6 تیه
صفحه 10:
سس 6 اس @ufFer
@OufFer Blk —
100 /
0
۳
work aed
نام
00 Gyetre Oocowytr, Oe.
صفحه 11:
بزچموه) وه بجورمومو)
سم بوب وه لب مس he بط مه فولات لد با بط
ا بل
«pol ۵ مج ip مه م۳ 30 مط ن 1 مسب لو
toh و وا روا یت تایه بط سوم یلیر
له 0 ۰( way be required Por 41١ (to vuiput @ ord وه نوی لوق
have beeo wode but bePore oll oP thew مشخ سا نس اه مج رون
we wade.
4
سا0 لح 0 لا سواه 1 مجو Od. ,تن 6 تیه
صفحه 12:
+ Reever aud Giaicks (Cod)
© روت مت despie Poles, we First cuiput iPorcatiod desorbiay the
ال سا رما نات مه ارو با سل
Be sity tw اس
© bxpbesed recovery, ord
* ومبسمسطصاد
000 thot i, oo Pier the ober.
سا0 لح 0 لا سواه 1 موه @eedrwr Gyetre Oocowytr, Oe.
صفحه 13:
رو رمووو() لودع ظاعوورا
B® by her oa okbe sine.
© Phe loys a sequewe oF bey revords, ued cototctces ot revord oP update
وص انعد va the لول
© Ohkea inners 7, starts, tredsiers رواخ a
<P, whales revord
۲ Before 1, exentes wrte(X), okey record <P, X, Oy O? 9 wien, where O,
te he other oP XC bePore the wurie, ood Ot the uke ty be writes to X.
© bet record antes tht 7) kor perPorwed a wrt لک ما مد مه her
O, be ore the wrte, ond ull ave voke O,, Pier te wrte.
Bl Whew 7) Brisker thet ottewed, be by record <P, ppew> t writes.
Be coor Por cow tht bu records oe writed drecly tp okible stone (thot i,
fey oe ot bP Pered)
© Dw wercaches ار
© DePered database wodFivaion
© Aexveddie dattbase wodPicdioa
سا0 لح 0 لا سواه 1 موه Od. ,تن 6 تیه
صفحه 14:
+ QsPerred Oxide DrdPodica
۲ با tePerred dotcbesr wodPivatve schewe records dl روصا جما جا جمحش حالسب
but dePers dll the wrties ty oPter portal azat.
© Ose nce he teeeuntiows ره جیوه
ke. لو خی > مج برجا وه مه
B® wrte(X) perctica resus ta by reverd <7, X, O> betsy writen, where
Ot he wow ihe Por X
© ote! od uch ty oot ceeded Por this schewe
19 Dhe wrte cet perPorwed oa X ot his toe, bat ts dePerred.
Ohea 7, portly ores, <P poet? ما سح te kx
© Pind), the bby records ore read oad used ty oct) exerute the previously
لاو
|۱۳ Gyetre Oocowytr, Oe. 7.0 ©Sbervehnts, Cork ced Cnakershe
صفحه 15:
+ OsPerred Oxide DrdPodion (Orc)
18 7 يحنى recovery er و crash, 9 rowsuniiog oeeds ty be redooe P oad oaly P bot
> ote orl? power oe tere جا he by.
BE Redbkn o texecctoa 7, (redo?) sete he ude of ol cota few arched by te
ترجه با نا بو vcher.
اج مه بل ۲
عه i examen the orice updos, و با ©
مجحلا بيجا جا kde recovery cote ۶
B excep exeioes Ty ond T, (Ty تسا فص 1١ :ل
Dy ress (P) Dy tread (OC)
@:-@-80 0 0-400
Orte (®) wort (O)
rewd ()
6: ۵+0
he (0) هس
سا0 لح 0 لا سواه 1 موه @eedrwr Gyetre Oocowytr, Oe.
صفحه 16:
+ OsPerred Oxide DrdPodion (Orc)
© ew we show the bby os it اه عم عدا اد سمه
<Tp start> <To start> <Tp start>
<T, A, 950> رن ۸, 9502 >, 2, 950<
>10, 8, 2050< ,و 8, 2050< >, B, 2050>
<Ty commit> >10 commit>
<T, start> <T, start>
<I, C, 600> = <T), C, 600>
<T, commit>
(b) (9)
BPRixy oooh sinnp of eve oP orack is oe ki ose!
(0) Oo rab wine weed w be her
(b) redo ام © ام( > هلچ سا شمه(
(0) vedo( Tg) اسح be perPorcoed Pobowed by rede T)) اه
<2, power nd <P, oouna> ore preset
سا0 لح 0 لا سواه 1 موه @eedrwr Gyetre Oocowytr, Oe.
صفحه 17:
+ towers Dokbwe DodPotva
۲ ۱ ات مرن سل لت dlows database updates oP ot
oor جد طلجت سا وا مس the writes ure لح
© swe uedoiey way be ceeded, upduie loys ust hove bois ok votue ced ceus
uch
198 Opdate by record wast be writes before datas tec is usritest
© Oe ws .nve thot the boy record is pulpal direcily tp stuble storage
© Coa be extruded i poster by record pulp, 90 boon جد prior to exer ot
of oc pulpa(®) operators Por a dota book @, of bt records correspon
otros Bonet be Photed ty oktble جمد
BE Ouput of uechied books on tohe phire ooo koe bePore or Per تمصي
سیر
18 OOrder ta whick blocks ure cual co be dP Perrot Brow the order ia ushick they:
re writes,
@eedrwr Gyetre Oocowytr, Oe. wae ©Sbervehnts, Cork ced Cnakershe
صفحه 18:
ی +
bow One Oud
>, sho
<P, B, WOO, SSO>
2, ۵, 6000, ۵6060
@=980
@=0090
>, wow
> sta
<0, 0, 20۵, 600<
0= 000
Dy ®,
> سوه
4
X. ری انا سا ۵ اقا
|۱۳ Gyetre Oocowytr, Oe. موه 311
صفحه 19:
+ thowerkts Debwe DodPoctva (Ovu.)
Bl Reve proche ام موه ماب of coe!
© edb 1) restores the سس of ll dott tec upchaied by “D7 their ob chew,
eee eee eee ee eet eee
© reb(1) see he vcke of ol deta texee uted by Po the ew ches, yore
Ponward Brow te Piet boo record Por 7,
Bok opercioxe cam be kewpoied
© Dro حك رط P he operdivg is executed wuliple fer the ePPevt is the socve a
Pits exerted core
* Deeded siwre operatives way yet re-exeruted dure بمصحدم
© hee reowvertay oer Pukire:
۱ مس وا ۶ موی بانط( مس he record
ST, phat, bet does cet ovis he revord <P eouwA>.
© Denton Poors be redoue Pe bx ovata bok the record <P) ett>
arnt he record <7 pow.
19 )1( له opertiow oe perPorwed First, hea redy operciicas.
Od. «9.00 ©Sbervehnts, Cork ced Cnakershe ,تن 6 تیه
صفحه 20:
el بت پا سل تیاه سب tL oppeary ot threw Retaarey oP tern.
<Ty start> <Ty start> <Ty Start>
<Ty, A, 1000, 950> >10, بق 1000, 950< >10, A, 1000, 950>
<To, B, 2000, 2050> >, 8, 2000, 2050< <Ty, B, 2000, 2050>
<Ty commit> >10 commit>
<T, start> <T; start>
>11, C, 700, 600> — <T,, C, 700, 600>
<T, commit>
م (©)
Cron wane Weak oer dae ae
(0) canbe (1): © b revered » COOO ened Ov (OOO.
(b) verbs (7) orn recy (Tp): Ow reotored POO, axed bea © سف © لحت
امهممسه 9060 لمی 060 باس
طخ (م )( cx re (P,): O axnd O wre ox ای 990 و 0
reopevand. Phew Oe ont» ODD
هو Od. ,تن 6 تیه
صفحه 21:
4 coated
۲ لا recovery proveder us deassed porter :
بط رام pute bry fre oe ROAR
تاه سم ات مج ولج را موی له سب
( ینجن شا updates to the database.
© Greaves recovery procedure by periodicdly perPorwiay chevhpottay
1١ Ouputdl locprevords curredly resides io ait wewory poi stoble storage.
Ouiput ol soodPied bubPer blocks te the disk.
2 Orie a log record > vhevkpotd> vain stable يماد
سا0 لح 0 لا سواه 1 مه Od. ,تن 6 تیه
صفحه 22:
+ Obsukporis (Orct.)
© Outeg recovery we ceed tp ooosider oly the wost reved شمه ٩۳ فكلا
stated bePore the chechpotd, اه اوه شحو لو ۶
> امس مس جد لعو ابا اه له مس (Goa backwards
record
Ovniteur sconce backwards lo record <7 stat? ie Poured,
( با وی بای لو( port of ley Polowiesy cbove stad record. Gorter
pend oF lpg can be tqeored durteg recovery, ced ooo be erased wheuever
desired.
1) or dl ienexctvar (startar Pro Por hier) wih صم >1١ poet,
طحي ححص )1١(, ))( ادب سستجمها خم حصت جا بياج ج55 (
عص 41 مس اوه سس Forward ict ihe boy, Por ol رمموق)
hier wilt <7, vower>, exeruir redo(T).
سا0 لح 0 لا سواه 1 هو Od. ,تن 6 تیه
صفحه 23:
Puke سود امن
( اعسات صا سل جات ما تفت تاحه تلم لس cos be ۱
Ty ew. 110 9
علي ۱
سا0 لح 0 لا سواه 1 و Gyetre Oocowytr, Oe. ۱۳|
صفحه 24:
۳۹) سطله!۵)
خا لحاس ها مت ع روم لها نا ماه معا بجوم و۵ ۲
ره اجه سا
BE وا بط مدا وج رد مت تم of مس و the pared pe
تساه بط اج سل pane یف
۲ انا وج تمه با ی to weve storage, suck trot state oP the database prior
لحم چا رو موه ماما و
۱۳ wodPied durtoy exertion
© Vo stat wth, bok the page tobles ore ideaicd. Ou curred paye tuble te used Por dat
few unpesses dudey executiog oF the trocruriza.
© Qkevever coy poe is cba ty be writes Por the Pirst tre
© copy oF this pone te wade voip oo need poe.
© Dhe correct par tuble te thea wade tp تروصت ححا صا مادم
© Dhe update te perPorned oo the copy
سا0 لح 0 لا سواه 1 هو Od. ,تن 6 تیه
صفحه 25:
pigeon disk
سا0 لح 0 لا سواه 1 ووم Od. ,تن 6 تیه
صفحه 26:
+ 0
(Ghadow cad curred poop tobles oPier write i page
shadow page table
سا0 لح 0 لا سواه 1 هو Od. ,تن 6 تیه
صفحه 27:
+ Obadow Parton (Oat)
© Po cowl a ixxenios |
عاص صا لوصوب مده ها جردم السب لد جاص اف .0
©. 0 سدم ححصت انحرف ible to tots
6 0 he cures pax tbe he ce shank poe ble, oe Polos!
© keep a pointer to the shadow pore table of o Pred (keowa) lpeaios vo deb.
6 6 22 12 22223 222 د fone etre rere اك اك ل
iy potuliy pannel puxe tobke na dks جر
اوح مس رت ما سا shadow pare نا Ocee porter ۴
© Wp reve & weeded Per 9 prick ,رهاط اجه موی مس تست
wien fhe shadowy pave tobe.
BE Pages owt ported iy Prow curreulshodbuy poy fable should be Breed (srarboxe
سیر
Od. wor ©Sbervehnts, Cork ced Cnakershe ,تن 6 تیه
صفحه 28:
+ Chow Pater (Oow.)
۲ Odvoukes of shadou-pagry over berbused schewes
© werhead oF writry log records
نضا جا بمصصحص *
1 :ص0
۶ و روت ال و له رون
» Coa be reckeed يحاص بو 3 poe thle pinwtwred the 0 O*-iree
— Oo weed ty copy euire tee, ody weed tp copy poke ta the tree that bead to
سب ببس
© Cort overhead shiek even wit مه و
+ Deed ip Phok every urdhied pay, oud pace tobe
© Dota ete Praxnoecied (rebied paves yet separced oa tsk)
© ORer every resuniva cowplction, the dotabuse pages ooctatateg obd versizces oP
swodhted data ceed to be garbage لاس
© ler to exited deporte to olow trresurioes to rust cvarurreciy
* Goster to extead log based schewes
سا0 لح 0 لا سواه 1 هو Od. ,تن 6 تیه
صفحه 29:
+ QRevovery Ok Cowured Traswirw
۴ De only te نا وس وی وه و عالطا
2222327
© 001 9 وماد جاص a sine dk bub Per ord a sted bot
© 09 تاها bck oa have dots ters urchiied by mee or crore tracert
BE Oe wm noe cowry roirol woken viret Luo-phare bhi
© he the updkies oP veroercnited trrcsuntizas should oot be visible to ver
سور
” Otkeruise how to perPorw verde PDC updates , thea DO updates (DB
را له ویو لو TM bas to abort?
۱ با ریسا doce os desorbed eorter.
سا رو و وله اه سم رورا و toterspersed ta the lou.
۲ ۱ اد اه له سم پا vo recovery have ty be chased
© ste severd irneurives way be uive wheo a chevhpotd is perPorwed.
سا0 لح 0 لا سواه 1 هو @eedrwr Gyetre Oocowytr, Oe.
صفحه 30:
۲ وه مان perPorwed us bePore, except thot the chevhpoidt joy revord te ow of
the Por
< حرا
where ts the tet oP trocsunticos ucive of the tie of the cherkpotat
۶ طولب موجه و( or it prowess hte واه ie carted ot (ul
roku tir kite)
© Wheo he systew revwvers Prow a oresh, tf Pirst doer the Polowterg!
وی سا ccd rede-let to ep
(Goa the joy backwards Proc the pwd, stoppies uted the Pirst <chevkpotat L>
vevord ts Pound.
(Por ack record Pourd durtey the backward sro
© Bike reper دا >1١ <امجموو , hl Te red-bet
Phe records <7, phat, hea P Dye ,سحاد جامد add ۰ ال
١ Por every Dial, B De ot i rect, بل و لل
سا0 لح 0 لا سواه 1 ممم @eedrwr Gyetre Oocowytr, Oe.
صفحه 31:
ths port vod coceiets oP recaps traneuntioes whick wast be untoce,
سوم سور لو oF Finished trocsantiogs thot aust be redo.
© Revvery ww oowinues os Pola:
bachivards Brow west reved record, stoppin whet بجاو
دحلم eta records have bea earountened Por eveny 1١ ١ >
© Our he soon, perPors ede Por د صا ما لا اس وا ات
trocsurtiog tr ucts.
الس را واه > فا ]مرا
bog Porwards Prow the <pheckporat L> record tke ead of the low. 1۱
© Orc he soon, perPorw rede Por cack oy record thot bel too
سل وه موم
|۱۳ Gyetre Oocowytr, Oe. «7.00 ©Sbervehnts, Cork ced Cnakershe
صفحه 32:
ی ۵ +
وا مسا بط من میگ وضو عط @o over he steps oP 19
موه ,>
<0 ,۵ ,0 ,>
سس >
و >
<40 ,۵ ,۵ ,>
| سا وه 4 مه دا موه ۱۶ و >
<P, 0, 0, 00<
<60 ,00 ,0 ,41>
٩ ۰(< ممسواسای>
بو >
<P, 0, 0, 60<
<P, O, 0, 00>
سس >
سا0 لح 0 لا سواه 1 هو Od. ,تن 6 تیه
صفحه 33:
+ Log Record @uPPertc
Bo bog revord bubPerted: oy records ore bubPered in watt wewory, fostead oP oP
beta pulpal direnily to stable sore.
© Lent records are pulpal ty stable pire hea a block of boc records ta ho
babPer ts Pl, or a bog Poroe operon is executed.
وممصم اه و رو ماه مه و او و Low Borne ۳
وه وه و( تمه (eke he
© Geverd ley reverds con thus be cuipul د عطاك stage مس شوه تون the
WO vost.
سا0 لح 0 لا سواه 1 وو Od. ,تن 6 تیه
صفحه 34:
+ Log Record BuPPertaq (Ovu.)
BO ODhe nies below west be Polowed P log records ore bubPered:
© beg reverts ore pulpal to stuble storage to the order ia whick they ore
لس
© يجا جملا مت تسه ام با مه 1 مس record
> ما مین beew pu ty oti واه
© ebore a block oF data te woke wewory وه رال سا نا تن ع
records pertototey to dota fo trot blocks o7ust have bers cuigut to stable storage.
۱ ۶۳ nde ts dled the varte~dhead loygag or D@L rue
~ Girely speakie DOL ool requires ای ص و موه طی
سا0 لح 0 لا سواه 1 موب Od. ,تن 6 تیه
صفحه 35:
Octebose OuPPertay
B Oottbase watdtoies oo رو bubPer of data beck
© Okea a cew block is ceeded, P buPPer is Pull ow extstay blocks ueeds to be
rewoved Prow buPPer
01P the block chose Por rewoud hes beeo updated, oust be muiput to dist
19 sa result oP the wortechead Irqgiay rule, Po block with veered updates is mulzut
to disk, leg records wih uade toPoreaiiog Por the updates are pulpal te the bog oe stable
store First.
1# Wow: ould be to progress ooo block wheo ite cuiput to disk. Ouc b ابید
us Polos.
۶ مت سوه o dott few, irrurion umquires exckisive lok oa block ovotaictcry
the cote tec
© bok co be releused pave the write te cowpleted.
+ سا اس held Por short duraica ore culled kaichew.
© @ePore u block is vulput to disk, the sysiew urquires oo exchusive hatch oo the block
* Cesures oo update coc be to progress 7 the blo
سا0 لح 0 لا سواه 1 هو Od. ,تن 6 تیه
صفحه 36:
+ OPPor Oucxpwed (Oow.)
© Octubuse bubPer coc be topiewrcted ether
© logo ed oF red wokrwewory reserved Por the dutdbuse, or
© ی مش
۲ وروی رو و تا با her drawbacks:
© Dewey ts portioued bePore-hoad between database buPPer ord
قاطا با ,امه
۱۳۹ way chan, ood othougk opera, syste havws best hows
swewory should be chivided up of ony tee, amet chooge the portiicatay oP
wewory.
سا0 لح 0 لا سواه 1 هو @eedrwr Gyetre Oocowytr, Oe.
صفحه 37:
+ OPPor Oucxpwed (Oow.)
ناه ره و ره لعج eerily kop ented ta بو ول له ۲
drawbacks:
© Oheu operate syoew wer ات وا a pour hol سل وا ,یی ما سم
epee Por coker pare, he poe ip unites tp swap space ou toh.
© سوا( dotubwwe یل Wy wrte bPPer pave to ok, bPPer pose way be fa
Suny space, ond way hve tobe red Brow swor spore oa doh ond ot
to the بط بحاصل من سول ts extra VO!
مس hed pacts problew.
© ded) when swppin pul a cicbose bPPer pane, oper syste shod
poss code bp dokbose, whick ia hua pups pace واه تیه سل و
SUN بط فده ط ید )ورد Pars!)
> Dad pari oon thus be worked, but exons operas سوه db ot
روط امه وود
Od. wer ©Sbervehnts, Cork ced Cnakershe ,تن 6 تیه
صفحه 38:
+ (Pokee Wik bow oP Oodikdiy Gore
Powe waned w lose oP corvokile store وق ا
Dechoique sive to chevhpoittey used to deul wits bss oP arcrupkatie storage ۱
Cericdicdly dcop the eure couiedt oF the chtcbose to stable strrace ©
© Oo resuriog wey be orive durioy the dup procedure; a provedure sitar to
chevhpotatey wet tohe phase
> Quint dl os records murrediy reside ia waa weeny oxto sible sion
* Output al buPPer blocks cote the cists.
١ Copy the ovotents oF the database to stable storace.
۱ موجه سای من وان حول > ارو و تون
©) اد موم و
۱ مس ول مور wost revect ducop.
۱ مدع اه لاو لا عمجم اه مل له رها ع ون
B® وا للم و olow trawsurtioes to be arive durioy ducrp;
heowa os Pay dice or voor dope
© Oil sity Paap checkportey ker
سا0 لح 0 لا سواه 1 هو Od. ,تن 6 تیه
صفحه 39:
صفحه 40:
+ Gdanced Gecnery Techanee
BH و 60 بو ای سب امه ,مات رما روص و
oT اش
© Operations the @ ree tesertows und debtions رات تا ما
© Dhey comun be ول by resioriogy okt voter (phyotrd vers), she pare «
ook brewed, her rexeurines way have urduied the @tree.
© Aeetenl, Keerivcs (resp. debetows) ure unror by بمشججمت 0 deleira (resp.
feseriva) opercica (horus مها چم ued).
Bor suck operon, ued ly records should ovotais he vedo operation ty be
exerted
© added gird uade bata ۰ ات ty phystod vans br.
15 Rede iPorwaina & byged phyeizdly (hol is, سنج voke Por euk write) eved Por
suk operations
© Ledted rede یه امه بوچ dotdbase stot oa desk way wt be
مه مرن
سا0 لح 0 لا سواه 1 موب Od. ,تن 6 تیه
صفحه 41:
Operatica loguicy te door os Priaws:
stare, ba <7, O, operdtorbeda>. ere O, 16 3 wise سین ما
idewPier of the operctios aki.
Ohte مه tf exeruity, cord ley records wih physicd redo od physica
sande صمت جمدم ونام booed.
Okeu pperdiva mops, <P, O, vperaorrad, D> is iced, where D
تمه له ی ماه مناج نا تج مهم مه تععین
٩۴ ورن تسوا موه بای oops:
۱
وه طجی و لح ها موق ke physied vedls ©
ات رورت سل ولا 0000 1701000
1
© bxicd vad is perPorced موق طلی لصوم :00 و Por the
persion ای ها
Qedo of (Pier crust) stil uses
4
@eedrwr Gyetre Oocowytr, Oe. مهمه
صفحه 42:
+ Bdanued Qecciery Tectcique (od)
doar oe Poke )1 ما جر اس(
BH Gow te by bahar
Raby record <P, X, Oy On? & Poued, perPorw the vedo oad by 0(
by record <7, X, OP موسر اسرد
AB a <7, O, vperstowend, O> record te Pond
60 مه ی Rolback the operas brody using *
اد ane legged jet the اس ام ومد لوط Opies
شحو ماه امه
و و روا اه ام ,سا مج اه امه ۲ 0۱)
روص له eed
سوه 0 ,>
و0 > ی اس وا بط و
beqo> is Pann
سا0 لح 0 لا سواه 1 موب @eedrwr Gyetre Oocowytr, Oe.
صفحه 43:
+ Bdanued Qecciery Tectcique (od)
Bl Gown be by bukards (ov.):
AR لاصيا جبحا جا لمحي لاصلص و
ممما جا لمعب <ا روطم سب ,0 ,11> د خاك ا
skip لجح یط وا مس
>, 0, operctorbedts> © Pons.
bop be soca when the revord <T, star? te Prvred
Okla <P, bor? record صما حا boy
Borne prt tr swe!
© Ooses 9 and @ obove coo poor coy Phe حول orushes white a
ایا لام رما سا طسو
۱ و oF
اوه سوه با
سا0 لح 0 لا سواه 1 موب Gyetre Oocowytr, Oe. ۱۳|
صفحه 44:
+ رس دس رگ Teckel Ook)
و موه و موم ات مه موه و۳ با
اس را ماه < (Goan fey Porward Brow ket
مه .1
Oreste oct veerot char he sia ces Prous
را ول ۱
ای و للم جا 1١ لمتجنا جا stan > سا«
deleted Brow 1 ,لوط or <7, xbort> حوموو > مسا(
لس
A his brags database to state os oP وی جه أعنبد جه ای نت ی
رنه شم beeo redoce.
Wow و نا مس موه لول keeowpbte, tral tr, have cether
لسو يج رجور wor bees Bully roled back.
@eedrwr Gyetre Oocowytr, Oe. wee ©Sbervehnts, Cork ced Cnakershe
صفحه 45:
+ Bdanued Qecciery Tectcique (od)
Recovery Brow systew oreek (ovct.)
©. Gra bu backwards, perPorcse wendy a ام سس اه و
ی
وم و are role back os desorbed eater.
© hen <7, wa? جا Prue Por موم و 0 verte, wre a <P,
لمصصيت را و
۶ و > اب و بو records hove beeo Pood Por of Pict ال
18/113957 adver the ef Penis oP ecowplete rocsuriioas (hose wih other powell
le ey eer eee ere
سا0 لح 0 لا سواه 1 موب @eedrwr Gyetre Oocowytr, Oe.
صفحه 46:
+ Bdanued Qecciery Tectcique (od)
Bl Okevhponatag i dice os Pokus:
(0 0 يجا اك تحف records in ewer) 1 sible storage
Out to doh dl waodPied bubPer books
١١ 0 اتحف bx oa otdble باس را واه > و وت
۱ oot dlowed to perPors coy onicos white chekpotiterg i it
prowess.
© Rugg chevhpototey dows trocsurtives i progress while the wost ive
واه خن وم روصت one to progress
۱ va cen slide
سا0 لح 0 لا سواه 1 موب @eedrwr Gyetre Oocowytr, Oe.
صفحه 47:
+ Bdanued Qecciery Tectcique (od)
۲ وه ریخ is dour os Pols:
اس روا لآ و راو
Onte 3 <chevkpotet L> ley record ood Pore boy to stable storage
ote tt D oP coodiPted buPPer blocks
0 Onw perni اب موم وا شم their orto
Ovtput te cist ol soodPied buPPer blacks tet tet D
blocks skoukl aot be updated ube beta ان
نطو OL: ol ley records pertotatay و block wast be vutput bePore
the block ts mutput
Gtore 3 potter to the chevkpotat من مانب ما ام و ام
hk
© kes reomeriog 1 PRuzzy checkporal, stot sca Brow the chevkpotd record
ported io by ماس
© bow records bePore et_chevkpord hove their upduies reAlevied tr dutcbase
pods, cod ceed oot be redo.
eens checkporis, where sysiew hed crashed white perPorcicny و
Ez رای لا checkpotat, are
@eedrwr Gyetre Oocowytr, Oe. ner ©Sbervehnts, Cork ced Cnakershe
صفحه 48:
@OR166 Revovery Okprikw
صفحه 49:
+ 2۳۷0۵
۲ 0186 و و oF he ot recovery wrth
cuvercus upitvizaives to reduce overheads durtay oral وا و
provessrn| ued ip speed up repovery
© Dhe “odveued recovery okprihew” we sted carder tr wrdeled Pier
ORIEG, but gredly اه وم توا موه
0 مد سم اه 0 ۲
سس نا رل نا (00) امه مصصوو بها تسوا ۱
سا راما عم whol updates رام و همم و و()۵)را موی ۱
اد و pphed to
٠١ لصم ادصحاصصيوا)
Oiny poe toble ty avord umevessary redos durtag recovery
(0) Cheap checkpotatey that ody records iePorwatica about ddy payee, ord
does ot require diy poops te be vorites cut ot chechpotc Ee
* Qore cowie up vo cork of the ubove ...
سا0 لح 0 لا سواه 1 موب Od. ,تن 6 تیه
صفحه 50:
@OR166 Optcazatows
© یاه re
© OR Revted poe is physiol) deuiPi|d, uciod wikia page coc be boyd
© Osed to reduce looper, pvereerds
a repo is deleted ced dl ther records hove ty be woved ات بو
to Pil kote
« جوا rede oa log feet the record delet
> Physicdd rede would require lrgsiog oP ob cod ce vokes Por
swuck oF the pace
ب دستص جف عاك جا اتحودج Requires pace ty be
hardware (ROD, dso supported by sve dst: ارت مه و بوچ
مود
~— eexeplete page cuiput coc be deterted by checks رجا
> Out extra ات ore required Por recovery
> Drected as ه edt لت
Od. ,تن 6 تیه
صفحه 51:
+ ORWO Ore Orwiures
Bboy seqewe wevber (UG6O) dealer euch by revord
© اص( be sequecidly rocreasicry
© Dypiedy oa of Peet Prow beara oF boy Ble ty olow Past aces
ما را ری مامت رات
© Gok pe ی o Pagel GO whick is the LGO of the lest ley record whose
ام ore reReried oo the poe
© Dodie 3 page
۱ ام( the por, cod write the by repord
* Opdkie the pace
۱ Revord the LGO of the boy revord it PaeLOO
۱ عم سل(
© Pore Phish to disk O-hiches poe
» De poe stite oo desk & opercica occa!
~ Required ip support physiplogical recy
© توص يتيك لحك و رام ty preved repeded redo
Pe bs
سا0 لح 0 لا سواه 1 مهمه Od. ,تن 6 تیه
صفحه 52:
+ @RWO Ove Orrwiurse (Ou)
© Goth by record crates LEO oP previous log record oP the او وی
LSN TransId_PrevLSN_ RedoInfo UndoInfo
be thrtog record way be Kopio4
vowpeuratoa boy revord (OLR) werd ty bt اه او وا رال تس تا
dureg recovery thot cever ceed to be ure لا شم
© Ob gene the roe of opercioc-ubont bby reverds used in odvaued revovery
ماه
© مرا a Pek! OodoOex GO to wie ext (carter) record ty be uedooe
۱ Records to betwero woud hove dlready beed vedo
* Required to avoid repected ade oP ات وی امه
LGO PraclO® تایه وولو
@eedrwr Gyetre Oocowytr, Oe.
صفحه 53:
+ @RWO Ove Orrwiurse (Ou)
B On@up hdr
© bet oP paves in the buPPer thot hove bers updated
اس ۳ ,مین و suck poe
١ PagelhOO of the poe
© RebGO & wa bGO suck that ley records bePore thie LOO راد عم
مسا uppled to he page versioa po desk
~ Getto cored ead of log used 0 poor is oserted tate dirty poe table
(hot bePore bets uechied)
~ Recorded ic chevkpoints, helps to einioie redo work:
© Obechport by record
۰ عون
۱ تم سوت تن با لجی و۳۳
۱ ۳ سره وی rocsuntion, Lost GO, the LGD oF the hast logy record
امد by the traceractin
© Cee positon va disk wies LEO of hast excpteted
سا0 لح 0 لا سواه 1 وو @eedrwr Gyetre Oocowytr, Oe.
صفحه 54:
+ @RWO Revowvery Okoritw
ORCC recovery woken hree pwses
Bl ةا pose! Orterwkes
© hick و مس
© Whick pee were diy (deh versio oot up to che) of eve oP rch
© Redd GO: LEO Brow whick rede shod stort
B Reb pest
© Repeus Keto, مه گم Red GO
» Rech GO oad Parpb GOs oe weed to avoid redo wots dread
phere oe ier
© Ocdo pes:
© Robs back ol eer p te trocsurtions
١ موه whose ober wos coxplete carer are ot urdooe
وت عیشت ول وی تشم وا وی وا له وم تم (Key ©
oe required لو و له لها
سا0 لح 0 لا سواه 1 موب Od. ,تن 6 تیه
صفحه 55:
+ ORWO Revovery: Ordos
عم سای
log record اسان وی با ما وت ۲
Proce beg record ویو( نا لت ©
حاحب اسه ناك Gets Redo GO = wit oP RecLOOs of ofl pres is Oi ©
ww paves ore drip, Redo GO = chevhport record's LEO و و ١
Gets verdo-tet = bet oF trocsurtivas to chevhpotat ly record ©
© Reus LG of ket by reverd بویت و مس ات و
hevhpotad ley record
Gone Porwerd Prow checkpoint
© Ouest page...
سا0 لح 0 لا سواه 1 هو @eedrwr Gyetre Oocowytr, Oe.
صفحه 56:
+ @RWO Revvery: Oude (Ova)
ude poe (ov.)
© Goo Porword Prow chevhpornt
© 4AP cay ley record Pourd Por tronsuricd oot fo uerdo-tet, odds troerantia to vedo
tet
© Ohevever oo updaie bby revord is Pound
"AP pose is oot to Oi Pace Dube, itis ockded with Rec GO set to LEO of
the update logy record
© 1Piecesuniiva ead Ioy record Pood, delete ierosurica Proc ucrdo-tst
© (leeps track of fest log ای و مس ات و لس
* Day be ceeded Por hater verde
of onli puss! لا
© Ret GO deterwices where ty stort neds pose
© Rech GO Por cack pore tr Or APagePable used ی و redo worl:
© 31 له لیا مومسم to be roled back
سا0 لح 0 لا سواه 1 هو @eedrwr Gyetre Oocowytr, Oe.
صفحه 57:
+ ORWO Reb Pow
Qedy Poss: Repeus history by reployioy every aniiva oot dread rePievied te the
page oa disk, oe Pols:
© Goere Ponwerd Prow Redo GO. Ohevever oo update boy revord te Pouerd:
(AP tke poe te ont it Dit <Page Mable or the LOGO of the bog record is bees
thao the Rec) GO oP the pare io Dit APaceDable, thea ship the boy record
» Olkenwiee Petck the page Pro dick. IP the PagelGO of the poe
Petched Proc disk ts fess thoc the LOGO of the logy record, rede the boxy
record
WOTE: P ether test is ceguive the ePPevts oP the bey record راد عم
sippeured vo the page. Pirst test avoids evec Petchioy the poce Prow dish!
Od. wer ©Sbervehnts, Cork ced Cnakershe ,تن 6 تیه
صفحه 58:
+ @RWO Oud Ore
Whe رمسو جم صا طحب من Por ot uzdate boy record
© @evercte ¢ OUR ooctotatey ter vedo orton perPorued (aciocy perPorcned dura ucds are
۳ .(ب فح
١ ساسا ین و بت لو لس بان
© Get DadDex OO of the OUR to the PreXOO ude oP te update oy reverd
© Orrows techate DerdoDend OO vale
BORGO apport partd rolback
© Oeed &.q. to hood: deudooks by roltey back Ket rou to rekecse read. looker
© Prqure techoates Porward ovtivey Pier portal roaches
* reverds 9 aed & rata, hater S oad O, thea Pull rollback:
0-0 -) oe ‘ebb 06% ۰
سا0 لح 0 لا سواه 1 هو Od. ,تن 6 تیه
صفحه 59:
+ ORWO: Oud Poe
Onde pes
© Cerforws bochwed soo oo oy vedoiry ol trresurtizg io verdo-tst
۶ و نله ppitetzed by shipptey ueerded ley records os Pols?
© Dext LGO te be vedere Por eack trocsoriva set to LOO of fest log record
Por trassortiog Pourd by ocdleis poss.
* @teuck step pick haryest oF these LGOs to undo, ship back tt ood ede
۱ )۳ اس روا ه بل
© Cor ordeary ley records, set cent bGO to be uedoor Por وا ما
Pred اس با سا وا لت
- ۳ وا موی records (OL(Rs) set cent LGO te be vende م
OcrkDen SO crted fa the bog record
» Ol ictervediay records are shipped siooe they would ove beec urd
باه
ای لام perPorwed ول( ۲
سا0 لح 0 لا سواه 1 ووم Od. ,تن 6 تیه
صفحه 60:
+ 0 R106 سیم
© Revery Ietependewe
© axes can be recovered indepeadeniy oP vers
» سره ۴ وج deb pours Pal hey oon be recovered Brow a backup while cher
را وم werd
۲ Gaweponis:
5 10 سامحم coo revo savepoidis aad rol back to a suvepoiat
* OsePul Por copter trassurios
© @bp wed to rolbook just eo ugk tp reteuse bck ام من
سا0 لح 0 لا سواه 1 هو @eedrwr Gyetre Oocowytr, Oe.
صفحه 61:
+ ان ۵0۷۵6 Prawres (Oru)
© re-qraiced lochicry:
۱ thal peril tuple level lochieg vo tedices cart be
لوص
لسعمدصعه متكت ,على امحتصيوام مها ,صاصر يحل require loyical ۳ ۱
سم ا
13 وی ۳ هو رم
© Ory pore tbl coo be used to prePetch payer duro redo
© رین order redo is possible:
* reds oan be posippaed ooo page betoy Peicked Prow dk, ced
وم ات لاو if Petched.
+ موی طسو رها اه طایمه() poutnue ty be processed
|۱۳ Gyetre Oocowytr, Oe. «7.00 ©Sbervehnts, Cork ced Cnakershe
صفحه 62:
صفحه 63:
م6 وله نس +
۲ Rewnte backup systews provide high wwvulubliy by olowiey ieneuniva processtay 17
مجك صوقموم P the priory ste ts destroyed.
network
log
records
@eedrwr Gyetre Oocowytr, Oe. ee ©Sbervehnts, Cork ced Cnakershe
صفحه 64:
+ Qewoe Backup Oystews (Ovd.)
© Oetriva of ature: Buchup site cust detect wheo pricey site اه عم
© to deteenick primary ste Poker Prow bol اوه وی سل
po دادج فى hake beter the privcary ocd the newts backup.
© Procter of wait
© Dp tohe over ood backup ste Pst perPorn revovery جل جاص copy oP the
database ced ol the brag records thos reveived Prow the priory.
و مسا مومس له سل ore ما لاو Dos,
اس لا
©) Okeu the backup site tches over processiay it beowes the oew priory
© Do trxePer ovine buck to old priwary when tl recovers, ol privary wet
reveive rede loye Prow the old backup ood apply ol upchites locally.
سا0 لح 0 لا سواه 1 وج Od. ,تن 6 تیه
صفحه 65:
+ QRewoe Backup Opsiews (Ovw.)
۲ تومیر ن و Vo reduce dehy in tohevver, backup ste pericdicaly proceses
the redy bog revords (in eP Pen, perPorwieg recovery Prow previous database
tute), perPores o chevhpoid, ood coo thea delete parker ports oP the oy.
Bt Opa موه pers very Post hover:
© @ochw ooonily processes redb ley record os hey onrive, applic be
updates torch.
© Whe Poke of the primary & deterted the backup rolls back امس
و له ,مات ready امس تن و و
18 Qlercaive to rewrite buckup! dstributed databuse wih rephoated cata
© Rewote buckup is Poster oad cheuper, but less toleroat to Pothure
۱ امه جص ۰ )( 06
سا0 لح 0 لا سواه 1 ووه Gyetre Oocowytr, Oe. ۱۳|
صفحه 66:
+ QReuvte wh Opstews (Ovu.)
BE Crewe darby of urckies by delavieny irmeanioa coco uci uci ts bored ot
backup} werd تال خن سول تن وحم روا روا
۲ ها ع لس ایحا و موه وه او ییون primary
© Problew: urckies way wl arive of buckup bePore it tokes over.
Duende! peer wheo trocsuntiod's peer fry nepord is varie ot prany
cond backup
© Reckees wokbliy لي وه ether site Paks.
۲ Dives: proceed op in hue-very-de B bots primary ond backup ore orive. ۴
waly the primary fe لس وا ی جح مرو و رن ما با رهش
مر اب من
© Ceter wulbiliy thon bor-veryrde; woke problew of bret trowsartiocr i 7
اه
سا0 لح 0 لا سواه 1 هو Gyetre Oocowytr, Oe. ۱۳|
صفحه 67:
Gad oP Okaper
صفحه 68:
input(A)
output(B)
هم
00 6 ,تن Od.
صفحه 69:
Corton oP te Odubwse boy Oorrespouiag to To aod
%
<T start>
<To, A, 950>
>10, 8, 2050<
<أأماكامه 10>
<T, start>
<T,, C, 600>
<T, commit>
+
@eedrwr Gyetre Oocowytr, Oe.
صفحه 70:
۹ OP te bog wad Outebase Oorrespoadiny to T, aod
%
Log Database
<Ty start>
<Ty, A, 950>
<Ty, B, 2050>
<Ty commit>
<T, start>
<T,, C, 600>
<T, commit>
سا0 لح 0 لا سواه 1 هجوج @eedrwr Gyetre Oocowytr, Oe.
صفحه 71:
Tr wh T, پاچ یم
<Ty start>
<Tp, A, 1000, 950>
<Tp, B, 2000, 2050>
<Ty commit>
<T, start>
<T,, C, 700, 600>
<T, commit>
@eedrwr Gyetre Oocowytr, Oe. wee ©Sbervehnts, Cork ced Cnakershe
صفحه 72:
Grae oP Gystew bog ard Oatbose Oorrespradiay tio
Ty wl T,
Log Database
<Tp start>
>10, بك 1000, 950>
>10, B, 2000, 2050>
>10 commit>
>11 start>
<T,, C, 700, 600>
<T, commit>
@eedrwr Gyetre Oocowytr, Oe.
Chapter 17: Recovery System
Database System Concepts
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Chapter 17: Recovery System
Failure Classification
Storage Structure
Recovery and Atomicity
Log-Based Recovery
Shadow Paging
Recovery With Concurrent Transactions
Buffer Management
Failure with Loss of Nonvolatile Storage
Advanced Recovery Techniques
ARIES Recovery Algorithm
Remote Backup Systems
Database System Concepts, 5th Ed.
17.2
©Silberschatz, Korth and Sudarshan
Failure Classification
Transaction failure :
Logical errors: transaction cannot complete due to some internal error
condition
System errors: the database system must terminate an active transaction
due to an error condition (e.g., deadlock)
System crash: a power failure or other hardware or software failure causes
the system to crash.
Fail-stop assumption: non-volatile storage contents are assumed to not be
corrupted by system crash
Database systems have numerous integrity checks to prevent corruption
of disk data
Disk failure: a head crash or similar disk failure destroys all or part of disk
storage
Destruction is assumed to be detectable: disk drives use checksums to detect
failures
Database System Concepts, 5th Ed.
17.3
©Silberschatz, Korth and Sudarshan
Recovery Algorithms
Recovery algorithms are techniques to ensure database consistency and
transaction atomicity and durability despite failures
Focus of this chapter
Recovery algorithms have two parts
1.
Actions taken during normal transaction processing to ensure enough
information exists to recover from failures
2.
Actions taken after a failure to recover the database contents to a state that
ensures atomicity, consistency and durability
Database System Concepts, 5th Ed.
17.4
©Silberschatz, Korth and Sudarshan
Storage Structure
Volatile storage:
does not survive system crashes
examples: main memory, cache memory
Nonvolatile storage:
survives system crashes
examples: disk, tape, flash memory,
non-volatile (battery backed up) RAM
Stable storage:
a mythical form of storage that survives all failures
approximated by maintaining multiple copies on distinct nonvolatile media
Database System Concepts, 5th Ed.
17.5
©Silberschatz, Korth and Sudarshan
Stable-Storage Implementation
Maintain multiple copies of each block on separate disks
copies can be at remote sites to protect against disasters such as fire or
flooding.
Failure during data transfer can still result in inconsistent copies: Block transfer
can result in
Successful completion
Partial failure: destination block has incorrect information
Total failure: destination block was never updated
Protecting storage media from failure during data transfer (one solution):
Execute output operation as follows (assuming two copies of each block):
1.
Write the information onto the first physical block.
2.
When the first write successfully completes, write the same information
onto the second physical block.
3.
The output is completed only after the second write successfully
completes.
Database System Concepts, 5th Ed.
17.6
©Silberschatz, Korth and Sudarshan
Stable-Storage Implementation (Cont.)
Protecting storage media from failure during data transfer (cont.):
Copies of a block may differ due to failure during output operation. To recover from
failure:
1.
2.
First find inconsistent blocks:
1.
Expensive solution: Compare the two copies of every disk block.
2.
Better solution:
Record in-progress disk writes on non-volatile storage (Non-volatile RAM
or special area of disk).
Use this information during recovery to find blocks that may be
inconsistent, and only compare copies of these.
Used in hardware RAID systems
If either copy of an inconsistent block is detected to have an error (bad checksum),
overwrite it by the other copy. If both have no error, but are different, overwrite
the second block by the first block.
Database System Concepts, 5th Ed.
17.7
©Silberschatz, Korth and Sudarshan
Data Access
Physical blocks are those blocks residing on the disk.
Buffer blocks are the blocks residing temporarily in main memory.
Block movements between disk and main memory are initiated through the
following two operations:
input(B) transfers the physical block B to main memory.
output(B) transfers the buffer block B to the disk, and replaces the
appropriate physical block there.
Each transaction Ti has its private work-area in which local copies of all data
items accessed and updated by it are kept.
Ti's local copy of a data item X is called xi.
We assume, for simplicity, that each data item fits in, and is stored inside, a single
block.
Database System Concepts, 5th Ed.
17.8
©Silberschatz, Korth and Sudarshan
Data Access (Cont.)
Transaction transfers data items between system buffer blocks and its private
work-area using the following operations :
read(X) assigns the value of data item X to the local variable xi.
write(X) assigns the value of local variable xi to data item {X} in the buffer
block.
both these commands may necessitate the issue of an input(BX) instruction
before the assignment, if the block BX in which X resides is not already in
memory.
Transactions
Perform read(X) while accessing X for the first time;
All subsequent accesses are to the local copy.
After last access, transaction executes write(X).
output(BX) need not immediately follow write(X). System can perform the output
operation when it deems fit.
Database System Concepts, 5th Ed.
17.9
©Silberschatz, Korth and Sudarshan
Example of Data Access
buffer
input(A)
Buffer Block A
x
Buffer Block B
Y
A
output(B)
read(X)
write(Y)
x2
x1
B
disk
y1
work area
of T1
work area
of T2
memory
Database System Concepts, 5th Ed.
17.10
©Silberschatz, Korth and Sudarshan
Recovery and Atomicity
Modifying the database without ensuring that the transaction will commit may leave
the database in an inconsistent state.
Consider transaction Ti that transfers $50 from account A to account B; goal is
either to perform all database modifications made by Ti or none at all.
Several output operations may be required for Ti (to output A and B). A failure
may occur after one of these modifications have been made but before all of them
are made.
Database System Concepts, 5th Ed.
17.11
©Silberschatz, Korth and Sudarshan
Recovery and Atomicity (Cont.)
To ensure atomicity despite failures, we first output information describing the
modifications to stable storage without modifying the database itself.
We study two approaches:
log-based recovery, and
shadow-paging
We assume (initially) that transactions run serially, that is, one after the other.
Database System Concepts, 5th Ed.
17.12
©Silberschatz, Korth and Sudarshan
Log-Based Recovery
A log is kept on stable storage.
The log is a sequence of log records, and maintains a record of update
activities on the database.
When transaction Ti starts, it registers itself by writing a
<Ti start>log record
Before Ti executes write(X), a log record <Ti, X, V1, V2> is written, where V1
is the value of X before the write, and V2 is the value to be written to X.
Log record notes that Ti has performed a write on data item Xj Xj had value
V1 before the write, and will have value V2 after the write.
When Ti finishes it last statement, the log record <Ti commit> is written.
We assume for now that log records are written directly to stable storage (that is,
they are not buffered)
Two approaches using logs
Deferred database modification
Immediate database modification
Database System Concepts, 5th Ed.
17.13
©Silberschatz, Korth and Sudarshan
Deferred Database Modification
The deferred database modification scheme records all modifications to the log,
but defers all the writes to after partial commit.
Assume that transactions execute serially
Transaction starts by writing <Ti start> record to log.
A write(X) operation results in a log record <Ti, X, V> being written, where
V is the new value for X
Note: old value is not needed for this scheme
The write is not performed on X at this time, but is deferred.
When Ti partially commits, <Ti commit> is written to the log
Finally, the log records are read and used to actually execute the previously
deferred writes.
Database System Concepts, 5th Ed.
17.14
©Silberschatz, Korth and Sudarshan
Deferred Database Modification (Cont.)
During recovery after a crash, a transaction needs to be redone if and only if both
<Ti start> and<Ti commit> are there in the log.
Redoing a transaction Ti ( redoTi) sets the value of all data items updated by the
transaction to the new values.
Crashes can occur while
the transaction is executing the original updates, or
while recovery action is being taken
example transactions T0 and T1 (T0 executes before T1):
T0: read (A)
T1 : read (C)
A: - A - 50
C:- C- 100
Write (A)
write (C)
read (B)
B:- B + 50
write (B)
Database System Concepts, 5th Ed.
17.15
©Silberschatz, Korth and Sudarshan
Deferred Database Modification (Cont.)
Below we show the log as it appears at three instances of time.
If log on stable storage at time of crash is as in case:
(a) No redo actions need to be taken
(b) redo(T0) must be performed since <T0 commit> is present
(c) redo(T0) must be performed followed by redo(T1) since
<T0 commit> and <Ti commit> are present
Database System Concepts, 5th Ed.
17.16
©Silberschatz, Korth and Sudarshan
Immediate Database Modification
The immediate database modification scheme allows database updates of an
uncommitted transaction to be made as the writes are issued
since undoing may be needed, update logs must have both old value and new
value
Update log record must be written before database item is written
We assume that the log record is output directly to stable storage
Can be extended to postpone log record output, so long as prior to execution
of an output(B) operation for a data block B, all log records corresponding
to items B must be flushed to stable storage
Output of updated blocks can take place at any time before or after transaction
commit
Order in which blocks are output can be different from the order in which they
are written.
Database System Concepts, 5th Ed.
17.17
©Silberschatz, Korth and Sudarshan
Immediate Database Modification Example
Log
Write
Output
<T0 start>
<T0, A, 1000, 950>
To, B, 2000, 2050
A = 950
B = 2050
<T0 commit>
<T1 start>
x1
<T1, C, 700,
600>
C = 600
BB, BC
<T1 commit>
BA
Note: BX denotes block containing X.
Database System Concepts, 5th Ed.
17.18
©Silberschatz, Korth and Sudarshan
Immediate Database Modification (Cont.)
Recovery procedure has two operations instead of one:
undo(Ti) restores the value of all data items updated by Ti to their old values,
going backwards from the last log record for Ti
redo(Ti) sets the value of all data items updated by Ti to the new values, going
forward from the first log record for Ti
Both operations must be idempotent
That is, even if the operation is executed multiple times the effect is the same as
if it is executed once
Needed since operations may get re-executed during recovery
When recovering after failure:
Transaction Ti needs to be undone if the log contains the record
<Ti start>, but does not contain the record <Ti commit>.
Transaction Ti needs to be redone if the log contains both the record <Ti start>
and the record <Ti commit>.
Undo operations are performed first, then redo operations.
Database System Concepts, 5th Ed.
17.19
©Silberschatz, Korth and Sudarshan
Immediate DB Modification Recovery Example
Below we show the log as it appears at three instances of time.
Recovery actions in each case above are:
(a) undo (T0): B is restored to 2000 and A to 1000.
(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are
set to 950 and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600
Database System Concepts, 5th Ed.
17.20
©Silberschatz, Korth and Sudarshan
Checkpoints
Problems in recovery procedure as discussed earlier :
1.
searching the entire log is time-consuming
2.
we might unnecessarily redo transactions which have already
3.
output their updates to the database.
Streamline recovery procedure by periodically performing checkpointing
1.
Output all log records currently residing in main memory onto stable storage.
2.
Output all modified buffer blocks to the disk.
3.
Write a log record < checkpoint> onto stable storage.
Database System Concepts, 5th Ed.
17.21
©Silberschatz, Korth and Sudarshan
Checkpoints (Cont.)
During recovery we need to consider only the most recent transaction T i that
started before the checkpoint, and transactions that started after Ti.
1.
Scan backwards from end of log to find the most recent <checkpoint>
record
2.
Continue scanning backwards till a record <Ti start> is found.
3.
Need only consider the part of log following above start record. Earlier
part of log can be ignored during recovery, and can be erased whenever
desired.
4.
For all transactions (starting from Ti or later) with no <Ti commit>,
execute undo(Ti). (Done only in case of immediate modification.)
5.
Scanning forward in the log, for all transactions starting
later with a <Ti commit>, execute redo(Ti).
Database System Concepts, 5th Ed.
17.22
from Ti or
©Silberschatz, Korth and Sudarshan
Example of Checkpoints
Tf
Tc
T1
T2
T3
T4
system failure
checkpoint
T1 can be ignored (updates already output to disk due to checkpoint)
T2 and T3 redone.
T4 undone
Database System Concepts, 5th Ed.
17.23
©Silberschatz, Korth and Sudarshan
Shadow Paging
Shadow paging is an alternative to log-based recovery; this scheme is useful if
transactions execute serially
Idea: maintain two page tables during the lifetime of a transaction –the current page
table, and the shadow page table
Store the shadow page table in nonvolatile storage, such that state of the database prior
to transaction execution may be recovered.
Shadow page table is never modified during execution
To start with, both the page tables are identical. Only current page table is used for data
item accesses during execution of the transaction.
Whenever any page is about to be written for the first time
A copy of this page is made onto an unused page.
The current page table is then made to point to the copy
The update is performed on the copy
Database System Concepts, 5th Ed.
17.24
©Silberschatz, Korth and Sudarshan
Sample Page Table
Database System Concepts, 5th Ed.
17.25
©Silberschatz, Korth and Sudarshan
Example of Shadow Paging
Shadow and current page tables after write to page 4
Database System Concepts, 5th Ed.
17.26
©Silberschatz, Korth and Sudarshan
Shadow Paging (Cont.)
To commit a transaction :
1. Flush all modified pages in main memory to disk
2. Output current page table to disk
3. Make the current page table the new shadow page table, as follows:
keep a pointer to the shadow page table at a fixed (known) location on disk.
to make the current page table the new shadow page table, simply update the
pointer to point to current page table on disk
Once pointer to shadow page table has been written, transaction is committed.
No recovery is needed after a crash — new transactions can start right away,
using the shadow page table.
Pages not pointed to from current/shadow page table should be freed (garbage
collected).
Database System Concepts, 5th Ed.
17.27
©Silberschatz, Korth and Sudarshan
Show Paging (Cont.)
Advantages of shadow-paging over log-based schemes
no overhead of writing log records
recovery is trivial
Disadvantages :
Copying the entire page table is very expensive
Can be reduced by using a page table structured like a B +-tree
– No need to copy entire tree, only need to copy paths in the tree that lead to
updated leaf nodes
Commit overhead is high even with above extension
Need to flush every updated page, and page table
Data gets fragmented (related pages get separated on disk)
After every transaction completion, the database pages containing old versions of
modified data need to be garbage collected
Hard to extend algorithm to allow transactions to run concurrently
Easier to extend log based schemes
Database System Concepts, 5th Ed.
17.28
©Silberschatz, Korth and Sudarshan
Recovery With Concurrent Transactions
We modify the log-based recovery schemes to allow multiple transactions to execute
concurrently.
All transactions share a single disk buffer and a single log
A buffer block can have data items updated by one or more transactions
We assume concurrency control using strict two-phase locking;
i.e. the updates of uncommitted transactions should not be visible to other
transactions
Logging is done as described earlier.
Otherwise how to perform undo if T1 updates A, then T2 updates A
and commits, and finally T1 has to abort?
Log records of different transactions may be interspersed in the log.
The checkpointing technique and actions taken on recovery have to be changed
since several transactions may be active when a checkpoint is performed.
Database System Concepts, 5th Ed.
17.29
©Silberschatz, Korth and Sudarshan
Recovery With Concurrent Transactions (Cont.)
Checkpoints are performed as before, except that the checkpoint log record is now of
the form
< checkpoint L>
where L is the list of transactions active at the time of the checkpoint
We assume no updates are in progress while the checkpoint is carried out (will
relax this later)
When the system recovers from a crash, it first does the following:
1.
Initialize undo-list and redo-list to empty
2.
Scan the log backwards from the end, stopping when the first <checkpoint L>
record is found.
For each record found during the backward scan:
3.
if the record is <Ti commit>, add Ti to redo-list
if the record is <Ti start>, then if Ti is not in redo-list, add Ti to undo-list
For every Ti in L, if Ti is not in redo-list, add Ti to undo-list
Database System Concepts, 5th Ed.
17.30
©Silberschatz, Korth and Sudarshan
Recovery With Concurrent Transactions (Cont.)
At this point undo-list consists of incomplete transactions which must be undone,
and redo-list consists of finished transactions that must be redone.
Recovery now continues as follows:
1.
Scan log backwards from most recent record, stopping when
<Ti start> records have been encountered for every Ti in undo-list.
During the scan, perform undo for each log record that belongs to a
transaction in undo-list.
2.
Locate the most recent <checkpoint L> record.
3.
Scan log forwards from the <checkpoint L> record till the end of the log.
Database System Concepts, 5th Ed.
During the scan, perform redo for each log record that belongs to a
transaction on redo-list
17.31
©Silberschatz, Korth and Sudarshan
Example of Recovery
Go over the steps of the recovery algorithm on the following log:
<T0 start>
<T0, A, 0, 10>
<T0 commit>
<T1 start>
<T1, B, 0, 10>
<T2 start>
/* Scan in Step 4 stops here */
<T2, C, 0, 10>
<T2, C, 10, 20>
<checkpoint {T1, T2}>
<T3 start>
<T3, A, 10, 20>
<T3, D, 0, 10>
<T3 commit>
Database System Concepts, 5th Ed.
17.32
©Silberschatz, Korth and Sudarshan
Log Record Buffering
Log record buffering: log records are buffered in main memory, instead of of
being output directly to stable storage.
Log records are output to stable storage when a block of log records in the
buffer is full, or a log force operation is executed.
Log force is performed to commit a transaction by forcing all its log records
(including the commit record) to stable storage.
Several log records can thus be output using a single output operation, reducing the
I/O cost.
Database System Concepts, 5th Ed.
17.33
©Silberschatz, Korth and Sudarshan
Log Record Buffering (Cont.)
The rules below must be followed if log records are buffered:
Log records are output to stable storage in the order in which they are
created.
Transaction Ti enters the commit state only when the log record
<Ti commit> has been output to stable storage.
Before a block of data in main memory is output to the database, all log
records pertaining to data in that block must have been output to stable storage.
This rule is called the write-ahead logging or WAL rule
– Strictly speaking WAL only requires undo information to be output
Database System Concepts, 5th Ed.
17.34
©Silberschatz, Korth and Sudarshan
Database Buffering
Database maintains an in-memory buffer of data blocks
When a new block is needed, if buffer is full an existing block needs to be
removed from buffer
If the block chosen for removal has been updated, it must be output to disk
As a result of the write-ahead logging rule, if a block with uncommitted updates is output
to disk, log records with undo information for the updates are output to the log on stable
storage first.
No updates should be in progress on a block when it is output to disk. Can be ensured
as follows.
Before writing a data item, transaction acquires exclusive lock on block containing
the data item
Lock can be released once the write is completed.
Such locks held for short duration are called latches.
Before a block is output to disk, the system acquires an exclusive latch on the block
Ensures no update can be in progress on the block
Database System Concepts, 5th Ed.
17.35
©Silberschatz, Korth and Sudarshan
Buffer Management (Cont.)
Database buffer can be implemented either
in an area of real main-memory reserved for the database, or
in virtual memory
Implementing buffer in reserved main-memory has drawbacks:
Memory is partitioned before-hand between database buffer and
applications, limiting flexibility.
Needs may change, and although operating system knows best how
memory should be divided up at any time, it cannot change the partitioning of
memory.
Database System Concepts, 5th Ed.
17.36
©Silberschatz, Korth and Sudarshan
Buffer Management (Cont.)
Database buffers are generally implemented in virtual memory in spite of some
drawbacks:
When operating system needs to evict a page that has been modified, to make
space for another page, the page is written to swap space on disk.
When database decides to write buffer page to disk, buffer page may be in
swap space, and may have to be read from swap space on disk and output
to the database on disk, resulting in extra I/O!
Known as dual paging problem.
Ideally when swapping out a database buffer page, operating system should
pass control to database, which in turn outputs page to database instead of to
swap space (making sure to output log records first)
Dual paging can thus be avoided, but common operating systems do not
support such functionality.
Database System Concepts, 5th Ed.
17.37
©Silberschatz, Korth and Sudarshan
Failure with Loss of Nonvolatile Storage
So far we assumed no loss of non-volatile storage
Technique similar to checkpointing used to deal with loss of non-volatile storage
Periodically dump the entire content of the database to stable storage
No transaction may be active during the dump procedure; a procedure similar to
checkpointing must take place
Output all log records currently residing in main memory onto stable storage.
Output all buffer blocks onto the disk.
Copy the contents of the database to stable storage.
Output a record <dump> to log on stable storage.
To recover from disk failure
restore database from most recent dump.
Consult the log and redo all transactions that committed after the dump
Can be extended to allow transactions to be active during dump;
known as fuzzy dump or online dump
Will study fuzzy checkpointing later
Database System Concepts, 5th Ed.
17.38
©Silberschatz, Korth and Sudarshan
Advanced Recovery Algorithm
Database System Concepts
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Advanced Recovery Techniques
Support high-concurrency locking techniques, such as those used for B +-tree
concurrency control
Operations like B+-tree insertions and deletions release locks early.
They cannot be undone by restoring old values (physical undo), since once a
lock is released, other transactions may have updated the B +-tree.
Instead, insertions (resp. deletions) are undone by executing a deletion (resp.
insertion) operation (known as logical undo).
For such operations, undo log records should contain the undo operation to be
executed
called logical undo logging, in contrast to physical undo logging.
Redo information is logged physically (that is, new value for each write) even for
such operations
Logical redo is very complicated since database state on disk may not be
“operation consistent”
Database System Concepts, 5th Ed.
17.40
©Silberschatz, Korth and Sudarshan
Advanced Recovery Techniques (Cont.)
Operation logging is done as follows:
1.
When operation starts, log <Ti, Oj, operation-begin>. Here Oj is a unique
identifier of the operation instance.
2.
While operation is executing, normal log records with physical redo and physical
undo information are logged.
3.
When operation completes, <Ti, Oj, operation-end, U> is logged, where U
contains information needed to perform a logical undo information.
If crash/rollback occurs before operation completes:
the operation-end log record is not found, and
the physical undo information is used to undo operation.
If crash/rollback occurs after the operation completes:
the operation-end log record is found, and in this case
logical undo is performed using U; the physical undo information for the
operation is ignored.
Redo of operation (after crash) still uses physical redo information.
Database System Concepts, 5th Ed.
17.41
©Silberschatz, Korth and Sudarshan
Advanced Recovery Techniques (Cont.)
Rollback of transaction Ti is done as follows:
Scan the log backwards
1.
If a log record <Ti, X, V1, V2> is found, perform the undo and log a
special redo-only log record <Ti, X, V1>.
2.
If a <Ti, Oj, operation-end, U> record is found
Rollback the operation logically using the undo information U.
– Updates performed during roll back are logged just like during
normal operation execution.
– At the end of the operation rollback, instead of logging an operationend record, generate a record
<Ti, Oj, operation-abort>.
Database System Concepts, 5th Ed.
Skip all preceding log records for Ti until the record <Ti, Oj operationbegin> is found
17.42
©Silberschatz, Korth and Sudarshan
Advanced Recovery Techniques (Cont.)
Scan the log backwards (cont.):
3.
If a redo-only record is found ignore it
4.
If a <Ti, Oj, operation-abort> record is found:
skip all preceding log records for Ti until the record
<Ti, Oj, operation-begin> is found.
5.
Stop the scan when the record <Ti, start> is found
6.
Add a <Ti, abort> record to the log
Some points to note:
Cases 3 and 4 above can occur only if the database crashes while a
transaction is being rolled back.
Skipping of log records as in case 4 is important to prevent multiple rollback of
the same operation.
Database System Concepts, 5th Ed.
17.43
©Silberschatz, Korth and Sudarshan
Advanced Recovery Techniques(Cont,)
The following actions are taken when recovering from system crash
1.
Scan log forward from last < checkpoint L> record
1.
Repeat history by physically redoing all updates of all transactions,
2.
Create an undo-list during the scan as follows
undo-list is set to L initially
Whenever <Ti start> is found Ti is added to undo-list
Whenever <Ti commit> or <Ti abort> is found, Ti is deleted from
undo-list
This brings database to state as of crash, with committed as well as uncommitted
transactions having been redone.
Now undo-list contains transactions that are incomplete, that is, have neither
committed nor been fully rolled back.
Database System Concepts, 5th Ed.
17.44
©Silberschatz, Korth and Sudarshan
Advanced Recovery Techniques (Cont.)
Recovery from system crash (cont.)
2. Scan log backwards, performing undo on log records of transactions found in
undo-list.
Transactions are rolled back as described earlier.
When <Ti start> is found for a transaction Ti in undo-list, write a <Ti
abort> log record.
Stop scan when <Ti start> records have been found for all Ti in undo-list
This undoes the effects of incomplete transactions (those with neither commit
nor abort log records). Recovery is now complete.
Database System Concepts, 5th Ed.
17.45
©Silberschatz, Korth and Sudarshan
Advanced Recovery Techniques (Cont.)
Checkpointing is done as follows:
1.
Output all log records in memory to stable storage
2.
Output to disk all modified buffer blocks
3.
Output to log on stable storage a < checkpoint L> record.
Transactions are not allowed to perform any actions while checkpointing is in
progress.
Fuzzy checkpointing allows transactions to progress while the most time
consuming parts of checkpointing are in progress
Performed as described on next slide
Database System Concepts, 5th Ed.
17.46
©Silberschatz, Korth and Sudarshan
Advanced Recovery Techniques (Cont.)
Fuzzy checkpointing is done as follows:
1.
Temporarily stop all updates by transactions
2.
Write a <checkpoint L> log record and force log to stable storage
3.
Note list M of modified buffer blocks
4.
Now permit transactions to proceed with their actions
5.
Output to disk all modified buffer blocks in list M
6.
blocks should not be updated while being output
Follow WAL: all log records pertaining to a block must be output before
the block is output
Store a pointer to the checkpoint record in a fixed position last_checkpoint on
disk
When recovering using a fuzzy checkpoint, start scan from the checkpoint record
pointed to by last_checkpoint
Log records before last_checkpoint have their updates reflected in database
on disk, and need not be redone.
Incomplete checkpoints, where system had crashed while performing
checkpoint, are handled safely
Database System Concepts, 5th Ed.
17.47
©Silberschatz, Korth and Sudarshan
ARIES Recovery Algorithm
Database System Concepts
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
ARIES
ARIES is a state of the art recovery method
Incorporates numerous optimizations to reduce overheads during normal
processing and to speed up recovery
The “advanced recovery algorithm” we studied earlier is modeled after
ARIES, but greatly simplified by removing optimizations
Unlike the advanced recovery algorithm, ARIES
1.
Uses log sequence number (LSN) to identify log records
Stores LSNs in pages to identify what updates have already been
applied to a database page
2.
Physiological redo
3.
Dirty page table to avoid unnecessary redos during recovery
4.
Fuzzy checkpointing that only records information about dirty pages, and
does not require dirty pages to be written out at checkpoint time
Database System Concepts, 5th Ed.
More coming up on each of the above …
17.49
©Silberschatz, Korth and Sudarshan
ARIES Optimizations
Physiological redo
Affected page is physically identified, action within page can be logical
Database System Concepts, 5th Ed.
Used to reduce logging overheads
– e.g. when a record is deleted and all other records have to be moved
to fill hole
»
Physiological redo can log just the record deletion
»
Physical redo would require logging of old and new values for
much of the page
Requires page to be output to disk atomically
– Easy to achieve with hardware RAID, also supported by some disk
systems
– Incomplete page output can be detected by checksum techniques,
»
But extra actions are required for recovery
»
Treated as a media failure
17.50
©Silberschatz, Korth and Sudarshan
ARIES Data Structures
Log sequence number (LSN) identifies each log record
Must be sequentially increasing
Typically an offset from beginning of log file to allow fast access
Easily extended to handle multiple log files
Each page contains a PageLSN which is the LSN of the last log record whose
effects are reflected on the page
To update a page:
X-latch the pag, and write the log record
Update the page
Record the LSN of the log record in PageLSN
Unlock page
Page flush to disk S-latches page
Thus page state on disk is operation consistent
– Required to support physiological redo
PageLSN is used during recovery to prevent repeated redo
Thus ensuring idempotence
Database System Concepts, 5th Ed.
17.51
©Silberschatz, Korth and Sudarshan
ARIES Data Structures (Cont.)
Each log record contains LSN of previous log record of the same transaction
LSN TransId PrevLSN
LSN in log record may be implicit
RedoInfo
UndoInfo
Special redo-only log record called compensation log record (CLR) used to log
actions taken during recovery that never need to be undone
Also serve the role of operation-abort log records used in advanced recovery
algorithm
Have a field UndoNextLSN to note next (earlier) record to be undone
Records in between would have already been undone
Required to avoid repeated undo of already undone actions
LSN TransID UndoNextLSN RedoInfo
Database System Concepts, 5th Ed.
17.52
©Silberschatz, Korth and Sudarshan
ARIES Data Structures (Cont.)
DirtyPageTable
List of pages in the buffer that have been updated
Contains, for each such page
PageLSN of the page
RecLSN is an LSN such that log records before this LSN have already
been applied to the page version on disk
– Set to current end of log when a page is inserted into dirty page table
(just before being updated)
– Recorded in checkpoints, helps to minimize redo work
Checkpoint log record
Contains:
DirtyPageTable and list of active transactions
For each active transaction, LastLSN, the LSN of the last log record
written by the transaction
Fixed position on disk notes LSN of last completed
checkpoint log record
Database System Concepts, 5th Ed.
17.53
©Silberschatz, Korth and Sudarshan
ARIES Recovery Algorithm
ARIES recovery involves three passes
Analysis pass: Determines
Which transactions to undo
Which pages were dirty (disk version not up to date) at time of crash
RedoLSN: LSN from which redo should start
Redo pass:
Repeats history, redoing all actions from RedoLSN
RecLSN and PageLSNs are used to avoid redoing actions already
reflected on page
Undo pass:
Rolls back all incomplete transactions
Transactions whose abort was complete earlier are not undone
– Key idea: no need to undo these transactions: earlier undo actions were
logged, and are redone as required
Database System Concepts, 5th Ed.
17.54
©Silberschatz, Korth and Sudarshan
ARIES Recovery: Analysis
Analysis pass
Starts from last complete checkpoint log record
Reads in DirtyPageTable from log record
Sets RedoLSN = min of RecLSNs of all pages in DirtyPageTable
In case no pages are dirty, RedoLSN = checkpoint record’s LSN
Sets undo-list = list of transactions in checkpoint log record
Reads LSN of last log record for each transaction in undo-list from
checkpoint log record
Scans forward from checkpoint
.. On next page …
Database System Concepts, 5th Ed.
17.55
©Silberschatz, Korth and Sudarshan
ARIES Recovery: Analysis (Cont.)
Analysis pass (cont.)
Scans forward from checkpoint
If any log record found for transaction not in undo-list, adds transaction to undolist
Whenever an update log record is found
If transaction end log record found, delete transaction from undo-list
Keeps track of last log record for each transaction in undo-list
If page is not in DirtyPageTable, it is added with RecLSN set to LSN of
the update log record
May be needed for later undo
At end of analysis pass:
RedoLSN determines where to start redo pass
RecLSN for each page in DirtyPageTable used to minimize redo work
All transactions in undo-list need to be rolled back
Database System Concepts, 5th Ed.
17.56
©Silberschatz, Korth and Sudarshan
ARIES Redo Pass
Redo Pass: Repeats history by replaying every action not already reflected in the
page on disk, as follows:
Scans forward from RedoLSN. Whenever an update log record is found:
1.
If the page is not in DirtyPageTable or the LSN of the log record is less
than the RecLSN of the page in DirtyPageTable, then skip the log record
2.
Otherwise fetch the page from disk. If the PageLSN of the page
fetched from disk is less than the LSN of the log record, redo the log
record
NOTE: if either test is negative the effects of the log record have already
appeared on the page. First test avoids even fetching the page from disk!
Database System Concepts, 5th Ed.
17.57
©Silberschatz, Korth and Sudarshan
ARIES Undo Actions
When an undo is performed for an update log record
Generate a CLR containing the undo action performed (actions performed during undo are
logged physicaly or physiologically).
Set UndoNextLSN of the CLR to the PrevLSN value of the update log record
Arrows indicate UndoNextLSN value
ARIES supports partial rollback
Used e.g. to handle deadlocks by rolling back just enough to release reqd. locks
Figure indicates forward actions after partial rollbacks
1
CLR for record n noted as n’ in figure below
2
records 3 and 4 initially, later 5 and 6, then full rollback
3
Database System Concepts, 5th Ed.
4
4
'
3
'
5
17.58
6
6
'
5 2
' '
1
'
©Silberschatz, Korth and Sudarshan
ARIES: Undo Pass
Undo pass
Performs backward scan on log undoing all transaction in undo-list
Backward scan optimized by skipping unneeded log records as follows:
Next LSN to be undone for each transaction set to LSN of last log record
for transaction found by analysis pass.
At each step pick largest of these LSNs to undo, skip back to it and undo it
After undoing a log record
– For ordinary log records, set next LSN to be undone for transaction to
PrevLSN noted in the log record
– For compensation log records (CLRs) set next LSN to be undo to
UndoNextLSN noted in the log record
»
All intervening records are skipped since they would have been undo
already
Undos performed as described earlier
Database System Concepts, 5th Ed.
17.59
©Silberschatz, Korth and Sudarshan
Other ARIES Features
Recovery Independence
Pages can be recovered independently of others
E.g. if some disk pages fail they can be recovered from a backup while other
pages are being used
Savepoints:
Transactions can record savepoints and roll back to a savepoint
Useful for complex transactions
Also used to rollback just enough to release locks on deadlock
Database System Concepts, 5th Ed.
17.60
©Silberschatz, Korth and Sudarshan
Other ARIES Features (Cont.)
Fine-grained locking:
Index concurrency algorithms that permit tuple level locking on indices can be
used
These require logical undo, rather than physical undo, as in advanced
recovery algorithm
Recovery optimizations: For example:
Dirty page table can be used to prefetch pages during redo
Out of order redo is possible:
redo can be postponed on a page being fetched from disk, and
performed when page is fetched.
Meanwhile other log records can continue to be processed
Database System Concepts, 5th Ed.
17.61
©Silberschatz, Korth and Sudarshan
Remote Backup Systems
Database System Concepts
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Remote Backup Systems
Remote backup systems provide high availability by allowing transaction processing to
continue even if the primary site is destroyed.
Database System Concepts, 5th Ed.
17.63
©Silberschatz, Korth and Sudarshan
Remote Backup Systems (Cont.)
Detection of failure: Backup site must detect when primary site has failed
to distinguish primary site failure from link failure maintain several
communication links between the primary and the remote backup.
Transfer of control:
To take over control backup site first perform recovery using its copy of the
database and all the long records it has received from the primary.
Thus, completed transactions are redone and incomplete transactions are
rolled back.
When the backup site takes over processing it becomes the new primary
To transfer control back to old primary when it recovers, old primary must
receive redo logs from the old backup and apply all updates locally.
Database System Concepts, 5th Ed.
17.64
©Silberschatz, Korth and Sudarshan
Remote Backup Systems (Cont.)
Time to recover: To reduce delay in takeover, backup site periodically proceses
the redo log records (in effect, performing recovery from previous database
state), performs a checkpoint, and can then delete earlier parts of the log.
Hot-Spare configuration permits very fast takeover:
Backup continually processes redo log record as they arrive, applying the
updates locally.
When failure of the primary is detected the backup rolls back incomplete
transactions, and is ready to process new transactions.
Alternative to remote backup: distributed database with replicated data
Remote backup is faster and cheaper, but less tolerant to failure
more on this in Chapter 19
Database System Concepts, 5th Ed.
17.65
©Silberschatz, Korth and Sudarshan
Remote Backup Systems (Cont.)
Ensure durability of updates by delaying transaction commit until update is logged at
backup; avoid this delay by permitting lower degrees of durability.
One-safe: commit as soon as transaction’s commit log record is written at primary
Two-very-safe: commit when transaction’s commit log record is written at primary
and backup
Problem: updates may not arrive at backup before it takes over.
Reduces availability since transactions cannot commit if either site fails.
Two-safe: proceed as in two-very-safe if both primary and backup are active. If
only the primary is active, the transaction commits as soon as is commit log record is
written at the primary.
Better availability than two-very-safe; avoids problem of lost transactions in onesafe.
Database System Concepts, 5th Ed.
17.66
©Silberschatz, Korth and Sudarshan
End of Chapter
Database System Concepts
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Block Storage Operations
Database System Concepts, 5th Ed.
17.68
©Silberschatz, Korth and Sudarshan
Portion of the Database Log Corresponding to T0 and
T1
Database System Concepts, 5th Ed.
17.69
©Silberschatz, Korth and Sudarshan
State of the Log and Database Corresponding to T0 and
T1
Database System Concepts, 5th Ed.
17.70
©Silberschatz, Korth and Sudarshan
Portion of the System Log Corresponding to T0 and T1
Database System Concepts, 5th Ed.
17.71
©Silberschatz, Korth and Sudarshan
State of System Log and Database Corresponding to
T0 and T1
Database System Concepts, 5th Ed.
17.72
©Silberschatz, Korth and Sudarshan