صفحه 1:
Modern Information Retrieval bevtare (: Ietoduction

صفحه 2:
Lecture Overview ° Tetroductiva to the Ovurse * | ‏صا صصلاص كوه‎ IePoreaica Retrieval ° Dre IePorwatic Geehioy Process ۶ IePorwoiod Retrieval Wistory ond Oevelopwests Oisrussiva ۰ RePerewes € وه موه مه مرو 206 ۳

صفحه 3:
Lecture Overview ° Tetroductiva to the Ovurse وه موه مه مرو وه ‎Blackened‏

صفحه 4:
Purposes of the Course ° Do ipo ‏وا و‎ theoreticd uoderstocdiag oP IR wodels > ‏ملس‎ ‎> Orctor Opae > Probobitetic (iackichoy Loopucce Doel) ° Do exavice woior upplicdion oredr oP IR iachudicgy: > Deb Geack > Dest cateyortzatod ood chistertay > Cross krouage retrieval > Dent ‏ویو‎ ‎> Digi Librortes وه موه مه مرو وه ‎iran‏

صفحه 5:
Purposes of the Course ... * ‏موی و‎ how IR perPorwoue ts weosured: > ‏ملس‎ ‎> Groteticd stqaPicaace ° Goro hords-v0 experieuwe wit IR systews وه موه مه مرو وه ‎Blackened‏

صفحه 6:
9 Lecture Overview وه موه مه مرو وه ‎iran‏

صفحه 7:
Introduction ° Bod oP IR is to retrieve of ced vol) the “elev” docuveds ito oolectiog Por ‏لح وه‎ user wit a ‏ماو و لس ملسم‎ ۱ a ceded cave i UR theory 8 Whew does oa IR spstew work wheo the “colevtion” ts tl domnveds wahuble oa the Deb? > Deb seorck maxis ore sireso-teotny the trachiccrd IR وه موه مه مرو 206 ۳

صفحه 8:
Information Retrieval Dke yoo is to searck barge domed oolevions (wilives oP dorucvects) io retrieve swll subsets eleva iv the user's inPorwotiva ceed ° Cxaoples ure: ‏م1 خ‎ searck ragiaes > Digi throry cota yues 8 وه موه مه مرو وه ‎iran‏

صفحه 9:
Information Retrieval ... * Gowe ‏اس وه نام‎ TR > Oross krouage retrieval > Opeeck/broadces retrieval > ‏دجمت ارجا‎ ‏ی‎ 9 وه موه مه مرو وه ال

صفحه 10:
Origins ° Opwwnniccion theory revisited ° Oroblews wik ‏و و موجه‎ 1 ‏وه موه مه مده‎ ao Blackened ۵۵

صفحه 11:
Components of an IR System ‎a‏ وه موه مه مرو 206 ۳

صفحه 12:
Lecture Overview * ۳ ‏م۳‎ Geehicn Process وه موه مه مرو وه ‎Blackened‏

صفحه 13:
Review: Information Overload * “De word's tot peudy production oP pict, Pio, vpticd, vod woceetic ooeteot would require roughly 9 ‏عصطلرب متا‎ oP storage. Mhis is the equivdedt of SSO weynbyies per persva Por cock wad, wowed, ed hid oo eordtk.” (Oortad & bywan) ۰ " ‏لام مین‎ oP today is ow to teack pevphe to iqeore the inelevodt, how ‏صا‎ rePuse to ‏با تن‎ bePore they ure suPPooded. Por too woop Parts oe ws bed os wor ot of.” (0.1. Buden) 6 سوه سوه ماه بو 206 ۳

صفحه 14:
The Standard Retrieval Interaction Model Information Need Send to System ‎ae‏ وه موه مه مرو وه ‎iran‏ ‎

صفحه 15:
Standard Model of IR 0 Ossanopiivds: > ‏سول‎ ypal te wantcotzieny precision acd revel sicvubraevush > Vke tePor@aios weed rewrote stir > Phe due is ic the result decuwed set ‎as‏ وه موه مه مرو وه ‎iran‏

صفحه 16:
6 Problems with Standard Model ° Osers teara durtay the searck process: ‏<ز‎ )9 15 thes oP retrieved doaredis > Reuday retrieved donned > Otewiey bets oP retated topics/esourss ters > ‏لوا مره‎ * Gower users doo the booq (apporediy) disoryacized fists oP ‏لمح‎ وه موه مه مرو وه ‎iran‏

صفحه 17:
a IR is an Iterative Process Oorkspae وه موه مه 206

صفحه 18:
IR is a Dialog 5 ‏و‎ ‎ag —* Dhe exchooge doeso't cod with Pirst ooewer ۰ )]( ‏عدو عمجم حمس كعجو‎ elewedts of a usePut 7... ‏اوه مار مور وه‎ 5تل ۳۳۲۲۲۵ 6 وه موه مه مرو وه ‎iran‏

صفحه 19:
Bates’ “Berry-Picking” Model ۰ ‏لول‎ 10 wodet ‏سوه سا و لت معط جوز‎ ‏اصوجه لا لول‎ وصصصحع‎ © Ben ‏اطو‎ ‏خ‎ nteresteny Poreration t scutered ‏جما|‎ berries xy broke > Phe query is cated ‏پات‎ © وه موه مه مرو وه ال

صفحه 20:
Berry-Picking Model © shetch oP searcher... “wou rach wony wipe twvards « ceoerd qoal oP sateRaciony oowpkeioa oP researck rekated to ext ‏)ات موم‎ Bates 09) ‎eo‏ وه موه مه مرو وه ‎iran‏ ‎ ‎

صفحه 21:
eq Berry-Picking Model ... ° Dke query is vootoudly shiP tog ° Dew iPorwativg wap vied oew ideas ood eu ‏اس‎ * ۳ ‏او مروت‎ > ‏وا هی بمب ح‎ oi skate, Prrd retrieved set > Ie satePied by a sertes of selevtocs urd bite of ‎the way‏ موه لت لول ‏وه موه مه مرو وه ‎iran‏ ‎

صفحه 22:
Restricted Form of the IR Problem ° Dke syste has waluble vdly ‏و توص‎ text poses ° 4s nespouse is licvited to selevtiogy Prow these pussuyes ‏مسا روصم له‎ to the wer 4 wust setent, say, ID or CO passages put of wilivas or bilivas! وه وه موه مه مرو وه ‎iran‏

صفحه 23:
Information Retrieval * Revised Dusk Gtateweut: Quid a syste that retrieves domes thot users ure likely to Pied relevoot to their queries 8 DR set oP assuopiiogs vadertes the Piet oF 4ePorwaiva Retrieval وه وه موه مه مرو وه ‎iran‏

صفحه 24:
Lecture Overview ۶ IePorwoiod Retrieval Wistory ond Oevelopwests وه موه مه مرو وه ‎iran‏

صفحه 25:
IR History Overview ۶ ‏وخ‎ Retrieval Wistory > Bob “WR” > DewOrwputer UR (oid (OGY 'z) > ‏وا یی‎ ooo ter-bamed IR Prox ‏لزب‎ 096005 > Oodera WR - ‏,واه طسو را‎ Deb-bosed seack onl Crack Bates - IQOV's وه سوه سوه ماه ‎Dora‏ ‏وه ‎iran‏

صفحه 26:
Origins ۶ Orr ety history oP pootedt represrotatios > Cuveriaa phew oad ‘ewebpes”

صفحه 27:
Rev. loka Dike, IOD0's : Phe (Phpsophiced rc ocr tbh: Olkeko Ostuxid od Pod Ofer, (O00 's: Phe “ova raphir price” ond Ociversd ‏متس‎ ‎Orkbery, 19602 - Coane dOP O's W.. Dets, “Oorld Broa: Nhe idea a of a peraxnect Dortd Baryeoped.” (lotroduntoa to tke ‏جل سوه‎ Proagasr, (89?) Ounevar Bush, “Bs we way tick.” kar Doky, ۰ Derw “ePorratca Retievd” cotced ‏مك0 نذا‎ Dovers. (OSE وه موه مه مرو 206 ۳

صفحه 28:
Card-Based IR Systems © Doiern (Corey, Perry, Bern, Kea: (SO) > Developed cad used Pree wid (QPOs 0005104: ۵1 So a @ هه وه هو 09 ود 6ه 76 هم وه 6ه ‎as‏ هه وه we 50۰ ۵۵6 ‎wre?‏ ‎ee 05 ۵ ۶‏ و6 1 6 وه ‎oe‏ وم معوم همع ‎dor 66 66‏ 9909 وطعن امني 890 ‎woe ۵ we‏ م9 موه ‎ore‏ عم wor Oars 6۴ ‏هه ره‎ Blackened ‏وه‎

صفحه 29:
وه Card-Based IR Systems ... ۰ Butea Opiicd Oviwidewsr Curds (“Peek-c-Oov Cards”), (QFE ‏م5‎ ‎6 ‎bus 9 6 . 9 0 0 9 9 0 9 0 9 orm Ghork Dowersay Goran ۳ 206

صفحه 30:
2 6 Card-Based IR Systems ... 7 ‏ی‎ (exke-wiched rads) overs, 0 9۵ 6 6 0 6 6 06 6 6 6 6 6 6 6 66 666 6 © 2 ۱۲ Oe oOorurect oOrnnrect ٩ Io 5 bk tats oie: hed heck ond Lara ‏سا ااا‎ 000 ele ched k ww bhy ia phe ° 2229 66 5 ° ° ° 6 22:29“ 280 وه موه مه مرو 206 ۳

صفحه 31:
Computer-Based IR Systems ° Butey’s (OS OG thesis Prow OVP sugested that seurckicy GO wwilive tec ‏وه سوه رطس‎ 906 fedex ters would tohe ‏)راومه‎ ‎hours ‎> Due tw the ‏جمدب صا لحجه‎ oral ‏لماك‎ the ted fet ore wwewory white corrpicy out the ooparisvas * d9OS° — Desk Gat wi Kabartae Wepburs od Gpewer ‏رو‎ 2 00 60 وه موه مه مرو وه ال

صفحه 32:
Historical Milestones in IR Research ۰ 690 ‏مورا مسق‎ Propertes (Luka) ۰ 16606 ‏سل‎ Iedentary (Darva & (Cabcer) © (900 Tern assprtaiog oad chestertery (Doe) ۰ (9898 Orci pace Dede (Sakon) * (990 Query expansion (Rocri7, Gata) © (OPO Grateicd Derihtay (Sparrk-lows) + (OPS ‏و64‎ 0 (Waner, Bovkstem, Suxneva) + (OPO Reevawe Drightay (Robertsra, Sparchaloaes) * (900 Crazy see (Bovkstein) + (900 Crobebiiy witout rice) (Orch) ‎oe‏ وه موه مه مرو وه ‎iran‏ ‎

صفحه 33:
Historical Milestones in IR Research ... 1969 ‏سا‎ Rewressioa (Pox) * (909 Probublets Depeudewr (Scio, Yu) * (90S) Ceverdized Orci Spare Doel (Does, Rhexpca) ۰ 4662 ‏لو مها نحص‎ ROORIC/POPIC (Pow, et ot) ۰ 690 ‏موق مرا‎ Tedentary (Ducrcis, Deerwester) ٠ 690 ‏مساق موم‎ Rewressioa (Ovoper, Bey, Pub) * (898 PREC (haw) ۰ 4699 ‏سوه و1‎ (Durie, Croh) ٠ O9F Wed cetworks (Kuch) ٠ 690 Lewnnrne Dodo (Poute, Orch) وه وه موه مه مرو وه ‎iran‏

صفحه 34:
Boolean IR Systems + Gyxthex t GOO, 100 + Preect DOO a OIP, 1999 (terre) * POLO a BOO, WOOF (Lark! Borky) + (00 Dew York Word's Par - Becker ond Waves produced syste ‏اه مه لح وه سوه و‎ reservation ‏وه‎ - 166 و سوه او و لو ما ۵00 * ی 0۵866-46000 )959( ‏مسا‎ ۵۸ ۰ 69 ‏م0‎ Orc Dew tairrdured LEXI — Pal tent of ‏صمل‎ Breeton ‎ond 0‏ 1920 سا - رای اون * ‎or‏ سوه سوه ماه ون ز وه ‎Blackened‏ ‎

صفحه 35:
Experimental IR systems ° Probubitetic trdextery — Darva ond Kubo, (OOO * GOORT — Gerad Gated a Coral — Ocvior spare wodel, (9? O's * GIRE a Gyrus * 1908 - ‏سن‎ ‎۰ نایلوس‎ 1)0660( ۰ PREC - 998 ° deguery * Chevhre 11 ((IO9@) * @@ (9987) + Lew (OOOO?) وه سوه سوه ماه ‎Dare‏ 2 وه ‎Blackened‏

صفحه 36:
The Internet and the WWW ° Gopker, Orckie, Orrvcica, DOIG * Dio Cervers-Lee, O00 orecies DOO t CERO — priqitahy hypertext voy * Oeb-orawter Lycos * ‏ه05 ما‎ سوه سوه ماه بو وه ‎iran‏

صفحه 37:
Information Retrieval - Historical View Research erustiqy ۶ Dooku work, otters ‏خات‎ * OTPLOO, Leme emer, ‏یه‎ 9 + GPOIRG (Workea + Ovvior space work, bese) ‏سا مه سس‎ ١ ‏حي‎ ‎Pevuback )066002( (ee ‏سل‎ ‎5 ‏مه ۱0۵ موی ۰ وسي عط واه‎ (90's) Bo + Cuz oellbye, evckrotd 1 ‏مجه‎ (O90 =) (C($AOO0?)) ‏سس‎ ‎+ Regrssns, wud wet, space, probable) ‏فا نوی مس‎ ‏مرس‎ (0 (9090's) wim ‏موه موجه‎ on iran ‏وه‎

صفحه 38:
Research Sources in Information Retrieval جرا موه مه نومه ‎@CO‏ * سمل وا موه ۲ وه ‎Ow.‏ ۶ ۰ ‏لجه عصرده) مجمجو()‎ 1 Proceedings (bos ‏(عوون)‎ ° IePorwdiod Provessioy ord Qacrageset (Percprova) 8 oured oP Orrurectaiica * ‏ل‎ ‎* TREC CodPerewe Proceeds ° bevtures it Oowputer Oviewr )30 سوه سوه ماه ‎Dora‏ ‏وه الم ده

صفحه 39:
Research Systems Software * AADERY (Cri) © OCP (Roberteva) ۰ PRICE (awn) > ‏رباص موی‎ * GO@RP (Buwkev) * OG (iten, OrPRa) * CLECMIRE (bersvn) > kip: ckeskine.berkekey.ecks 2 * LEOOR toolkit ° Duce * Others Darn Chek Doversay Opry ‏وه‎ Blackened ‏وه‎

صفحه 40:
Lecture Overview وه موه مه مرو وه ‎Blackened‏

صفحه 41:
Next Time ° @usic Cowepts in IR ۰ Readings > oye & Deed “Phe Phesorus Ppprowk to Ieboraioa Retrieval” (ice Reads bok) > bike “Phe Burwratie Derivation of ‏مه توا‎ Retievd Boaordewerus Pro Dackice-Reakable Texts” (i React) > Dovke “eadentcry cer Pbetratiay by Bosvvkaion, P11" (it Recers) 60 وه موه مه مرو ‎eae‏ 02

صفحه 42:
Lecture Overview وه موه مه مرو وه ‎Blackened‏

صفحه 43:
References موق موه مسق مرو ‎iran eae‏

Modern Information Retrieval Lecture 1: Introduction Lecture Overview • Introduction to the Course • Introduction to Information Retrieval • The Information Seeking Process • Information Retrieval History and Developments • Discussion • References Marjan Ghazvininejad Sharif University Spring 2012 2 Lecture Overview • Introduction to the Course • Introduction to Information Retrieval • The Information Seeking Process • Information Retrieval History and Developments • Discussion • References Marjan Ghazvininejad Sharif University Spring 2012 3 Purposes of the Course • To impart a basic theoretical understanding of IR models  Boolean  Vector Space  Probabilistic (including Language Models) • To examine major application areas of IR including:      Web Search Text categorization and clustering Cross language retrieval Text summarization Digital Libraries Marjan Ghazvininejad Sharif University Spring 2012 4 Purposes of the Course … • To understand how IR performance is measured:  Recall/Precision  Statistical significance • Gain hands-on experience with IR systems Marjan Ghazvininejad Sharif University Spring 2012 5 Lecture Overview • Introduction to the Course • Introduction to Information Retrieval • The Information Seeking Process • Information Retrieval History and Developments • Discussion • References Marjan Ghazvininejad Sharif University Spring 2012 6 Introduction • Goal of IR is to retrieve all and only the “relevant” documents in a collection for a particular user with a particular need for information  Relevance is a central concept in IR theory • How does an IR system work when the “collection” is all documents available on the Web?  Web search engines are stress-testing the traditional IR models Marjan Ghazvininejad Sharif University Spring 2012 7 Information Retrieval • The goal is to search large document collections (millions of documents) to retrieve small subsets relevant to the user’s information need • Examples are:  Internet search engines  Digital library catalogues Marjan Ghazvininejad Sharif University Spring 2012 8 Information Retrieval … • Some application areas within IR     Cross language retrieval Speech/broadcast retrieval Text categorization Text summarization • Subject to objective testing and evaluation  hundreds of queries  millions of documents Marjan Ghazvininejad Sharif University Spring 2012 9 Origins • Communication theory revisited • Problems with transmission of meaning Message Source Message Encoding Decoding Destination Channel Noise Message Source Marjan Ghazvininejad Message Encoding (writing/indexing) Storage Decoding (Retrieval/Reading) Sharif University Spring 2012 Destination 10 Components of an IR System Documents Authoritative Indexing Rules User’s Information Need Indexing Process Query Specification Process Severe Inform ation Loss Query Index Records & Document Surrogates Retrieval Process Retriev al Rules List of Documents Relevant to User’s Information Need Marjan Ghazvininejad Sharif University Spring 2012 11 Lecture Overview • Introduction to the Course • Introduction to Information Retrieval • The Information Seeking Process • Information Retrieval History and Developments • Discussion • References Marjan Ghazvininejad Sharif University Spring 2012 12 Review: Information Overload • “The world's total yearly production of print, film, optical, and magnetic content would require roughly 1.5 billion gigabytes of storage. This is the equivalent of 250 megabytes per person for each man, woman, and child on earth.” (Varian & Lyman) • “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden) Marjan Ghazvininejad Sharif University Spring 2012 13 The Standard Retrieval Interaction Model Marjan Ghazvininejad Sharif University Spring 2012 14 Standard Model of IR • Assumptions:  The goal is maximizing precision and recall simultaneously  The information need remains static  The value is in the resulting document set Marjan Ghazvininejad Sharif University Spring 2012 15 Problems with Standard Model • Users learn during the search process:  Scanning titles of retrieved documents  Reading retrieved documents  Viewing lists of related topics/thesaurus terms  Navigating hyperlinks • Some users don’t like long (apparently) disorganized lists of documents Marjan Ghazvininejad Sharif University Spring 2012 16 IR is an Iterative Process Repositories Goals Workspace Marjan Ghazvininejad Sharif University Spring 2012 17 IR is a Dialog • The exchange doesn’t end with first answer • Users can recognize elements of a useful answer, even when incomplete • Questions and understanding changes as the process continues Marjan Ghazvininejad Sharif University Spring 2012 18 Bates’ “Berry-Picking” Model • Standard IR model  Assumes the information need remains the same throughout the search process • Berry-picking model  Interesting information is scattered like berries among bushes  The query is continually shifting Marjan Ghazvininejad Sharif University Spring 2012 19 Berry-Picking Model A sketch of a searcher… “moving through many actions towards a general goal of satisfactory completion of research related to an information need.” (after Bates 89) Q2 Q1 Q4 Q3 Q5 Q0 Marjan Ghazvininejad Sharif University Spring 2012 20 Berry-Picking Model … • The query is continually shifting • New information may yield new ideas and new directions • The information need  Is not satisfied by a single, final retrieved set  Is satisfied by a series of selections and bits of information found along the way Marjan Ghazvininejad Sharif University Spring 2012 21 Restricted Form of the IR Problem • The system has available only pre-existing, “canned” text passages • Its response is limited to selecting from these passages and presenting them to the user • It must select, say, 10 or 20 passages out of millions or billions! Marjan Ghazvininejad Sharif University Spring 2012 22 Information Retrieval • Revised Task Statement: Build a system that retrieves documents that users are likely to find relevant to their queries • This set of assumptions underlies the field of Information Retrieval Marjan Ghazvininejad Sharif University Spring 2012 23 Lecture Overview • Introduction to the Course • Introduction to Information Retrieval • The Information Seeking Process • Information Retrieval History and Developments • Discussion • References Marjan Ghazvininejad Sharif University Spring 2012 24 IR History Overview • Information Retrieval History  Early “IR”  Non-Computer IR (mid 1950’s)  Interest in computer-based IR from mid 1950’s  Modern IR – Large-scale evaluations, Web-based search and Search Engines -- 1990’s Marjan Ghazvininejad Sharif University Spring 2012 25 Origins • Very early history of content representation  Sumerian tokens and “envelopes”  Alexandria - pinakes Marjan Ghazvininejad Sharif University Spring 2012 26 Visions of IR Systems • Rev. John Wilkins, 1600’s : The Philosophical Language and tables • Wilhelm Ostwald and Paul Otlet, 1910’s: The “monographic principle” and Universal Classification • Emanuel Goldberg, 1920’s 1940’s • H.G. Wells, “World Brain: The idea of a permanent World Encyclopedia.” (Introduction to the Encyclopédie Française, 1937) • Vannevar Bush, “As we may think.” Atlantic Monthly, 1945. • Term “Information Retrieval” coined by Calvin Mooers. 1952 Marjan Ghazvininejad Sharif University Spring 2012 27 Card-Based IR Systems • Uniterm (Casey, Perry, Berry, Kent: 1958)  Developed and used from mid 1940’s EXCURSION 43821 90 58 241 49 52 17 130 281 119 92 57 88 640 122 97 158 139 870 178 199 342 248 298 63 83 93 34 44 104 269 25 66 75 86 115 146 157 207 LUNAR 12457 110 73 44 15 46 7 28 18139 12 430 241 79 42 113 74 85 76 17 78 820 761 109 602 233 134 95 136 37 118 194 165 127 198 901 179 982 377 288 407 Marjan Ghazvininejad Sharif University Spring 2012 28 Card-Based IR Systems … • Batten Optical Coincidence Cards (“Peek-a-Boo Cards”), 1948 Excursion Lunar Marjan Ghazvininejad Sharif University Spring 2012 29 Card-Based IR Systems … • Zatocode (edge-notched cards) Mooers, 1951 Document 1 Title: lksd ksdj sjd sjsjfkl Document Author: Smith, 200 J. Title: lksf Xksduejm Lunar sjd sjsjfkl Abstract: jshy Jones, R. ksd jhAuthor: uyw hhy jha jsyhe Abstract: Lunar uejm jshy ksd jh uyw hhy jha jsyhe Marjan Ghazvininejad Document 34 Title: lksd ksdj sjd Lunar Author: Smith, J. Abstract: lksf uejm jshy ksd jh uyw hhy jha jsyhe Sharif University Spring 2012 30 Computer-Based IR Systems • Bagley’s 1951 MS thesis from MIT suggested that searching 50 million item records, each containing 30 index terms would take approximately 41,700 hours  Due to the need to move and shift the text in core memory while carrying out the comparisons • 1957 – Desk Set with Katharine Hepburn and Spencer Tracy – EMERAC Marjan Ghazvininejad Sharif University Spring 2012 31 Historical Milestones in IR Research • • • • • • • • • • 1958 Statistical Language Properties (Luhn) 1960 Probabilistic Indexing (Maron & Kuhns) 1961 Term association and clustering (Doyle) 1965 Vector Space Model (Salton) 1968 Query expansion (Roccio, Salton) 1972 Statistical Weighting (Sparck-Jones) 1975 2-Poisson Model (Harter, Bookstein, Swanson) 1976 Relevance Weighting (Robertson, Sparck-Jones) 1980 Fuzzy sets (Bookstein) 1981 Probability without training (Croft) Marjan Ghazvininejad Sharif University Spring 2012 32 Historical Milestones in IR Research … • • • • • • • • • • 1983 1983 1985 1987 1990 1991 1992 1992 1994 1998 Marjan Ghazvininejad Linear Regression (Fox) Probabilistic Dependence (Salton, Yu) Generalized Vector Space Model (Wong, Rhagavan) Fuzzy logic and RUBRIC/TOPIC (Tong, et al.) Latent Semantic Indexing (Dumais, Deerwester) Polynomial & Logistic Regression (Cooper, Gey, Fuhr) TREC (Harman) Inference networks (Turtle, Croft) Neural networks (Kwok) Language Models (Ponte, Croft) Sharif University Spring 2012 33 Boolean IR Systems • • • • • • • • Synthex at SDC, 1960 Project MAC at MIT, 1963 (interactive) BOLD at SDC, 1964 (Harold Borko) 1964 New York World’s Fair – Becker and Hayes produced system to answer questions (based on airline reservation equipment) SDC began production for a commercial service in 1967 – ORBIT NASA-RECON (1966) becomes DIALOG 1972 Data Central/Mead introduced LEXIS – Full text of legal information Online catalogs – late 1970’s and 1980’s Marjan Ghazvininejad Sharif University Spring 2012 34 Experimental IR systems • Probabilistic indexing – Maron and Kuhns, 1960 • SMART – Gerard Salton at Cornell – Vector space model, 1970’s • SIRE at Syracuse • I3R – Croft • Cheshire I (1990) • TREC – 1992 • Inquery • Cheshire II (1994) • MG (1995?) • Lemur (2000?) Marjan Ghazvininejad Sharif University Spring 2012 35 The Internet and the WWW • Gopher, Archie, Veronica, WAIS • Tim Berners-Lee, 1991 creates WWW at CERN – originally hypertext only • Web-crawler • Lycos • Alta Vista • Inktomi • Google • (and many others) Marjan Ghazvininejad Sharif University Spring 2012 36 Information Retrieval View – Historical Research • Boolean model, statistics of language (1950’s) • Vector space model, probablistic indexing, relevance feedback (1960’s) • Probabilistic querying (1970’s) • Fuzzy set/logic, evidential reasoning (1980’s) • Regression, neural nets, inference networks, latent semantic indexing, TREC (1990’s) Marjan Ghazvininejad Industry • DIALOG, LexusNexus, • STAIRS (Boolean based) • Information industry (O($B)) • Verity TOPIC (fuzzy logic) • Internet search engines (O($100B?)) (vector space, probabilistic) Sharif University Spring 2012 37 Research Sources in Information Retrieval • ACM Transactions on Information Systems • Am. Society for Information Science Journal • Document Analysis and IR Proceedings (Las Vegas) • Information Processing and Management (Pergammon) • Journal of Documentation • SIGIR Conference Proceedings • TREC Conference Proceedings • Lectures in Computer Science Marjan iGhazvininejad Sharif University Spring 2012 38 Research Systems Software • INQUERY (Croft) • OKAPI (Robertson) • PRISE (Harman)  http://potomac.ncsl.nist.gov/prise • SMART (Buckley) • MG (Witten, Moffat) • CHESHIRE (Larson)  http://cheshire.berkeley.edu • LEMUR toolkit • Lucene • Others Marjan Ghazvininejad Sharif University Spring 2012 39 Lecture Overview • Introduction to the Course • (re)Introduction to Information Retrieval • The Information Seeking Process • Information Retrieval History and Developments • Discussion • References Marjan Ghazvininejad Sharif University Spring 2012 40 Next Time • Basic Concepts in IR • Readings  Chapter 1 in IR text (?????????????)  Joyce & Needham “The Thesaurus Approach to Information Retrieval” (in Readings book)  Luhn “The Automatic Derivation of Information Retrieval Encodements from Machine-Readable Texts” (in Readings)  Doyle “Indexing and Abstracting by Association, Pt I” (in Readings) Marjan Ghazvininejad Sharif University Spring 2012 41 Lecture Overview • Introduction to the Course • (re)Introduction to Information Retrieval • The Information Seeking Process • Information Retrieval History and Developments • Discussion • References Marjan Ghazvininejad Sharif University Spring 2012 42 References Marjan Ghazvininejad Sharif University Spring 2012 43

51,000 تومان