صفحه 1:
Modern Information
Retrieval
bevtare (: Ietoduction
صفحه 2:
Lecture Overview
° Tetroductiva to the Ovurse
* | صا صصلاص كوه IePoreaica Retrieval
° Dre IePorwatic Geehioy Process
۶ IePorwoiod Retrieval Wistory ond Oevelopwests
Oisrussiva
۰ RePerewes
€ وه موه مه مرو
206 ۳
صفحه 3:
Lecture Overview
° Tetroductiva to the Ovurse
وه موه مه مرو
وه Blackened
صفحه 4:
Purposes of the Course
° Do ipo وا و theoreticd uoderstocdiag oP IR
wodels
> ملس
> Orctor Opae
> Probobitetic (iackichoy Loopucce Doel)
° Do exavice woior upplicdion oredr oP IR iachudicgy:
> Deb Geack
> Dest cateyortzatod ood chistertay
> Cross krouage retrieval
> Dent ویو
> Digi Librortes
وه موه مه مرو
وه iran
صفحه 5:
Purposes of the Course ...
* موی و how IR perPorwoue ts weosured:
> ملس
> Groteticd stqaPicaace
° Goro hords-v0 experieuwe wit IR systews
وه موه مه مرو
وه Blackened
صفحه 6:
9
Lecture Overview
وه موه مه مرو
وه iran
صفحه 7:
Introduction
° Bod oP IR is to retrieve of ced vol) the “elev”
docuveds ito oolectiog Por لح وه user wit a
ماو و لس ملسم
۱ a ceded cave i UR theory
8 Whew does oa IR spstew work wheo the “colevtion”
ts tl domnveds wahuble oa the Deb?
> Deb seorck maxis ore sireso-teotny the trachiccrd IR
وه موه مه مرو
206 ۳
صفحه 8:
Information Retrieval
Dke yoo is to searck barge domed oolevions
(wilives oP dorucvects) io retrieve swll subsets
eleva iv the user's inPorwotiva ceed
° Cxaoples ure:
م1 خ searck ragiaes
> Digi throry cota yues
8 وه موه مه مرو
وه iran
صفحه 9:
Information Retrieval ...
* Gowe اس وه نام TR
> Oross krouage retrieval
> Opeeck/broadces retrieval
> دجمت ارجا
ی
9 وه موه مه مرو
وه ال
صفحه 10:
Origins
° Opwwnniccion theory revisited
° Oroblews wik و و موجه
1 وه موه مه مده ao
Blackened ۵۵
صفحه 11:
Components of an IR System
a وه موه مه مرو
206 ۳
صفحه 12:
Lecture Overview
* ۳ م۳ Geehicn Process
وه موه مه مرو
وه Blackened
صفحه 13:
Review: Information
Overload
* “De word's tot peudy production oP pict, Pio,
vpticd, vod woceetic ooeteot would require roughly 9
عصطلرب متا oP storage. Mhis is the equivdedt of
SSO weynbyies per persva Por cock wad, wowed,
ed hid oo eordtk.” (Oortad & bywan)
۰ " لام مین oP today is ow to teack pevphe
to iqeore the inelevodt, how صا rePuse to با تن
bePore they ure suPPooded. Por too woop Parts oe
ws bed os wor ot of.” (0.1. Buden)
6 سوه سوه ماه بو
206 ۳
صفحه 14:
The Standard Retrieval
Interaction Model
Information Need
Send to System
ae وه موه مه مرو
وه iran
صفحه 15:
Standard Model of IR
0 Ossanopiivds:
> سول ypal te wantcotzieny precision acd revel
sicvubraevush
> Vke tePor@aios weed rewrote stir
> Phe due is ic the result decuwed set
as وه موه مه مرو
وه iran
صفحه 16:
6
Problems with Standard
Model
° Osers teara durtay the searck process:
<ز )9 15 thes oP retrieved doaredis
> Reuday retrieved donned
> Otewiey bets oP retated topics/esourss ters
> لوا مره
* Gower users doo the booq (apporediy) disoryacized
fists oP لمح
وه موه مه مرو
وه iran
صفحه 17:
a
IR is an Iterative Process
Oorkspae
وه موه مه
206
صفحه 18:
IR is a Dialog
5 و
ag
—* Dhe exchooge doeso't cod with Pirst ooewer
۰ )]( عدو عمجم حمس كعجو elewedts of a usePut
7... اوه مار مور وه
5تل ۳۳۲۲۲۵
6 وه موه مه مرو
وه iran
صفحه 19:
Bates’ “Berry-Picking” Model
۰ لول 10 wodet
سوه سا و لت معط جوز
اصوجه لا لول وصصصحع
© Ben اطو
خ nteresteny Poreration t scutered جما| berries xy
broke
> Phe query is cated پات
© وه موه مه مرو
وه ال
صفحه 20:
Berry-Picking Model
© shetch oP searcher... “wou rach wony wipe twvards «
ceoerd qoal oP sateRaciony oowpkeioa oP researck rekated to ext
)ات موم Bates 09)
eo وه موه مه مرو
وه iran
صفحه 21:
eq
Berry-Picking Model ...
° Dke query is vootoudly shiP tog
° Dew iPorwativg wap vied oew ideas ood eu
اس
* ۳ او مروت
> وا هی بمب ح oi skate, Prrd retrieved set
> Ie satePied by a sertes of selevtocs urd bite of
the way موه لت لول
وه موه مه مرو
وه iran
صفحه 22:
Restricted Form of the IR
Problem
° Dke syste has waluble vdly و توص
text poses
° 4s nespouse is licvited to selevtiogy Prow these pussuyes
مسا روصم له to the wer
4 wust setent, say, ID or CO passages put of
wilivas or bilivas!
وه وه موه مه مرو
وه iran
صفحه 23:
Information Retrieval
* Revised Dusk Gtateweut:
Quid a syste that retrieves domes thot
users ure likely to Pied relevoot to their queries
8 DR set oP assuopiiogs vadertes the Piet oF
4ePorwaiva Retrieval
وه وه موه مه مرو
وه iran
صفحه 24:
Lecture Overview
۶ IePorwoiod Retrieval Wistory ond Oevelopwests
وه موه مه مرو
وه iran
صفحه 25:
IR History Overview
۶ وخ Retrieval Wistory
> Bob “WR”
> DewOrwputer UR (oid (OGY 'z)
> وا یی ooo ter-bamed IR Prox لزب 096005
> Oodera WR - ,واه طسو را Deb-bosed
seack onl Crack Bates - IQOV's
وه سوه سوه ماه Dora
وه iran
صفحه 26:
Origins
۶ Orr ety history oP pootedt represrotatios
> Cuveriaa phew oad ‘ewebpes”
صفحه 27:
Rev. loka Dike, IOD0's : Phe
(Phpsophiced rc ocr tbh:
Olkeko Ostuxid od Pod Ofer,
(O00 's: Phe “ova raphir price”
ond Ociversd متس
Orkbery, 19602 -
Coane
dOP O's
W.. Dets, “Oorld Broa: Nhe idea a
of a peraxnect Dortd Baryeoped.”
(lotroduntoa to tke جل سوه
Proagasr, (89?)
Ounevar Bush, “Bs we way tick.”
kar Doky, ۰
Derw “ePorratca Retievd” cotced
مك0 نذا Dovers. (OSE
وه موه مه مرو
206 ۳
صفحه 28:
Card-Based IR Systems
© Doiern (Corey, Perry, Bern, Kea: (SO)
> Developed cad used Pree wid (QPOs
0005104:
۵1
So a @
هه وه هو 09 ود
6ه 76 هم وه
6ه as هه وه
we
50۰
۵۵6
wre?
ee 05 ۵ ۶ و6 1
6 وه oe وم معوم همع
dor 66 66 9909 وطعن امني 890
woe ۵ we م9 موه
ore عم
wor
Oars 6۴ هه ره
Blackened وه
صفحه 29:
وه
Card-Based IR Systems ...
۰ Butea Opiicd Oviwidewsr Curds (“Peek-c-Oov
Cards”), (QFE
م5
6
bus 9
6 . 9
0 0 9 9
0
9
0 9
orm Ghork Dowersay Goran
۳ 206
صفحه 30:
2 6
Card-Based IR Systems ...
7 ی (exke-wiched rads) overs, 0
9۵ 6 6 0 6 6 06 6 6 6 6 6 6 6 66 666 6 © 2 ۱۲ Oe
oOorurect
oOrnnrect ٩ Io
5
bk tats oie: hed heck ond Lara
سا ااا
000 ele
ched k ww bhy ia phe
°
2229 66
5
° °
°
6
22:29“
280 وه موه مه مرو
206 ۳
صفحه 31:
Computer-Based IR Systems
° Butey’s (OS OG thesis Prow OVP sugested that
seurckicy GO wwilive tec وه سوه رطس
906 fedex ters would tohe )راومه
hours
> Due tw the جمدب صا لحجه oral لماك the ted fet ore
wwewory white corrpicy out the ooparisvas
* d9OS° — Desk Gat wi Kabartae Wepburs od
Gpewer رو 2 00
60 وه موه مه مرو
وه ال
صفحه 32:
Historical Milestones in IR
Research
۰ 690 مورا مسق Propertes (Luka)
۰ 16606 سل Iedentary (Darva & (Cabcer)
© (900 Tern assprtaiog oad chestertery (Doe)
۰ (9898 Orci pace Dede (Sakon)
* (990 Query expansion (Rocri7, Gata)
© (OPO Grateicd Derihtay (Sparrk-lows)
+ (OPS و64 0 (Waner, Bovkstem, Suxneva)
+ (OPO Reevawe Drightay (Robertsra, Sparchaloaes)
* (900 Crazy see (Bovkstein)
+ (900 Crobebiiy witout rice) (Orch)
oe وه موه مه مرو
وه iran
صفحه 33:
Historical Milestones in IR
Research ...
1969 سا Rewressioa (Pox)
* (909 Probublets Depeudewr (Scio, Yu)
* (90S) Ceverdized Orci Spare Doel (Does, Rhexpca)
۰ 4662 لو مها نحص ROORIC/POPIC (Pow, et ot)
۰ 690 موق مرا Tedentary (Ducrcis, Deerwester)
٠ 690 مساق موم Rewressioa (Ovoper, Bey, Pub)
* (898 PREC (haw)
۰ 4699 سوه و1 (Durie, Croh)
٠ O9F Wed cetworks (Kuch)
٠ 690 Lewnnrne Dodo (Poute, Orch)
وه وه موه مه مرو
وه iran
صفحه 34:
Boolean IR Systems
+ Gyxthex t GOO, 100
+ Preect DOO a OIP, 1999 (terre)
* POLO a BOO, WOOF (Lark! Borky)
+ (00 Dew York Word's Par - Becker ond Waves
produced syste اه مه لح وه سوه و
reservation وه
- 166 و سوه او و لو ما ۵00 *
ی
0۵866-46000 )959( مسا ۵۸
۰ 69 م0 Orc Dew tairrdured LEXI — Pal tent of
صمل Breeton
ond 0 1920 سا - رای اون *
or سوه سوه ماه ون ز
وه Blackened
صفحه 35:
Experimental IR systems
° Probubitetic trdextery — Darva ond Kubo, (OOO
* GOORT — Gerad Gated a Coral — Ocvior spare wodel,
(9? O's
* GIRE a Gyrus
* 1908 - سن
۰ نایلوس 1)0660(
۰ PREC - 998
° deguery
* Chevhre 11 ((IO9@)
* @@ (9987)
+ Lew (OOOO?)
وه سوه سوه ماه Dare 2
وه Blackened
صفحه 36:
The Internet and the WWW
° Gopker, Orckie, Orrvcica, DOIG
* Dio Cervers-Lee, O00 orecies DOO t CERO
— priqitahy hypertext voy
* Oeb-orawter
Lycos
* ه05 ما
سوه سوه ماه بو
وه iran
صفحه 37:
Information Retrieval - Historical
View
Research erustiqy
۶ Dooku work, otters خات * OTPLOO, Leme emer,
یه 9 + GPOIRG (Workea
+ Ovvior space work, bese)
سا مه سس ١ حي
Pevuback )066002( (ee سل
5 مه ۱0۵ موی ۰ وسي عط واه
(90's) Bo
+ Cuz oellbye, evckrotd 1
مجه (O90 =) (C($AOO0?)) سس
+ Regrssns, wud wet, space, probable)
فا نوی مس
مرس (0
(9090's)
wim موه موجه on
iran وه
صفحه 38:
Research Sources in
Information Retrieval
جرا موه مه نومه @CO *
سمل وا موه ۲ وه Ow. ۶
۰ لجه عصرده) مجمجو() 1 Proceedings (bos
(عوون)
° IePorwdiod Provessioy ord Qacrageset
(Percprova)
8 oured oP Orrurectaiica
* ل
* TREC CodPerewe Proceeds
° bevtures it Oowputer Oviewr
)30 سوه سوه ماه Dora
وه الم ده
صفحه 39:
Research Systems Software
* AADERY (Cri)
© OCP (Roberteva)
۰ PRICE (awn)
> رباص موی
* GO@RP (Buwkev)
* OG (iten, OrPRa)
* CLECMIRE (bersvn)
> kip: ckeskine.berkekey.ecks
2 * LEOOR toolkit
° Duce
* Others
Darn Chek Doversay Opry وه
Blackened وه
صفحه 40:
Lecture Overview
وه موه مه مرو
وه Blackened
صفحه 41:
Next Time
° @usic Cowepts in IR
۰ Readings
> oye & Deed “Phe Phesorus Ppprowk to
Ieboraioa Retrieval” (ice Reads bok)
> bike “Phe Burwratie Derivation of مه توا
Retievd Boaordewerus Pro Dackice-Reakable
Texts” (i React)
> Dovke “eadentcry cer Pbetratiay by Bosvvkaion, P11"
(it Recers)
60 وه موه مه مرو
eae 02
صفحه 42:
Lecture Overview
وه موه مه مرو
وه Blackened
صفحه 43:
References
موق موه مسق مرو
iran eae
Modern Information
Retrieval
Lecture 1: Introduction
Lecture Overview
• Introduction to the Course
• Introduction to Information Retrieval
• The Information Seeking Process
• Information Retrieval History and Developments
• Discussion
• References
Marjan
Ghazvininejad
Sharif University Spring
2012
2
Lecture Overview
• Introduction to the Course
• Introduction to Information Retrieval
• The Information Seeking Process
• Information Retrieval History and Developments
• Discussion
• References
Marjan
Ghazvininejad
Sharif University Spring
2012
3
Purposes of the Course
• To impart a basic theoretical understanding of IR
models
Boolean
Vector Space
Probabilistic (including Language Models)
• To examine major application areas of IR including:
Web Search
Text categorization and clustering
Cross language retrieval
Text summarization
Digital Libraries
Marjan
Ghazvininejad
Sharif University Spring
2012
4
Purposes of the Course …
• To understand how IR performance is measured:
Recall/Precision
Statistical significance
• Gain hands-on experience with IR systems
Marjan
Ghazvininejad
Sharif University Spring
2012
5
Lecture Overview
• Introduction to the Course
• Introduction to Information Retrieval
• The Information Seeking Process
• Information Retrieval History and Developments
• Discussion
• References
Marjan
Ghazvininejad
Sharif University Spring
2012
6
Introduction
• Goal of IR is to retrieve all and only the “relevant”
documents in a collection for a particular user with a
particular need for information
Relevance is a central concept in IR theory
• How does an IR system work when the “collection”
is all documents available on the Web?
Web search engines are stress-testing the traditional IR
models
Marjan
Ghazvininejad
Sharif University Spring
2012
7
Information Retrieval
• The goal is to search large document collections
(millions of documents) to retrieve small subsets
relevant to the user’s information need
•
Examples are:
Internet search engines
Digital library catalogues
Marjan
Ghazvininejad
Sharif University Spring
2012
8
Information Retrieval …
• Some application areas within IR
Cross language retrieval
Speech/broadcast retrieval
Text categorization
Text summarization
• Subject to objective testing and evaluation
hundreds of queries
millions of documents
Marjan
Ghazvininejad
Sharif University Spring
2012
9
Origins
• Communication theory revisited
• Problems with transmission of meaning
Message
Source
Message
Encoding
Decoding
Destination
Channel
Noise
Message
Source
Marjan
Ghazvininejad
Message
Encoding
(writing/indexing)
Storage
Decoding
(Retrieval/Reading)
Sharif University Spring
2012
Destination
10
Components of an IR System
Documents
Authoritative
Indexing Rules
User’s
Information
Need
Indexing
Process
Query
Specification
Process
Severe
Inform
ation
Loss
Query
Index Records
&
Document
Surrogates
Retrieval
Process
Retriev
al
Rules
List of Documents
Relevant to User’s
Information Need
Marjan
Ghazvininejad
Sharif University Spring
2012
11
Lecture Overview
• Introduction to the Course
• Introduction to Information Retrieval
• The Information Seeking Process
• Information Retrieval History and Developments
• Discussion
• References
Marjan
Ghazvininejad
Sharif University Spring
2012
12
Review: Information
Overload
• “The world's total yearly production of print, film,
optical, and magnetic content would require roughly 1.5
billion gigabytes of storage. This is the equivalent of
250 megabytes per person for each man, woman,
and child on earth.” (Varian & Lyman)
• “The greatest problem of today is how to teach people
to ignore the irrelevant, how to refuse to know things,
before they are suffocated. For too many facts are
as bad as none at all.” (W.H. Auden)
Marjan
Ghazvininejad
Sharif University Spring
2012
13
The Standard Retrieval
Interaction Model
Marjan
Ghazvininejad
Sharif University Spring
2012
14
Standard Model of IR
• Assumptions:
The goal is maximizing precision and recall
simultaneously
The information need remains static
The value is in the resulting document set
Marjan
Ghazvininejad
Sharif University Spring
2012
15
Problems with Standard
Model
• Users learn during the search process:
Scanning titles of retrieved documents
Reading retrieved documents
Viewing lists of related topics/thesaurus terms
Navigating hyperlinks
• Some users don’t like long (apparently) disorganized
lists of documents
Marjan
Ghazvininejad
Sharif University Spring
2012
16
IR is an Iterative Process
Repositories
Goals
Workspace
Marjan
Ghazvininejad
Sharif University Spring
2012
17
IR is a Dialog
• The exchange doesn’t end with first answer
• Users can recognize elements of a useful
answer, even when incomplete
• Questions and understanding changes as the
process continues
Marjan
Ghazvininejad
Sharif University Spring
2012
18
Bates’ “Berry-Picking” Model
• Standard IR model
Assumes the information need remains the same
throughout the search process
• Berry-picking model
Interesting information is scattered like berries among
bushes
The query is continually shifting
Marjan
Ghazvininejad
Sharif University Spring
2012
19
Berry-Picking Model
A sketch of a searcher… “moving through many actions towards a
general goal of satisfactory completion of research related to an
information need.” (after Bates 89)
Q2
Q1
Q4
Q3
Q5
Q0
Marjan
Ghazvininejad
Sharif University Spring
2012
20
Berry-Picking Model …
• The query is continually shifting
• New information may yield new ideas and new
directions
• The information need
Is not satisfied by a single, final retrieved set
Is satisfied by a series of selections and bits of
information found along the way
Marjan
Ghazvininejad
Sharif University Spring
2012
21
Restricted Form of the IR
Problem
• The system has available only pre-existing, “canned”
text passages
• Its response is limited to selecting from these passages
and presenting them to the user
• It must select, say, 10 or 20 passages out of
millions or billions!
Marjan
Ghazvininejad
Sharif University Spring
2012
22
Information Retrieval
• Revised Task Statement:
Build a system that retrieves documents that
users are likely to find relevant to their queries
• This set of assumptions underlies the field of
Information Retrieval
Marjan
Ghazvininejad
Sharif University Spring
2012
23
Lecture Overview
• Introduction to the Course
• Introduction to Information Retrieval
• The Information Seeking Process
• Information Retrieval History and Developments
• Discussion
• References
Marjan
Ghazvininejad
Sharif University Spring
2012
24
IR History Overview
• Information Retrieval History
Early “IR”
Non-Computer IR (mid 1950’s)
Interest in computer-based IR from mid 1950’s
Modern IR – Large-scale evaluations, Web-based
search and Search Engines -- 1990’s
Marjan
Ghazvininejad
Sharif University Spring
2012
25
Origins
• Very early history of content representation
Sumerian tokens and “envelopes”
Alexandria - pinakes
Marjan
Ghazvininejad
Sharif University Spring
2012
26
Visions of IR Systems
• Rev. John Wilkins, 1600’s : The
Philosophical Language and tables
• Wilhelm Ostwald and Paul Otlet,
1910’s: The “monographic principle”
and Universal Classification
• Emanuel Goldberg, 1920’s 1940’s
• H.G. Wells, “World Brain: The idea
of a permanent World Encyclopedia.”
(Introduction to the Encyclopédie
Française, 1937)
• Vannevar Bush, “As we may think.”
Atlantic Monthly, 1945.
• Term “Information Retrieval” coined
by Calvin Mooers. 1952
Marjan
Ghazvininejad
Sharif University Spring
2012
27
Card-Based IR Systems
• Uniterm (Casey, Perry, Berry, Kent: 1958)
Developed and used from mid 1940’s
EXCURSION
43821
90 58
241 49
52
17
130
281 119
92
57
88
640
122
97 158
139
870
178
199 342
248
298
63
83
93
34
44
104
269
25
66
75
86
115 146
157
207
LUNAR
12457
110
73
44
15
46
7
28 18139 12
430
241 79
42 113
74
85
76
17
78
820
761 109
602 233 134
95 136
37 118
194 165
127
198 901
179 982
377
288
407
Marjan
Ghazvininejad
Sharif University Spring
2012
28
Card-Based IR Systems …
• Batten Optical Coincidence Cards (“Peek-a-Boo
Cards”), 1948
Excursion
Lunar
Marjan
Ghazvininejad
Sharif University Spring
2012
29
Card-Based IR Systems …
• Zatocode (edge-notched cards) Mooers, 1951
Document 1
Title: lksd ksdj sjd sjsjfkl
Document
Author:
Smith, 200
J.
Title: lksf
Xksduejm
Lunar
sjd sjsjfkl
Abstract:
jshy
Jones,
R.
ksd jhAuthor:
uyw hhy
jha jsyhe
Abstract: Lunar uejm jshy
ksd jh uyw hhy jha jsyhe
Marjan
Ghazvininejad
Document 34
Title: lksd ksdj sjd Lunar
Author: Smith, J.
Abstract: lksf uejm jshy
ksd jh uyw hhy jha jsyhe
Sharif University Spring
2012
30
Computer-Based IR Systems
• Bagley’s 1951 MS thesis from MIT suggested that
searching 50 million item records, each containing
30 index terms would take approximately 41,700
hours
Due to the need to move and shift the text in core
memory while carrying out the comparisons
• 1957 – Desk Set with Katharine Hepburn and
Spencer Tracy – EMERAC
Marjan
Ghazvininejad
Sharif University Spring
2012
31
Historical Milestones in IR
Research
•
•
•
•
•
•
•
•
•
•
1958 Statistical Language Properties (Luhn)
1960 Probabilistic Indexing (Maron & Kuhns)
1961 Term association and clustering (Doyle)
1965 Vector Space Model (Salton)
1968 Query expansion (Roccio, Salton)
1972 Statistical Weighting (Sparck-Jones)
1975 2-Poisson Model (Harter, Bookstein, Swanson)
1976 Relevance Weighting (Robertson, Sparck-Jones)
1980 Fuzzy sets (Bookstein)
1981 Probability without training (Croft)
Marjan
Ghazvininejad
Sharif University Spring
2012
32
Historical Milestones in IR
Research …
•
•
•
•
•
•
•
•
•
•
1983
1983
1985
1987
1990
1991
1992
1992
1994
1998
Marjan
Ghazvininejad
Linear Regression (Fox)
Probabilistic Dependence (Salton, Yu)
Generalized Vector Space Model (Wong, Rhagavan)
Fuzzy logic and RUBRIC/TOPIC (Tong, et al.)
Latent Semantic Indexing (Dumais, Deerwester)
Polynomial & Logistic Regression (Cooper, Gey, Fuhr)
TREC (Harman)
Inference networks (Turtle, Croft)
Neural networks (Kwok)
Language Models (Ponte, Croft)
Sharif University Spring
2012
33
Boolean IR Systems
•
•
•
•
•
•
•
•
Synthex at SDC, 1960
Project MAC at MIT, 1963 (interactive)
BOLD at SDC, 1964 (Harold Borko)
1964 New York World’s Fair – Becker and Hayes
produced system to answer questions (based on airline
reservation equipment)
SDC began production for a commercial service in 1967 –
ORBIT
NASA-RECON (1966) becomes DIALOG
1972 Data Central/Mead introduced LEXIS – Full text of
legal information
Online catalogs – late 1970’s and 1980’s
Marjan
Ghazvininejad
Sharif University Spring
2012
34
Experimental IR systems
• Probabilistic indexing – Maron and Kuhns, 1960
• SMART – Gerard Salton at Cornell – Vector space model,
1970’s
• SIRE at Syracuse
• I3R – Croft
• Cheshire I (1990)
• TREC – 1992
• Inquery
• Cheshire II (1994)
• MG (1995?)
• Lemur (2000?)
Marjan
Ghazvininejad
Sharif University Spring
2012
35
The Internet and the WWW
• Gopher, Archie, Veronica, WAIS
• Tim Berners-Lee, 1991 creates WWW at CERN
– originally hypertext only
• Web-crawler
• Lycos
• Alta Vista
• Inktomi
• Google
• (and many others)
Marjan
Ghazvininejad
Sharif University Spring
2012
36
Information Retrieval
View
– Historical
Research
• Boolean model, statistics of
language (1950’s)
• Vector space model,
probablistic indexing, relevance
feedback (1960’s)
• Probabilistic querying
(1970’s)
• Fuzzy set/logic, evidential
reasoning (1980’s)
• Regression, neural nets,
inference networks, latent
semantic indexing, TREC
(1990’s)
Marjan
Ghazvininejad
Industry
• DIALOG, LexusNexus,
• STAIRS (Boolean
based)
• Information industry
(O($B))
• Verity TOPIC (fuzzy
logic)
• Internet search engines
(O($100B?)) (vector
space, probabilistic)
Sharif University Spring
2012
37
Research Sources in
Information Retrieval
• ACM Transactions on Information Systems
• Am. Society for Information Science Journal
• Document Analysis and IR Proceedings (Las
Vegas)
• Information Processing and Management
(Pergammon)
• Journal of Documentation
• SIGIR Conference Proceedings
• TREC Conference Proceedings
• Lectures in Computer Science
Marjan
iGhazvininejad
Sharif University Spring
2012
38
Research Systems Software
• INQUERY (Croft)
• OKAPI (Robertson)
• PRISE (Harman)
http://potomac.ncsl.nist.gov/prise
• SMART (Buckley)
• MG (Witten, Moffat)
• CHESHIRE (Larson)
http://cheshire.berkeley.edu
• LEMUR toolkit
• Lucene
• Others
Marjan
Ghazvininejad
Sharif University Spring
2012
39
Lecture Overview
• Introduction to the Course
• (re)Introduction to Information Retrieval
• The Information Seeking Process
• Information Retrieval History and Developments
• Discussion
• References
Marjan
Ghazvininejad
Sharif University Spring
2012
40
Next Time
• Basic Concepts in IR
• Readings
Chapter 1 in IR text (?????????????)
Joyce & Needham “The Thesaurus Approach to
Information Retrieval” (in Readings book)
Luhn “The Automatic Derivation of Information
Retrieval Encodements from Machine-Readable
Texts” (in Readings)
Doyle “Indexing and Abstracting by Association, Pt I”
(in Readings)
Marjan
Ghazvininejad
Sharif University Spring
2012
41
Lecture Overview
• Introduction to the Course
• (re)Introduction to Information Retrieval
• The Information Seeking Process
• Information Retrieval History and Developments
• Discussion
• References
Marjan
Ghazvininejad
Sharif University Spring
2012
42
References
Marjan
Ghazvininejad
Sharif University Spring
2012
43