کامپیوتر و IT و اینترنتعلوم مهندسی

CS 352H: Computer Systems Architecture

صفحه 1:
CG SGC: Cosputer Gpstesos rokiterture Lecture 0: Okat is Cowputer Orchiterture ant why should 1 care? Professor Barwet Dick Ouwersiy of Texas of Busts ‏تا مان‎ Lecture 1 0

صفحه 2:
ous © Onderstodd the “how” ocd “why” oP cocwputer ‏موه‎ ‏كصاا ام‎ - Iestrunica Get Orokitecture = Opstew Oryunizuivn (processor, wewory, VO) — Dervochiterture ‏مان‎ ‎۰ ‏مسر مرو ۲و عون مورا‎ — ODvtrics & beackwarks ° beara how to woke systews yo Post = ‏مه را‎ — ‏رانا ماو‎ )0۱2۸, ۹۱۸۸( — Oppicaica spev Pic archiertures (qraphivg, vical prov.) ° Preview oP where ochitecture is heodtacy Lecture 1

صفحه 3:
Loxistics Lewes ۳۲۱۳ 8:90:07, 9.00: ‏سوه‎ Pro. Borwet Ditckel, ۵ 0:9 TO ‏ملظ وق‎ OO 44:90-4:00pw PO1S.90 Desk render see web pe Dente Weewessy & Patersoa, ‏وه‎ ‏ون‎ ord Descgr (Ports Cio) 1 00 ی Lecture 1 8

صفحه 4:
لدم 606991 ORL: 4 wil povesiosly exo pou vie blackboard ord by pour nevistered ewal address. 1 expect this choceel to be retoble ood tively. dismussivd your: vie bhackboord ‏تلجس نومه اه زوا‎ Geverd, Woweworks, Project ]0 Orchitecture Gewicar Series: ‏وله عم و ترس‎ Lecture 1 5

صفحه 5:
Ossie Por ext Puestay * Duro ic studect survey Pores, PF you wet * Rend the Doore puper (ser webpor) = Orte a revew oP WOU pag (ser slows) — Review shoud tactude ۰ ‏موق‎ of ‏و خن مه‎ ۱ ‏عي سي لسع سوه لانن سر‎ ۰ Your pbeervaies vu iis releveare tokay — @e prepared ip dene va Duestky io ches Lecture 1 8

صفحه 6:
ومادص جوز( © Ore pou itterested tc tobiey this course? ۰ Ove questive ubvut cowputer sviewe * Ove questive ubvut cowputer architecture ‎Lecture 1 6‏ بوصصومن ‎eaouwe‏

صفحه 7:
Specification compute the fibonacci sequence for(i=2; i<100; i++) { Program afi] = a[i-1]+a[i-2];} ISA (instruction Set Architecture$9@4 5 ali Arch vs. parch microArchitecture | 2 Logic EF D D 5 1 Transistors of = oy { Physics/Chemistry 5 5 Lecture 1 a

صفحه 8:
66991 Topics ‎Trews‏ مت رت( واه او متسه كد ناعون ‎Ooders pipeliced architectures‏ ‎- Opoamic WP wackices ‎— Graic WP wackices Cocke wewory sysiews Orrtud wewory Qutiprovessors Cowputer systew iwpleweutaiva ‎Lecture 1 8

صفحه 9:
Occhio Mhis Chass Dork Por ‘ou * Phos und witus yrodtes ٠ Olickers cose, Lecture 1 6 anne

صفحه 10:
Oket is ‏سم‎ Prvhitecture?

صفحه 11:
Pevkulbyp Covsiraints * Yea) koprovewedt - ‏مس‎ ecko + 980% wore devices per chy (doubles every WO with) ۰ 19% Poster devices (doubles every S years) + Chwer wires - ‏سر(‎ Dicks + 90% kerewe i decoy = Ora boos + 8% korewse to wire decoy = Obes + ww oben 1000nm350nm g00nm 250nm130nm 90nm << ‏سود سس سس‎ OOD (Dx ‏سس یت‎ Lecture 1 5

صفحه 12:
۰ مسا واه( رمزممان) مساو( من ‎٠ 0‏ ۶ + ماسم ‎be oP‏ 00000 تاسلج = ‎ery = poeple poirot expbot‏ ون سقس - و او روم عسسوی ‎٠ 200‏ اوه لسوت ‎Teena ea) 0‏ سوت یکت ‎eae ee) = @ower wal‏ 000 وا مت = 0۶ ۰ ای اس( - ام 0۳0 مایت - اس 00 0 ۰ ‎- ‏امه الا ,وود‎ — stople iustruction sete - sel rch coches ‎= Cwbecked ve. Deshi vs. ota vector (chord) ‎— Dew sexe (POO, Pkeck) ‎= Grok cores ond bots oP ew ‎— Optcetztasy Por power ‎Lecture 1 we

صفحه 13:
9 0 - 6۳0 اصه» ٠ First wioroprovessor موم 8,900 ۰ جلك )0 0006 ۰ (Ou ‏عم‎

صفحه 14:
ی ما۵ + سم ۵۵ ۰ ۱ che! ard net revenue was around $35 billion a year for most of the < R&D about $5 billion a year ۰ ۱2 10 عماجم ونان وود 0 ره ‎sonmprocess‏ 2 ‎o-saymprocess‏ ‎IBMCefE‏ ‎evector processors + 1PPC‏ * عدوم * ‎sonmprocess‏ * 3

صفحه 15:
‎Vou Dorat (az loony us it iz xOO)‏ متام بو ‎Processor families in TOP500 supercomputers. ‎Number of systems ‎ ‎100 ‎ ‎2010 ‎Year ‎ ‎

صفحه 16:
Oppicaicd Ovustraicts لا اس سل شام ۳ سم مپسا ۰ سا رت ین باس مس اجه چم و1۵ ۰ ‎OPO) perPorenace‏ سیر ۰ ‎Dever support‏ = الما ۵ ۰ ی ‎WO tore, power‏ + اس ملع( ‎ta‏ بم ‎ortho‏ انم ما ‎ ‎ ‎Lecture 1 9

صفحه 17:
Oppicaice-Onved Orchitectunes * Geverd purpose - wood perPorwode vo “oll” propos — x09 Pamiy, BRO, powerPO, etc. * Oppicaicg speciPicity ooo Poous oat = ‏امه رو وه ۴و مور‎ ‏و‎ oP ‏رصصم) ماو‎ koodkel, desktop) ٠ Dodap - pverview oF yruphics processors = AeterPace (eetrution set archievture - 16@) - ‏ومسصيسى و۳‎ - ‏واه امجوون)‎ Lecture 1

صفحه 18:
Opple's (Padi(PkoueP Powered by BP Chip ‎Applications P‏ كيدا ‎OF pad‏ * ‎[System Gontal Connectivity‏ ‏سوه ۳وروی .. - ‎3 a eRe SR Fast OA . 3 21 ea ‏0ه ةا ال وضع || مس كعنم | ‎"es‏ ‏زا 1 ۲۵9۵۱۵ | ‎prove im‏ ‎۵ ‎۶ ‎PUPA ux Multimedia - (7 0 ‎I DOP YP EE vider!‏ وراج ‎Ober 5‏ کج ۵ ال الوم رودن ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎Image Processing Unit SPOR Te‏ سح ‎‘Sahara v4 Resizing and Blending =e‏ این [ ‎[irwstzone™ ||| [~~ irworsionand Rctaton‏ ‎ae |‏ ‎Image Enhancement [Ex memory UF]‏ = ‎lho zoe te‏ ‎[sate Camera Doe 200 mz‏ ‎oma ©‏ ]| ۴ [ ۳885 ]| «ممصمم ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎

صفحه 19:
ارم له وه را نموم( وظ) موه مه موم صا موز ترمووورا ۰ ‎fie‏ ام وم موی او نیما ۰ امین ومع ‎٠ )0‏ يفيو سين نير لبه يياكا ص سو تمصا تيهنا ع ‎Dicts boric: bots oP water (e.q., 17 Pil pool)‏ = ‎Oketis “Wigk speed Ietercet?”‏ ۰ لصي عرو عسوو لكيه جما ‎bow‏ = لا ما ادلی الما رال لان و له را اه وا سا مور راو

صفحه 20:
ارو مه رورا ‎Qehaicaship beter‏ لاه راما رامت كلسب جوج جد مورا ۰ بوواسكجم مات طساوا ود چا موی ‎hewy Pond‏ = صما ۰ Op Pactory tokes ( doy to woke o Dodel-T Pord. = @u 1 cae ott bubdeg ‏تج 10 جيه سروت و‎ — OSE krslday, Vow woke OF * © <2 IPP curs per day = © spevid order Por 1 qreeu or, vill tober (doy = Dhrougkpulis eoreused, bu kieany is oot. © boteuy recuntivg is diPPioutt ۰ ‏اتمه رها موه وه بان‎ = Gig wore wewery chips) wore coho poorer ‏ون‎ ‎= Oty server Panne (e.4., oode) ore high boedwichy

صفحه 21:
۰ Olu cowwputteg is where dyouicdly scotoble ced ‏مجثاه‎ ‎virtudized respurces ure provided us a service ver the otercet (hacks, wihipedia! ) ٠ 11 ‏جه وسخصصصه خا-‎ a service (IotS) — @wamn's CCS (chests cowpute ciel) * PhoiPors ‏منطو وه و‎ (PatS) — Cone was - Divrosvht azure ٠ ‏وروی‎ us u service (GutS) - ‏أدصي‎ ‎— Pavebook — Flickr

صفحه 22:
Services Economies of Scale * Substantial economies of scale possible * 2006 comparison of very large service with small/mid-sized: (~1000 servers): Large Service [$13/Mbis/mth]: $0.04/68, [Medium [$95/Mbisimth]: $0.30/GB (7.1%) Large Service: $4.6/GBlyear (2x in 2 DC) Medium: $26.00/GBiyear* (5.7x) Large Service: Over 1.000 servers/admin Enterprise: ~140 servers/admin (7.1x) * High cost of entry — Physical plant expensive: 15MW roughly $200M * Summary: significant economies of scale but at very high cost of entry — Small number of large players likely outcome Thanks, James Hamilton, amazon

صفحه 23:
(raphios kas dedicated chip ia POs 681 Million transistors (GeForce 8800, 90nm) 582 Million ۱ transistors (intel “Kentsfield” quad core, 1X6700, 65nm, two dies, 8MB L2$) Disk, Keyboard, PCle, etc. Lecture 1 eo (AGP, PCle)

صفحه 24:
موه مهو( و ‎BPO/CCO‏ مه موی نسم ‎Bove: DODD‏ Lecture 1 oe

صفحه 25:
Okv «dedicated provessicy chip? * 0) Gpeviaizaivg — bevoewiag less ieoportod wily tee 0( Cordless = beoowkny wore ‏موود‎ (Graphics processors one the voly kighty-porcbel Provessors ic every deshiop ‏او‎ 128 “processors” 0 اجه دس مت سرت You can program them!

صفحه 26:
(raphivs requires ‏راطاهمموممص‎ Every application does something a bit different. Example Cg “shader” program (invoked like a “callback” function): void normalmapped(float2 normalMapTexCoord : TEXCOORDO, out Floata color : COLOR, uniform float ambient, ~) float3 normalTex, ‏نس‎ ‎normalTex = tex2D(normalMap, normalMapTexCoord) .xyz; diffuse = saturate(dot(normal, normLightDir) ; Color = Kd * (ambient + diffuse ) + Ks * pow(specular, specularExponent; Lecture 1 ee

صفحه 27:
مس ۳ ات مت one aa

صفحه 28:
Ora Dice تاره مهو( و( تاو نموه توه) ‎Wow chips ore worde‏ ‎festruntiog set review!overview‏ مه وا توق Ohwuys check web poe Por ‏اوه‎ Lecture 1 ee

CS 352H: Computer Systems Architecture Lecture 1: What is Computer Architecture and why should I care? Professor Emmett Witchel University of Texas at Austin witchel@cs.utexas.edu Lecture 1 1 Goals • Understand the “how” and “why” of computer system organization – – – – Instruction Set Architecture System Organization (processor, memory, I/O) Microarchitecture Virtualization • Learn methods of evaluating performance – Metrics & benchmarks • Learn how to make systems go fast – Pipelining, caching – Parallelism (ILP, DLP, TLP) – Application specific architectures (graphics, signal proc.) • Preview of where architecture is heading Lecture 1 2 Logistics Lectures Instructor TA T/Th 12:30-2:00pm, PAI 3.14 Prof. Emmett Witchel, W 1:15-2:15 Shalini Sahoo MW 11:30-1:00pm PAI 5.38 Desk1 Grading see web page Texts Hennessy & Patterson, Computer Organization and Design (Fourth Edition) Including CD Revised Fourth Edition preferred, not required Lecture 1 3 CS352H Online URL: www.cs.utexas.edu/users/witchel/CS352H I will occasionally email you via blackboard and by your registered email address. I expect this channel to be reliable and timely. discussion group: via blackboard login at courses.utexas.edu General, Homeworks, Project Computer Architecture Seminar Series: www.cs.utexas.edu/users/cart/arch Lecture 1 4 Assignment for Next Tuesday • Turn in student survey forms, if you want • Read the Moore paper (see webpage) – Write a review of 1/2-1 page (see syllabus) – Review should include • Summary of content of paper • Your observations on the most interesting/important aspects • Your observations on its relevance today – Be prepared to discuss on Tuesday in class Lecture 1 5 Discussion • Are you interested in taking this course? • One question about computer science • One question about computer architecture CS352H Fall 2007 Lecture 1 6 Specification compute the fibonacci sequence for(i=2; i<100; i++) { a[i] = a[i-1]+a[i-2];} Program load r1, a[i]; (Instruction Set Architecture) add r2, r2, r1; microArchitecture A Logic Arch vs. µarch registers ISA F B D S Transistors G G Physics/Chemistry S Lecture 1 D S 7 CS352H Topics • • • • Technology Trends Instruction set architectures Pipelining Modern pipelined architectures – Dynamic ILP machines – Static ILP machines • • • • Cache memory systems Virtual memory Multiprocessors Computer system implementation Lecture 1 8 Making This Class Work For You • Plus and minus grades • Clickers CS352H Fall 2007 Lecture 1 9 I/O Chan Link ISA API What is Computer Architecture? Interfaces Technology IR Regs Machine Organization Applications Computer Architect Lecture 1 Measurement & Evaluation 10 Technology Constraints • Yearly improvement 1000nm 350nm – Semiconductor technology 250nm130nm 800nm • 60% more devices per chip 1989 (doubles every 18 months) 1992 • 15% faster devices (doubles every 5 years) 1995 • Slower wires – Magnetic Disks 1998 • 60% increase in density – Circuit boards 20 • 5% increase in wire 02 density 20 – Cables 06 • no change 90nm >100x more devices since 1989 10x faster devices Lecture 1 11 Changing Technology leads to Changing Architecture • 1970s – multi-chip CPUs – semiconductor memory very expensive – microcoded control – complex instruction sets (good code density) • 1990s – lots of transistors – complex control to exploit instruction-level parallelism • 2000s – – – – • 1980s – single-chip CPUs, on-chip RAM feasible – simple, hard-wired control – simple instruction sets – small on-chip caches even more transistors Power wall Transition to CMPs Multi-level caches • 2010s – Embedded vs. Desktop vs. Data center (cloud) – New storage (PCM, flash) – Simpler cores and lots of them – Optimizing for power Lecture 1 12 Intel 4004 - 1971 • The first microprocessor • 2,300 transistors • 108 KHz • 10m process Lecture 1 13 Some Recent Chips! Intel Pentium IV • 42 million transistors • 4GHz • 0.13m process • Could fit ~15,000 4004s on this chip! net revenue was around $35 billion a year for most of the a Intel Itanium II (Montecito) R&D about NVidia - GeForce$5 6800 billion a year • 222 million transistors • 400MHz • 0.13m process Lecture 1 • 1.7 billion transistors • 1.6 GHz • 90nm process IBM Cell • 8 vector processors + 1 PPC • 4 GHz • 90nm process 14 Any Architecture You Want (as long as it is x86) CS352H Fall 2007 Lecture 1 15 Application Constraints • Applications drive machine ‘balance’ – Numerical simulations • floating-point performance • main memory bandwidth – Transaction processing • I/Os per second • integer CPU performance – Decision support • I/O bandwidth – Embedded control • I/O timing, power – Media processing • low-precision ‘pixel’ arithmetic Lecture 1 16 Application-Driven Architectures • General purpose - good performance on “all” programs – x86 family, ARM, powerPC, etc. • Application specificity can focus on: – Types of concurrency available – Domain of deployment (server, handheld, desktop) • Today - overview of graphics processors – Interface (instruction set architecture - ISA) – Processor organization – Concurrent elements Lecture 1 17 Apple’s iPad/iPhone4 Powered by A4 Chip • A4 is modified ARM Cortex run at 1GHz – Integrated processor, graphics, memory controller • Among other claims, ARM says the processors gets a near "25 percent processing power boost, even at same processor speed, from the use of a new instruction pipelining system." – We will cover pipelining in this class. • Claim: 10 hours of 1024x768 video at 25W • Let’s look at the Freescale i.MX51 CS352H Fall 2007 Lecture 1 18 Performance: Latency and Throughput • Latency: time to complete an operation • Throughput: work completed per unit time • Consider plumbing – Low latency: turn on faucet and water comes out – High bandwidth: lots of water (e.g., to fill a pool) • What is “High speed Internet?” – Low latency: needed to interactive gaming – High bandwidth: needed for downloading large files – Marketing departments like to conflate latency and bandwidth… Relationship between Latency and Throughput • Latency and bandwidth only loosely coupled – Henry Ford: assembly lines increase bandwidth without reducing latency • My factory takes 1 day to make a Model-T ford. – – – – But I can start building a new car every 10 minutes At 24 hrs/day, I can make 24 * 6 = 144 cars per day A special order for 1 green car, still takes 1 day Throughput is increased, but latency is not. • Latency reduction is difficult • Often, one can buy bandwidth – E.g., more memory chips, more disks, more computers – Big server farms (e.g., google) are high bandwidth What is cloud computing? • Cloud computing is where dynamically scalable and often virtualized resources are provided as a service over the Internet (thanks, wikipedia!) • Infrastructure as a service (IaaS) – Amazon’s EC2 (elastic compute cloud) • Platform as a service (PaaS) – Google gears – Microsoft azure • Software as a service (SaaS) – gmail – facebook – flickr Thanks, James Hamilton, amazon Graphics has dedicated chip in PCs Memory Memory Memory Memory Memory Controller Chip CPU (“North Bridge”) 582 Million transistors Input/Output Glue Chip (“South Bridge”) Graphics Processor 681 Million transistors (GeForce 8800, 90nm) (Intel “Kentsfield” quad core, QX6700, 65nm, two dies, 8MB L2$) (AGP, PCIe) Disk, Keyboard, PCIe, etc. Lecture 1 23 GFLOPS GPU/CPU Performance comparison * IBM Cell ~200 GFlops Core 2 Quad 3GHz, 96 GFLOPS * G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce 6800 Ultra NV35 = GeForce FX 5950 Ultra NV30 = GeForce FX 5800 Source: NVIDIA (except CELL and Core2 Quad) Lecture 1 24 Why a dedicated processing chip? • 1) Specialization – becoming less important with time • 2) Parallelism – becoming more important Graphics processors are the only highly-parallel processors in every desktop machine. 128 “processors” * 2 FLOPS @ 1.35 GHz You can program them! CS352H Fall 2007 Lecture 1 25 Graphics requires programmability Every application does something a bit different. Example Cg “shader” program (invoked like a “callback” function): void normalmapped(float2 normalMapTexCoord : TEXCOORD0, … out float4 color : COLOR, uniform float ambient, …) { float3 normalTex, …; normalTex = tex2D(normalMap, normalMapTexCoord).xyz; … diffuse = saturate(dot(normal, normLightDir); … color = Kd * (ambient + diffuse ) + Ks * pow(specular, specularExponent; } Lecture 1 26 GeForce 8800 Lecture 1 27 Next Time • • • • Performance evaluation Basic computer organization How chips are made Start in on instruction set review/overview • Always check web page for assignments Lecture 1 28

51,000 تومان