CS 352H: Computer Systems Architecture
اسلاید 1: Lecture 1 1CS 352H: Computer Systems ArchitectureLecture 1: What is Computer Architecture and why should I care?Professor Emmett WitchelUniversity of Texas at Austinwitchel@cs.utexas.edu
اسلاید 2: Lecture 1 2GoalsUnderstand the “how” and “why” of computer system organizationInstruction Set ArchitectureSystem Organization (processor, memory, I/O)MicroarchitectureVirtualizationLearn methods of evaluating performanceMetrics & benchmarksLearn how to make systems go fastPipelining, cachingParallelism (ILP, DLP, TLP)Application specific architectures (graphics, signal proc.)Preview of where architecture is heading
اسلاید 3: Lecture 1 3LogisticsLecturesT/Th 12:30-2:00pm, PAI 3.14InstructorProf. Emmett Witchel, W 1:15-2:15TAShalini Sahoo MW 11:30-1:00pm PAI 5.38 Desk1Gradingsee web pageTextsHennessy & Patterson, Computer Organization and Design (Fourth Edition)Including CDRevised Fourth Edition preferred, not required
اسلاید 4: Lecture 1 4CS352H OnlineURL: www.cs.utexas.edu/users/witchel/CS352HI will occasionally email you via blackboard and by your registered email address. I expect this channel to be reliable and timely.discussion group: via blackboard login at courses.utexas.eduGeneral, Homeworks, ProjectComputer Architecture Seminar Series: www.cs.utexas.edu/users/cart/arch
اسلاید 5: Lecture 1 5Assignment for Next TuesdayTurn in student survey forms, if you wantRead the Moore paper (see webpage)Write a review of 1/2-1 page (see syllabus)Review should includeSummary of content of paperYour observations on the most interesting/important aspectsYour observations on its relevance todayBe prepared to discuss on Tuesday in class
اسلاید 6: DiscussionAre you interested in taking this course?One question about computer scienceOne question about computer architectureCS352HFall 2007Lecture 1 6
اسلاید 7: Lecture 1 7SpecificationProgramISA (Instruction Set Architecture)microArchitectureLogicTransistorsPhysics/Chemistrycompute the fibonacci sequencefor(i=2; i<100; i++) { a[i] = a[i-1]+a[i-2];}load r1, a[i];add r2, r2, r1;registersABSFGDSGSDArch vs. µarch
اسلاید 8: Lecture 1 8CS352H TopicsTechnology TrendsInstruction set architecturesPipeliningModern pipelined architecturesDynamic ILP machinesStatic ILP machinesCache memory systemsVirtual memoryMultiprocessorsComputer system implementation
اسلاید 9: Making This Class Work For YouPlus and minus gradesClickersCS352HFall 2007Lecture 1 9
اسلاید 10: Lecture 1 10What is Computer Architecture?TechnologyApplicationsComputer ArchitectInterfacesMachine OrganizationMeasurement &EvaluationISAAPILinkI/O ChanRegsIR
اسلاید 11: Lecture 1 11Technology ConstraintsYearly improvementSemiconductor technology60% more devices per chip (doubles every 18 months)15% faster devices (doubles every 5 years)Slower wiresMagnetic Disks60% increase in densityCircuit boards5% increase in wire densityCablesno change1998199519921989>100x more devices since 1989 10x faster devices2002200690nm130nm1000nm800nm350nm250nm
اسلاید 12: Lecture 1 12Changing Technology leads to Changing Architecture1970smulti-chip CPUssemiconductor memory very expensivemicrocoded controlcomplex instruction sets (good code density)1980ssingle-chip CPUs, on-chip RAM feasiblesimple, hard-wired controlsimple instruction setssmall on-chip caches1990slots of transistorscomplex control to exploit instruction-level parallelism2000seven more transistorsPower wallTransition to CMPsMulti-level caches2010sEmbedded vs. Desktop vs. Data center (cloud)New storage (PCM, flash)Simpler cores and lots of themOptimizing for power
اسلاید 13: Lecture 1 13Intel 4004 - 1971The first microprocessor2,300 transistors108 KHz10mm process
اسلاید 14: Lecture 1 14Some Recent Chips!Intel Pentium IV42 million transistors4GHz0.13mm processCould fit ~15,000 4004s on this chip!NVidia - GeForce 6800222 million transistors400MHz0.13mm processIntel Itanium II (Montecito)1.7 billion transistors1.6 GHz90nm processIBM Cell8 vector processors + 1 PPC4 GHz90nm processIntel’s net revenue was around $35 billion a year for most of the aughtsR&D about $5 billion a year
اسلاید 15: CS352HFall 2007Lecture 1 15Any Architecture You Want (as long as it is x86)
اسلاید 16: Lecture 1 16Application ConstraintsApplications drive machine ‘balance’Numerical simulationsfloating-point performancemain memory bandwidthTransaction processingI/Os per secondinteger CPU performanceDecision supportI/O bandwidthEmbedded controlI/O timing, powerMedia processinglow-precision ‘pixel’ arithmetic
اسلاید 17: Lecture 1 17Application-Driven ArchitecturesGeneral purpose - good performance on “all” programsx86 family, ARM, powerPC, etc.Application specificity can focus on:Types of concurrency availableDomain of deployment (server, handheld, desktop)Today - overview of graphics processorsInterface (instruction set architecture - ISA)Processor organizationConcurrent elements
اسلاید 18: Apple’s iPad/iPhone4 Powered by A4 ChipA4 is modified ARM Cortex run at 1GHzIntegrated processor, graphics, memory controllerAmong other claims, ARM says the processors gets a near 25 percent processing power boost, even at same processor speed, from the use of a new instruction pipelining system. We will cover pipelining in this class.Claim: 10 hours of 1024x768 video at 25WLet’s look at the Freescale i.MX51CS352HFall 2007Lecture 1 18
اسلاید 19: Performance: Latency and ThroughputLatency: time to complete an operationThroughput: work completed per unit timeConsider plumbingLow latency: turn on faucet and water comes outHigh bandwidth: lots of water (e.g., to fill a pool)What is “High speed Internet?”Low latency: needed to interactive gamingHigh bandwidth: needed for downloading large filesMarketing departments like to conflate latency and bandwidth…
اسلاید 20: Relationship between Latency and ThroughputLatency and bandwidth only loosely coupledHenry Ford: assembly lines increase bandwidth without reducing latencyMy factory takes 1 day to make a Model-T ford.But I can start building a new car every 10 minutesAt 24 hrs/day, I can make 24 * 6 = 144 cars per dayA special order for 1 green car, still takes 1 dayThroughput is increased, but latency is not.Latency reduction is difficultOften, one can buy bandwidthE.g., more memory chips, more disks, more computersBig server farms (e.g., google) are high bandwidth
اسلاید 21: What is cloud computing?Cloud computing is where dynamically scalable and often virtualized resources are provided as a service over the Internet (thanks, wikipedia!)Infrastructure as a service (IaaS)Amazon’s EC2 (elastic compute cloud)Platform as a service (PaaS)Google gearsMicrosoft azureSoftware as a service (SaaS)gmailfacebookflickr
اسلاید 22: Thanks, James Hamilton, amazon
اسلاید 23: Lecture 1 23Graphics has dedicated chip in PCsCPUMemoryInput/Output Glue Chip (“South Bridge”)Graphics ProcessorMemory Controller Chip (“North Bridge”)MemoryMemoryMemoryDisk, Keyboard, PCIe, etc.582 Million transistors681 Million transistors (GeForce 8800, 90nm)(AGP, PCIe) (Intel “Kentsfield” quad core, QX6700, 65nm, two dies, 8MB L2$)
اسلاید 24: Lecture 1 24GPU/CPU Performance comparisonGFLOPSG80 = GeForce 8800 GTXG71 = GeForce 7900 GTXG70 = GeForce 7800 GTXNV40 = GeForce 6800 UltraNV35 = GeForce FX 5950 UltraNV30 = GeForce FX 5800Source: NVIDIA (except CELL and Core2 Quad)* IBM Cell ~200 GFlopsCore 2 Quad 3GHz, 96 GFLOPS *
اسلاید 25: CS352HFall 2007Lecture 1 25Why a dedicated processing chip?1) Specialization – becoming less important with time2) Parallelism – becoming more important Graphics processors are the only highly-parallel processors in every desktop machine. 128 “processors” * 2 FLOPS @ 1.35 GHzYou can program them!
اسلاید 26: Lecture 1 26Graphics requires programmabilityvoid normalmapped(float2 normalMapTexCoord : TEXCOORD0, … out float4 color : COLOR, uniform float ambient, …){ float3 normalTex, …; normalTex = tex2D(normalMap, normalMapTexCoord).xyz; … diffuse = saturate(dot(normal, normLightDir); … color = Kd * (ambient + diffuse ) + Ks * pow(specular, specularExponent; }Every application does something a bit different. Example Cg “shader” program (invoked like a “callback” function):
اسلاید 27: Lecture 1 27GeForce 8800
اسلاید 28: Lecture 1 28Next TimePerformance evaluationBasic computer organizationHow chips are madeStart in on instruction set review/overviewAlways check web page for assignment
نقد و بررسی ها
هیچ نظری برای این پاورپوینت نوشته نشده است.