صفحه 1:
Cloud Computing:
Concepts, Technologies
and Business Implications
B. Ramamurthy & K. Madurai
bina@buffalo.edu & kumar.madurai@ctg.com
This talks is partially supported by National
Science 01 Due #0920335,
© Wipro Chennai 2011 6/23/2010 ۶
صفحه 2:
Outline of the talk
* Introduction to cloud context
© Technology context: multi-core, virtualization, 64-bit
processors, parallel computing models, big-data storages...
۵ Cloud models: laaS (Amazon AWS), PaaS (Microsoft Azure),
SaaS (Google App Engine)
* Demonstration of cloud capabilities
0 Cloud models
0 Data and Computing models: MapReduce
© Graph processing using amazon elastic mapreduce
- A case-study of real business application of
the cloud
* Questions and Answers
© Wipro Chennai 2011 6/23/2010 #2
صفحه 3:
Speakers’ Background in cloud
computing
antic technology business,
0 CO!
9
9h | of Management, University,
© Wipro Chennai 2011 6/23/2010 ده
صفحه 4:
Introduction: A Golden Era in
Computing
Powerful
multi-core
processors
۳ General
Explosion of purpose
willie هر 25
ie Ss 0
5-2
software
methodologies
Virtualization
Wider bandwidth leveraging the
for communication powerful
hardware
صفحه 5:
Cloud Concepts, Enabling-
technologies, and Models:
The Cloud Context
صفحه 6:
Evolution of Internet Computing
scale 35
web deep web
۶ ee secre 2
<=
9
=
&
@
c
z
2
4 5
8 s
2 5
<
E 0 ] عا
2) |S F 2
a 8
0 0 3
time
© Wipro Chennai 2011 6/23/2010 »6
صفحه 7:
Top Ten Largest Databases
Top ten largest databases (2007)
7000
000
‘5000
4000
3000
2000
4 = اليم ب سه سه سس 6 ل .و
ععصات ۱۴۵0 CIA Amazon “YOUTube ChaicePt Sprint Google AT&T عه
Ref: http://www.focus.com/fyi/operations/10-largest-databases-in-the-world/
© Wipro Chennai 2011 623/2010 7
صفحه 8:
Challenges
+ Alignment with the needs of the business / user / non-
computer specialists / community and society
٠ Need to address the scalability issue: large scale
data, high performance computing, automation,
response time, rapid prototyping, and rapid time to
production
+ Need to effectively address (i) ever shortening cycle
of obsolescence, (ii) heterogeneity and (iii) rapid
changes in requirements
+ Transform data from diverse sources into intelligence
and deliver intelligence to right people/user/systems
+ What about providing all this in a cost-effective
manner?
© Wipro Chennai 2011 6/23/2010 #8
صفحه 9:
Enter the cloud
* Cloud computing is Internet-based computing,
whereby shared resources, software and
information are provided to computers and other
devices on-demand, like the electricity grid.
* The cloud computing is a culmination of
numerous attempts at large scale computing with
seamless access to virtually limitless resources.
© on-demand computing, utility computing, ubiquitous computing,
autonomic computing, platform computing, edge computing, elastic
computing, grid computing, .
© Wipro Chennai 2011 6/23/2010 9
صفحه 10:
“Grid Technology: A slide from my presentation
to Industry (2005)
٠ Emerging enabling technology.
+ Natural evolution of distributed systems and the
Internet.
+ Middleware supporting network of systems to
facilitate sharing, standardization and openness.
* Infrastructure and application model dealing with
sharing of compute cycles, data, storage and other
resources.
+ Publicized by prominent industries as on-demand
computing, utility computing, etc.
ring “
” to masses
© Wipro Chennai 1 6/23/2010 #10
صفحه 11:
It is a changed world now...
٠ Explosive growth in applications: biomedical informatics, space
exploration, business analytics, web 2.0 social networking:
YouTube, Facebook
= ی scale content generation: e-science and e-business data
leluge
٠ Extraordinary rate of digital content consumption: digital gluttony:
Apple iPhone, iPad, Amazon Kindle
+ Exponential growth in compute capabilities: multi-core, storage,
bandwidth, virtual machines (virtualization)
٠ Very short cycle of obsolescence in technologies: Windows Vista>
Windows 7; Java versions; C>C#; Phython
+ Newer architectures: web services, persistence models,
distributed file systerms/repos|tories (Google, Hadoop), multi-core,
wireless and mobile
* Diverse knowledge and skill levels of the workforce
٠ You amply cannot manage this complex situation with your
traditional IT infrastructure:
© Wipro Chennai 2011 6/23/2010 @11
صفحه 12:
Answer: The Cloud Computing?
۰ ۳ requirements and models:
© platform (PaaS),
© software (SaaS),
© infrastructure (laaS),
© Services-based application programming interface (API)
* Acloud computing environment can provide one
or more of these requirements for a cost
٠ Pay as you go model of business
+ When using a public cloud the model is similar to
renting a property than owning one.
* An organization could also maintain a private
cloud and/or use both.
© Wipro Chennai 2011 6/23/2010 2
صفحه 13:
Enabling Technologies
© Wipro Chennai 2011 6/23/2010 #13
صفحه 14:
امس ۸ دی 0۸۸ ۸ ۱21۸111110011
Providers
essible through
eb services
5 ات
سس
© Wipro Chennai 2011 6/23/2010 #14
صفحه 15:
EY
Windows Azure
+ Enterprise-level on-demand capacity builder
* Fabric of cycles and storage available on-request
for a cost
* You have to use Azure API to work with the
infrastructure offered by Microsoft
* Significant features: web role, worker role , blob
storage, table and drive-storage
© Wipro Chennai 2011 6/23/2010 #15
صفحه 16:
amazon
webservices”
Amazon EC2
+ Amazon EC2 is one large complex web service.
* EC2 provided an API for instantiating computing
instances with any of the operating systems
supported.
٠ It can facilitate computations through Amazon
Machine Images (AMIs) for various other models.
+ Signature features: S3, Cloud Management
Console, MapReduce Cloud, Amazon Machine
Image (AMI)
٠ Excellent distribution, load balancing, cloud
monitoring tools
© Wipro Chennai 2011 6/23/2010 #16
صفحه 17:
Google App Engine
* This is more a web interface for a development
environment that offers a one stop facility for
design, development and deployment Java and
Python-based applications in Java, Go and Python.
* Google offers the same reliability, availability and
scalability at par with Google’s own applications
* Interface is software programming based
* Comprehensive programming platform
irrespective of the size (small or large)
٠ Signature features: templates and appspot,
excellent monitoring and management console
© Wipro Chennai 2011 6/23/2010 #17
صفحه 18:
Demos
* Amazon AWS: EC2 & S3 (among the many
infrastructure services)
© Linux machine
6 Windows machine
© Athree-tier enterprise application
* Google app Engine
© Eclipse plug-in for GAE
© Development and deployment of an application
* Windows Azure
© Storage: blob store/container
© MS Visual Studio Azure development and production environment
© Wipro Chennai 2011 6/23/2010 8
صفحه 19:
Cloud Programming
Models
© Wipro Chennai 2011 6/23/2010 #19
صفحه 20:
The Context: Big-data
+ Data mining huge amounts of data collected in a wide range of
domains from astronomy to healthcare has become essential for
planning and performance.
+ Weare in a knowledge economy.
© Data is an important asset to any organization
© Discovery of knowledge; Enabling discovery; annotation of
data
© Complex computational models
© No single environment is good enough: need elastic, on-
demand capacities
+ Weare looking at newer
© Programming models, and
© Supporting algorithms and data structures.
‘© Wipro Chennai 2011 6/23/2010 #20
صفحه 21:
Google File System
* Internet introduced a new challenge in the form
web logs, web crawler’s data: large scale “peta
scale”
+ But observe that this type of data has an uniquely
different characteristic than your transactional or
the “customer order” data : “write once read many
(WORM)” ;
* Privacy protected healthcare and patient information;
* Historical financial data;
* Other historical data
* Google exploited this characteristics in its Google
file system (GFS)
© Wipro Chennai 2011 6/23/2010 #21
صفحه 22:
What is Hadoop?
® At Google MapReduce operation are run ona
special file system called Google File System (GFS)
that is highly optimized for this purpose.
® GFS is not open source.
® Doug Cutting and others at Yahoo! reverse
engineered the GFS and called it Hadoop
Distributed File System (HDFS).
© The software framework that supports HDFS,
MapReduce and other related entities is called the
project Hadoop or simply Hadoop.
® This is open source and distributed by Apache.
© Wipro Chennai 2011 6/23/2010 #22
صفحه 23:
Fault tolerance
Failure is the norm rather than exception
A HDFS instance may consist of thousands of server
machines, each storing part of the file system’s data.
Since we have huge number of components and that
each component has non-trivial probability of failure
means that there is always some component that is
non-functional.
Detection of faults and quick, automatic recovery
from them is a core architectural goal of HDFS.
© Wipro Chennai 2011 6/23/2010 3
صفحه 24:
HDFS Architecture
iMetadata(Name, replicas.)
Metadata ops Vhome/foo/data,6... سر
= Bleek ops
Datanodes Datanodes
5 ۱ Write i
© Wipro Chennai 2011 6/23/2010 #24
صفحه 25:
1 1000۵ [( 21215 اها 11( ۱۱۵۱ Pile
System
(DFS Server Master node
HDES Client - —
Name Nodes
Block size: 128M
Replicated
© Wipro Chennal 2011 6/23/2010 25
صفحه 26:
What is MapReduce?
® MapReduce is a programming model Google has used
successfully is processing its “big-data” sets (~ 20000 peta
bytes per
OA map function extracts some intelligence from raw data.
OA reduce function aggregates according to some guides the
data output by the map.
Ousers specify the computation in terms of a map anda
reduce function,
@ Underlying runtime system automatically parallelizes the
computation across large-scale clusters of machines, and
O Underlying system also handles machine failures, efficient
communications, and performance issues.
-- Reference: Dean, J. and Ghemawat, S. 2008. MapReduce: simplified
data processing on large clusters. Communication of ACM 51, 1 (Jan.
2008), 107-113.
© Wipro Chennai 2011 6/23/2010 6
صفحه 27:
لا يي ع ل a ee
“mapreducable”
® Benchmark for comparing: Jim Gray’s challenge on data-
intensive computing. Ex: “Sort”
® Google uses it for wordcount, adwords, pagerank,
indexing data.
® Simple algorithms such as grep, text-indexing, reverse
indexing
® Bayesian classification: data mining domain
© Facebook uses it for various operations: demographics
© Financial services use it for analytics
® Astronomy: Gaussian analysis for locating extra-
terrestrial objects.
© Expected to play a critical role in semantic web and in
web 3.0
© Wipro Chennai 2011 6/23/2010 #27
صفحه 28:
Large scale data splits Map <key,1> بو
<key, value>paifiJ™ Reducers (say, Count)
8
“ae oa
وه 8 1
8 و و
8 oa
P-0000
تمه و 7 29 0 و
8 1
0
ی a
8 a
3
ao ۱
oye دا ۱ و
ان
a 08 ao
ee a 1
ao ۲
"يه a
60 8 1 P-0002
sa ®q,° امه و کت
0 oo
8 oo
6/23/2010 8
كك
صفحه 29:
MapReduce Engine
MapReduce requires a distributed file system and
an engine that can distribute, coordinate, monitor
and gather the results.
Hadoop provides that engine through (the file
system we discussed earlier) and the JobTracker +
TaskTracker system.
JobTracker is simply a scheduler.
TaskTracker is assigned a Map or Reduce (or other
operations); Map or Reduce run on node and so is
the TaskTracker; each task is run on its own JVM on
a node.
© Wipro Chennai 2011 6/23/2010 #29
صفحه 30:
Demos
* Word count application: a simple foundation for
text-mining; with a small text corpus of inaugural
speeches by US presidents
* Graph analytics is the core of analytics involving
linked structures (about 110 nodes): shortest
path
© Wipro Chennai 2011 6/23/2010 #30
صفحه 31:
A Case-study in Business:
Cloud Strategies
© Wipro Chennai 2011 6/23/2010 #31
صفحه 32:
Predictive Quality Project Overview
Problem / Motivation:
+ Identify special causes that relate to bad outcomes for the quality-
related parameters of the products and visually inspected defects
* Complex upstream process conditions and dependencies making the
problem difficult to solve using traditional statistical / analytical
methods
* Determine the optimal process settings that can increase the yield
and reduce defects through predictive quality assurance
Potential savings huge as the cost of rework and rejects are very high
Solution:
+ Use ontology to model the complex manufacturing processes and
utilize semantic technologies to provide key insights into how
outcomes and causes are related
* Develop a rich internet application that allows the user to evaluate
process outcomes and conditions at a high level and drill down to
specific areas of interest to address performance issues
wipro Chennai 1 6723/2010 #32
صفحه 33:
Why Cloud Computing for this
Project
+ Well-suited for incubation of new technologies
© Semantic technologies still evolving
© Use of Prototyping and Extreme Programming
© Server and Storage requirements not completely known
+ Technologies used (TopBraid, Tomcat) not part of
emerging or core technologies supported by
corporate IT
* Scalability on demand
٠ Development and implementation on a private
cloud
© Wipro Chennai 2011 6/23/2010 3
صفحه 34:
Public Cloud vs. Private Cloud
Rationale for Private Cloud:
* Security and privacy of business data was a big
concern
* Potential for vendor lock-in
* SLA’s required for real-time performance and
reliability
* Cost savings of the shared model achieved
because of the multiple projects involving
semantic technologies that the company is
actively developing
© Wipro Chennai 2011 6/23/2010 #34
صفحه 35:
Enterprise
What should IT Do
* Revise cost model to utility-based computing:
CPU/hour, GB/day etc.
* Include hidden costs for management, training
* Different cloud models for different applications -
evaluate
+ Use for prototyping applications and learn
* Link it to current strategic plans for Services-
Oriented Architecture, Disaster Recovery, etc.
© Wipro Chennai 2011 6/23/2010 5
صفحه 36:
References & useful links
* Amazon AWS: http://aws.amazon.com/free/
* AWS Cost Calculator:
http://calculator.s3.amazonaws.com/calc5.html
* Windows Azure: http://www.azurepilot.com/
* Google App Engine (GAE):
http://code.google.com/appengine/docs/whatisgo
ogleappengine.html
* Graph Analytics:
http://www.umiacs.umd.edu/~jimmylin/Cloud9/do
cs/content/Lin_Schatz_MLG2010.pdf
* For miscellaneous information:
http://www.cse.buffalo.edu/~bina
© Wipro Chennai 2011 6/23/2010 @36
صفحه 37:
Summary
We illustrated cloud concepts and demonstrated
the cloud capabilities through simple applications
We discussed the features of the Hadoop File
System, and mapreduce to handle big-data sets.
We also explored some real business issues in
adoption of cloud.
Cloud is indeed an impactful technology that is sure
to transform computing in business.
© Wipro Chennai 2011 6/23/2010 #37