CLOUD COMPUTING NOTES

CLOUD DATA MIGRATION:

Cloud Migration is the process of moving digital business operation into the cloud. Cloud migration is short of like a physical move except it involves moving data, application and IT process from some data center to other data center instead of packing up and moving physical goods. Most often cloud migration describes the move from on premises or legacy( old but still in use) infrastructure to the cloud.

WHAT IS LEGACY INFRASTRUCTURE ?

In computing, Hardware or Software is considered legacy if it is outdated but still in use. Legacy product and process are usually not as efficient or secure as more up to date solution. Businesses stuck running legacy system are in danger of falling behind their competitors, they also face an Increased risk of data breaches.

WHAT CLOUD MIGRATION STRATEGY SHOULD ENTERPRISE MAY ADOPT ?

i) Rehost

ii) Refactor

iii) Revise

iv) Rebuild

v) Replace

These five strategies are commonly called 5R.

i) Rehost: Rehosting can be thought of as the same thing but on cloud servers. Companies that choose that strategy will select an infrastructure-as-a-service provider and recreate their application architecture on that infrastructure.

ii) Refactor: Companies that choose to refactor will reuse already existing codes and frameworks. But run their application on a platform-as-a-service providers' platform. Instead of an infrastructure as a service, as in rehosting.

iii) Revise: The strategy involves partially rewriting (expanding) the code base, then deploying it by either re hosting or refactoring.

iv) Rebuild: To rebuild means rewritting and rearchitecting the application from the ground up on a paas provider platform.

v) Replace: Business can also opportunity to discard their old application all together and switch to already build software-as-a-service application form the third party vender.

**************************************

In distributed architecture components are presented in different platform and several component can co operate with one another over a communication network in order to achieve a specific objective or goal. In this architecture information processing is not confine in a single machine rather than it is distributed over several independent computer.

There are several technology framework to support distributed architecture including .Net, Dot Net web services, Access Java web services, and globus glue services.

Middleman is a infrastructure that appropriately support the development and execution of distributed application. It provides a buffer between an application and the network.

Some advantage of Distributed Architecture:

1. Resource sharing

2. Concurrency

3. Scalability

4. Fault tolerance

Disadvantages of Distributed Architecture:

1. Complexity

2. Security

3. Manageability

Client-Server Architecture:

Client: This is a first process issue a request to a second process.

Server: This is a second process that receive the request carries it out and send a reply to the client. Using this architecture the server need not know about client but the client must know the identity of server.

VIRTUALIZATION CONCEPT:

Creating a virtual machine over existing operating system and hardware is refered as hardware virtualization.

Virtual machine provides an environment that is logically seperated from the underlying hardware.

The machine on which the virtual machine is created is known as HOST machine and virtual machine is refered as a GUEST machine.

This virtual machine is managed by a software or frameware. Which is known as Hypervisor.

There are two type of Hypervisor, 1> TYPE 1 HYPERVISOR and 2> TYPE 2 HYPERVISOR

TYPE 1 HYPERVISOR:

It execute on bare(physical) system. ORACLE VM, Virtual Logic, VLX are type of Hypervisor.

TYPE 2 HYPERVISOR:

It is a software interface that emulates the devices with which a system normally interacts. Containers, KVM, Microsoft HyperV, are example of type 2 Hypervisor.

TYPES OF HARDWARE VIRTUALIZATION:

1> Full virtalization

2> Emulation virtualization

3> Para Virtualization

1> FULL VIRTUALIZATION :

In full vir. the underlying hardware is completely simulated. Guest software does not require any modification program.

2> EMULATION VIRTUALIZATION

In emulation,the virtual machine simulates the hardware and hence became independent of it. In this the Guest OS does not require modification.

3> PARA VIRTUALIZATION

In para virtualization, The hardware is not simulated the Guest software runs their own Isolated domain.

**************************************

VIRTUAL CPU:

When we install a hypervisor, each physical CPU in abstracted into virtual CPUs. It divides the available CPU cycle, for each core, and allow multiple vn to time share a given physical processor core. The hyper visor critically assign 1 workload per virtual CPU. If the work load of a server need more CPU cycles, It is better to deploy fewer VN on a perticular virtual CP. Now, following example, to understand the logic of virtual CPU.

.... pic..

I have a physical server with 2 processor (Cpu 1 and Cpu2) and each of them physical port in total we have 2 * 4 = 8 physical core. Based on some calculation, our hyper visor provided for each physical core you can get 5 to 10 virtual cpu. So, total we will have 8 physical core * (5 to 10 vCPU) = 40 to 80 vCPU.

which means that we have maximum of 80 vCPU to Virtual machine.

VIRTUAL MEMORY:

Virtual memory is a simple work in the Ram with the machine. The memory resource getting a virtual machine determines, how much the host memory is allocated to the virtual machine.The virtual hardware memory size determines how much memory is available to application that run in the virtual machine. You can add, change and configure virtual memory resource or option to enhance virtual machine performance. You can say most of the memory parameter while creating the virtual machine or it can also be done after the guest OS is installed. Most of the hypervisor require to power off the virtual machine before changing these settings.

...pic...

In the following diagram, we can see the total physical memory is divided between two virtual machines.

VIRTUAL STORAGE:

Storage Virtualization is a pooling of physical storage from multiple network storage devices, Into what? appears to be a single storage device that is managed from a central console. We cannot assign more storage to virtual machine that data cluster offer physically. In the following example, we have a data cluster of 12TB in total and 4VM to which we have allocated storage to each of them. In total, the maximum storage is allocated to them is only 12TB.

VIRTUAL NETWORKING:

We have VM 1,2,3 and 4 running on same host. They would like to send the network traffic that and forth. This is done by Virtual networking card as shown in the following diagram which connect virtually with a virtual switch that is created by the hypervisor. This virtual switch communicates with a physical NIC card of the server which is connected with a physical switch and then communicates the rest of the network equipment.

IMPORTANT QUESTIONS: 5 MARKS

1> WHAT ARE THE LAYERS AND TYPES OF CLOUD?

2> WRITE NOTE OF DESIRE FEATURES OF A CLOUD.

3> STATE ESSENTIAL CHARACTERISTIC OF CLOUD COMPUTING.

4> EXPLAIN IN DETAIL ABOUT CLOUD DELIVERY MODEL.

5> DISCUSS THE OPERATIONAL AND ECONOMIC BENEFIT OF SAC(SOFTWARE AS-A SERVICE).

6> WHAT ARE THE SECURITY CONSTRAIN IN CLOUD COMPUTING?

7> WRITE NOTES ON GREED GRID AND CLOUD.

8> HOW DO YOU IMPLEMENT THE HYBRID CLOUD.

9> DEFINE VIRTUAL CPU.

10> DEFINE VIRTUAL MEMORY

11> VIRTUAL STORAGE

12> VIRTUAL NETWORK

13> WRITE A SHORT NOTE ON SOFTWARE VIRTUALIZATION AND NETWORK VIRTUALIZATION.

14> WHAT IS MEAN BY MIGRATING IN CLOUD COMPUTING? WHY WE USE MIGRATING?

15> DISCUSS THE ADVANTAGES OF SOA (SERVICE-ORIENTED-ARCHITECTURE).

16> WRITE SHORT NOTES ON VIRTUALIZATION AND GOOGLE APP ENGINE.

17> WRITE SHORT NOTES HADOOP MAP REDUCE.

18> HIGH PERFORMANCE COMPUTING DEFINE HPC.

19> DIFFERENCE BETWEEN GREED AND CLOUD COMPUTING.

20> DIFFERENCE BETWEEN DISTRIBUTED AND PARALLEL COMPUTING.

21> WRITE A SHORT NOTE OF ORIGIN OF CLOUD COMPUTING.(HISTORY)

22> HOW CLOUD STORAGE IS DIFFERENT FROM ON PREMISE DATA CENTER?

Simplicity, scalability, maintenance, and accessibility of data are the features which we expect from any public cloud storage and these are main asset of .... and which is very difficult on premise data center

SIMPLICITY -

We can easily create an setup storage object in google or azure cloud.

SCALABILITY -

Storage capacity is highly scalable and elastic.

MAINTENANCE, ACCESSIBILITY & BACKUP -

Data in Azure or Google Storage is searchable and accessable through the latest web technologies like HTTP, and rest API.

Multi protocol (HTTP, TCP, etc) data access for modern application makes azure or google who standing the cloud.

Not required to bother about maintenance of data center and backup of data everything, will be of the cloud provider team.

Azure or Google replication concept is used to maintain the different copies of data and different geo location. Using this we can protect our data even if a natural disaster occur.

High availability and adjuster recovery are one of the good features provided by cloud storage which we cannot see on promise data center.

[same IP address series, same domain, network address ]

23> DESCRIBE ABOUT CLUSTER COMPUTING?

=> Cluster computing, it's a group of computers connected to each other and work together as a single computer. These computers are often link through a LAN. The cluster came to existence for the high need for them, because the computing requirement are increasing in a high rate and there's more data to process. So the cluster has been use widely in d.... The cluster is a tightly coupled system and from it's characteristic that it is a centralized job management and scheduling system. All the computers in the cluster use the same hardware and OS and the computers are the same location and connected with a very high speed connection to perform as a single computer. The resources of the cluster are managed by Centralized resource manager. The cluster is a single owned to only one organization or department. In a high-end with low latency and high bandwidth , the security in the cluster is a login password based and it has a medium level of privacy depends on user privilege.

....

Architecture of Cluster Computing:

The architecture of cluster computing contains some main components like i) multiple standalone computers. ii) Operating system. iii) High performance inter connect iv) Communication software and v) application platform.

Advantages:

In the Cluster software, Is automatically installed and configured and the nodes of the cluster can be added and managed easily. So, It is a very easy to deploy, it is a open system and very cost-effective to acquire and manage, cluster have many sources of support and supply, It is fast and very flexible, the system is optimized for performance as well as simplicity and it can change software configuration at any time. Also It saves the time of searching the net for the latest drivers. The cluster system is very supportive as it includes software update.

Disadvantage:

Cluster computing contains some disadvantages such as that it is hard to be managed with out experience. Also when the size of the cluster is large (node) it will be difficult to find out something has failed, the programming environment is hard to be improved when software of some node is different from other.

*******************************************************************

WHAT IS DATA CENTER ?

All the data center are essentially building that provides space, power and cooling for network infrastructure. A data center design is based on a network of computing and storage resource that enables the delivery of shared application and data. the key component of a data center design include router, switch, firewall, storage system, server and application delivery controller.

WHY ARE DATA CENTER IMPORTANT TO BUSINESS ?

Data center are integral part of the enterprise, design to support business application and provide services such as,

1> Data storage and management and backup and recovery.

2> Productivity application such as email service,

3> High volume E-commerce transaction.

4> Powering , online gaming community.

5> Big-data, Machine learning and artificial intelligence based application.

Today there are 7 million data center world wide, practically every business and government entity build and maintain it's own data center or has access to someone else, if not both model. Many options are available today such as renting server, at a poor location facility, maintaining data center service by third party. or using public cloud based service like Amazon, Microsoft, Google etc.

The Core component of a data center :

The primary element of data center break down as follows:

1> Facility:

The usable phase available for IT equipment. Providing round the clock access to information make data center some of the world most energetic consuming facility. Design to optimize phase and environmental control to kepp equipment within specific temperature -humidity range.

2> Core component:

Equipment and software for IT operation and storage of data and application. this may include storage system , server, network infrastructure such as switch, router, firewall, load balancer etc.

3> Support Infrastructure:

Equipment contributing to securely sustaining the highest availability procedures. The Up time institute has define four tyres of data center with availability ranging from 99.671% to 99.995%.

Some component for supporting infrastructure include

a) UPS : Battery bank generator and redundant power source.

b) Environmental Control: Computer room air-conditional (E_RAC), Sitting ventilation and air conditioning system (HVAC).

c) Physical Security system: Bio metric and video surveillance system.

d) Operation task: Personal available to monitor operating and maintain IT and infrastructure around the clock.

Q> WHAT ARE THE STANDARD FOR DATA -CENTER INFRASTRUCTURE?

The most widely adoptive data center design in ANSI/TIA-942. It include standard for ANSI/TIA-942 ready certification. Which ensure compliance with one or four category of data center tier. Rated for level of redundancy and fault tolerance.

TIER -1 :

Basic size infrastructure, Tier 1 data center occur limited protection against physical event. It has single capacity component and a single non redundant distribution path.

TIER -2:

Redundant capacity component size infrastructure, this data center offer improved protection against physical events. It has redundant capacity component and a single non-redundant distribution part.

TIER -3:

Concurrently maintainable size infrastructure , this data center protects against virtually or physical event , providing a redundant components and multiple independent distribution path. Each component can be removed or replaced without disturb the services to end user.

TIER -4:

Fault tolerant size infrastructure, this data center provides highest level of fault tolerance and redundancy. Redundancy capacity component and multiple independent distribution path enable concurrent maintainability and one fault anywhere in the installation without causing down time.

Q > TYPES OF DATA CENTER

Many types of data center and service models are available. There classification depends on whether they are owned by one or many organizations, what technology they use for computing and storage, and even their energy efficiency. There are four types of data center:

Hyper scale data center:

Co-located data center:

Wholesale Co-location data center:

Enterprise data center:

There are another one, i.e.

Cloud data center:

Communication (HTTPS security, firewall) , Application(tier authentication,server load balance, database SSL certificate), Storage security( encryption algo, ) ,

*******************************************************

MARKS: 5

1> EXPLAIN HADOOP CORE COMPONENTS ?

2> STABILITY OF A TWO LEVEL RESOURCE ALLOCATION ARCHITECTURE.

3> EXPLAIN MODERN IMPLEMENTATION OF SAAS (SOFTWARE AS A SERVICE) USING SOA COMPONENTS.

4> DEFINE HYPERVISOR AND IT'S TYPE IN CLOUD COMPUTING WITH DIAGRAM.

5> LIST OUT THE TOP 10 OBSTACLES AND OPPORTUNITIES FOR ADOPTION AND GROWTH OF CLOUD COMPUTING.

6> EXPLAIN THE WORKING OF BROKERED CLOUD STORAGE ACCESS SYSTEM WITH DIAGRAM.

7> DESCRIBE USE OF EC2(AMAZON ELASTIC) TO SERVICE IN AWS OR AMAZON PLATFORM

8> ANALYSES THE REASON OF INTRODUCING STORAGE AREA NETWORK (SAN).

9> WHAT IS CLOUD BASED STORAGE. EXPLAIN MANAGED AND UNMANAGED CLOUD STORAGE WITH EXAMPLE.

10> BRIEFLY DESCRIBE OPEN STACK WITH ITS APPLICATION.

MARKS :10

1> EXPLAIN BRIEFLY THE SECURITY CONCERN OF CLOUD COMPUTING.

2> DISCUSS THE OPERATIONAL AND ECONOMICAL BENEFITS OF SAAS.

3> EXPLAIN DEPLOYMENT MODEL OF CLOUD IN DETAIL.

4> EXPLAIN VIRTUAL MACHINE SECURITY IN CLOUD COMPUTING .

5> DESCRIBE IN DETAIL ABOUT PROVIDER DATA AND IT'S SECURITY.

6> DEFINE SERVICE ORIENTED ARCHITECTURE. EXPLAIN COMPONENT OF SERVICE ORIENTED ARCHITECTURE.

7> DISCUSS THE KEY PRINCIPLE OF SERVICE ORIENTED ARCHITECTURE.

8> DEFINE VIRTUALIZATION. WHAT IS THE NEED OF VIRTUALIZATION IN CLOUD COMPUTING.

9> WRITE SHORT NOTES ON AMAZON S3(STATIC STORAGE) AND AMAZON SIMPLE DB.

10> VIRTUALIZATION OF CPU , MEMORY AND INPUT OUTPUT DEVICE.

11> EXPLAIN THE GFS (GOOGLE FILE SYSTEM) CLUSTER ARCHITECTURE WITH SUITABLE BLOCK DIAGRAM.

12> EXPLAIN THE VIRTUALIZATION FOR DATA CENTER AUTOMATION.

13> EXPLAIN THE CHALLENGES AND LEGAL ISSUSES OF CLOUD COMPUTING.

14> WHAT ARE THE ROLE OF WEB SERVICES IN CLOUD COMPUTING.

15> WHAT IS A DATA CENTER.

16> DEFINE THE CORE COMPONENTS AND STANDARD OF A DATA CENTER.

17> EXPLAIN WITH DIAGRAM HDFS (HADDOF FILE SYSTEM) ARCHITECTURE.

18> EXPLAIN CLOUD INFRASTRUCTURE SECURITY AT APPLICATION LEVEL.

19> DISCUSS SCHEDULING ALGORITHM FOR CLOUD COMPUTING.

20> WRITE SHORT NOTES ON BROKER CLOUD STORAGE ACCESS AND STORAGE LOCATION AND TENANCY.

21> DESCRIBE ABOUT CLUSTER COMPUTING.

************************************************************************

12> EXPLAIN THE VIRTUALIZATION FOR DATA CENTER AUTOMATION.

The dynamic nature of cloud computing, has pushed data center work load, server and even hardware automation to whole new level. Now, any data center provider looking to get into cloud computing, must look at some form of automation to help them be as agile as possible in the cloud world. New technologies are forcing data center providers to adopt new methods to increase efficiency, scalability and redundancy. There are numerous big trends increased use of data center facility. the trends are More Device, More user, More Cloud, More work load and a lot of More Data.

The automation layers are given below,

a) Server layer:

Server, hardware automation had come a long way. Administrator only need to deploy one server profile and allow new server to pick up those settings. More data center are trying to get into the cloud business. This means deploying high density, fast provision, server and blades(high end server is blade server). With the on demand nature of the cloud being able to quickly deploy fully configured server is a big plus for staying agile and very protective.

b) Software layer:

Entire application can be automated and provision based on the uses and resource utilization. Using the latest load balancing tools (i.e. traffic balancing) (F5 load balancing) administrator are able to say threshold for key application running within the environment. If a load balancer F5 or a netscaler (citrix netscaler) for example, see that a certain type of application is receiving too many connections, It can set off a process that you allow the administrator to provision another instance of the application or a new server which will host the air.

c) Virtual layer:

The modern data center is now full of virtualization and virtual machine. In using solution like Cirtix presentation server, administrator are able to take work load provisioning to a hole new level. Imagine being able to set a process that will kick start the creation of new virtual server when one starts to get over utilized. Now administrator can create truly automated virtual machine environment where each work load is monitored, managed, and controlled.

d) Cloud layer:

This is a new and still emerging field. Still some very large organization are already deploying technology like Open Stack, Cloud Stack, Open Nebula. Furthermore, they are trying this platforms in with big data management solution like map reduce and Hadoop. Organization can deploy distributed data center and had the entire cloud layer managed by a cloud control software platform. Engineer are able to monitor workloads, How data is being distributed and the health of the cloud infrastructure. (gossiping protocol)(Cassandra database protocol). The great part about this technology is that organization can deploy a true private cloud with as much control and redundancy as a public cloud instance.

e) Data center layer:

Although entire data center automation technologies are not quiet here yet. We are saying more robotics appear with in the data center environment. So, robotic arms already control massive tape library for google and robotic automation, is a thoroughly discussed concept among other large data center provider. In working with modern data center technologies, administrator strive to be as efficient as possible, this means deploying new types of automation solutions which span the entire technology stack.

**********************************************

5> DISCUSS THE OPERATIONAL AND ECONOMIC BENEFIT OF SAAS(SOFTWARE AS-A SERVICE).

OPERATIONAL BENEFIT OF SAAS :

a) Managing business driven IT project

A SAAS model provide a necessary infrastructure and thus leads to technology projects that address true business need.

b) Increasing consumer demand -

SAAS model provides reliability to deliver near perfect 99.99% system availability. So any number of user can access the system at any time from any where.

c) Addressing growth -

Each model provide scalability that is easily supported by an increasing number of consumer to meet there own objective.

d) Servicing New Market Quickly and Easily -

SAAS allow the organization to quickly and easily add program so as to add up the changes based on the demand at a faster rate.

e) On Demand -

The solution is self-served and available for use as needed.

f) Scalable -

It allows for the infinite scalability and quick processing time.

ECONOMIC BENEFIT OF SAAS :

SAAS not only save time but also has greater financial benefit.

a) It reduces IT expenses

b) The implementation cost of SAAS is much lower than traditional software.

c) It redirects saving expenses towards businesses improvement by utilizing SAAS we are free to use as much of any software as we needed. These gives you easy and economical access to many programs

d) SAAS vendor release upgrade for their software, thus user need not put any effort into installing and upgrading the software.

e) Another main benefit in SAAS is that it can quickly and easily be accessed from any where by using a latest web browser.

capex and opex

Rajesh Bose Books: https://www.amazon.in/Computers-Internet-Rajesh-Bose-Books/s?rh=n%3A1318105031%2Cp_27%3ARajesh+Bose

*************************************************

1. Differences between Public, Private, Hybrid and Community cloud.

Feature	Public	Private	Hybrid	Community
Host	Service Provider	Enterprise	Enterprise	Community or Third-Party
Suitable for	Large Enterprise	Large Enterprise	Small / Middle Enterprise	Financial, Health, Legal company
Access	Internet	Intranet, VPN	Intranet, VPN	Intranet, VPN
Security	Low	Most secure	Moderate	Secured
Cost	Cheapest	High Cost	Cost-Effective	Cost-Effective
Owner	Service Provider	Enterprise	Enterprise	Community
Users	Organization, Public like individuals	Business Organization	Business Organization	Community Member
Reliability	Moderate	Very High	Medium To high	Very High
Scalability	Very High	Limited	Very High	Limited

2. Service Model

Software-as-a-Service: There are no CAPEX and OPEX concept.

Infrastructure-as-a-Service: There are only Operational Cost needed from the client side.

Platform-as-a-Service: There are only Operational cost needed from the client side.

Packaged Software : There are Capital investments and operational costs involved from the client side.

In CAPEX, there are high risk when migration is done, because Server and Storage cost is provided by Company. Cost of server and storage etc devices are given by Company where cooling, electricity etc provided by Cloud provider.

In OPEX, there are no need to wait in migration time. Any time, Company can change their Cloud providers.

CLOUD MANAGEMENT TASK

Cloud management means management of cloud data using some tools i.e. protocols and monitors all services in every second.

AUDIT SYSTEM BACKUPS:

IT Audit -

Information Technology Audit

Income Tax Audit

For example Audit Company Names: PricewaterhouseCoopers PWC, Harness- Yon. Those companies are used for Audit checking purposes.

Performance based on share market. If a company has highest share then they can be a reputed one.

Backup : Audit company looks for backup in a random order must be both ways i.e. Online backup and offline backup. In public cloud, Backup can be done anywhere. In private cloud, Backup can be done where and in which order all are maintained in SLA.

DATA FLOW OF THE SYSTEM :

SERVER HBA CARD FOR PORT system in order to service all the time.

BEWARE OF VENDOR LOCK-IN :

MIGRATION : Time, Bandwidth, TB etc rules and regulations. How to handover the data to the client. Google to Amazon switching data. Company pays for that migration.

KNOWING PROVIDER SECURITY PROCEDURE:

MONITOR CAPACITY PLANNING :

MONITOR AUDIT LOG :

EVERY LOG ABOUT MACHINES

SOLUTION TESTING AND VALIDATION:

Upcoming Client list, Prototype model is already build and client require that, Company provide that for future.

...(note)

ADMINISTRATING FEATURES OF CLOUD

Resource administration:

* Resource Configuration

* Security Enforcement

* Operations monitoring

* Provisioning of resources

* Management of policies

* Performance maintenance

* Performance optimizing

Based on some protocol and scheduling algorithm are run in cloud like First come First serve, Round Robin, Min-Min algorithm(shortest from short tasks), Max-Min algorithm(Highest workload is given first). There are only one protocol in one Load balancer.

The above picture is co-related to the given below diagram.

This is a 3 tier architecture i,e. DataCenter, V, and Database where user comes Data center.

There are server load balancer and link load balancer.

User enter VMs. Which user enters which VM, that is decided by server load balancer. There are some protocols and then using some scheduling algorithm, you can get in the VM. User authenticate and enter in VM and where data stores in DB. By authentication you can enter the database. v1, v2 and v3 has same application. Where user enters at the VMs.

There are no shortest path to enter the server because there are 1000 of users to enter that can be handled by BGP(Boader Gateway protocol).

** Cloud Router uses Border Gateway Protocol (BGP) to exchange routes between your Virtual Private Cloud (VPC) network and your on-premises network. On Cloud Router, you configure an interface and a BGP peer for your on-premises router. The interface and BGP peer configuration together form a BGP session.

HOP :

Hop count. In wired networks, the hop count refers to the number of intermediate network devices through which data must pass between source and destination. Hop count is a rough measure of distance between two hosts. ... On a layer 3 network such as Internet Protocol (IP), each router along the data path constitutes a hop ...

**********************************************

Discuss 3 different cloud storage model

1. INSTANCE STORAGE
2. VOLUME STORAGE
3. OBJECT STORAGE

INSTANCE STORAGE OR VIRTUAL DISK IN THE CLOUD

In a traditional virtualized environment, the virtual disk storage model is the eminent one. Basically, The meaning of the storage is used like a conventional virtual storage. This storage can be implemented in numerous way. For example, DAS (direct access storage) is generally used to implement instance storage.

VOLUME STORAGE OR SAN (STORAGE AREA NETWORK)

Volume storage is also known as Block storage. It supports operation like read, write and keeping the system files of running virtual machine.

As suggested by it's name Data is stored in the structured block and volume. where the files are split in to equal size block. Each block has it's own address.

Input-Output Operation Per Second(IOOPS) :
Backend IOOPS(home usage, data download) is greater than Frontend IOOPS (server storage, data upload), Data fetch can easily be done. For storage functionality we can check IOOPS. Storage Administrator handles this.

OBJECT STORAGE OR NAS (Network attached storage)

Cloud native applocation needs space for storing data that is shared between different VMs. However often, there is a need for space that can be extend to various data center across multiple geographics which is created by object storage. For example, Amazon simple storage service (S3) cararas through a single space across at entire region. Object storage data as object unlike others which go for a file hierarchy system. Now each object consist of data, meta data, and a unique identifier. Object storage, also saves a substaintial amount of unstructured data. These kiind of storage is used for storing songs or audio applications, photos on social media or online services like dropbox.

********************************

How do you implement the hybrid cloud?

Thе сlоud infrаѕtruсturе is a composition оf twо оr more diѕtinсt сlоud deployment
models (private, соmmunitу, оr рubliс) thаt remain uniquе еntitiеѕ, but are bound
tоgеthеr bу ѕtаndаrdizеd оr proprietary tесhnоlоgу thаt еnаblеѕ data аnd application
роrtаbilitу (е.g., сlоud bursting for load balancing between clouds).
Lаrgе роrtiоnѕ оf аgеnсiеѕ thаt hаvе already ѕwitсhеd ѕоmе рrосеѕѕеѕ оvеr tо сlоud
based computing solutions hаvе utilizеd hуbrid сlоud options. Fеw еntеrрriѕеѕ hаvе
the ability tо ѕwitсh over аll оf thеir IT ѕеrviсеѕ аt оnе timе, the hybrid орtiоn allows
fоr a mix оf оn bаѕе and сlоud options which рrоvidе аn easier trаnѕitiоn. NASA iѕ
оnе example оf a federal аgеnсу whо is utilizing the Hybrid Cloud Computing
dерlоуmеnt model. Its Nеbulа open-source сlоud computing project uѕеѕ a рrivаtе
сlоud fоr rеѕеаrсh аnd dеvеlорmеnt as well as a рubliс сlоud tо shared dаtаѕеtѕ with
external раrtnеrѕ and thе рubliс. Thе hуbrid сlоud соmрuting deployment model
option has аlѕо рrоvеn tо be thе сhоiсе option for ѕtаtе аnd lосаl gоvеrnmеntѕ аѕ wеll,
with states likе Miсhigаn аnd Cоlоrаdо hаving аlrеаdу declared thеir cloud
соmрuting intentions with рlаnѕ illuѕtrаting hуbrid сlоud deployment models.

Write short notes on (i) Software virtualization (ii) network virtualization.

(i) Software virtualization It is the virtualization of applications or computer programs.
One of the most widely used software virtualization is Software Virtualization
Solution (SVS) which is developed by Altris. It is similar to hardware which is
simulated as virtual machines. Software virtualization involves creating a virtual layer
or virtual hard drive space where applications can be installed. From this virtual
space, the application can be run as they have been installed onto host OS. Once user
finished using application, they can switch it off. When a application is switched off,
any changes that the application made to the host OS will be completely reversed.
This means that registry entries and installation directories will have no trace of the
application being installed, executed at all. Benefits of software virtualization are, 
The ability to run applications without making permanent registry or library changes.
The ability to run multiple versions of the same application.   The ability to install
applications that would otherwise conflict with each other.
(ii) Network virtualization
Network virtualization is the process of combining hardware and software network
resources and network functionality into a single, software based administrative entity
which is said to be virtual network. Network virtualization involves platform
virtualization. Network virtualization is categories into external network virtualization
and internal network virtualization.
External network virtualization is combining of many networks into a virtual unit.
Internal network virtualization is providing network like functionality to the software
containers on a single system. Network virtualization enables connections between
applications, services, dependencies and end users to be accurately emulated in the
test environment.

Parameter of a Server Scheduling Algorithms should maintain
1. Fault Tolerance
2. Throughput

3. Response Time,

4. Adaptability

5. Energy Consumption

Every parameter is dependent on another parameter. Thus there are Commercial benefit, Technical benefit Architecture.

Technical benefit is important.

Technically high then commercial gets low. No matter that. because all technical benefits are matched.

Platform Scheduling, Infrastructure scheduling.

Load balancing, High Availability, Fault tolarence

Load Balancing:

Improve Performance, Redundancy. More cost effective scaling.

Incase of HA, Need some downtime. Incase of Fault tolarence, Need no downtime.

GFS ARCHITECTURE:

1. Each of these is typically a commodity Linux machine, running a user level server process.

2. Files are divided into fixed size chunk. Each chunk is identified by a fixed and globally unique 64 bit chunk.

3. Chunk servers store chunks on local disk as Linux file.

4. For reliability Each chunk is replicated on multiple chunk server.

5. GFS consist of a single Master and multiple chunk server.

6. The Master maintained all file system meta data.

7. These includes the namespace, access control information, the mapping from files to chunks and the current location of chunk.

8. It also controls System wise activity such as Chunk lease management. Chunk migration between chunk server.

9. The master periodically communicates with each chunk server in heartbeat message. To give it instruction and collect its state.

10. Client interact with the master for metadata operation but all data bearing communication goes directly to the chunk server.

:::: HADOOP ::::::

WHAT IS HDFS ?

It is a unique design that provide storage for extremely large file with streaming data access pattern and it runs on commodity hardware.

Extremely large file means here we are talking about the data in range of peta bytes (1000tb).

Streaming data access pattern means HDFS is designed on principle of right once, and read many times. Once data is written large portion of data set can be processed any number of times.

Commodity hardware means hardware that is inexpensive and easily available in the market. This one of the feature which specially distinguish HDFC from other file system.

::::: WRITE AND READ OPERATION IN HDFS :::::

WRITE OPERATION in HDFS :

1 & 2. CREATE : A client initiate right operation by calling create (method )of distributed file system object which creates a new file.

Distributed file system object connects to the Namenode using RPC call and initiate a new file creation.

3. WRITE: Once a new record in Namenode is created and object of type FSDataOutputStream is returned to the client. A client uses it to write data in to HDFS.

4. DATA QUEUE : FSDataOutputStream contains DFSOutputStream object which looks after communication with DataNodes and Namenodes.

5. DATA STREAMER : There is a one more component call data streamer. Which consume this data queue. Data streamer also ask Namenode for allocation of new blocks they are by picking desirable DataNodes to be used for replication.

6. DATASTORE PIPELINE: now the process of replication starts by creating a pipeline of DataNodes. In our case, We have chosen aplication level of 3 and hence there are three dataNode in pipeline.

7: The data streamer Cores packet into first DataNode into the pipeline.

8 : Every data node in a pipeline stored packet received by it and forward the same to the second DataNode in a pipeline.

9: another queue ACK queue is maintained by DFSOutputStream to store packet which are waiting for acknowledgement from DataNodes.

10: Sending Acknowledge packet : Once acknowledged for a packet in the queue is received from all data nodes in the pipeline, It is removed from the ack Queue(9). In the event of any data node failure packets from this queue are used to reinitiate the operation.

11. After a client is done with the writing data, It calls close method. Result in to flashing remaining data packets to the pipeline followed by waiting of acknowledgement.

12. Once a Final acknowledge is received NameNode is connected to tell that the file write operation is complete.

READ OPERATION in HDFS:

Data read request is served by HDFS NameNode and DataNode. Lets call the reader as a Client.

1. A client initiate read request by calling open method of file system object. It is an object type of distributed file system

2. This object connects to NameNode using RPC and gets Metadata information. Such as location of the blocks of the file.

3. In response to this metadata request address of the data nodes having a copy of that block is returned back.

4. Once address of the DataNodes are received an object of type FSOutput stream is returned to the client. FSDataInputStream contain DFSInputStream which takes care of interaction with DataNode and NameNode. In this figure, a client invokes read method which causes DFS input stream to establish a connection with the first dataNode with the first block of a file.

5. Data is read in the form of stream wherein client invokes read method repeatedly. This process of read operation continuous till it reaches the end of the block.

6. Once a End of the block is reached. DFS input stream close the connection and moves on to locate the next DataNode for the next block.

7. Once a client has done with the reading, it calls close method.

............ GPFS ................

There are three main type of nodes
1. File system node: coordinate administrative tasks
2. Manager node : it exist one per file system. Different manager nodes are global lock manager, local lock manager, allocation manager etc.
3. Storage node : it implement shared access to files, coordinate with manager node during recovery and facilitated both file data and meta data both to be striped across multiple storage nodes.

Some other auxiliary nodes include..
1. Meta node - a node is dynamically elected as meta node for centralized management for file meta data. Election of meta node is facilitated by token server.
2. Token server : a token server tracks all tokens granted to all nodes in the cluster.
A token granting algorithm is used to reduce cost of token management.

What is GPFS used for...
IBM GPFS is a file system used to distribute and manage data across multiple server and is implemented in many high performance computing and large scale storage environments. Gpfs is among the leading file system for high performance computing application.

How does parallel file system works...
A parallel file system breaks up a data set and distributes or strives, the blocks to multiple storage drives. Which can be located in the local or remote servers. User do not need to know the physical location of the data blocks to retrieve a file. The system uses a global namespace to facilitated data access.

What is gpfs file system linux?
The gpfs is a high performance clustered file system developed by IBM. It allows to configure a high available file system allowing concurrent access from a cluster of nodes. Cluster nodes can be server using AIX, LINUX OPERATING SYSTEM.

How general parallel file system gpfs works in cluster systems.
The gpfs is a cluster file system this means that it provides concurrent access to a single file system or set of file system from multiple nodes. Multiple gpfs clusters can share data within a location or accross wide area network connection. The gpfs disk data structure support file system with upto 4096 disk of upto 1tb in size each for a total of 4petabytes per file system.

****************************************************************************************

Cloud Scheduling

Scheduling is at the heart of Distributed Compute -

> Paas model -> Workflow (job) scheduling.

> Iaas model -> Virtual Machines (VM) scheduling.

Scheduler decides which job/ VM should go on which machine .

An effective scheduler can

> Reduce operational cost

> Reduce queue waiting time

> Increase resource utilization

Traditional Cloud Scheduling

FCFS

Simple

Lower headache

Fit for small size of data

No starvation

Priority Queue Scheduling

Avoid starvation

Round Robin Scheduling

Time quantum is used.

Shortest Job First Scheduling

Basis of shortest execution time

Multi level feedback queue scheduling

Use multiple queue with RR & FCFS.

Multi level queue scheduling

Use multiple queues with different scheduling

Job Scheduling Framework in Clouds

Challenges

Allocating massive job requests

satisfying user QoS requirements.

Maintaining average response time.

STEPS

* User portal - manages job requests.

* Job scheduler - routing decisions & selects VM instance.

* Management module-

> VM monitor

> Job Monitor - keeps track of jobs

> Job progiling - identify job types

> History repository - stores the records of job

Fault tolerant Scheduling in Cloud

* Fault tolerance technique

> Enhances Reliability and Availability

> Intorduce redundancy

> incurs extra overhead

Steps:

* Global Scheduler

> Analyzes information

> makes decisions

> Sends the primary/ backup copies of the task to different VM.

* Local scheduler

> Rearranging the order of the local queue

* Resource Manager

> Decides how VMs should be added or migrated

Inter-Cloud Meta-Scheduling

> Multiple autonomous clouds.

> Functions under a single federated management entity.

> The algorithm estimates the queue length of neighboring proessors.

> Reschedules the loads based on estimates

> The method aims to increase the possibility to gain load balancing.

> Facilitates scalable resource provisioning.

> ICMS is based on a novel message exchange mechanism

> Offers improved flexibility, robustness and decentralization.

Work flow Scheduling

* Jobs arranged as workflows.

* Usually defined as DAG

> Nodes represents tasks

> edges represents flow

* Job completion time depends on

> DAG desugn

> Scope of parallelism

> Wait time in queue

VM Scheduling

* Too simple, single priority

* No gaurantee of optimized resource utilization

* Dynamic VM scheduling & Migration

* Does not take hypervisor into loop

* Physical partitioning instead of logical partitioning.

Utility Driven Scheduling

New scheduling paradigm

* Goals

> Optimized resource utilization

> Relaxed Qos in over committed Scenarios

* Approach

> Partial Utility function

> Continuous resource monitoring & feedback

Condor

WHAT IS CONDOR?

* Open source project out of the University of Wisconsin-Madison.

* Distributed computing research project in computer sciences, est. 1985

* Too many - a batch system managing millions of machines worldwide, running many more jobs - for individuals, enterprises, govetnments and research organizations

* Under active development by multiple organizations

* Maintaining an active user community

* Multi platform code base - RHEL, HPUX, AIX, SLES, YDL, Solaris, Debian, FreeBSD, OS X, Windows - X86_64, cell, ultraSOARC

* Distributed by multiple organizations, e.g. UW and Red Hat in dedora and RHEL (MRG).

* A fundamental building block for creating clouds.

Cloud scheduling Implementation

Conclusions

* Scheduling and execution improve service quality of the clouds.

* Creates VMs and decrease the failure rate of task scheduling

* Increase in resource utilization

**************************************************************************************************