Cloud Object Storage | Store & Retrieve Data …...Created Date 1/24/2017 4:06:40 PM
(Deemed to be University), · 2018-10-04 · 20 •Network-attached storage (NAS) •Dedicated file...
Transcript of (Deemed to be University), · 2018-10-04 · 20 •Network-attached storage (NAS) •Dedicated file...
Assistant Professor, Deptt. of CSE, Jamia Hamdard
(Deemed to be University), New Delhi, India.
http://www.jamiahamdard.edu
https://[email protected]
1. Types of Data
2. Data Sources
3. Data Collection
4. API
5. Data Storage
6. Data Storage Management
7. Storage Security
2
• Unstructured
• Structured
• Semi Structured
3
4
Unstructured
Structured
5
Semi structured
6
• JSON(JavaScript Object Notation)
• BibTex
• .csv
• tab-delimited text
• XML
• etc.
Structured
7
• Numerical
• Categorical
Numerical
8
•Continuous
•Discrete
Numerical
9
• Continuous
• Interval Data
• Ratio Data
Categorical
10
• Nominal Data
• Ordinal Data
11
• Twitter and Facebook
• Blogs and comments
• Instagram, Flickr, Picasa, etc.
• YouTube
• Internet searches
• Mobile data content (text messages)
• User-generated maps
• etc.
12
Social Networks
• Commercial transactions
• Banking/stock records
• E-commerce
• Credit cards
• Medical records
• etc.
13
Traditional Business Systems
• Sensors: traffic, weather, mobile phone location, etc.
• Security, surveillance videos, and images
• Satellite images
• Data from computer systems (logs, web logs, etc.)
• etc.
14
Internet of Things
Variables of interest
15
• Which features should be included?
• All Vs Targeted
• How can we obtain ground truth for the target variable?
• Manually
• Crowdsourcing
• Controlled Experiments
• How much data is required?
• Is the data set representative enough?
16
• Web API
• Twitter REST API,
• Facebook Graph API,
• Amazon S3 REST API
• OS based API
• Cocoa,
• Carbon,
• WinAPI
• Database API
• Drupal Database API,
• Django API
• Hardware
• Google PowerMeter
• CubeSensore
17
18
19
• Redundant Array of IndependentDisks (RAID)
• Way of storing the same data in differentplaces on multiple hard disks.
• Direct-attached storage (DAS)
• Connects directly to a server (host) or agroup of servers in a cluster.
• Storage area network (SAN)
• A separate network of storage devices forblock-level communication between serversand storage; not accessible through thelocal area network (LAN) by other devices.
Evolution of Storage Technology and Architecture
20
• Network-attached storage (NAS)
• Dedicated file storage that enables multipleusers and heterogeneous client devices toretrieve data from centralized disk capacity.
• Internet Protocol SAN (IP-SAN)
• One of the latest evolutions in storagearchitecture.
• Convergence of technologies used in SANand NAS.
• Provides block-level communication acrossa local or wide area network (LAN or WAN),resulting in greater consolidation andavailability of data.
Evolution of Storage Technology and Architecture
21
Generic cloud storage architecture
22
Characteristic Description
Manageability The ability to manage a system with minimal resources
Access method Protocol through which cloud storage is exposed
Performance Performance as measured by bandwidth and latency
Multi-tenancy Support for multiple users (or tenants)
Scalability Ability to scale to meet higher demands or load in a graceful manner
Data availability Measure of a system's uptime
ControlAbility to control a system—in particular, to configure for cost, performance, or other
characteristics
Storage efficiency Measure of how efficiently the raw storage is used
Cost Measure of the cost of the storage (commonly in dollars per gigabyte)
Cloud storage characteristics
23
• Know your data
• Don't neglect unstructured data
• Understand your compliance needs
• Establish a data retention policy
• Look for a solution that fits your data
• Use a tiered storage approach
• Know your clouds
• Carefully choose storage providers
• Make sure your data is secure
• Leverage technologies that use deduplication,snapshotting and cloning
• Have a disaster recovery plan
Steps to be take to choose the right data storage solution(s)
24
• Monitoring
• Continuous
• Security, performance, accessibility, andcapacity
• Reporting
• Periodically
• resource performance, capacity, andutilization
• Provisioning
• Hardware, software, and other resourcesneeded to run a data center
• Include capacity and resource planning
Key management activities
25
• Capacity planning
• Ensures that the user’s and the application’sfuture needs will be addressed in the mostcost-effective and controlled manner.
• Resource planning
• is the process of evaluating and identifyingrequired resources, such as personnel, thefacility (site), and the technology.
• Ensures that adequate resources areavailable to meet user and applicationrequirements.
Key management activities
26
• Exploding digital universe
• Increasing dependency on information
• Changing value of information
Key Challenges
27
• The information lifecycle is the“change in the value of information”over time.
• ILM is a proactive strategy thatenables an IT organization toeffectively manage the datathroughout its lifecycle, based onpredefined business policies.
• This allows an IT organization tooptimize the storage infrastructure formaximum return on investment.
Information Lifecycle Management (ILM)
28
• Business-centric
• Centrally managed
• Policy-based
• Heterogeneous
• Optimized
ILM strategy characteristics
29
• Classifying data and applications toenable differentiated treatment ofinformation.
• Implementing policies by usinginformation management tools.
• Managing the environment by usingintegrated tools.
• Organizing storage resources in tiers.
ILM implementation activities
30
• Accountability
• Accounting for all the events and operations
• Confidentiality
• Provides the required secrecy of information
• ensures that only authorized users haveaccess to data.
• Integrity
• Ensures that the information is unaltered.
• Availability
• Ensures that authorized users have reliableand timely access to data.
Primary services of security
31
• Controlling User Access to Data
• Protecting the Storage Infrastructure
• Data Encryption
• Securing Backup, Recovery, and Archive
• Firewall
• Access Control Switch
• Know the Vulnerability
Primary services of security
32
33
THANK YOU!