☁️Cloud Computing Architecture Unit 3 – Cloud Storage and Data Management

Cloud storage revolutionizes data management, offering remote access, scalability, and cost-effectiveness. It enables users to store, share, and collaborate on data from anywhere, while providing durability and high availability through distributed storage across multiple servers and data centers. Various types of cloud storage cater to different needs: object storage for unstructured data, block storage for high-performance applications, file storage for shared access, and archive storage for long-term retention. These options, combined with robust data management practices, empower organizations to optimize their storage strategies.

Cloud Storage Basics

  • Cloud storage involves storing data on remote servers accessed via the internet
  • Enables users to store, access, and share data from anywhere with an internet connection
  • Offers scalability, allowing users to easily increase or decrease storage capacity as needed
  • Provides data durability through redundancy and distributed storage across multiple servers and data centers
  • Offers high availability, ensuring data can be accessed even if individual components fail
  • Supports various data types, including structured, unstructured, and semi-structured data
  • Enables collaboration by allowing multiple users to access and modify shared data simultaneously
  • Offers cost-effectiveness by eliminating the need for upfront hardware investments and maintenance costs

Types of Cloud Storage

  • Object storage stores data as objects, each with a unique identifier, metadata, and the data itself
    • Ideal for unstructured data like images, videos, and documents
    • Offers scalability and durability, making it suitable for large-scale data storage
  • Block storage divides data into fixed-size blocks, each with a unique address
    • Provides low-latency access and high performance
    • Suitable for applications requiring frequent read/write operations, such as databases and virtual machines
  • File storage organizes data in a hierarchical structure of directories and files
    • Offers shared access and familiar file system interfaces (NFS, SMB)
    • Suitable for use cases requiring shared file access, such as content management systems and collaborative workflows
  • Archive storage is designed for long-term retention of infrequently accessed data
    • Offers lower costs compared to other storage types
    • Suitable for data that needs to be retained for compliance, legal, or historical purposes (backups, logs)

Data Management Fundamentals

  • Data lifecycle management involves managing data from creation to deletion
    • Includes stages such as data ingestion, processing, storage, analysis, and archival
    • Helps optimize storage costs and ensure data compliance
  • Data governance establishes policies, procedures, and responsibilities for managing data
    • Ensures data quality, security, privacy, and regulatory compliance
    • Involves defining data ownership, access controls, and data retention policies
  • Data cataloging involves creating and maintaining a comprehensive inventory of an organization's data assets
    • Enables data discovery, understanding, and governance
    • Includes metadata management, data lineage, and data classification
  • Data integration combines data from various sources to provide a unified view
    • Enables data consistency, accuracy, and completeness
    • Involves data extraction, transformation, and loading (ETL) processes
  • Data backup and disaster recovery ensure data protection and business continuity
    • Regular backups help recover data in case of accidental deletion, corruption, or system failures
    • Disaster recovery plans outline procedures to restore data and systems in case of major disruptions (natural disasters, cyber-attacks)

Cloud Storage Services and Providers

  • Amazon Web Services (AWS) offers various storage services, including Amazon S3 for object storage and Amazon EBS for block storage
    • Provides a wide range of storage classes with different performance and cost characteristics
    • Offers data management features like versioning, lifecycle policies, and cross-region replication
  • Microsoft Azure provides Azure Blob Storage for object storage and Azure Disk Storage for block storage
    • Offers data redundancy options, such as locally-redundant storage (LRS) and geo-redundant storage (GRS)
    • Provides data management capabilities, including access control, encryption, and data lifecycle management
  • Google Cloud Platform (GCP) offers Google Cloud Storage for object storage and Persistent Disks for block storage
    • Provides various storage classes optimized for different access patterns and data durability requirements
    • Offers data management features, such as object versioning, retention policies, and data transfer services
  • Other notable cloud storage providers include IBM Cloud, Oracle Cloud, and Alibaba Cloud
    • Each provider offers a range of storage services and data management capabilities tailored to specific use cases and industries

Data Security and Compliance

  • Encryption protects data from unauthorized access by converting it into an unreadable format
    • Data can be encrypted at rest (stored on disk) and in transit (transmitted over the network)
    • Providers offer server-side encryption, where data is encrypted before being stored, and client-side encryption, where data is encrypted before being sent to the cloud
  • Access control ensures that only authorized users and applications can access data
    • Involves defining user roles, permissions, and access policies
    • Providers offer identity and access management (IAM) services to manage user identities, authentication, and authorization
  • Data sovereignty refers to the legal and regulatory requirements governing data storage and processing based on its geographic location
    • Ensures compliance with data protection laws, such as GDPR (European Union) and HIPAA (United States)
    • Providers offer data residency options, allowing customers to choose the geographic regions where their data is stored and processed
  • Auditing and monitoring help detect and investigate security incidents and ensure compliance
    • Providers offer logging and monitoring services that track user activities, data access patterns, and system events
    • Enables organizations to detect anomalies, respond to security threats, and generate compliance reports

Performance and Scalability

  • Throughput measures the amount of data that can be transferred to or from the storage system over a given period
    • Providers offer different storage classes with varying throughput capabilities to meet specific application requirements
    • Techniques like data partitioning and parallel access can improve throughput
  • Latency refers to the time it takes to read or write data from the storage system
    • Low latency is critical for applications requiring real-time data access, such as online transaction processing (OLTP) systems
    • Providers offer storage options optimized for low latency, such as solid-state drives (SSDs) and in-memory caches
  • Elasticity allows storage resources to scale up or down automatically based on demand
    • Providers offer auto-scaling capabilities that adjust storage capacity based on predefined rules or metrics
    • Enables organizations to handle sudden spikes in storage demand without manual intervention
  • Caching improves performance by storing frequently accessed data in a faster storage layer
    • Providers offer caching services, such as Amazon ElastiCache and Azure Cache, to accelerate data access
    • Caching can reduce the load on primary storage systems and improve application response times

Cost Optimization Strategies

  • Storage tiering involves placing data in different storage classes based on access frequency and retention requirements
    • Frequently accessed data is stored in high-performance storage tiers (SSD), while infrequently accessed data is moved to lower-cost tiers (HDD, archive)
    • Providers offer automated tiering capabilities that move data between tiers based on predefined policies
  • Data lifecycle management helps reduce storage costs by automatically transitioning data to lower-cost storage tiers or deleting it when no longer needed
    • Providers offer lifecycle policies that can be configured to move data between storage classes or delete objects after a specified period
    • Enables organizations to optimize storage costs while ensuring data is retained only as long as necessary
  • Data compression reduces the amount of storage space required by removing redundant or unnecessary data
    • Providers offer various compression algorithms, such as Gzip and Snappy, to compress data before storing it
    • Compression can help reduce storage costs, especially for large datasets with high redundancy
  • Data deduplication eliminates duplicate copies of data, storing only unique data blocks
    • Providers offer deduplication techniques, such as block-level deduplication, to reduce storage consumption
    • Deduplication is particularly effective for backup and archival scenarios, where multiple copies of similar data may exist

Practical Applications and Use Cases

  • Backup and disaster recovery: Cloud storage enables organizations to store backups and replicate data across multiple regions for disaster recovery purposes
    • Providers offer backup services, such as Amazon EBS Snapshots and Azure Backup, to automate and manage backup processes
    • Cloud-based disaster recovery solutions, like AWS CloudEndure and Azure Site Recovery, help organizations quickly recover from disasters by failing over to cloud-based infrastructure
  • Big data analytics: Cloud storage provides a scalable and cost-effective platform for storing and analyzing large volumes of data
    • Providers offer big data services, such as Amazon EMR and Google BigQuery, that integrate with cloud storage to process and analyze data at scale
    • Cloud storage enables data lakes, where raw data can be stored in its native format for later processing and analysis
  • Content delivery: Cloud storage can be used to store and deliver static content, such as images, videos, and web assets, to end-users
    • Providers offer content delivery networks (CDNs), like Amazon CloudFront and Azure CDN, that cache content in edge locations closer to users for faster delivery
    • Cloud storage provides a scalable and reliable backend for serving content to global audiences
  • Archiving and long-term retention: Cloud storage is ideal for archiving data that needs to be retained for extended periods but is rarely accessed
    • Providers offer archive storage classes, such as Amazon S3 Glacier and Azure Archive Storage, that provide low-cost, durable storage for long-term data retention
    • Archival storage helps organizations meet regulatory compliance requirements and preserve historical data for future reference


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.