Azure Files: serving files in the cloud - Microsoft€¦ · Azure Table and Blob Store • Azure...
Transcript of Azure Files: serving files in the cloud - Microsoft€¦ · Azure Table and Blob Store • Azure...
Azure Files: serving files in the cloudPavel Shilovskiy
Architecture overview
Azure Files & Linux
Lessons learned
Content
54 total Azure regions: 46 generally available + 8 coming soon
Azure scale
960CPUs
Compute performance
Largest in public cloud
24TB RAM
Memory
Largest in public cloud
160KIOPs
Remote Storage(single disk)
Fastest in public cloud
3.7MIOPs
Localstorage
Fastest in public cloud
30Gbps Ethernet
100Gbps InfiniBand
VM-VM Networking
Fastest in public cloud
100Gbps Connectivity
Hybrid Networking
Fastest in public cloud
100KIOPs
File storage
Fastest in public cloud
Azure in numbers
Geography
•
•
•
Region
•
•
Availability Zones
•
•
•
•
•
Geography
Region 2Region 1
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Data residency boundary
Azure physical infrastructure
Storage Stamp
REST FILE
Distributed FS Layer
Intra-stamp replication
Load Balancer
Partition Layer
Front-Ends
Storage Stamp
Load Balancer
Partition Layer
Front-Ends
Distributed FS Layer
Intra-stamp replication
Geo replication
Azure Storage Architecture
Azure Files Service
Azure Table and Blob Store
• Azure Files uses the underlying Azure Tables infrastructure to store metadata
associated with files/dirs, open handles to them and other state like byte range locks,
leases, etc
• An Azure Table is a simple NoSQL collection of rows with a common schema and
sorted / searchable by a subset of ordered ‘key’ columns
• Two types of keys: Partition (coarse) or Row (fine)
Tables
• Internal transaction APIs allow multiple rows from multiple tables to be modified with ACID
semantics in a single transaction if they have the same partition key
• One share is served by multiple partitions and distributed transactions are used to persist some
SMB and REST protocol commands
• File share’s metadata is stored as a group of tables, the most notable of which are:
Tables
File A table of all files and directories. It is a hybrid type, keyed
by either ParentId & FileName, or FileId (64bit like NTFS)
Page The allocated file ranges and their backing page blobs
Handle All open handles to files and directories
Lease All currently active SMB leases
Change Notify Registered change notifies
Byte Range Locks All currently active byte range locks
• Azure Tables allows associating a set of tables as a group
• There are two types of file rows: Namespace and Data(technically a single merged row type, but showing them separate here for clarity)
Table Partitioning
Account Name Share Name ParentId FileName ShareVersion Other Columns
Namespace Rows
Partition Key Row Key
Account Name Share Name FileId ShareVersion Other Columns
Data Rows: old 5TB shares
Partition Key Row Key
• Azure Tables allows associating a set of tables as a group
• There are two types of file rows: Namespace and Data(technically a single merged row type, but showing them separate here for clarity)
Table Partitioning
Account Name Share Name ParentId FileName ShareVersion Other Columns
Namespace Rows
Partition Key Row Key
Account Name Share Name FileId ShareVersion Other Columns
Data Rows: new large (up to 100TB) shares
Partition Key Row Key
Mapping Azure Files to hardware• Front End nodes receive and manage connections from SMB and REST clients
• Any FE node can service any share
Front EndNode 2
Front EndNode 0
Front EndNode 1
Front EndNode N
. . .
• One share is partitioned by FileId with partitions served by a collection of BE Table nodes
• Partitions are being split and merged automatically to maintain uniform load
• Underlying data is managed by Extent nodes (EN) – the Distributed FS layer
. . . EN 0Back End
Table Node 0 EN N. . .Back EndTable Node N
Data Flow Topology on a Single Share
FE 0 FE 1 FE 2 FE 3 FE NFE 4 . . . .
EN 0 EN 1 EN 2 EN 3 EN NEN 4 . . . .
FE = Front End Node(client connection)
BE-N = Back End Namespace Node(namespace metadata)
BE-B = Back End Blob Node(file metadata)
EN = Extent Node(stores actual file data) BE-N
Namespace Metadata
BE-B0 BE-B1 BE-BN. . . .
Data Flow Topology on a Single Share
FE 0 FE 1 FE 2 FE 3 FE NFE 4 . . . .
EN 0 EN 1 EN 2 EN 3 EN NEN 4 . . . .
FE = Front End Node(client connection)
BE-N = Back End Namespace Node(namespace metadata)
BE-B = Back End Blob Node(file metadata)
EN = Extent Node(stores actual file data) BE-N
Namespace Metadata
BE-B0 BE-B1 BE-BN. . . .
File Metadata* & Write Data
Data Flow Topology on a Single Share
FE 0 FE 1 FE 2 FE 3 FE NFE 4 . . . .
EN 0 EN 1 EN 2 EN 3 EN NEN 4 . . . .
FE = Front End Node(client connection)
BE-N = Back End Namespace Node(namespace metadata)
BE-B = Back End Blob Node(file metadata)
EN = Extent Node(stores actual file data) BE-N
Namespace Metadata
BE-B0 BE-B1 BE-BN. . . .
File Metadata* & Write Data
File Read Data
Multi-table transactions
• Namespace oriented requests make the heaviest use of transactions across multiple tables
• Open/Create/Close make modifications to at least two tables but all within one commit
• Even reads/writes look at byte range locks and potentially break leases
• The built-in transaction support makes this relatively painless… before large file shares
Distributed transactions
• Writable metadata transactions are distributed and affect two different partitions
• Failures are common in the cloud environment: partitions being moved, network problems, etc
• Fault tolerant mechanism is needed to ensure consistency on failures
• No built-in distributed transactions in the Table layer – invented
• Open makes 2 commits: 1 for Namespace and 1 for corresponding Data partition – slow!
• Create/Close/SetInfo make 3 commits total on 2 partitions – even slower!
• 3-commit transactions are almost always rolled forward… unless rolled backward
FE NAMESPACE DATA
Distributed transactions – 3-way commit
Retry detected
COMMIT
COMMIT
COMMIT
FE NAMESPACE DATA
Distributed transactions – 3-way commit
Retry detected
Retry detected
COMMIT
COMMIT
COMMIT COMMIT
COMMIT
COMMIT
Azure Files & Linux
Linux Distribution Kernel VersionSMB3
EncryptionSnapshots Compounding
Directory
Leases**
Ubuntu Server 18.04 LTS 4.18.0.1020.19 Yes No No Yes
Ubuntu Server 19.04 5.0.0.19.20 Yes Yes Yes Yes
Debian 9 4.19.0-0.bpo.5*** Yes Yes Yes* Yes
CentOS 7.6 3.10.0-957 Yes No No No
Red Hat Enterprise Linux 8 4.18.0-80 Yes No No Yes
SUSE Linux Enterprise Server 15 4.12.14-150.22.1 Yes No No No
* Basic compounding was introduced in the Linux kernel 4.19, more advance was added in 4.20** Directory leases are implemented for a root file handle only*** Debian 9 kernel version from backports
Azure Files & Linux
• Discovered and fixed many bugs related to network problems, retries and reconnects (kernel 5.0+)
• IP address may change – need to resolve DNS every time on reconnects (backported to stable)
• SMB3 compounding helped a lot to reduce round trips (kernel 4.19+)
• IO size was increased from 1MB to 4MB (kernel 4.20+), block size – from 128KB to 1MB (kernel 5.1+)
• Easy and straightforward snapshot support: listing and mounting (kernel 4.19+) – see the next slide
List snapshots on Linux
Mount snapshots on Linux
Observations and Lessons Learned
• We now have some experience running the world’s largest SMB server
• Metadata operations are unfortunately common and expensive for us
• Even compared to srv2.sys on-prem, Azure Files Service pays a high price for its durability
• Repeatable Open/Close operations and Write-Only handles are particularly bad
• Apps and clients leak handles which brings the implication when opening or deleting files
• Variability in performance due to the distributed cloud environment
• Some applications may not be suitable for “lift and shift”, especially if they have never even been run
against an on-prem file server
• Open, Close and I/O operations are all that matters for total aggregate End-to-End request time
Observations and Lessons Learned
Resources
• Getting started blog with many useful links: http://blogs.msdn.com/b/windowsazurestorage/archive/2014/05/12/introducing-microsoft-azure-file-
service.aspx
• Generally Availability announcement:https://azure.microsoft.com/en-us/blog/azure-file-storage-now-generally-available
• NTFS features currently not supported:https://msdn.microsoft.com/en-us/library/azure/dn744326.aspx
• Naming restrictions for REST compatibility:https://msdn.microsoft.com/library/azure/dn167011.aspx
• Large File Share announcement:https://azure.microsoft.com/en-us/blog/a-new-era-for-azure-files-bigger-faster-better
• Feedback:[email protected]
Questions?