Tuning an SMB server implementation - SNIA › sites › default › files › SDC15... ·...

Post on 27-Jun-2020

3 views 0 download

Transcript of Tuning an SMB server implementation - SNIA › sites › default › files › SDC15... ·...

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Tuning an SMB server implementation

Mark Rabinovich Visuality Systems Ltd.

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Who are we?

Visuality Systems Ltd. provides SMB solutions from 1998. NQE (E stands for Embedded) is an implementation of SMB

client/server for the embedded world: Consumer devices: printers, MFP, routers, smart devices, etc. Industrial Automation, Medical, Aerospace and Defense Anything else that is neither PC, MAC or Samba.

NQ Storage is an SMB server implementation for Storage platforms.

This presentation is about NQ Storage

2

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Presentation Plan

SMB Storage architecture highlights Performance factors Performance figures Tuning a server

3

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

4

Architecture

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Architecture in general

5

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Architecture explained

Transport: Responsible for receiving SMB requests and responding Delegates requests to SMB Engine TCP (socket) transport SMBDirect (SMBD) transport over RDMA More platform-dependent transports can be plugged in

SMB Engine Is responsible for parsing SMB requests and composing responses Is responsible for internal SMB semantics (e.g. - IPC$)

VFS Responsible for file operations Posix VFS implements basic VFS on top of the local OS An external VFS can be plugged-in

6

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

SMB request flow

7

Transport module handles concurrent requests

VFS module handles concurrent calls

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

SMB flow explained

This flow exposes the “async” case which is of a particular interest for this presentation.

Transport receives a request and delegates it to a transport thread. SMB Engine parses the request and calls VFS. VFS may decide to delegate the call to a VFS thread. When finished, VFS invokes an SMB Engine’s callback which send

the response. This call may happen in the context of a VFS thread.

8

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

9

Performance Factors

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Platform factors

CPU. Frequency Number of cores Hyper-threading – effectively doubles the number of cores

Network Throughput (1Gb/s, 10Gbs/s, Infiniband, RoCE, etc.) NIC offloading (different techniques) RDMA offload

Drive HDD SSD

10

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Server parameters

We assume that each transport has a thread pool. Serve concurrent requests

VFS components may use separate thread pools for: Create Read Write Time-consuming IOCTLs (set file info, trim, etc.) Query Directory Other meta-operations

Credit window Other parameters

11

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Credits

The credit window should not be a factor. We can easily have enough buffers of 1MB. How many buffers will be enough?

Satisfactory credit window is:

Max credits = <num of effective cores> + <NIC offload factor> + <drive speed factor> <overhead>

“NIC offload factor” – how many SMBs can an adapter receive and store in its

buffers. For simplicity we count receiving and do not consider transmitting. “Drive speed factor” – how many pending threads do we need to load the CPU while

drive performs an I/O. . <drive speed factor> = <memory access speed> / <drive speed>

12

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Credits (cont.)

Memory access speed (typical for DDR3) – 5000MB/s Drive speed (typical):

HDD 115MB/s SSD 400 MB/s

Example: 6 + 2 + .5000 / 115 + 5 = 56

Is the above formula accurate? NIC offload factor depends on hardware and it is not always easy to

comprehend. Drive speed factor varies

If we could know the number of threads, credit window could be easily and accurately calculated

13

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Credits (cont.)

An alternative method - uses mostly software parameters

Max credits = <transport threads> + <max VFS threads> + <NIC offload factor> + <overhead>

Example: 20 + 20 + 2 + 3 = 45. We still depend on the NIC offload factor

14

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Thread pool size

The credit windows question may be transited to thread pool size(s). How big? Big enough to utilize all cores of the CPU Not too big - bigger numbers lead to saturation.

Which numbers are optimal? We will try to find tendencies Trying different scenarios Trying various parameters The server platform remains the same

15

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Other parameters

Buffer pre-allocation SMB Request buffers SMB Response buffers RPC buffers

The optimal buffer pre-allocation may be calculated, while the optimal number of threads is not that easy to calculate.

16

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

17

Performance Figures

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Test platform

HP ProLiant ML350P Generation 9 Intel® Xeon® 1.90GHz/6-cores 1000GB HP HDD over SATA HP Ethernet 1Gb/s HP Ethernet 10Gb/s

18

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Performance … by server threads (cont.)

19

Case: File download

Legend: • Increasing Read threads leaving

Transport threads unchanged. • Increasing Transport threads leaving

Read threads unchanged. • Increasing Transport and Read

threads.

Testware: • SwiftTest, 20 users. • 100MB file • 64K packets

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

… by server threads (cont.)

20

Case: File upload

Legend: • Increasing Write threads leaving

Transport threads unchanged. • Increasing Transport threads leaving

Write threads unchanged. • Increasing Transport and Write

threads.

Testware: • SwiftTest, 20 users. • 100MB file • 64K packets

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

… by server threads (cont.)

21

Case: File upload/download mix

Legend: • Increasing Write threads leaving

Transport and Read threads unchanged.

• Increasing Read threads leaving Write and Transport threads unchanged.

• Increasing Transport, Read and Write threads.

Testware: • SwiftTest, 20 users. • 100MB file • 64K packets

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Adding too many threads does not help – “saturation”. Increasing transport threads alone does not help. Apparently,

backend becomes the server’s bottleneck. Increasing VFS threads helps for read and write scenarios. We still

need transport threads for the mixed case. Reading is more sensible to multiplexing than writing.

22

… by server threads (cont.)

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

… by CPU cores

23

Case: File upload by CPU cores

Legend: • All cores. • One core.

Testware: • SwiftTest, 20 users. • 100MB file • 64K packets

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

… by CPU cores (cont.)

24

Case: SQL Server traffic simulation

Legend: 1. Random file access with single core. 2. Random file access with six cores. 3. Sequential file access with single

core. 4. Sequential file access with six cores

Both Transport, Read and Write threads are increasing.

Testware: • SQLIO. • 60 sec run • 4K packets • 8 outstanding requests

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

… by CPU cores (cont.)

25

Case: Low load file uploading

1. File uploading over multiple connections with a single core.

2. File uploading over multiple connections with six cores.

Both Transport, and Write threads are increasing.

Testware: • SwiftTest, 20 users. • 100MB file • 64K packets

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

… by CPU cores (cont.)

26

Case: High load file uploading

1. File uploading over multiple connections with a single core.

2. File uploading over multiple connections with six cores.

Both Transport, and Write threads are increasing.

Testware: • SwiftTest, 1000 users. • 100MB file • 64K packets

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

More cores utilize more threads. One core can also (but less) benefit from threading. This apparently

happens because some of them are locked on I/O. Server is more sensible to the number of threads when it comes to

random access scenarios. Server is more sensible to the number of threads when it comes to

smaller chunks. On a higher load a the number of cores becomes a more essential

factor.

27

… by CPU cores (cont.)

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

28

Tuning a Server

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Platforms

Typical server platforms: SOHO NAS: ARM 1.2GHz Dual Core, HDD Mid-level storage: Atom® 2.13GHz Quad Core, HDD Top-end storage: Intel® Xeon® 3.4GHz Quad Core, SSD

Apparently, the ideal parameter numbers will be different for each of these categories. Even in the same category (e.g., - Top-end storage) the numbers may differ between two different platforms.

We need a methodology of choosing ideal parameters

29

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

The challenge

Find out the optimal parameters. Do it fast or, at least, do it automatically. Do it reliably Solution example – Tune-a-Server

30

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Tune-a-Server

Is a part of NQ Server Management Enumerates each single combination of the server parameters. Runs a set of tests for each combination

Test result is the time it takes to run the test. The less the better. Each test have a weight.

Calculates the result for each parameter combination by applying test weights to test results.

31

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Tune-a-Server (cont.)

32

Choose Tune-a-Server from NQ Management Console

This will start a Wizard

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Tune-a-Server (cont.)

33

Select the parameters of the interest.

For each of them choose the range

Other parameters will keep their default value.

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Tune-a-Server (cont.)

34

Select scripts to run. Script == test

Choose script weights. Weight means script

importance.

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Tune-a-Server (cont.)

35

Script explained: A script runs a test. A script is expected to emulate a use case scenario A script can be any program whose results evaluate in the time of run. The

less the time, the better the result. Each of the experiments from this presentation may be a script. We need more script ideas – suggestions welcome.

Weight explain: For instance: a tool like SQLIO has bigger weight than file upload/download

since it emulates more practical case(s). Writing is more sensible to threading than reading (see performance results

above). We can consider giving more weight for the upload script.

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Tune-a-Server (cont.)

36

Run scripts. This may take long – we usually

run overnight.

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

Tune-a-Server (cont.)

37

When done, the results may be exported to Excel and analyzed.

2015 Storage Developer Conference. © Visuality Systems Ltd. All Rights Reserved.

38

markr@visualitynq.com

Thank you Your feedback is very important for us.