Analyze Application Impact - Storage Subsystemv2

8/14/2019 Analyze Application Impact - Storage Subsystemv2

1/29

1

Analyzing Oracles impact using simple userlandtools Storage subsystem (Specific toDatawarehousing)

Krishna Manoharan

[email protected]


2/29

2

IntroductionEvery application impacts the host Operatingsystem and connected sub-systems in a uniqueway.

In order to profile an application and understand itsimpact on the environment, there are a number ofuserland tools provided within the OS.

Many of these tools do not require super-userprivileges thus enabling ordinary users such as

dbas or application developers, the ability to seeand gauge the impact of the application on thesystem.


3/29

3

Subsystems in an environment

One needs to analyze the impact ofan application on all the majorsubsystems in an environment.

CPUMemory

StorageNetwork


4/29

4

Profiling an application

To profile an application, oneneeds to knowWhat to observe (Metrics)How to observe (Tools togather these metrics)

And finally how to interpret theresults (correlate, compare anddraw conclusions)


5/29

5

Storage SubsystemThe storage subsystem normally consists of thefollowing:

Application

Created for applications.

Can have different block sizes.Filesystem

Created from luns. Can be of different layout Stripe, Concat, mirror etcVolume

Cache is the memory (staging area) on the Array. Luns are carved from Raid Groups on Array. Luns are also of different layouts mirror, concat, stripe etc

Luns throughCache

Array Controller

Disks

H o s t

A r r a

y

Array Management

Actual Disks on the Array

HBA

Oracle


6/29

6

Storage Subsystem Contd.Disks refer to the actual hard drives which we are all familiar

with. Disks are of different capacities 72GB, 146GB, 300GB,different kinds FC, SATA, SAS and finally different speeds 7200 RPM, 10K RPM, 15K RPM.Cache refers to the memory on the array to which all writes arestaged. Cache also contains pre-fetch data. The controller isthe intelligence behind the Array.

Raid groups are created using the disks on the array. Luns arecarved out of the Raid Groups and assigned to the host. Lunscan be of any size, as can volumes and filesystems. Raidgroups can be of different layouts mirror, stripe, stripe-mirror,Raid 5 and Luns inherit the same layout. Luns are normallymultipathed (with 2 or more paths).Volumes are created from luns Volumes can be created asconcat, mirror, stripe-mirror, mirror-stripe, Raid5 etc.Filesystems have different block sizes For vxfs, the blocksizes are 1K to 8K.


7/297

Storage Subsystem Metrics.

IO requests starts with the application issuing a IOsystem call (read, write).Based on the current activity of the system, the requestmay be processed immediately or routed to a queue ofrequests (similar to a run queue on a CPU waitcolumn in iostat).It waits in the queue for a period of time until availableto be dispatched (Wait time wsvc_t column in iostat)It then executes on the disk taking time to complete(response time asvc_t column in iostat).

Corresponding to the above activities, there is alsothe size of the IO operation (math using iostatcounters), bandwidth (kr/s + kw/s columns in iostat)and number of IO operations (r/s + w/s in iostat).


8/298

Storage Subsystem Metrics (Contd.)

The common metrics used todescribe storage performance areWait Average transactions waiting to be serviced (Similarto run queue on a CPU)

Wait time Average time spent in the wait queue.Service Time The time in milliseconds which a lun spendson servicing a request.

IOPS Number of IO Operations/second

I/O sizes The average size of an IO operation in KB orMB.

Throughput The average bandwidth available in MB/sec.


9/299

Storage Subsystem Tools

The following tools are used to capturestorage statistics. Both run-time andhistorical data is very much essential.

Run Time dataiostat Gives statistics at a lun level (service time, IOPS, IOsizes, throughput)vxstat Gives statistics at a volume level (assuming we areusing Veritas Volume Manager). Statistics available are againservice time, IOPS and IO sizes.vxdmpadm Gives statistics at a lun level (service time, IOPS

and IO sizes)odmstat Gives statistics for oracle datafiles (if using VeritasODM).swat Sun Storedge Workload Analysis Tool Gives statisticsat a lun level (service time, IOPS, Throughput, IO sizes).Oracle v$views Historical and run time (Not at a lun level)


10/2910

Storage Subsystem Tools (Contd.)

Historical data capture toolsswat Sun Storedge Workload Analysis Tool Givesstatistics at a lun level (service time, IOPS, Throughput,IO sizes).

sar sar also captures disk stats

Oracle v$views Historical and run time ((Not at a lunlevel)


11/2911

Storage Subsystem Tools (Contd.)

Of the tools listed, only iostat, odmstat and swat can berun by a non-privileged user. Oracle v$views can beviewed by anyone with appropriate oracle privileges.

Normally luns assigned to a host are small sized andnumerous. So using tools such as iostat are verycumbersome.The most user friendly tool is swat and is ideal forcollecting and analyzing data over a long term. It collectsand graphs the data for easy analysis.

Using iostat with the extended options (iostat xnM) will give the most useful information. Thecolumns to look for are wait, asvc_t, r/s, w/s, Mr/s andMw/s.


12/2912

Storage Subsystem Data Collectionmkrishna@viveka:> iostat xnM 1

extended device statisticsr/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.2 0 0 c1t10d00.6 1.6 0.0 0.0 0.0 0.0 0.0 19.2 0 1 c0t8d00.0 0.1 0.0 0.0 0.0 0.0 0.0 11.2 0 0 c8t20d53

r/s (number of reads/second), w/s ( number of writes/second),Mr/s (MB reads/second), Mw/s (MB Writes/second) areindicators of the work load.wait This is the run-queue. This shows the pendingoperations waiting to be serviced. Normally, it should be 0.wsvc_t The wait time for the above statistic. Should be 0.avsc_t The average response time to a lun It can varyanywhere from 1ms to10ms. The average response time for avolume in a datawarehouse system should be 20-40ms. Duringheavy loads, volume response times can vary between 20-100ms.


13/2913

Storage Subsystem Data Collection (Contd.)

From an oracle perspective, the views which contain IO

data arev$filestat Specific to oracle datafilesv$sysstat At the instance levelv$segstat At the segment levelv$tempstat Temporary Tablespace file statsdba_hist_sysmetric_summary Data from the AWR snapshots

It is easy to write a sql which groups by mount point orfilename and reports

Number of IOPS

Response TimeThroughput

However Oracle does not report statistics at a Lun level.These need to come from the OS.


14/2914


v$filestat Specific to oracle datafiles only (no redo, temp files)It appears to be event driven and so should be accurate.Since it is cumulative, one needs to take a snapshot of v$filestatbefore and after a load.The relevant columns are

PHYRDS + PHYWRTS = IOPS to the fileAVGIOTIM Average response time of the file in 1/100 th of asecond. Divide by 10 to report in ms. These timings can varybetween 1 to 30 ms depending on the size of the file and thekind of activity. I would assume that 25 ms is about themaximum you should ever see.MAXIORTM Maximum time spent on a single read in 1/100 thof a second. Divide by 10 to report in ms. This shows theslowest read ever on the file. Anything greater than 30mswould raise alarms.MAXIOWTM - Maximum time spent on a single write in 1/100 thof a second. Divide by 10 to report in ms. This shows theslowest write even on the file. Anything greater than 30mswould raise alarms.

It is important to look at MAXIORTM and MAXIOWTM as theseshow the poorest performance for the datafile.


15/2915


v$sysstat Reports cumulative statistics at the instance

level. Again, in order to understand the impact of a load, asnapshot of v$sysstat needs to be taken before and afterthe load.

It appears to be event driven and so should be accurate.

physical read total IO requests + physical write total IOrequests = IOPS

physical read total bytes + physical write total bytes =Throughput


16/2916


v$tempstat Cumulative Temporary Tablespace file stats. Again,in order to understand the impact of a load, a snapshot ofv$tempstat needs to be taken before and after the load.

It appears to be event driven and hence must be accurate.The relevant columns are

PHYRDS + PHYWRTS = IOPS to the fileAVGIOTIM Average response time of the file in 1/100 th of asecond. Divide by 10 to report in ms. These timings can varybetween 1 to 30 ms depending on the size of the file and thekind of activity. I would assume that 25 ms is about themaximum you should ever see.MAXIORTM Maximum time spent on a single read in 1/100 thof a second. Divide by 10 to report in ms. This shows theslowest read ever on the file. Anything greater than 30mswould raise alarms.

MAXIOWTM - Maximum time spent on a single write in 1/100th

of a second. Divide by 10 to report in ms. This shows theslowest write even on the file. Anything greater than 30mswould raise alarms.

It is important to look at MAXIORTM and MAXIOWTM as theseshow the poorest performance for the datafile.


17/2917


dba_hist_sysmetric_summary Data from the AWR

snapshots. The accuracy of the data in this table isdebatable. I have noticed discrepancies in the datareported in this table.

The data is reported by snapshot number.

Assume it is the average during the entiresnapshot interval.

Physical Read Total IO Requests Per Sec +Physical Write Total IO Requests Per Sec =

IOPSPhysical Read Total Bytes Per Sec + PhysicalWrite Total Bytes Per Sec = Throughput


18/2918

Storage Subsystem Data Collection (Contd.)odmstat gives file level I/O specific details if Veritas ODM is

enabled.[oracle@viveka] $ /opt/VRTS/bin/odmstat *dbfOPERATIONS FILE BLOCKS AVG TIME(ms)

FILE NAME READ WRITE READ WRITE READ WRITE

APD_01.dbf 36 3 1152 96 2.2 0.0

ARD_10.dbf 31 9 1056 320 5.8 1.1

Operations refer to number of IOPSFile Blocks refer to size of IOPS. It is reported in sectors(1 sector = 512Bytes)

Avg Time refer to the service time.


19/2919

Storage Subsystem Data CorrelationData (IOPS, Service Time etc) needs to be collected from

the OS and Database perspective and correlated.Data needs to be analyzed over a period of time to profilethe nature of the workload.


20/2920

Profile of a typical Datawarehouse system - Storage

It is very difficult to generalize access patterns and storage profile for aDatawarehouse system. However the below probably is a good guideline.

For a averagely used Enterprise Datawarehouse for a large company not inthe retail industry and configured properly

IO Profile would show smaller number of IOPS, but large sized IOPS.Typical IOPS would be in the range of 5-10K IOPS/s during heavy usage.During normal hours, one can see around 2K-3K IOPS.For a block size of 16K and db_multiblock_read_count set to 64, you canexpect to see IOP sizes from 16K to +1MB. Luns used for Redo logs willshow significantly large sized IOPS during peak DML activity.Large number of direct reads/writes and multiblock reads/writes.High number of parallel operations and heavy PGA activity.Average throughput would be in the range of 450-600MB/sec.

Heavy temporary tablespace usage.Heavy redo logs activity during periods of DML. It is important to note thatredo log group members are generally small 512M to 1.5GB and sotypically the Storage/Unix Administrator will re-use luns when creatingredo log filesystems. This is not good practice. You probably will noticethat the redo log luns will see the biggest IOP sizes for write operations.


21/29

21

Storage from an oracle perspective (Datawarehouse)

For datawarehousing, the critical components areStorage

Memory

CPU

Storage plays a critical role in performance.

Do not skimp on storage. Plan ahead for 2 years and layout thefile systems appropriately. It is normal to have 3-4x overheadfor initial sizing.

Appropriate sizing and configuration is very important.

Most operations process huge amounts of data resulting inconsiderable IO.

Datawarehousing is more throughput dependant and less onresponse time.


22/29

22

Datawarehousing - ArrayCheaper modular arrays (such as HDS AMS1000) work better fordatawarehousing kind of loads rather than the high end arrays.

Go for the fastest drives (15K RPM) . Use 72GB drives instead of146GB Drives.

Since Oracle does read-ahead into the buffer cache, disable readcache or minimize read cache on the array. Try and assign

maximum possible cache for staging writes.Avoid striping on the array (Raid 5, Raid 10). Array based stripingdoes not offer big stripe widths (1M or greater). Most are limited to384K (HDS). Stripe width refers to the width of the stripe on asingle disk.

Go for Disk mirroring (Raid 1), preferable 1D +1P as the RaidGroup configuration.

Use the entire Raid Group as a single lun.

Share the luns across as many controllers as possible.


23/29

23

Datawarehousing - SystemThere is a lot of sort/merge of IO requests happening atevery layer (Volume Manager, HBA, Array Controller) tominimize head movement. To make best use of it, ensurethat 32M is set as the maximum size of a single IO requestthat can be passed down from the driver to the HBA.Configure volumes as stripes with big stripe widths (1M or

greater). Stripe can be used for all file systems (redo,archive, data, temporary).Use even number of luns for creating volumes (2, 4 or 6 or8).If using 146GB drives, then usable space is only about 90-100GB. Do not exceed this.Configure multipathing such that the all the active paths to alun are written to at the same time.If using vxfs, then set the block size to 8K (maximum).


24/29

24

Datawarehousing - OracleBasics

Veritas ODM is a must (async + direct i/o).

Make sure that all async i/o patches specific to platform are applied.Do not re-use luns. That is, once a lun is used for a filesystem, it is not tobe used for any other requirement or other filesystems. There will beconsiderable wastage, however well worth it.Do not intermix data, redo, archive and index files. Keep them on separatefilesystems. It is easy to maintain and also troubleshoot performance.Use appropriate block size as required.

Redo configurationUse 4 redo log groups with 2 members each. Place the redo logs ondedicated filesystems. Make sure the size of the members are big ( >1500MB).

Temporary TablespacesYou can use either raw volumes or ODM enabled datafiles.Solid State Memory is best suited for Temporary tablespaces.However if it is not available, 72GB, 15K RPM drives can be used.Create Temporary Tablespace groups and assign the TemporaryTablespaces to groups accordingly. Hopefully oracle will use thetemporary tablespaces without conflict.


25/29

25

Datawarehousing Oracle (Contd.)

Tablespaces and datafilesLook ahead for 2 years and plan as below. Beyond 2 years,offload old data to static instances.

Identify number of tablespaces for a schema.Number of datafiles for a tablespace (depending on size of theobjects and projected growth). If you know the growth, thenpre-create appropriate datafiles are required.

Oracle round-robins when creating extents for objects. So themore datafiles available in a tablespace when creating anobject, the better the striping of the extents will be.Use fixed size datafiles (10G or 20G). Do not enable auto-extend. Use uniform extent sizing. Disabling auto-extend anduniform extent sizing reduces fragmentation and is a lot moreefficient for the database especially when doingupdates/deletes.Use multiples of the extent size to match with the stripe widthon the volume. Especially for big tables (> 15-20GB), use bigextents such 200M or higher as required.Split the datafiles across all available filesystems.


26/29

26

Datawarehousing Oracle (Contd.)

Multiple Block sizesMultiple block sizes are a mixed bag. Useful only when you knowyour data very well. On Solaris, the maximum block size is 32K.Enabling multiple block sizes require you to set aside a portion ofthe SGA for the specific block size.Writes -

Updates are a very costly operation when having bigger block

sizes.Inserts using bulk loads will be very fast. Conventional loading depends on the memory set aside.

Reads Index scans - Would probably give good performance. But isretrieving 32K data really necessary when you need only 8K.Direct reads I guess it would be fast as it bypasses the buffercache.

Do not know the impact of multiple block sizes on undotablespace?


27/29

27

Oracle and Storage statistics

Help! I have setup all as discussed. How do I know if Oracle isperforming adequately? I do not have access to run privilegedcommands (vxstat etc) and I want to see statistics from oraclesperspective. I am not happy with odmstat. I want more data.Oracle does collect IO statistics at all levels (object, datafile andinstance) I assume most, if not all are event driven. If they areevent driven, then they are extremely accurate. If time sampled,

then these are only indicators.Event driven statistics are the wait events sequential reads,scattered reads, log archive i/o etc.Oracle also captures number of IOPS, throughput, physical reads,writes, response times etc.

All this data is stored in the v$ views and the dba_hist tables.As to how accurate the numbers are, we can only guess.I have personally seen discrepancies in Oracles reporting, so it isbest to correlate with OS statistics.


28/29

28

Conclusion

Get storage right the first time or the datawarehouse

solution will fail.Take the time and do proper assessment along with testingbefore deploying the instance.

Do not skimp on storage.

Storage is the most important component and most easilyforgotten too.

Always correlate oracle statistics with OS statistics.


29/29

Questions?

Analyze Application Impact - Storage Subsystemv2

Documents

Transcript of Analyze Application Impact - Storage Subsystemv2