Analyze Application Impact - Storage Subsystemv2

download Analyze Application Impact - Storage Subsystemv2

of 29

Transcript of Analyze Application Impact - Storage Subsystemv2

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    1/29

    1

    Analyzing Oracles impact using simple userlandtools Storage subsystem (Specific toDatawarehousing)

    Krishna Manoharan

    [email protected]

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    2/29

    2

    IntroductionEvery application impacts the host Operatingsystem and connected sub-systems in a uniqueway.

    In order to profile an application and understand itsimpact on the environment, there are a number ofuserland tools provided within the OS.

    Many of these tools do not require super-userprivileges thus enabling ordinary users such as

    dbas or application developers, the ability to seeand gauge the impact of the application on thesystem.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    3/29

    3

    Subsystems in an environment

    One needs to analyze the impact ofan application on all the majorsubsystems in an environment.

    CPUMemory

    StorageNetwork

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    4/29

    4

    Profiling an application

    To profile an application, oneneeds to knowWhat to observe (Metrics)How to observe (Tools togather these metrics)

    And finally how to interpret theresults (correlate, compare anddraw conclusions)

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    5/29

    5

    Storage SubsystemThe storage subsystem normally consists of thefollowing:

    Application

    Created for applications.

    Can have different block sizes.Filesystem

    Created from luns. Can be of different layout Stripe, Concat, mirror etcVolume

    Cache is the memory (staging area) on the Array. Luns are carved from Raid Groups on Array. Luns are also of different layouts mirror, concat, stripe etc

    Luns throughCache

    Array Controller

    Disks

    H o s t

    A r r a

    y

    Array Management

    Actual Disks on the Array

    HBA

    Oracle

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    6/29

    6

    Storage Subsystem Contd.Disks refer to the actual hard drives which we are all familiar

    with. Disks are of different capacities 72GB, 146GB, 300GB,different kinds FC, SATA, SAS and finally different speeds 7200 RPM, 10K RPM, 15K RPM.Cache refers to the memory on the array to which all writes arestaged. Cache also contains pre-fetch data. The controller isthe intelligence behind the Array.

    Raid groups are created using the disks on the array. Luns arecarved out of the Raid Groups and assigned to the host. Lunscan be of any size, as can volumes and filesystems. Raidgroups can be of different layouts mirror, stripe, stripe-mirror,Raid 5 and Luns inherit the same layout. Luns are normallymultipathed (with 2 or more paths).Volumes are created from luns Volumes can be created asconcat, mirror, stripe-mirror, mirror-stripe, Raid5 etc.Filesystems have different block sizes For vxfs, the blocksizes are 1K to 8K.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    7/297

    Storage Subsystem Metrics.

    IO requests starts with the application issuing a IOsystem call (read, write).Based on the current activity of the system, the requestmay be processed immediately or routed to a queue ofrequests (similar to a run queue on a CPU waitcolumn in iostat).It waits in the queue for a period of time until availableto be dispatched (Wait time wsvc_t column in iostat)It then executes on the disk taking time to complete(response time asvc_t column in iostat).

    Corresponding to the above activities, there is alsothe size of the IO operation (math using iostatcounters), bandwidth (kr/s + kw/s columns in iostat)and number of IO operations (r/s + w/s in iostat).

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    8/298

    Storage Subsystem Metrics (Contd.)

    The common metrics used todescribe storage performance areWait Average transactions waiting to be serviced (Similarto run queue on a CPU)

    Wait time Average time spent in the wait queue.Service Time The time in milliseconds which a lun spendson servicing a request.

    IOPS Number of IO Operations/second

    I/O sizes The average size of an IO operation in KB orMB.

    Throughput The average bandwidth available in MB/sec.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    9/299

    Storage Subsystem Tools

    The following tools are used to capturestorage statistics. Both run-time andhistorical data is very much essential.

    Run Time dataiostat Gives statistics at a lun level (service time, IOPS, IOsizes, throughput)vxstat Gives statistics at a volume level (assuming we areusing Veritas Volume Manager). Statistics available are againservice time, IOPS and IO sizes.vxdmpadm Gives statistics at a lun level (service time, IOPS

    and IO sizes)odmstat Gives statistics for oracle datafiles (if using VeritasODM).swat Sun Storedge Workload Analysis Tool Gives statisticsat a lun level (service time, IOPS, Throughput, IO sizes).Oracle v$views Historical and run time (Not at a lun level)

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    10/2910

    Storage Subsystem Tools (Contd.)

    Historical data capture toolsswat Sun Storedge Workload Analysis Tool Givesstatistics at a lun level (service time, IOPS, Throughput,IO sizes).

    sar sar also captures disk stats

    Oracle v$views Historical and run time ((Not at a lunlevel)

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    11/2911

    Storage Subsystem Tools (Contd.)

    Of the tools listed, only iostat, odmstat and swat can berun by a non-privileged user. Oracle v$views can beviewed by anyone with appropriate oracle privileges.

    Normally luns assigned to a host are small sized andnumerous. So using tools such as iostat are verycumbersome.The most user friendly tool is swat and is ideal forcollecting and analyzing data over a long term. It collectsand graphs the data for easy analysis.

    Using iostat with the extended options (iostat xnM) will give the most useful information. Thecolumns to look for are wait, asvc_t, r/s, w/s, Mr/s andMw/s.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    12/2912

    Storage Subsystem Data Collectionmkrishna@viveka:> iostat xnM 1

    extended device statisticsr/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.2 0 0 c1t10d00.6 1.6 0.0 0.0 0.0 0.0 0.0 19.2 0 1 c0t8d00.0 0.1 0.0 0.0 0.0 0.0 0.0 11.2 0 0 c8t20d53

    r/s (number of reads/second), w/s ( number of writes/second),Mr/s (MB reads/second), Mw/s (MB Writes/second) areindicators of the work load.wait This is the run-queue. This shows the pendingoperations waiting to be serviced. Normally, it should be 0.wsvc_t The wait time for the above statistic. Should be 0.avsc_t The average response time to a lun It can varyanywhere from 1ms to10ms. The average response time for avolume in a datawarehouse system should be 20-40ms. Duringheavy loads, volume response times can vary between 20-100ms.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    13/2913

    Storage Subsystem Data Collection (Contd.)

    From an oracle perspective, the views which contain IO

    data arev$filestat Specific to oracle datafilesv$sysstat At the instance levelv$segstat At the segment levelv$tempstat Temporary Tablespace file statsdba_hist_sysmetric_summary Data from the AWR snapshots

    It is easy to write a sql which groups by mount point orfilename and reports

    Number of IOPS

    Response TimeThroughput

    However Oracle does not report statistics at a Lun level.These need to come from the OS.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    14/2914

    Storage Subsystem Data Collection (Contd.)

    v$filestat Specific to oracle datafiles only (no redo, temp files)It appears to be event driven and so should be accurate.Since it is cumulative, one needs to take a snapshot of v$filestatbefore and after a load.The relevant columns are

    PHYRDS + PHYWRTS = IOPS to the fileAVGIOTIM Average response time of the file in 1/100 th of asecond. Divide by 10 to report in ms. These timings can varybetween 1 to 30 ms depending on the size of the file and thekind of activity. I would assume that 25 ms is about themaximum you should ever see.MAXIORTM Maximum time spent on a single read in 1/100 thof a second. Divide by 10 to report in ms. This shows theslowest read ever on the file. Anything greater than 30mswould raise alarms.MAXIOWTM - Maximum time spent on a single write in 1/100 thof a second. Divide by 10 to report in ms. This shows theslowest write even on the file. Anything greater than 30mswould raise alarms.

    It is important to look at MAXIORTM and MAXIOWTM as theseshow the poorest performance for the datafile.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    15/2915

    Storage Subsystem Data Collection (Contd.)

    v$sysstat Reports cumulative statistics at the instance

    level. Again, in order to understand the impact of a load, asnapshot of v$sysstat needs to be taken before and afterthe load.

    It appears to be event driven and so should be accurate.

    physical read total IO requests + physical write total IOrequests = IOPS

    physical read total bytes + physical write total bytes =Throughput

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    16/2916

    Storage Subsystem Data Collection (Contd.)

    v$tempstat Cumulative Temporary Tablespace file stats. Again,in order to understand the impact of a load, a snapshot ofv$tempstat needs to be taken before and after the load.

    It appears to be event driven and hence must be accurate.The relevant columns are

    PHYRDS + PHYWRTS = IOPS to the fileAVGIOTIM Average response time of the file in 1/100 th of asecond. Divide by 10 to report in ms. These timings can varybetween 1 to 30 ms depending on the size of the file and thekind of activity. I would assume that 25 ms is about themaximum you should ever see.MAXIORTM Maximum time spent on a single read in 1/100 thof a second. Divide by 10 to report in ms. This shows theslowest read ever on the file. Anything greater than 30mswould raise alarms.

    MAXIOWTM - Maximum time spent on a single write in 1/100th

    of a second. Divide by 10 to report in ms. This shows theslowest write even on the file. Anything greater than 30mswould raise alarms.

    It is important to look at MAXIORTM and MAXIOWTM as theseshow the poorest performance for the datafile.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    17/2917

    Storage Subsystem Data Collection (Contd.)

    dba_hist_sysmetric_summary Data from the AWR

    snapshots. The accuracy of the data in this table isdebatable. I have noticed discrepancies in the datareported in this table.

    The data is reported by snapshot number.

    Assume it is the average during the entiresnapshot interval.

    Physical Read Total IO Requests Per Sec +Physical Write Total IO Requests Per Sec =

    IOPSPhysical Read Total Bytes Per Sec + PhysicalWrite Total Bytes Per Sec = Throughput

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    18/2918

    Storage Subsystem Data Collection (Contd.)odmstat gives file level I/O specific details if Veritas ODM is

    enabled.[oracle@viveka] $ /opt/VRTS/bin/odmstat *dbfOPERATIONS FILE BLOCKS AVG TIME(ms)

    FILE NAME READ WRITE READ WRITE READ WRITE

    APD_01.dbf 36 3 1152 96 2.2 0.0

    ARD_10.dbf 31 9 1056 320 5.8 1.1

    Operations refer to number of IOPSFile Blocks refer to size of IOPS. It is reported in sectors(1 sector = 512Bytes)

    Avg Time refer to the service time.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    19/2919

    Storage Subsystem Data CorrelationData (IOPS, Service Time etc) needs to be collected from

    the OS and Database perspective and correlated.Data needs to be analyzed over a period of time to profilethe nature of the workload.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    20/2920

    Profile of a typical Datawarehouse system - Storage

    It is very difficult to generalize access patterns and storage profile for aDatawarehouse system. However the below probably is a good guideline.

    For a averagely used Enterprise Datawarehouse for a large company not inthe retail industry and configured properly

    IO Profile would show smaller number of IOPS, but large sized IOPS.Typical IOPS would be in the range of 5-10K IOPS/s during heavy usage.During normal hours, one can see around 2K-3K IOPS.For a block size of 16K and db_multiblock_read_count set to 64, you canexpect to see IOP sizes from 16K to +1MB. Luns used for Redo logs willshow significantly large sized IOPS during peak DML activity.Large number of direct reads/writes and multiblock reads/writes.High number of parallel operations and heavy PGA activity.Average throughput would be in the range of 450-600MB/sec.

    Heavy temporary tablespace usage.Heavy redo logs activity during periods of DML. It is important to note thatredo log group members are generally small 512M to 1.5GB and sotypically the Storage/Unix Administrator will re-use luns when creatingredo log filesystems. This is not good practice. You probably will noticethat the redo log luns will see the biggest IOP sizes for write operations.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    21/29

    21

    Storage from an oracle perspective (Datawarehouse)

    For datawarehousing, the critical components areStorage

    Memory

    CPU

    Storage plays a critical role in performance.

    Do not skimp on storage. Plan ahead for 2 years and layout thefile systems appropriately. It is normal to have 3-4x overheadfor initial sizing.

    Appropriate sizing and configuration is very important.

    Most operations process huge amounts of data resulting inconsiderable IO.

    Datawarehousing is more throughput dependant and less onresponse time.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    22/29

    22

    Datawarehousing - ArrayCheaper modular arrays (such as HDS AMS1000) work better fordatawarehousing kind of loads rather than the high end arrays.

    Go for the fastest drives (15K RPM) . Use 72GB drives instead of146GB Drives.

    Since Oracle does read-ahead into the buffer cache, disable readcache or minimize read cache on the array. Try and assign

    maximum possible cache for staging writes.Avoid striping on the array (Raid 5, Raid 10). Array based stripingdoes not offer big stripe widths (1M or greater). Most are limited to384K (HDS). Stripe width refers to the width of the stripe on asingle disk.

    Go for Disk mirroring (Raid 1), preferable 1D +1P as the RaidGroup configuration.

    Use the entire Raid Group as a single lun.

    Share the luns across as many controllers as possible.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    23/29

    23

    Datawarehousing - SystemThere is a lot of sort/merge of IO requests happening atevery layer (Volume Manager, HBA, Array Controller) tominimize head movement. To make best use of it, ensurethat 32M is set as the maximum size of a single IO requestthat can be passed down from the driver to the HBA.Configure volumes as stripes with big stripe widths (1M or

    greater). Stripe can be used for all file systems (redo,archive, data, temporary).Use even number of luns for creating volumes (2, 4 or 6 or8).If using 146GB drives, then usable space is only about 90-100GB. Do not exceed this.Configure multipathing such that the all the active paths to alun are written to at the same time.If using vxfs, then set the block size to 8K (maximum).

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    24/29

    24

    Datawarehousing - OracleBasics

    Veritas ODM is a must (async + direct i/o).

    Make sure that all async i/o patches specific to platform are applied.Do not re-use luns. That is, once a lun is used for a filesystem, it is not tobe used for any other requirement or other filesystems. There will beconsiderable wastage, however well worth it.Do not intermix data, redo, archive and index files. Keep them on separatefilesystems. It is easy to maintain and also troubleshoot performance.Use appropriate block size as required.

    Redo configurationUse 4 redo log groups with 2 members each. Place the redo logs ondedicated filesystems. Make sure the size of the members are big ( >1500MB).

    Temporary TablespacesYou can use either raw volumes or ODM enabled datafiles.Solid State Memory is best suited for Temporary tablespaces.However if it is not available, 72GB, 15K RPM drives can be used.Create Temporary Tablespace groups and assign the TemporaryTablespaces to groups accordingly. Hopefully oracle will use thetemporary tablespaces without conflict.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    25/29

    25

    Datawarehousing Oracle (Contd.)

    Tablespaces and datafilesLook ahead for 2 years and plan as below. Beyond 2 years,offload old data to static instances.

    Identify number of tablespaces for a schema.Number of datafiles for a tablespace (depending on size of theobjects and projected growth). If you know the growth, thenpre-create appropriate datafiles are required.

    Oracle round-robins when creating extents for objects. So themore datafiles available in a tablespace when creating anobject, the better the striping of the extents will be.Use fixed size datafiles (10G or 20G). Do not enable auto-extend. Use uniform extent sizing. Disabling auto-extend anduniform extent sizing reduces fragmentation and is a lot moreefficient for the database especially when doingupdates/deletes.Use multiples of the extent size to match with the stripe widthon the volume. Especially for big tables (> 15-20GB), use bigextents such 200M or higher as required.Split the datafiles across all available filesystems.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    26/29

    26

    Datawarehousing Oracle (Contd.)

    Multiple Block sizesMultiple block sizes are a mixed bag. Useful only when you knowyour data very well. On Solaris, the maximum block size is 32K.Enabling multiple block sizes require you to set aside a portion ofthe SGA for the specific block size.Writes -

    Updates are a very costly operation when having bigger block

    sizes.Inserts using bulk loads will be very fast. Conventional loading depends on the memory set aside.

    Reads Index scans - Would probably give good performance. But isretrieving 32K data really necessary when you need only 8K.Direct reads I guess it would be fast as it bypasses the buffercache.

    Do not know the impact of multiple block sizes on undotablespace?

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    27/29

    27

    Oracle and Storage statistics

    Help! I have setup all as discussed. How do I know if Oracle isperforming adequately? I do not have access to run privilegedcommands (vxstat etc) and I want to see statistics from oraclesperspective. I am not happy with odmstat. I want more data.Oracle does collect IO statistics at all levels (object, datafile andinstance) I assume most, if not all are event driven. If they areevent driven, then they are extremely accurate. If time sampled,

    then these are only indicators.Event driven statistics are the wait events sequential reads,scattered reads, log archive i/o etc.Oracle also captures number of IOPS, throughput, physical reads,writes, response times etc.

    All this data is stored in the v$ views and the dba_hist tables.As to how accurate the numbers are, we can only guess.I have personally seen discrepancies in Oracles reporting, so it isbest to correlate with OS statistics.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    28/29

    28

    Conclusion

    Get storage right the first time or the datawarehouse

    solution will fail.Take the time and do proper assessment along with testingbefore deploying the instance.

    Do not skimp on storage.

    Storage is the most important component and most easilyforgotten too.

    Always correlate oracle statistics with OS statistics.

  • 8/14/2019 Analyze Application Impact - Storage Subsystemv2

    29/29

    Questions?