Post on 22-May-2020
Azure Learning Circles
Presented By:
Shane Creamer
shanec@Microsoft.com
Azure Management Session 1:
Logs, Diagnostics & Metrics
Typical Customer Narratives
• Most customers know how to operate on-premises, but now they need to understand the essentials of Azure management
• They want to leverage Azure native features
• They want to understand who is doing what and when
• They want to know how the systems performs and collects diagnostics data in case of problems
• They want to be alerted when things don’t go as expected
Session AgendaAzure Infrastructure as a Service
Azure Management generally includes
• Azure Audit Logs
• Azure Diagnostics
• Azure Alerts
• Analytics (extended feature sets of Azure that also incur additional cost), such as:
• Azure Network Watcher
• Azure Monitor
Overview – Source Types
4
Activity Logs : The Azure Activity Log is a log that provides insight into the operations that were performed on resources in your subscription.Glossary: Operational logs (ASM) , Activity logs (ARM)
Diagnostics Logs: Azure Diagnostic Logs are logs emitted by a resource that provide rich, frequent data about the operation of that resource
Metrics: Azure Insight data that enables to gain visibility into the performance and health of your workloads on Azure.
Alerts: Alerts are a method of monitoring Azure resource metrics, events or logs and then being notified when a condition you specify is met.
Diagnostics: Single Server vs. the Cloud
Single Server
Static Environment
• Single well-known instance
• Traceable local transactions
Local Access is Feasible
• All in one TS session
• Data & tools co-located
• In-Place Changes
Cloud
Dynamic Environment
• Multi-instance, elastic capacity
• Distributed work loads
Local Access is Infeasible
• Many instances in different nodes
• Distributed, scaled-out data
• Service Upgrades
Diagnostics
• SDK component providing distributed monitoring & data collection for cloud apps
• Support Standard Diagnostics APIs
• Trace, Debug normally
• Cloud Friendly
• Manage multiple role instances centrally
• Scalable
• Choose what to collect & when to collect it
• Event Logs, Trace/Debug, Performance Counters, IIS Logs, Crash Dumps, Arbitrary log files
• Everything is remotely configurable
How does it work (in a nutshell)?
• Role Instance Starts
• Diagnostic Monitor Starts
• Monitor is configured
• Imperatively at Start time
• Remotely any time
• Monitor buffers data locally
• User can set a quota (FIFO)
• User initiates transfer to storage
• Scheduled or On Demand
Role
Role Instance
Diagnostic Monitor
Local directory storage
Remote Configuration
Role
Role Instance (VM)
Diagnostic Monitor
Local directory storage
Poll Interval
Windows Azure Diagnostics
Role
Role Instance
Diagnostic
Monitor
Configuration
Quota
enforcement
Local directory storage
Data collection
(traces, logs,
crash dumps)
Windows
Data Sources
IIS Logs & Failed Request Logs
Perf Counters
Windows Event Logs
Azure Network Watcher*
Microsoft Azure Diagnostics: Collection Process
1. Role Instance starts
2. Diagnostic Agent starts
3. Monitor is configured
1. Configuration & Extension
4. Monitor buffers data locally
1. Users can set a quota to limit log sizes
2. Uses First In / First Out (FIFO)
5. Data transferred from local buffer to Azure Storage
10
Blob Storage Table Storage
Microsoft Azure Diagnostics: Data Types
• Information about IIS web sites.IIS Logs
• Information about Diagnostics agent and other process involved in data collectionAzure infrastructure logs
• Information about failed requests to an IIS site or application.IIS Failed Request logs
• Information sent to the Windows event logging system.Windows Event logs
• Operating System and custom performance counters.Performance counters
• Information about the state of the process in the event of an application crash.Crash dumps
• Logs created by your application or service.Custom error logs
• Events generated by your code using the .NET EventSource class..NET Event Source
• ETW events generated by any process.Manifest based ETW
11
Microsoft Azure Diagnostics: Data Location
12
Diagnostics Data Storage Container
Location
Application Logs Table WADLogsTable
Event Logs Table WADWindowsEventLogsTable
Performance Counters Table WADPerformanceCountersTable
Infrastructure Logs Table WADDiagnosticInfrastructureLogsTable
Directories monitored by diagnostic monitor Table WADDirectoriesTable
IIS Logs Blob wad-iis-logfiles
IIS Failed Request Log Files Blob wad-iis-failedreqlogfiles
Crash dumps Blob wad-crash-dumps
Custom Log Files Blob Custom Blob Container
Azure Storage Explorer is very useful here
VM connection issues due to RDP security changes
Scenario affecting Dormant use VM’s
In March of 2018 Microsoft released a CredSSP Encryption remediation for RDP
• This update will prevent connection of your updated On-Prem workstations to any Cloud VM’s that have not been updated (dormant or test VM’s)
• There are (4) approaches to mitigate this:
1. Change the Encryption Oracle Remediation policy to Enabled, and Protection Level to Vulnerable: - This requires a reboot on your location Workstation to take effect which means losing a lot of work progress, so not recommended.
2. If you have OMS or SCCM agents running on the VM’s, consider pushing an update script that will force them to download, install, and reboot the VM’s solving the issue.
3. Use the Azure Serial Console to force the VM to install the latest update - Unwieldly but allows PowerShell scripts
4. Configure your local Workstation in System\Remote\and Deselect and apply “Allow connections only from computers running Remote Desktop with Network Level Authentication” – This is the quickest approach
The fastest workaround
Configure your local Workstation in System\Remote\and Deselect and apply “Allow connections only from computers running Remote Desktop with Network Level Authentication” – This is the quickest approach
Uncheck and Apply
REG ADD HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\System\CredSSP\Parameters\/v AllowEncryptionOracle /t REG_DWORD /d 2
An optimal Long Term Approach
Allow Microsoft to enable Update Management on your VM’s. This can help prevent these kinds of scenarios, especially test VM’s which can lie dormant and be affected when there is a significant Microsoft security pivot.
Alerts
Alerts
• Alerts are a method of monitoring Azure resource metrics, events or logs and then being notified when a condition you specify is met.
• You can receive alerts based on:
• Metric values - The alert triggers when the value of a specified metric crosses a threshold you assign in either direction
• Activity log events - An alert can trigger on every event, or, only when a certain number of events occur
Other Features worth exploring
• Azure Monitor Provides a single source for monitoring Azure resources. You can visualize, query, route, archive, and take action on the metrics and logs coming from your Azure resources
• Microsoft Azure Log Integration
• Enables you to integrate raw logs from your Azure resources into your on-premises Security Information and Event Management (SIEM) systems
• Current support: Azure Audit Logs, Virtual machine logs and Azure Security Center alerts
• Integration of activity log in OMS - Provides you insights into activities happening on your Azure environment across multiple subscriptions
Metrics
Metrics• Azure Provides Telemetry (Performance
counters) to metrics for monitoring and troubleshooting your services in the Cloud
• Supported Services:
• VMs (Azure Resource Manager based) , VM Scale Sets , Batch
• Event Hub namespace , Service Bus namespace (premium SKU only)
• SQL (version 12) , Elastic SQL Pool
• Web Sites, Web Server Farms, Logic Apps, IoTHubs, Redis Cache
• Networking: Application Gateways
• Search
Metrics
• Metrics have the following characteristics:
• All metrics have 1-minute frequency. You receive a metric value every minute from your resource, giving you near real-time visibility into the state and health of your resource.
• Metrics are available out-of-the-box without needing to opt in or setting up additional diagnostics*
• You can access 30 days of history for each metric. You can quickly look at the recent and monthly trends in the performance or health of your resource
Metrics - Actions• View all metrics via the Azure portal
• Alert rules based on metrics
• Autoscale resources.
• Archive metrics for longer or use them for offline reporting.
• Stream metrics to an Event Hub.
• Route all metrics to OMS
Metrics – Storage Metrics
• Azure Storage Metrics allows you to track your aggregated storage usage for Blobs, Tables and Queues
• The metrics data can be categorized as:
• Capacity: Provides information regarding the storage capacity consumed for the blob service, the number of containers and total number of objects stored by the service
• Requests: Provides summary information of requests executed against the service. It provides total number of requests, average E2E latency etc.
Metrics – Storage Metrics• Storage metrics are stored in hidden tables , can be retrieved using the Query
Tables and Query Entities APIs.
• You have the ability to query and/or delete records in the tables. Note that the tables themselves cannot be deleted.
• Capacity information:
• $MetricsCapacityBlob table to store capacity information for blobs
• Transaction metrics:
• $MetricsTransactionsBlob:
• $MetricsTransactionsTable:
• $MetricsTransactionsQueue:
• This information is available for all the services – Blobs, Tables and Queues.
Activity Logs
Activity Log – First Look
• Get this view by clicking on the Dashboard\Activity Log button
• This view has saveable queries that can be pinned to the dashboard
• This view also allows us to select Metrics, Diagnostic Logs, Searches, and manage Alerts
Activity Logs• Using the Activity Log, you can determine the ‘what, who, and when’ for any
write (PUT, POST, DELETE) operations taken on the resources in your subscription,
• You can also understand the status of the operation and other relevant properties.
• The Activity Log does not include read (GET) operations
• It reports control-plane events for your subscriptions
Activity Logs - Actions
Here are some of the things you can do with the Activity Log:
• Query and view it in the Azure portal.
• Query it via REST API, PowerShell Cmdlet, or CLI.
• Create an email or webhook alert that triggers off an Activity Log event.
• Export Activity Log
• to a Storage Account for archival or manual inspection
• to an Event Hub for ingestion by a third party services
• Analyze it in PowerBI using the PowerBI content pack.
Activity Logs - Archiving
Export the Activity Log with Log Profiles – 90 Days default storage
• A Log Profile controls how your Activity Log is exported. Using a Log Profile, you can configure:
• Where the Activity Log should be sent (Storage Account or Event Hubs)
• Which event categories (eg. Write, Delete, Action) should be sent
• Which regions (locations) should be exported
• How long the Activity Log should be retained in a Storage Account
Activity Logs – Storage Analytics
• Storage Analytics logs detailed information about successful and failed requests to a storage service (Minimal or Verbose)
• You must enable it individually for each service you want to monitor (Blobs, Tables, Queues)
• Log entries are created only if there are requests made against the service endpoint
• Stored in in block blobs in a container named $logs http://<accountname>.blob.core.windows.net/$logs
Analytics – Initial Configuration
• The management blades allow us to choose a subscription, workspace, and pricing tier that we wish
Activity Metrics – Selection panel
• This allows us to configure what default dashboard data we wish to see
Limitation of Dashboard metrics▪ The dashboard can only display counters that have the same sample rate,
since CPU is sampled at a different rate then the disks notice below the disk counters are greyed out
▪ While selectable there is a typical 1 minute delta between real-time and graphed data in the management snap-in
Diagnostic Logs
Diagnostics Logs• Provides frequent data about the operation of that resource
• The content of these logs varies by resource type:
• Windows event system logs are one category of Diagnostic Log for VMs
• Blob, table, and queue logs are categories of Diagnostic Logs for storage accounts
• Not all resources support the new type of Diagnostic Logs
• Available: Azure LB, NSG, AppGW, KeyVault, Azure Search, Data Lake
• vNET GW (PS) : Start-AzureVNetGatewayDiagnostics
Diagnostics Logs - Actions
• Here are some of the things you can do with Diagnostic Logs:
• Save them to a Storage Account for auditing or manual inspection.
• Stream them to Event Hubs for ingestion by a third-party service or custom analytics solution such as PowerBI.
• Analyze them with OMS Log Analytics
Diagnostics Logs - Enabling• Diagnostic Logs for non-Compute resources are configured using Diagnostic
Settings.
• Diagnostic Settings for a resource control:
• Where Diagnostic Logs are sent (Storage Account, Event Hubs, and/or OMS Log Analytics).
• Which Log Categories are sent.
• How long each log category should be retained in a Storage Account – a retention of zero days means that logs are kept forever
Diagnostics Logs - VMs• Diagnostic Logs VMs are enabled using Diagnostic Settings or PowerShell
• It deploys a VM extension – add-on: Windows: Microsoft.Azure.Diagnostics.IaaSDiagnosticsLinux: Microsoft.OSTCExtensions.LinuxDiagnostic)
• Metrics, Events, Boot Diagnostic can be enabled
• Configuration is based on specific schema, Azure provides a default config file
• You can provide your own configuration file and enable the diagnostics during deployment or post deployment on both ASM or ARM VMs
Diagnostics Logs – Config FileData Source Description
IIS Logs Information about IIS websites.
Azure Diagnostic infrastructure logsInformation about Azure Diagnostics.
IIS Failed Request logs Information about failed requests to an IIS website or application.
Windows Event logs Information sent to the Windows event logging system.
Performance counters Operating system and custom performance counters.
Crash dumps Information about the state of the process in the event of an application crash.
Custom error logs Logs created by your application or service.
.NET EventSource Events generated by your code by using the .NET EventSource class.
Manifest-based ETW Event Tracing for Windows events generated by any process.
Diagnostics Logs – VMs/Tables• WADWindowsEventLogs Table stores the enabled event logs from the VM
• WADMetrics tables stores the Metrics (performance data) from the VM with the following naming conventions:
• WADMetrics : Standard prefix for all WADMetrics tables, PT1H or PT1M : Signifies that the table contains aggregate data over 1 hour or 1 minute, P10D : Signifies the table will contain data for 10 days from when the table started collecting data
• V2S : String constant
• yyyymmdd : The date at which the table
started collecting data
• *LADMetrics for Linux
Diagnostics Logs – Storage analytics
• Storage Analytics logs detailed information about successful and failed requests to a storage service
• You must enable it individually for each service you want to monitor
• Log entries are created only if there are requests made against the service endpoint
• Stored in in block blobs in a container named $logs http://<accountname>.blob.core.windows.net/$logs
Diagnostics Logs – Storage analytics
• The following types of authenticated requests are logged:
• Successful requests
• Failed requests, including timeout, throttling, network, authorization, and other errors
• Requests using a Shared Access Signature (SAS), including failed and successful requests
• Requests to analytics data
Azure EssentialsQuestions & Discussion