Post on 27-May-2020
Ravi Kishore Valeti
Lead Member of Technical Staff
rvaleti@salesforce.com
Operationalizing Big Data as a
Service
Forward-Looking Statements
Statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any
of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking
statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or
service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for
future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer
contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our
service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth,
interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible
mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our
employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com
products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of
salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most
recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information
section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not
be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available.
Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
What is a Service?
Big-data As a Service (BDaaS)
Operational Challenges
Operational Excellence
Agenda
Service
Security + HA + DR + Monitoring
What is a Service?
Big-data As a Service (BDaaS)
Operational Challenges
Operational Excellence
Agenda
(Security + Multi-tenancy + HA + DR + Monitoring) at scale
Big-data As a Service (BDaaS)
Security using Kerberos
● Third party authentication service
● Provides both authentication and authorization
● Authenticates User to Application and Application to
Application
● Each Cluster should be configured with multiple KDC
servers in Master/Slave Mode for HA
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
Multi-Tenancy
● Security
● Resource Management
● Quota Management
● Permissions and ACLs
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
Noisy neighbors: multi-tenant
Hadoop clusters
Multi-Tenancy
● Authentication
● Authorization
● Auditing
● Data Protection - Data at-rest and
in-motion Encryption
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
Noisy neighbors: multi-tenant
Hadoop clusters
Security
Multi-Tenancy
● Yarn Scheduler Framework
● Dynamic Partitioning
○ Regulate user access to its
allocation/pool of resources
○ Queue configurations per user
● Static Partitioning
○ cgroups to configure statically -
HBase 50% and Yarn 50%
● ACLs
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
Noisy neighbors: multi-tenant
Hadoop clusters
Resource Management
Multi-Tenancy
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
Noisy neighbors: multi-tenant
Hadoop clusters
Quota Management
Multi-Tenancy
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
Noisy neighbors: multi-tenant
Hadoop clusters
Permissions and ACLs
HA - High Availability
● HA for all Services in the stackNamenode, RM, JHS
Hive, Hue, HBase
Spark Master
● Fault Tolerance & Mean time to
Recover
● Multi-Rack architecture & Services are
Rack aware
● Continuous Synth tests
● Rolling restarts whenever possible
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
DR - Disaster Recovery
● Namenode Metadata Backups
● Namenode Snapshots
● Hive Metadata Backups
● HBase Backups
Configure Replication to a Buddy Cluster
Daily/Continuous Backups using
Snapshots/WAL
● Switch to DR site when ready
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
DR - Disaster Recovery
● All the last known check-pointed data is
available in the DR site
● For HBase, make sure HBase
Replication queue is empty - Everything
is replicated to DR
● Make sure Data checksums (fsck) &
Synth tests pass
● Enable traffic to DR site
Site Switching Checklist
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
Monitoring & Alerting
● Centralized Visualization & Alerting
● Monitor User Quotas
● Monitor Resource Utilizations by jobs - Memory/CPU
● Should be a mix of Logs & Metrics
● Should be extensible to on-board new added service monitoring
needs
● Ability to quickly incorporate new rules to alert on newly observed
issues
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
Monitoring & Alerting
Metrics include but not limited to
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale
What is a Service?
Big-data As a Service (BDaaS)
Operational Challenges
Operational Excellence
Agenda
Operational Challenges
● Zero down time
● Mean time to recover from failures
● Optimum utilization of resources
● Capacity Planning
● On-Demand capacity adds/removals
What is a Service?
Big-data As a Service (BDaaS)
Operational Challenges
Operational Excellence
Agenda
Operational Excellence - Shipping bits
● Maintain “Light” forks for the key services that you run
● Choose an appropriate packaging model - Bigtop
● Make sure your production services are as close possible as to
stable versions in open source
Operational Excellence - Shipping bits
● Continuous Integration & Deployment pipeline!
● Almost Zero DownTime* - Rack by Rack Rolling Upgrades
Block placement policy - All replicas on different racks** can mitigate the
risk of Service disruptions during Rack by Rack Rolling upgrades
● Auto-Restart bots#
* - except some planned major upgrades where downtime might be inevitable!
# - Caution! May cause more damage than healing if not configured properly
** Faster network links are usually preferred to make sure service SLAs are not breached due to
this special block placement policy. Extensive performance testing might be required.
Operational Excellence - Tuning
● Always keep an eye on the user resource requirements vs reality
Update User Quotas/resource configurations based on actual usage
● Automated Daily reports on important events/metrics
● Dynamic Thresholds for Alerting & continuous tuning to make the
alerts meaningful & non-noisy.
● Performance testing & configuration tuning of all services
● Choosing the right GC settings
thank y u