Post on 15-Jul-2015
Introductions
www.techferry.com /TechFerry /@ techferry
Deepansh MalikCEO at TechFerry
@DeepanshMalik
https://in.linkedin.com/in/deepanshmalik
TechFerry: Analytics, IT Innovation, R&D CompanySpecialization in
o Growth Analyticso HealthCare Analyticso Massively Scalable Applications and Rich UI
Massively Scalable Applications
Benchmark: 1 Million TRX per second
1 Million Requests per second
1 Million Messages per second
1 Million DB Transactions per second
1 Million/sec = 1 Billion TRX in 17 minutes
= 86.4 Billion TRX a day
Scale out or Scale up?
Scale out -> Add more hardware.
1 CPU Core = 1000 requests/sec
To massively scale (1 Million request/second), we need 1000 cores. 50
machines 20 cores each.
Good idea or stupid idea? Costs??
Scale up?Can one machine scale to a million transactions per second?
The Answer is YES.
Our commodity hardware is very powerful.
What is the bottleneck then? What do we need to save tons of money being
wasted in scaling out?
Computing Spectrum
Symmetric Multi ProcessingA single problem or a single task (eg. a DB query), it
takes 2 milliseconds on a core.
Can I use two cores and complete this single task in 1
ms?
Distributed Computing
Distribute load on multiple machines.
Make sure there are no bottlenecks or single point of
failures.
Can we achieve End to End Distribution, from
messaging to processing to databases?
Concurrent Programming
One CPU core currently handles 1000 trx/sec.
Can one core handle 1000 trx in a millisecond
instead? That is 1M trx/sec.
Can we remove context switching overheads and
synchronous, I/O idling?
Parallel Programming
● Throw more CPU cores for different
tasks.
Distributed Computing
Distribute workload between two or more computing devices or machines
connected by some type of network.
● For example, clustered architecture with multiple machines
However, in real life web applications, we need to distribute workload on
● application servers,
● database servers,
● perform real-time computations or analytics.
Distributed Computing
Distributed Storage
Distributed Messaging
Distributed Analytics
(Real Time and Batch)
Traditional vs New
Spot the Bottleneck node / single point of failure.Traditional: Load Balancer (L), Master DB (M) | New: ??
Traditional New
Load balancingApp Servers
Master SlaveDB Architecture
Distributed Computing - Tools
➔ Distributed Messaging
◆ Apache Kafka, RabbitMQ, Apache ActiveMQ
◆ A detailed comparison from LinkedIn is available at
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
➔ Distributed Analytics
◆ Apache Storm (Real Time), Apache Spark (Batch)
➔ Distributed Storage
◆ Cassandra
Use Cases:
Highly Suitable for Real Time analytics of High Velocity Big Data
Machine to Machine (M2M) or Internet of Things (IoT)
M2M, IoT and real time analytics
https://www.linkedin.com/pulse/20141203105632-40354099-m2m-iot-and-real-time-analytics
Concurrent Programming
is a form of computing in which several computations are executing during
overlapping time periods –concurrently – instead of sequentially
software code that facilitates the performance of multiple computing tasks at
the same time
Architectural Concepts
Events, Threads or Actors?
Asynchronous Programming
Functional Programming
Concurrent Programming
Events vs Threads, ActorsNodeJS vs J2EE
Performance comparison of
Multithreaded synchronous
technology using Spring/Hibernate,
VSEvent based, single process, asynchronous
technology using NodeJS.
Independent Research Report from TechFerry Innovation Lab
http://www.techferry.com/eBooks/NodeJS-vs-J2EE-Stack.html
Asynchronous Programming
End to end asynchronous programming
Non blocking call-backs
not just at Application layer
but also at UI or Database layers.
Pick asynchronous programming at application,
database or UI layer based on your use-case.
Functional ProgrammingA programming paradigm, a style of building the
structure and elements of computer programs, that
treats computation as the evaluation of mathematical
functions and avoids changing-state and mutable data.
Routines can easily be moved to a different CPU core.
Scala/Akka Actors
Symmetric Multi Processing
Symmetric Multi Processing (SMP) is the processing of programs by multiple
processors that share a common operating system and memory.
The processors share memory and the I/O bus or data path.
A single copy of the operating system is in charge of all the processors.
Asymmetric vs Symmetric
Asymmetric MultiprocessingThe different CPU take on different job
Symmetric Multi Processing (SMP)
All CPU run in parallel, doing the same job
CPUs share the same memory
+1 408-337-6607
info@techferry.com
Contact Information
www.techferry.com
Thank You/techferry /@techferry