20111104 s4 overview

Apache S4: A Distributed Stream Computing Platform

Presented at Stanford Infolab – Nov 4, 2011

http://incubator.apache.org/projects/s4 (migrating from http://s4.io)

S4 Committers: {fpj, kishoreg, leoneu, mmorel, robbins}@apache.orgPresented by Leo Neumeyer (@leoneu)

About Me

Born in Buenos Aires, Argentina, studied EE.School/Work in Canada (Signal Processing, Speech Coding).SRI Int'l (Menlo Park) Speech Lab, DARPA benchmarks, lab founded speech recognition spin-off Nuance Comm Inc.Mindstech: Startup to teach spoken English in Asia using web audio/video (before 2-way media was widely available).Yahoo! Labs: Search advertising (optimization, auctions).Quantbench: mission is to create a marketplace for data scientists, data providers, and investment funds.

S4 Project History

Started as a research project at Yahoo! Labs in August 2008 out of the need to personalize search ads in real-time.Open sourced in September 2009.Moved to Apache Incubator in October 2011.

Motivation

given multiple event streamsextract information

using data driven modelsin real time

with low latencyat scale

Personalized Search Twitter TrendsOnline Parameter

Optimization

Predict Market PricesAutomatic Trading

Network IntrusionDetection

Spam Filtering

Sensor Networks

It's Fun!

S4 Architecture

S4 is a general-purpose, real-time, distributed, decentralized, robust, scalable, event driven, pluggable platform that allows programmers to easily implement applications for processing continuous unbounded streams of data.

Server AppAppApp AppAppPE Prototype

AppAppStream

AppAppPE InstanceAppAppNode

Apps encapsulate units of work. They can consume and produce event streams.

Unlimited number of nodes. Each node has one process.

There is one server process per node. The server loads/unloads apps.

An app is a graph composed of PE prototypes and streams that produce, consume, and transmit msgs.

PE instances are clones of the prototype. They are associated with a unique key and contain the state.

Latency vs. Accuracy

Zero Errors Real-Time

Latency ➔ Unconstrained ➔ Constrained

Why? ➔ Reproducible results ➔ Limited control over inbound data rate and computing complexity

Use ➔ Debug➔ Train Models

➔ Process unstructured data➔ Tolerance to small errors➔ Graceful recovery from

inbound data streams

Design

Actors programming model.Probabilistic thinking in both algorithms and systems.Run on commodity hardware.All in-memory, no disk bottlenecks.Pluggable (Protocols, applications, serialization, etc.)Object oriented design → POJOsStatic typing, no string literals, minimize type casting.Science friendly → constant change, ease of use.

Programming Model

Example: estimate click-through rate in a web application after applying a filter to remove bot traffic.

Coding an App

Research Areas: Systems

Checkpointing strategiesReplication strategiesDynamic load balancingAdaptive load managementQuery languages

Fault Tolerance

Problem Approaches S4

High Availability ➔ Warm/hot failover➔ Cold failover

➔ Warm failover➔ Standby nodes +

Apache Zookeeper

State Loss(Crashes, system updates)

➔ Lossy checkpointing➔ Lossless checkpoint.

➔ Lossy checkpointing

Low Latency ➔ Decouple stream processing from checkpointing

➔ Asynchronous writes➔ Uncoordinated

checkpointing

Approach: checkpoints are count or time based, pluggable backend to support any data store, lazy PE restore, tuning is application dependent.

Research by M. Morel, F. Junqueira, Yahoo! Research Europe, 2011.

Resilience in a Distributed Word Count Task

Research Areas: Algorithms

Self-adaptive models: adaptive language models using small amounts of data.Personalization: learn from user feedback (clicks, location, behavior) to deliver relevant information in RT.Trend detection: find personal Twitter trends relevant to you.Intrusion detection: summarize high level state of the network and detect unusual patterns.Sensor networks: large amounts of audio/video and other sources require processing, recognition, detection, and tracking. Detect events across sensors.

Personalized Search Ads

Goal is to maximize:RevenueClick yieldUser experience

By controlling:RankingPricingFilteringPlacement

S. Schroedl, A. Kesari, and L. Neumeyer, “Personalized ad placement in web search,” in ADKDD ’10: Proceedings of the 4th Annual International Workshop on Data Mining and Audience Intelligence for Online Advertising, 2010.

Model ad click intent using recent user activity.More likely to click → show more North ads.

Example 1First query is digital slr cameraNext query is canon slrMore likely than average to click another ad

Example 2Repeated query without previous clicksLess likely to click another ad

Modeling user session

Typical features:Number of searches/clicks by user past 24 hrsUser COPC: Ratio of observed clicks to predicted clicksIdentical query searched before / clicked beforeTime (seconds) since last search/clickSimilarity measures: current vs. previous queries

Modeling technique: stochastic gradient-descent boosted trees (GDBT)

Target

P[CLICK|ad,query,user]

Approximation

P[CLICK|ad,query]*ucp[user,session]

Non-personalizedlong-term model

computed using Hadoop

User Click Propensity (UCP)for user session

computed using S4

Results:

We can reduce the average number of ads (ad footprint) by 7% without decreasing click yield and revenue.

- OR -

For a given ad footprint we can increase click yield by ~2%.

Thank you!

Join the Apache S4 project:

s4-user-subscribe@incubator.apache.org

s4-dev-subscribe@incubator.apache.org

20111104 s4 overview

Technology

Transcript of 20111104 s4 overview

Company Overview - S4 Inc. · management team. S4’s benefits and employee support programs are among the best in the industry. As such we have been successful at transitioning 100%

20111104 URISIA

The new classic. 2007/2008 GB · wilk s4 400 mdwilk-s4 **** wilk s4 450 eu wilk s4 450 htd wilk s4 470 dq wilk s4 490 htd wilk s4 490 ue wilk s4 500 k wilk s4 520 htd 400 md 450 eu

S4 Interior

20111104 Distraccions Web Jzq 62fb66d0

Full page photo - civil808.com · V , I A Email : info@civi1808.com s-4-5 S4-6 S4-7 S4-8 S4-9 FRP , ABAQUS AFRP ABAQUSI— Sl-3 S4-1 S4-2 S4-3 ETABS 2013-2015 CSI ABAQUS 6.14-4

FUJITSU Storage ETERNUS DX100 S4/DX200 S4 ... Storage ETERNUS DX100 S4/DX200 S4, ETERNUS DX100 S3/DX200 S3 Hybrid Storage Systems Design Guide (Basic) P : *3: 3 ´ ¡ ¡ ´ ´ ´ ´

150 g/m - Dab Pumps · 2020. 2. 12. · Maximum amount of sand: 150 g/m3 Delivery port diameter: S4-1, S4-2, S4-3, S4-4, S4-6: 1” 1/4 S4-8, S4-12, S4-16: 2” Impellers material:

REQUEST FOR PROPOSALS - Medicaid...REQUEST FOR PROPOSALS Medicaid Provider Enrollment Credentialing Services RFP #20111104 Contact: Melanie Wakeland Procurement Officer . Melanie.Wakeland@medicaid.ms.gov

Supplementary Information Timescales of transformational ......• Table S3: Overview of the nine target crops and the consulted literature and experts. • Table S4: Overview of the

Transcription s4

S4 History

SAP S4/HANA on AWS overview

S4 EXPLORER S4 PIONEER - VTP UP · PDF fileControl del sistema ... Componentes espectrométricos ... S4 EXPLORER / S4 PIONEER X-ray Spectrometer Pre-Installation Guide

Strategies for Induced Nucleation - Academia · Strategies for Induced Nucleation ... SHELF 1 (S1) S4, Bl S4, FL S4, CENTER S4, BR S4, FR S8, FL S8, FR 15 m2 shelf area Cluster of

Stoeger AIRGUNS rifles are...XM1 S4 SUPPRESSOR XM1 S4 SUPPRESSOR - COMBO S4 SUPPRESSOR XM1 S4 SUPPRESSOR All innovative XM1 features plus the new generation of integral silencer S4

S4 GEOMATICS

Sosters S4

PLUS+1 GUIDE Software PLUS+1 Compliant S4 …PLUS+1 Compliant S4 PVEO PVG Function Block User Manual S4 PVEO Function Block Overview This function block controls the output of a Sauer-Danfoss

Traktor Kontrol S4 Manual English - ferrispark.com Kontrol S4 Manual... · Table of Contents 1 Welcome to the World of TRAKTOR KONTROL S4!.....11 1.1 What Is TRAKTOR KONTROL S4 ...