DISTRIBUTED SYSTEMSApache FlumeMuhammad Afaq
OVERVIEW What is Flume?
Flume Agent
Flume Components
Conf File
Example Configuration
Example: User Trends Retrieval with Flume using Twitter API
WHAT IS FLUME?
Reliable service for collection and aggregation of large amount of data. Especially streaming data, for example Log data.
Flume is one of the projects which comes into Hadoop framework.
For log analysis based on Hadoop, Flume can be used to get the log information, such as logs from websites or system logs.
FLUME AGENT Flume architecture or flume agent has source
(anything like web server, application server or website etc.)
From source, data moves to channel where our log data will be stored.
From channel, the log data will be moved to sink (storage, for example Hadoop, or local file system etc.)
FLUME COMPONENTS
Source An active component which receives the event
and places it in the channel.
Channel A passive component which buffers the event
and sends it to the sink,
Sink Writes the data into next hop for final
destination.
CONF FILE
Basic Rules
Every agent must have at least one channel.
Every source must have at least one channel.
Every sink must have exactly one channel.
Every component must have a type.
EXAMPLE CONFIGURATION# example.conf: A single-node Flume configuration
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1
# Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API In this example, we will retrieve users’ trends
as logs from a personal Twitter account using an API. These trends can be further analyzed as desired.
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)
Download Flume
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Check whether the flume tar is present or not
Create flume-ng directory
Copy the flume tar to flume-ng directory
Check whether flume tar is copied or not
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)
Change directory to flume-ng
Extract file from flume tar
Check whether flume files are extracted or not
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Move flume-sources-1.0-SNAPSHOT.jar file to
‘lib’ directory of apache-flume and check its presence there
Create flume.env.sh file in the ‘conf’ directory of apache flume
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)
Open flume-env.sh
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Edit flume-env.sh according to the below
snapshot
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)
Open a Browser and go the below URL:URL: https://apps.twitter.com
Log in to Twitter
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Create a new application
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)
Twitter Apps
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) The highlighted part will be used in
flume.conf
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)
Edit flume.conf
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Change the directory to the ‘bin’ folder of
apache flume
Start fetching the data from Twitter
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)
Data being fetched from Twitter
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Browse the filesystem
Click on user
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)
Click on flume
Click on tweets
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Click on FlumeData file
USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)
This is the data that has been downloaded from Twitter
Top Related