How Hue integrates Hadoop with Django
-
Upload
gethue -
Category
Technology
-
view
125 -
download
4
description
Transcript of How Hue integrates Hadoop with Django
Django+NoSQLHOW Hue Integrates with HadoopAbraham ElmahrekCloudera - March 5th, 2014
Monday, March 3, 14
What is Hue?
HUE 1
Desktop-like in a browser, did its job but pretty slow, memory leaks and not very IE friendly but definitely advanced for its time (2009-2010).
Monday, March 3, 14
HISTORY
HUE 2
The first flat structure port, with Twitter Bootstrap all over the place.
Monday, March 3, 14
HISTORY
HUE 2.5
New apps, improved the UX adding new nice functionalities like autocomplete and drag & drop.
Monday, March 3, 14
HISTORY
HUE 3 ALPHA
Proposed design, didn’t make it.
Monday, March 3, 14
HISTORY
HUE 3
Transition to the new UI, major improvements and new apps.
Monday, March 3, 14
HISTORY
HUE 3.5+
Monday, March 3, 14
APPS
PIGJO
B BROWSER
JOB DESIGNER
OOZIE
HIVE IMPA
LA
METASTO
RE BROWSERSEARCH
HBASE BROWSER
SQOOP
ZOOKEEPERUSER ADMIN
DB QUERY
SPARK
HOME ...
GUI DESIGN
FILE BROWSER
USER
USER WORKFL
OWS
USER
Monday, March 3, 14
YARN JobTracker Oozie
Pig
HDFS
HiveServer2
HiveMetastore
ClouderaImpala
Solr
HBase
Sqoop2
Zookeeper
LDAPSAML
Hue Plugins
APPS
Monday, March 3, 14
FAST PACE
LAST MONTH
91 issues created and 90 resolved.Core team + Community
Monday, March 3, 14
STACK
BACKEND
Python + Django (2.6+/1.4.5)
FRONTEND
jQueryBootstrap
Knockout.jsLove
Monday, March 3, 14
HADOOP INTERFACES
REST & THRIFT
Many Hadoop interfaces used
WebHDFSYARN API (RM, NM, MR...)HiveServer2ImpalaHBaseOozieSqoop2ZooKeeper...
CUSTOM CLIENTS
Provide custom clients for more explicit API definitions
Monday, March 3, 14
PROTOCOLS
REST
Use python-requests and a custom client to streamline RESTful interface calls.
http_client.HttpClient(url,
exc_class=WebHdfsException,
logger=LOG)
if security_enabled:
client.set_kerberos_auth()
return client
Thrift
Custom connection pooling and socket multiplexing to streamline thrift calls.
thrift_util.get_client(TCLIService.Client,
query_server['server_host'],
query_server['server_port'],
service_name=query_server['server_name'],
kerberos_principal=kerberos_principal_short_name,
use_sasl=use_sasl,
mechanism=mechanism,
username=user.username,
timeout_seconds=conf.SERVER_CONN_TIMEOUT.get(),
use_ssl=conf.SSL.ENABLED.get(),
ca_certs=conf.SSL.CACERTS.get(),
keyfile=conf.SSL.KEY.get(),
certfile=conf.SSL.CERT.get(),
validate=conf.SSL.VALIDATE.get())
Monday, March 3, 14
ACCESSIBILITY
Middleware
Make Hadoop interfaces accessible in request objects
class ClusterMiddleware(object):
def process_view(self, request, ...):
request.fs = cluster.get_hdfs(request.fs_ref)
if request.user.is_authenticated():
if request.fs is not None:
request.fs.setuser(request.user.username)
def download(request, path):
if not request.fs.exists(path):
raise Http404(_("File not found."))
if not request.fs.isfile(path):
raise PopupException(_("not a file."))
Monday, March 3, 14
HDFS
Goal
Easily browse, create, read, update, and delete files in HDFS
Monday, March 3, 14
HDFS - Communication
REST
The NameNode provides a RESTful server called WebHDFS
def download(request, path):
if not request.fs.exists(path):
raise Http404(_("File not found."))
if not request.fs.isfile(path):
raise PopupException(_("not a file."))
Request Accessible
Provide a middleware for populating a request member
Explicit Client
Provide an API that is explicit
http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN
...
class WebHdfs(Hdfs):
def create(self, path, ...):
...
def read(self, path, ...):
...
Monday, March 3, 14
HDFS - Cool Things
MIME Type Detection
Detect the various kinds of files being read: Avro, GZIP, etc.
Pagination
Nice pagination by block size when viewing a file (soon to be more like a PDF reader with content automatically being added)
Monday, March 3, 14
HBase
Goal
Make it easy to view and search HBase
Monday, March 3, 14
HBase - Technical Risk
2 Dimensions
Infinitely many columns and rows
Sparseness
Column names will often differ per row
Monday, March 3, 14
HBase - Communication
Thrift
Communicate with HBase using Thrift for better filtering
Explicit Client
Provide an API that is explicit
class HBaseApi(Hdfs):
def createTable(self, cluster, tableName, ...):
...
def getRows(self, cluster, tableName, columns, ...):
...
Monday, March 3, 14
HBase - Results
Improved View
Intelligent view that collapses null cells
MIME Type Detection
Able to view documents in HBase: PDF, images, etc
Better Search
Improved searchability of HBase via flexible search
Monday, March 3, 14
Hive
Goal
Make it easy to run queries in Hive
Monday, March 3, 14
Hive - Communication
Thrift
Communicate with HiveServer2 using Thrift
DBMS
Further the capacities of the DBMS in Hue
Explicit Client
Provide a higher level API that is explicit and easy to configure
class HiveServerClient:
HS2_MECHANISMS = {'KERBEROS': 'GSSAPI', 'NONE': 'PLAIN', 'NOSASL': 'NOSASL'}
def __init__(self, query_server, user, ...):
thrift_util.get_client(TCLIService.Client,
...
thrift_util.get_client(TCLIService.Client,
query_server['server_host'],
query_server['server_port'],
service_name=query_server['server_name'],
...)
class HiveServer2Dbms(object):
def get_databases(self):
return self.client.get_databases()
...
def select_star_from(self, database, table):
hql = "SELECT * FROM `%s.%s` %s" % (database, table.name, self._get_browse_limit_clause(table))
return self.execute_statement(hql)
...
Monday, March 3, 14
Hive - Results
One Page App
Intelligent view that lets users worry about their queries
Navigation
Able to navigate databases and tables easily
Secure
Achieved some level of security through SASL, Kerberos, and SSL
Monday, March 3, 14
DEMO TIME
Monday, March 3, 14
Missed something?
GET STARTED Take a closer look at REST and Thrift communication in Hue
The inner workings of the FilebrowserThe fundamentals of the HBase browserThe concepts behind the Beeswax app
Monday, March 3, 14
What else does Hue do with Django?
Extensible settings
Configuration of settings.py provided through the hue.ini
Testing
Mocked and functional tests via nose + django-nose
Authentication
LDAP, PAM, OAuth, etc. provided through authentication backends
Security
Configurable session timeouts, SAML authentication, etc.
Doc Model
Polymorphic documents via a base document model
Permissions
Per-app permissions configurable in the UserAdmin
Monday, March 3, 14
GET HUE
Try in advance the latest and greatest but you’ll have to configure everything on your own.
Get to play with Hue and various Hadoop components in 5 minutes. It’s a self contained CDH environment ready to use.
Newer version than HDP, close to the original 2.5 minus apps like HBase, Impala, Sqoop, Search.
The newest addition, ships Hue 3.0 through the GreenButton products.
Stable and highly tested releases perfectly integrated with the Hadoop ecosystem, automagically configured by Cloudera Manager.
In HDP there’s an old forked version of Hue 2.3.
CLOUDERA’S CDH TARBALL CLOUDERA’S DEMO VM
HORTONWORKS* MAPR* HP CLOUD*
* YOUR MILEAGE MAY VARY.
BIGTOP EMBEDDED/DEMO IN IND. COMPANIES
Monday, March 3, 14
LINKS
WEBSITE
http://gethue.com
BLOG
http://blog.gethue.com
@gethue
USER GROUP
hue-user@
GITHUB
https://github.com/cloudera/hue/
Monday, March 3, 14
THANKS.
gethue.com
QUESTIONS?
Monday, March 3, 14