Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup

50
INTEGRATE HUE WITH YOUR HADOOP CLUSTER Romain Rigaux Y! HUG Apr 16, 2014

Transcript of Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup

INTEGRATE HUE WITH YOUR HADOOP CLUSTERRomain RigauxY! HUG Apr 16, 2014

WHATIS HUE?

WEB INTERFACE FOR MAKING HADOOP EASIER TO USE Suite of apps for each Hadoop component, like Hive, Pig, Impala, Oozie, Solr, Sqoop2, HBase...

VIEW FROM30K FEET

Hadoop Web Server You and eventhat friend

that uses IE9 ;)

YARN JobTracker Oozie

Pig

HDFS

HiveServer2

Hive Metastore

Cloudera Impala

Solr

HBase

Sqoop2

Zookeeper

LDAP SAML

Hue Plugins

ECOSYSTEMAND APPS

TARGETOF HUE

GETTING STARTED WITH HADOOP BEING PRODUCTIVE EXPLORING DIFFERENT ANGLES OF THE PLATFORM !

LET ANY USER FOCUS ON BIG DATA PROCESSINGBEING COMPATIBLE WITH ANY HADOOP VERSION (0.20/1.2.0/2.3.0)

OPEN SOURCE

~3000 COMMITS 33 CONTRIBUTORS648 STARS212 FORKS !

github.com/cloudera/hue

THE CORETEAM PLAYERS

team.gethue.com

ABRAHAM ELMAHREK

ROMAIN RIGAUX

ENRICO BERTI

CHANG BEER

TALKS

Meetups and events in NYC, Paris, LA, Tokyo, SF, Stockholm, Vienna, San Jose, Singapore…Coming up in London, West coast

AROUNDTHE WORLD

RETREATS

Nov 13 Koh Chang, Thailand May 14 Curaçao, Netherlands Antilles

FAST PACE

LAST 30 DAYS

41 issues created and 38 resolved. Core team + Community

HISTORY

HUE 1

Desktop-like in a browser, did its job but pretty slow, memory leaks and not very IE friendly but definitely advanced for its time (2009-2010).

HISTORY

HUE 2

The first flat structure port, with Twitter Bootstrap all over the place.

HISTORY

HUE 2.5

New apps, improved the UX adding new nice functionalities like autocomplete and drag & drop.

HISTORY

HUE 3 ALPHA

Proposed design, didn’t make it.

HISTORY

HUE 3.5+

Where we are now, new UI, several new apps, the most user friendly features to date.

WHICH VERSION TO USE?

6 months 1k commits later1-2 years old

HUE 2.X HUE 3.X HUE 3.5 + 1/2 3.6

WHICH DISTRIBUTION?

Advanced preview The most stable and cross component checked

Very latest

GITHUB CDH / CM TARBALL

HACKER ADVANCED USER NORMAL USER

WHERE TO PUT HUE? IN ONE MACHINE

WHERE TO PUT HUE? INSIDE THE CLUSTER

WHERE TO PUT HUE? OUTSIDE THE CLUSTER

WHAT DO YOU NEED?

Python 2.4 2.6 That’s it if using a packaged version. If building from the source, here are the extra packages

SERVER CLIENT

Web Browser IE 9+, FF 10+, Chrome, Safari

HOW DOES THE HUE SERVICE LOOK LIKE?

Process serving pages and also static content

1 SERVER 1 DB

For cookies, saved queries, workflows, …

HOW TO CONFIGURE HUE

HUE.INI

Similar to core-site.xml but with .INI syntax !

Where?

/etc/hue/conf/hue.ini

or

$HUE_HOME/desktop/conf/

pseudo-distributed.ini

[desktop] [[database]] # Database engine is typically one of: # postgresql_psycopg2, mysql, or sqlite3 engine=sqlite3 ## host= ## port= ## user= ## password= name=desktop/desktop.db

AUTHENTICATE / LOGIN

[desktop] [[auth]] # - django.contrib.auth.backends.ModelBackend (entirely Django backend) # - desktop.auth.backend.AllowAllBackend (allows everyone) # - desktop.auth.backend.AllowFirstUserDjangoBackend # - desktop.auth.backend.LdapBackend # - desktop.auth.backend.OAuthBackend # ... ## backend=desktop.auth.backend.AllowFirstUserDjangoBackend

USERS

Can give and revoke permissions to single users or group of users

ADMIN USER

Regular user + permissions

DB BACKEND

LDAP BACKEND

Integrate your employees: LDAP How to guide

LIST OF GROUPS AND PERMISSIONS

A permission can: - allow access to one app

(e.g. Hive Editor) - modify data from the app

(e.g drop Hive Tables or edit cells in HBase Browser)

CONFIGURE APPSAND PERMISSIONS

A list of permissions

PERMISSIONS IN ACTION

User ‘test’ belonging to the group ‘hiveonly’ that has just the ‘hive’ permissions

CONFIGURE APPSAND PERMISSIONS

HOW HUE INTERACTSWITH HADOOP

YARN

JobTracker

Oozie

Hue Plugins

LDAP SAML

Pig

HDFS HiveServer2

Hive Metastore

Cloudera Impala

Solr

HBase

Sqoop2

Zookeeper

RCP CALLS TO ALLTHE HADOOP COMPONENTS

HDFS EXAMPLE

WebHDFS REST

DN

DN

DN

DN

NN

http://localhost:50070/webhdfs/v1/<PATH>?op=LISTSTATUS

HOW

Host/port of all services like Oozie, Yarn, HDFS, HBase… APIs are specified in hue.ini on sections, e.g. [hbase] by major service, Hue core [desktop] or Hue lib [liboozie]

[hbase] # Comma-separated list of HBase Thrift servers for # clusters in the format of '(name|host:port)'. hbase_clusters=(Cluster|localhost:9090) !

[liboozie] # The URL where the Oozie service runs on. # oozie_url=http://hue.ent.cloudera.com:11000/oozie

RCP CALLS TO ALLTHE HADOOP COMPONENTS

Full list

KERBEROS

1 Hue ticket/ principal - no user ticket !

Hue uses its ticket for authenticating to every other service (HDFS, Oozie, …)

read more on the Hue Security Guide

HUE KERBEROS TICKET

kadmin: addprinc -randkey hue/[email protected]

Add Hue user principal to Kerberos

$ kinit -k -t /etc/hue/hue.keytab hue/[email protected]

Test

Ticket should be renewable (krb5.conf and kdc.conf)

[desktop] [[kerberos]] # Path to Hue's Kerberos keytab file hue_keytab=/etc/hue/hue.keytab # Kerberos principal name for Hue hue_principal=hue/FQDN@REALM # add kinit path for non root users kinit_path=/usr/kerberos/bin/kinit

hue.ini

HOW

Hue is a “super proxy” Client could be on a Windows machine, phone… and interact with all the Hadoop services

http://localhost:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hue&doas=bob

IMPERSONATION

<!-- Hue WebHDFS proxy user setting --><property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property>

Call for getting the information about an HDFS file

WebHDFS, add to core-site.xml

HTTPS SSL DB SSL WITH HIVESERVER2

READ MORE … AUDITING

OTHER SECURITYFEATURES

2 Hue instances HA proxy Multi DB Performances: like a website, mostly RPC calls

HIGH AVAILABILITY

HOW

DEMO TIME

CONFIGURATIONS ARE HARD…

…GIVE CLOUDERA MANAGER A TRY!

vimeo.com/91805055

MISSEDSOMETHING?

learn.gethue.com

LINKS

TWITTER

@gethue

USER GROUP

hue-user@

WEBSITE

http://gethue.com

LEARN

http://learn.gethue.com

GET HUE

Try in advance the latest and greatest but you’ll have to configure everything on your own.

Get to play with Hue and various Hadoop components in 5 minutes. It’s a self contained CDH environment ready to use.

Newer version than HDP, close to the original 2.5 minus apps like HBase, Impala, Sqoop, Search.

The newest addition, ships Hue 3.0 through the GreenButton products.

Stable and highly tested releases perfectly integrated with the Hadoop ecosystem, automagically configured by Cloudera Manager.

In HDP there’s an old forked version of Hue 2.3.

CLOUDERA’S CDH TARBALL CLOUDERA’S DEMO VM

HORTONWORKS* MAPR* HP CLOUD*

* YOUR MILEAGE MAY VARY.

BIGTOP EMBEDDED/DEMO IN IND. COMPANIES

WHAT ARE YOUR USE CASES?

WHICH COMPONENTS DO YOU USE?

WHAT WOULD YOU LIKE TO SEE IN HUE?

INTERESTED IN CONTRIBUTING? WANNA SAY HELLO? DO YOU WANT A TAILOR

MADE TEAM RETREAT?

QUESTIONS?

TEAM@ GETHUE.COM

THANK YOU!

gethue.com

APPENDIX

HOW

Add Hue as WebHDFS proxy user setting like 3 slides ago Add the property on the right in hdfs-site.xml to enable WebHDFS in the NameNode and DataNodes

<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>

HDFS FILE BROWSER

[hadoop] [[hdfs_clusters]] # HA support by using HttpFs ! [[[default]]] # Enter the filesystem uri ##fs_defaultfs=hdfs://localhost:8020 ! # Use WebHdfs/HttpFs as the communication mechanism. ##webhdfs_url=http://localhost:50070/webhdfs/v1

hdfs-site.xml

hue.ini

HOW

Example of config for having Hue interact with Yarn

[hadoop] [[yarn_clusters]] ! [[[default]]] # Enter the host on which you are running the ResourceManager resourcemanager_host=localhost ! # The port where the ResourceManager IPC listens on ## resourcemanager_port=8032 ! # Whether to submit jobs to this cluster submit_to=True ! # Change this if your YARN cluster is Kerberos-secured ## security_enabled=false ! # URL of the ResourceManager API ## resourcemanager_api_url=http://localhost:8088 ! # URL of the ProxyServer API ## proxy_api_url=http://localhost:8088 ! # URL of the HistoryServer API # history_server_api_url=http://localhost:19888 ! [[[ha]]] # Enter the host on which you are running the failover Resource Manager resourcemanager_api_url=http://localhost:8088 ## logical_name= submit_to=True

YARN / MR2

HOW

Based on HiveServer2 interface !

Note for Hive: <property> <name>hive.server2.enable.doAs</name> <value>true</value> </property> !

Video demoSetup tutorial

[beeswax] # Host where Hive server Thrift daemon is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). ## hive_server_host=localhost ## hive_server_port=10000 ! # Hive configuration directory, where hive-site.xml is located ## hive_conf_dir=/etc/hive/conf

HIVE (IMPALA / SHARK)

HOW

Make sure share lib is installed !

Alternative Dashboard and Editors

[liboozie] #oozie_url=http://localhost.com:11000/oozie

OOZIE

HOW

Comes with Oozie, no PigServer yet Oozie sharelib Oozie credentials for security

PIG