Revenue Earned From Students in USA

Post on 21-Jan-2018

82 views 0 download

Transcript of Revenue Earned From Students in USA

Jongwook Woo

HiPIC

CSULA

Revenue & employment Analysis of International Students in USA

A CIS 528 Project by:

Priyanka Kale, Apekshit Bhingardive, Aditya Verma, Prof. Jongwook Woo

High Performance Information Computing CenterJongwook Woo

CSULA

Content

Introduction

System Development Cycle

Requirement Analysis

Design

Implementation

Results/Visualization

References

High Performance Information Computing CenterJongwook Woo

CSULA

Purpose

To develop a system which will

assist us to determine the

revenue generated by

international students.

Examining the relationship

between new international

enrollments and institutional

income at public colleges,

universities and professional

organizations in the US.

High Performance Information Computing CenterJongwook Woo

CSULA

Adherence to SDLC

High Performance Information Computing CenterJongwook Woo

CSULA

WHY ?

To understand the effects of increased international student

enrollment on net revenue generation in US

find out the income from Universities

predict the impact of international students on revenue generation

predict employment opportunities in the US

High Performance Information Computing CenterJongwook Woo

CSULA

How ?

• Basic formula for calculating economic benefit

High Performance Information Computing CenterJongwook Woo

CSULA

How ?

Estimate of Economic Benefit, which is the overall imported

dollars from international students without any multiplier effect

Determine the appropriate direct import dollars from

international students studying at U.S. institutions of higher

education

High Performance Information Computing CenterJongwook Woo

CSULA

Continued..

The analysis is specific to each institution’s student expenses

and the type of student (i.e. undergraduate, graduate, non-

degree) reported by each institution.

The analysis is broken down by the tuition and fees at specific

institutions and a derived living expense based upon the

reported institutional living expenses plus estimated

incidentals

High Performance Information Computing CenterJongwook Woo

CSULA

Implementation

Analysis on huge data is required which will be done using the

Hadoop File system (HDFS)

Hadoop environment using Horton Sandbox on Azure

Using Python and HIVE [Pyhive] – iPython Notebook

HUE

Google Fusion tables

WEKA Framework

GitLab and GitHub

High Performance Information Computing CenterJongwook Woo

CSULA

Horton Sandbox configuration

Number of nodes: 3 Size : Basic A4 with 8 cores 14 Gb memory

High Performance Information Computing CenterJongwook Woo

CSULA

Loading data into HDFS:

High Performance Information Computing CenterJongwook Woo

CSULA

Creating Tables from command Line

High Performance Information Computing CenterJongwook Woo

CSULA

Creating tables in HUE from existing data

High Performance Information Computing CenterJongwook Woo

CSULA

Continued..

High Performance Information Computing CenterJongwook Woo

CSULA

Connecting HIVE through Python

• Using Ipython notebook for writing the python code

• Embedding HiveQL inside python code.

High Performance Information Computing CenterJongwook Woo

CSULA

Executing HIVE commands through script:

Example: Input.sql

High Performance Information Computing CenterJongwook Woo

CSULA

Executing the hive script from python code:

High Performance Information Computing CenterJongwook Woo

CSULA

Attempt using HiveQL in spark (Future Prospect)

Executing Hive queries using spark:

High Performance Information Computing CenterJongwook Woo

CSULA

Fetching output in a CSV file for further Visualization

The generated CSV can be directly used for visualization purpose

High Performance Information Computing CenterJongwook Woo

CSULA

Visualizing data with Graphs

$0.00

$5.00

$10.00

$15.00

$20.00

$25.00

Billi

on

s

TOTAL EARNING FROM FEES

High Performance Information Computing CenterJongwook Woo

CSULA

Visualizing data with Graphs

$0.00

$2.00

$4.00

$6.00

$8.00

$10.00

$12.00

$14.00

Milli

on

s

TOTAL EARNING FROM OTHER EXPENSES

High Performance Information Computing CenterJongwook Woo

CSULA

Visualizing data with Graphs

$0.00 $200.00

TOTAL EARNING FROM FEES

TOTAL EARNING FROM OTHER EXPENSES

$200.25

$0.14

Billions

TOTAL

High Performance Information Computing CenterJongwook Woo

CSULA

Major earning states

California, 9.55%

New York, 10.84%

Pennsylvania, 7.36%

PERCENTAGE OF TOTAL INCOME

High Performance Information Computing CenterJongwook Woo

CSULA

Supervised Learning using Classification:

WEKA framework has been used to classify the states depending on there

total value of earnings.

UserClassifier Algorithm provided by WEKA tool has been used to

generate below graph of classification.

final outcome of the hive script executed in python has been processed

using above mentioned algorithm.

High Performance Information Computing CenterJongwook Woo

CSULA

Classification

The class color differentiate the states into categories : For instance New York lies in orange color zone with being the among the top revenue generating state

High Performance Information Computing CenterJongwook Woo

CSULA

Visualizing data in Google Fusion Tables

High Performance Information Computing CenterJongwook Woo

CSULA

Employment Analysis – How ?

• Finding data where international student work after their graduation

• Based on the number students employed in current and past years

• Number of employers hiring international students in every filed of the

grad study [Job positions]

High Performance Information Computing CenterJongwook Woo

CSULA

Files on GitHub

High Performance Information Computing CenterJongwook Woo

CSULA

COMING NEXT…….

Predict future incomes and revenues pattern and therefore the

different type of employment opportunities in U.S.A

High Performance Information Computing CenterJongwook Woo

CSULA

References :

• https://nces.ed.gov/ipeds/datacenter/

• https://github.com/priya708/Project-528

• https://gitlab.com/Addylad/Project528BigData/tree/47b3e6469bff4e9b7cbe0

d743cb8ad9520dbb786/DataSource

• https://cwiki.apache.org/confluence/display/Hive/Tutorial

• https://hortonworks.com/tutorials

• http://www.nafsa.org/

High Performance Information Computing CenterJongwook Woo

CSULA

Thank You !