Revenue Earned From Students in USA

31
Jongwook Woo HiPIC CSULA Revenue & employment Analysis of International Students in USA A CIS 528 Project by: Priyanka Kale, Apekshit Bhingardive, Aditya Verma, Prof. Jongwook Woo

Transcript of Revenue Earned From Students in USA

Page 1: Revenue Earned From Students in USA

Jongwook Woo

HiPIC

CSULA

Revenue & employment Analysis of International Students in USA

A CIS 528 Project by:

Priyanka Kale, Apekshit Bhingardive, Aditya Verma, Prof. Jongwook Woo

Page 2: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Content

Introduction

System Development Cycle

Requirement Analysis

Design

Implementation

Results/Visualization

References

Page 3: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Purpose

To develop a system which will

assist us to determine the

revenue generated by

international students.

Examining the relationship

between new international

enrollments and institutional

income at public colleges,

universities and professional

organizations in the US.

Page 4: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Adherence to SDLC

Page 5: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

WHY ?

To understand the effects of increased international student

enrollment on net revenue generation in US

find out the income from Universities

predict the impact of international students on revenue generation

predict employment opportunities in the US

Page 6: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

How ?

• Basic formula for calculating economic benefit

Page 7: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

How ?

Estimate of Economic Benefit, which is the overall imported

dollars from international students without any multiplier effect

Determine the appropriate direct import dollars from

international students studying at U.S. institutions of higher

education

Page 8: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Continued..

The analysis is specific to each institution’s student expenses

and the type of student (i.e. undergraduate, graduate, non-

degree) reported by each institution.

The analysis is broken down by the tuition and fees at specific

institutions and a derived living expense based upon the

reported institutional living expenses plus estimated

incidentals

Page 9: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Implementation

Analysis on huge data is required which will be done using the

Hadoop File system (HDFS)

Hadoop environment using Horton Sandbox on Azure

Using Python and HIVE [Pyhive] – iPython Notebook

HUE

Google Fusion tables

WEKA Framework

GitLab and GitHub

Page 10: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Horton Sandbox configuration

Number of nodes: 3 Size : Basic A4 with 8 cores 14 Gb memory

Page 11: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Loading data into HDFS:

Page 12: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Creating Tables from command Line

Page 13: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Creating tables in HUE from existing data

Page 14: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Continued..

Page 15: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Connecting HIVE through Python

• Using Ipython notebook for writing the python code

• Embedding HiveQL inside python code.

Page 16: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Executing HIVE commands through script:

Example: Input.sql

Page 17: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Executing the hive script from python code:

Page 18: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Attempt using HiveQL in spark (Future Prospect)

Executing Hive queries using spark:

Page 19: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Fetching output in a CSV file for further Visualization

The generated CSV can be directly used for visualization purpose

Page 20: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Visualizing data with Graphs

$0.00

$5.00

$10.00

$15.00

$20.00

$25.00

Billi

on

s

TOTAL EARNING FROM FEES

Page 21: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Visualizing data with Graphs

$0.00

$2.00

$4.00

$6.00

$8.00

$10.00

$12.00

$14.00

Milli

on

s

TOTAL EARNING FROM OTHER EXPENSES

Page 22: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Visualizing data with Graphs

$0.00 $200.00

TOTAL EARNING FROM FEES

TOTAL EARNING FROM OTHER EXPENSES

$200.25

$0.14

Billions

TOTAL

Page 23: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Major earning states

California, 9.55%

New York, 10.84%

Pennsylvania, 7.36%

PERCENTAGE OF TOTAL INCOME

Page 24: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Supervised Learning using Classification:

WEKA framework has been used to classify the states depending on there

total value of earnings.

UserClassifier Algorithm provided by WEKA tool has been used to

generate below graph of classification.

final outcome of the hive script executed in python has been processed

using above mentioned algorithm.

Page 25: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Classification

The class color differentiate the states into categories : For instance New York lies in orange color zone with being the among the top revenue generating state

Page 26: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Visualizing data in Google Fusion Tables

Page 27: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Employment Analysis – How ?

• Finding data where international student work after their graduation

• Based on the number students employed in current and past years

• Number of employers hiring international students in every filed of the

grad study [Job positions]

Page 28: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Files on GitHub

Page 29: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

COMING NEXT…….

Predict future incomes and revenues pattern and therefore the

different type of employment opportunities in U.S.A

Page 30: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

References :

• https://nces.ed.gov/ipeds/datacenter/

• https://github.com/priya708/Project-528

• https://gitlab.com/Addylad/Project528BigData/tree/47b3e6469bff4e9b7cbe0

d743cb8ad9520dbb786/DataSource

• https://cwiki.apache.org/confluence/display/Hive/Tutorial

• https://hortonworks.com/tutorials

• http://www.nafsa.org/

Page 31: Revenue Earned From Students in USA

High Performance Information Computing CenterJongwook Woo

CSULA

Thank You !