Seniority Classification of Job Titles
Transcript of Seniority Classification of Job Titles
INTRODUCTION/BACKGROUND/MOTIVATION RESEARCH METHODOLOGY
CONCLUSIONS
REFERENCE/ACKNOWLEDGEMENTS
Seniority Classification of Job Titles
The Data Mine Corporate Partners Symposium 2021
• Before training dataset into models, our team did some preprocessing to the raw
data:
• We use one hot encoding for classification labels for calculation efficiency
because it is multi-label classification
• Instead of using classification labels as a string, we are representing them as
vectors.
• We used two types of encoder which changes raw text data into a vector,
which is called encoding process
• Universal Sentence Encoder: From Google, we use it to encode text into
high dimensional vectors that can be used for text classification
• SpaCy: Performs tokenizing and encoding with pretrained word vectors
• Below is an example of how texts can be encoded into vectors
Thanks for our mentor Reuben Wilson's
help
CATHERINE MAO, EVAN SHAW, ADRIENNE ZHANG, JACOB ZHANG
• These are the six categories that our team use to classify job titles, a description for each category and some common examples that will be classify as that category
• Eventually we went to the evaluation step
• For our Deep Neutral Network Model, we've got
average K-fold accuracy (accuracy and validation
of our model) score as 0.895 with standard error
0.003
• To the left are two graphs for model accuracy and
loss score during training
• We've done several tuning so that model
accuracy can be higher and loss score gets
lower
• Considering we're classifying among 6
labels, the accuracy score we got it pretty high
(we think)
• From azure machine learning, we found out that the best model is stochastic gradient descent with MaxAbsScaler.• MaxAbsScaler means that the function scales each feature by its maximum absolute value.
TMap• A small company who uses technology and targeted marketing to identify
and engage qualified employment at scale• The motivation for this project is to classify job titles based on seniority
to make job titles more accessible• Match Candidates with Roles or Job Opportunities
• This is a result confusion matrix. We can find out that senior individual contributors(5th label) are the hardest category to classify.
• It’s because our dataset is not balanced, so that many instances are misclassified as individual contributors.
• Keep improving the Ontology to recognize "skill" related terms
• Keep cleaning up and normalizing the dataset
FUTURE GOALS
• This is an example for our dataset.• We started with 5000 instances at first
and expanded it to 11K instances so far.
In the left there is a partial 2-d array. A single vector is a label for an instance. That is, for the first label, it should be the last one from our 6 categories: which is student.
After preprocessing, we started to train models:
• We implement a simple deep neural network with TensorFlow
• To the right is a summary of the chosen model after few trials of parameter tuning
•It has 2 layers with dropout to prevent overfitting.
•At the last layer, we use softmax activation since it's a multi-label classification.
• We used Microsoft Azure machine learning platform
• To check which model is the best for our given input data
• This platform automatically samples partial data and work on training/testing for
each model
This is the softmax activation function we refer to
a summary of the chosen model after few trials of parameter tuning