Part 2 | How to better understand job descriptions and profiles through Deep Learning
And we are back! Continuing from last week let’s look at how the Deep Learning model helps us get more accurate results and improves ‘Search and Match.’
This model allows us to pick a section of a resume or a job description (JD) and get most related titles mapping to that section. We randomly took some sample results from the final test data to validate the performance of our model on unseen data.
Some of the results even surprised us!
For example: we passed a single line JD to our prediction algorithm. It was not only able to extract the correct title from the JD, but the top other predictions also look very relevant to the title.
Our algorithm was not only trained on IT/software related titles, but also on more traditional occupations. Here is an example, where our algorithm failed to extract the actual title, but the relevant titles it extracted are quite close. Even when the JD is noisy, this model was able to recognise the most relevant title belonging to this project description.
How Deep Learning can help improve our search & match?
Building a model also allows us to represent text efficiently with vectors. To understand how our model was able to predict titles accurately, we took a bunch of JDs and extracted the final vector before the prediction layer (something akin to extracting last layer features from Vgg-16 CNN models, if you are familiar with work in ConvNets).
For example: here we show the embeddings for three types of JDs namely Software Developer, Project Manager and a Java Developer. Intuitively, before building any model, we sort of know that Software Developer and Java Developer are closely related roles and there could be a lot of overlap. A project manager is more towards management and execution of projects.
Can we automatically encode this world-view information by our model? We can and it is pretty amazing!
This figure shows a t-sne decomposition of 200 dimensional vector that we get from our model. As you can see, our model was able to separate most of the Project Manager roles from Software Developer/ Java developer roles.
It is indeed fascinating to see that unstructured text can be represented in this low dimensional space where the classes become linearly separable. Like we mentioned before, there is no magic involved. The beauty lies in learning these low dimensional embeddings efficiently. With this representation, we can search for ideal candidates in embedding space. The previous pattern matching algorithms, manual skill extraction could be replaced by more proper representations. All this research work helps us in finding the right candidate for a given job.
We hope you found this tech blog series helpful. We would be happy to answer your queries. Please do feel free to share your thoughts, feedback and questions in the comments section below.
Sandeep Tammu is a Data Scientist with EdGE Networks. With a desire to probe and experiment the real world, Sandeep pursued his education in electronics and physics. Working as a Data Scientist, he strives to bring the same level of enthusiasm into exploring and gaining valuable insights from various kinds of data using machine learning and predictive modelling techniques.