Interview with Rahul Kulhari, Lead Data Scientist, EdGE Networks Pvt. Ltd.
Looking back at his journey, Rahul Kulhari can clearly see it all started with his love for math. Right from his school days, Rahul enjoyed math and problem-solving. It was in college that he was introduced to the Theory of Computation which only deepened his interest and led him to Natural Language Processing (NLP), algorithms and Data Science.
As he was exploring and learning in this field, Rahul Kulhari got the opportunity to join EdGE Networks where he would get to work on real life use cases and data. There was no looking back…
Before working, Rahul completed his B.Tech in Computer Science from BK Birla Institute of Engineering and Technology, followed by a Postgraduate Diploma in Advanced Computing from CDAC, Hyderabad.
How did you get into Data Analytics? What interested you in learning Data Analytics?
Rahul Kulhari: What intrigues me the most is finding hidden patterns in structured as well as unstructured data. This is the foundation on which one builds Machine Learning algorithms.
At EdGE our Artificial Intelligence (AI) and Data Analytics solutions are powered by this foundation – and in turn, it helps us go deeper into Data Analytics (hence more inferences, predictive and prescriptive capabilities).
What was the first data set you remember working with? What did you do with it?
Rahul Kulhari: The first data set I worked with is the Kaggle Titanic Survival Analysis set. It’s an incredibly interesting and fun competition wherein the data is used to predict who could have survive the Titanic shipwreck. Although I did this purely for fun, it was my first experience with structured data and the myriad possibilities it holds.
Subsequently, when I joined EdGE I began to work with unstructured data (large resume data sets of IT/ ITeS companies) and this gave me the opportunity to unlock hidden patterns and inferences in it.
Was there a specific “aha” moment when you realized the power of data?
Rahul Kulhari: I would have to say there have been many ‘aha’ moments working with our clients’ data. It is a real high to find answers that were not visible earlier – and that is the real power of data today.
What is your typical day-in-a-life in your current job? Where do you spend most of your time?
Rahul Kulhari: No two days are alike working in a startup environment, there’s excitement every day. I spend most of my time delving into client data and building a platform that solves the toughest HR challenges the industry faces.
In addition, I manage a team of 12 talented Data Scientists, Analysts and Engineers. In my spare time, I like to keep abreast with the latest research in the Deep Learning and NLP domain.
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Rahul Kulhari: To stay updated and current with what is happening in the world of Data Science (DS), I closely follow the thought leaders of this space, academic journals, online communities, events and conferences.
Some of the resources I regularly tap into: arXiv.org for research papers, MILA research publications for Deep Learning, the European Conference on Information retrieval (ECIR) which takes place every year.
Share the names of 3 people that you follow in the field of Data Science or Big Data Analytics.
- Richard Socher, Chief Scientist at Salesforce
- Yoshua Benjio, Professor at University of Montreal (also heads MILA)
- Andrew Ng, Machine Learning Pioneer and former Chief Scientist at Baidu
Team, Skills and Tools
Which are your favorite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Rahul Kulhari: We use and experiment with a wide variety of tools. In addition, we use frameworks such as TensorFlow and PyTorch to develop our AI and Deep Learning algorithms.
What are the different roles and skills within your data team?
Rahul Kulhari: Deep Learning Scientists with expertise in NLP, Machine Learning Engineers with expertise in statistics and Data Engineers with expertise in ETL tools and data pipelines.
Help describe some examples of the kind of problems your team is solving in this year?
Rahul Kulhari: We work with enterprise clients to help solve their toughest HR challenges. Last year we focused on the eternal problem of recruiters/ talent acquisitions teams using Deep Learning: how to find the best talent in the market?
As a product-based HR tech company, we are committed to finding solutions to issues around talent acquisition, workforce optimization, transformation, attrition and more. This is our area of focus this year.
How do you measure the performance of your team?
Rahul Kulhari: In addition to the client related deliverables, we encourage and recognize innovation and out-of- the-box thinking.
Big Data Team, Skills and Tools
In the huge Big Data landscape, the skills are swiftly changing. Which is the technology do you see dominating in the ETL data space and real time?
Rahul Kulhari: Some of the open source technologies will be dominating the ETL tools. Airflow, Luigi are popular in the startup space.
How do aspiring Data Engineering demonstrate their capabilities of handling the tool, technology, data and domain? Is Certificate (Cloudera/Hortonworks) a clear differentiator?
Rahul Kulhari: Under this umbrella, there would be two types of Data Engineers.
Data Engineers with Machine Learning knowledge: for these engineers certification would not be the differentiator. However, knowledge in ML frameworks and libraries, ETL tools or distributed systems would be a clear advantage.
Big Data Engineer with knowledge of Hadoop/ Spark: certifications are very valuable for them. In addition, knowledge in ETL tools, distributed and scalable systems could be leveraged as differentiators.
Is Analytical skills, Statistics, Machine Learning must have or good to have skills for Data Engineers?
Rahul Kulhari: For Data Engineers with Machine Learning knowledge – statistics and Machine Leaning will be a must-have. For Big Data Engineers with knowledge of Hadoop/ Spark, it is not in the must- have category but good to have category of skills.
Industry Readiness for Data Science
Are the industries looking to understand what they can do with data? Do they have the required data in place?
Rahul Kulhari: Industries are definitely looking to understand and harness data to enhance their business decisions. However, data quality and other issues like the right kind of tools or platforms to capture data, disparate systems, structured and unstructured data are some of the challenges.
The C-suite must engage with experts within their organizations or outside to truly understand the potential of data and how it can be leveraged.
Which are the top 3 problems that are on top of the Data Science, either based on industries, or based on technology area?
- Data quality
- Systems for capture and storage of data
- Unlabeled data
Industry Readiness for Big Data
Is Big Data becoming a reality in the industry beyond the social giants like Facebook, Google, Yahoo? If yes, which industries are actually moving towards the power of Big Data Analytics? If no, what is the outlook for adoption?
Rahul Kulhari: Yes, Big Data has tremendous potential and many industries are looking at adopting it seriously today. Healthcare, HR, agriculture, e-commerce, cybersecurity are some areas that come to mind.
Name 3 Industries and the kind of problems that they are solving using Big Data.
- Healthcare: diagnosis and delivery of healthcare
- Cybersecurity: detecting attacks and protection from hackers
- HR: talent acquisition and workforce optimization
Who in the Industry is your typical client for Big Data? Is it the CTO, CIO, CMO or special data leaders?
Rahul Kulhari: As an HR tech company, our conversations take place mainly with the CHRO and CIO of an organization.
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft-skills that are needed for Data Analysts and Data Scientists?
Rahul Kulhari: Technical skills would include Machine Learning, Deep Learning and visualization. And soft skills would include communication, ability to work with teams and collaborate.
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Rahul Kulhari: Sadly, a large proportion of the data that exists today is messy and noisy data. As data practitioners, this is an area of concern – but one that cannot be avoided.
To build expertise I would first recommend a solid grasp and knowledge of mathematics in addition to all the available and upcoming tools.
What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?
Rahul Kulhari: Build deep knowledge of linear algebra and probability. Also, focus on Machine Learning and NLP. Courses in CS224n and CS231 and Machine Learning courses on Coursera are very valuable too. Most of all to keep the passion in you burning.
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Rahul Kulhari: Deep Learning in the field of computer vision and NLP.
Big Data Solution Space
What is the kind of structured and un-structured data companies have? What is the size that we are talking about?
Rahul Kulhari: Companies sit on large amounts of unstructured data (textual data, social data) as well as structured data (transactional data). Both are valuable from a Data Science perspective and can reveal valuable insights. The size of the dataset can vary – from a few thousands to a few millions.
Are there legacy systems that are being replaced? If yes, which legacy skills are being replaced?
Rahul Kulhari: In general, legacy systems are being replaced with newer technologies. If you look at the HR space, legacy systems are definitely being replaced or being integrated with intelligence layers of newer technologies.
What is the size of clusters/environments that are being deployed for the clients? What are the production challenges?
Rahul Kulhari: Our customers are large enterprises and the size of the cluster may vary. Production challenges may include issues related to scalability, performance and the Cloud.
Would you like to share few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?
“Digital Vidya is doing great work in the area of developing Data Analytics talent. The industry is moving towards Artificial Intelligence and Deep Learning – and Digital Vidya can be a valuable resource for everyone and a valuable knowledge sharing community.”
To know more about Rahul Kulhari, you can check out his LinkedIn.