Confidential $\qquad$ ©C-DAC,
Mumbai (formerly NCST)
Review and Recommendations on
Gurumine and Gurusearch
Disclaimer: This document
is prepared based on the author's understanding of the system as per the demo's
and interaction with Three P Consultants Pvt. Ltd.
A. Background
When a corporate needs to recruit
people, they place an advertisement in the paper. This advertisement normally
receives many applications, sometimes in thousands, more so, when electronic
forms are accepted. However, this response contains spurious responses also,
which need to be filtered out. Given the constraints on time and huge volume of
data, manual filtering is not feasible. Even after removing spurious responses,
there are many candidates to be reviewed and filtered further before they can
be called for an interview.
Sometimes candidates do not wait
for an advertisement and periodically keep dropping or emailing their resumes
to corporates in the hope of getting called for an interview whenever a vacancy
arises. This introduces another problem in managing Resumes in having to
identify duplicates and updates to one's Resume and handling them properly.
All in all recruitment is a
costly process in both time and money. To make things worse, the same process
is carried out over and over again for the same posts.
Three P Consultants Pvt. Ltd.
(3P), a consultant for recruiting people for the engineering and management
fields, is attempting to simplify problems such as these by automating the
various steps.
In the 13 years of experience, 3P
has created a large databank of Resumes of nearly 3 lakh Resumes. The
experience gathered and their anticipation of future trends is what drives
their website RecruitGuru (http://www.recruitguru.com). This website hosts two main
functionalities that any employer would need – Gurusearch and Gurumine.
This document is a brief
description of these components for which 3P looks for some guidance from C-DAC
Mumbai (formerly NCST) to improve their performance.
A corporate registering with 3P
is given an account. Whenever a Resume is sent, it is first processed by
Gurumine and stored in a database for future consideration. This processing not
only identifies the relevant details of a candidate but also evaluates the
strengths of a candidate and how he fares against other applicants. It can be
noted here that multiple formats of a candidate Resume are stored to suit
different job profiles are not to be treated as duplicates; this is partially
addressed by having separate account for each employer, and limiting duplicate
checking to within an account.
Once the Resumes are stored in
the database, Gurusearch is used to retrieve Resumes of candidates based on a
number of factors such as the estimated aptitude of the candidate,
qualification, age, etc.
B. Gurumine
From a semi-structured Resume, we
need to identify a select set of information for comparison, screening, etc.
These include experience, educational qualification, etc.
Confidential $\qquad$ ©C-DAC,
Mumbai (formerly NCST)
3P's database currently provides
for 23 such fields. These are stored in a database. There is no standard format
for making Resumes, which makes it challenging to extract information for these
23 fields. The order in which the various fields are filled in a Resume varies
from Resume to Resume. In addition, there are several ways of stating the same
thing, which complicates the task even further.
A functional profile of
the candidate is also prepared to identify the top 3 core areas of the
candidate. Gurumine recognises about 33 core areas. While preparing the
functional profile, a raw score is assigned to the candidate for each
core area. This raw score is used to compute the percentile of the candidate
for each core area.
Basic Workflow
When a Resume is submitted to
Gurumine, the processing involves
- Check for duplicates
- Segmentation
- Information Extraction
- Functional Profiling
Check for duplicates
When a Resume is loaded it is
first checked for duplicates based on the first and last names and date
of birth. In case of duplicates the HR manager can decide on which copy to
keep.
Segmentation
Most Resumes have segments like personal
details, work experience, technical skills and education.
Segmentation involves identifying these segments and extracting relevant
information within the segment. Though normally an easy task for human beings,
this is complex to automate. The candidate may use different conventions for
separating the different segments, as the titles/headers used for the various
segments are not standard nor in any standard order.
To do segmentation, 3P has
identified keywords typical to the segments. They also use a set of heuristics
such as the following to help in this process.
- The name of the candidate is always in the
largest font.
Note: They are heavily dependent
on a particular format here and this may not be true always. However we need
clarification from them on what happens if they do not get this.
- To identify an email address they search for a
string having the '@' character and terminated by a country identifier
like 'in' \ 'com' \ 'au' \ 'uk' ...
Note: A generic identification
routine is available with C-DAC Mumbai, which doesn't require the country code
for identifying an email address.
- They have a synonym list for segment headers.
For example Relevant Experience and Job Experience mean the
same.
Note that this list is not
exhaustive.
This approach works to a fairly
good degree of accuracy.
Information Extraction
Confidential $\qquad$ ©C-DAC,
Mumbai (formerly NCST)
Once the segments have been
identified, the required information is extracted from each of these segments.
Within the personal
information section, they identify the name, age, address, email address
and phone numbers of the candidate.
The heuristics for identification
of address is that it typically follows the name; else it is clearly
marked by a heading of some kind of 'Address'. The extraction is not accurate
and parts of an address are sometimes omitted. The entire rote for the address
identification and the nature of the same is still to be clarified.
The Experience segment is
one of the relevant segments in this problem domain apart from the technical
skills. The current job profile details are identified looking for keywords
like current period, to date, etc. However this technique would have a
problem if a person has taken a break after his last job.
After the relevant information
has been extracted, the information of the candidate is displayed as a form on
the left side of the screen. The right side displays the Resume of the
candidate from which this information was extracted and hence one can easily
verify and update (if required) the contents in the form.
Functional Profiling
The Resume is once again scanned
for keywords irrespective of the segments for functional profiling. For
example: keywords like marketing vice president clearly indicate that
the candidate is from a finance background and hence the weight of this
keyword in finance profile is very high. Similarly all keywords have been
assigned weights. This weight is calculated from the Resume corpus via
statistical methods and is updated every month. The raw score is computed based
on the weight of all keywords present in the Resume.
While extracting keywords, it has
been observed that the experience segment poses problems. For example, there
was a Resume wherein a candidate had written "report to vice president"
and hence the keyword "vice president" was associated with the
candidate!
The raw score computed for each
profile is used to compute the percentile range to give an idea where the
candidate stands compared to other applicants. This is displayed using graphs
for easy comparison.
C. Gurusearch
Once the Resumes have been
processed and stored in the database, the short-listing of candidates for an
interview involves specifying the percentile range and the various other
criteria like age and qualification. Gurusearch fetches the Resumes that match
the selection criteria for further perusal. 3P wanted to explore some AI
technique like SOM for clustering and ranking the candidates.
D. Knowledge Bases
The knowledge bases used by the
system as of now, include
- Keyword list
There are tables of keywords for:
a. Posts like vice president,
accounts manager etc
b. Headers for segments like experience,
technical skill, etc.
Confidential $\qquad$ ©C-DAC,
Mumbai (formerly NCST)
c. Qualification
Note: This list is not
exhaustive.
- Synonym list
People have various ways of
stating one thing. For example, a graduate of arts may write B. A. or
Bachelor of Arts. Similarly segment titles may be written as "Experience"
or may be broken into "Relevant Experience" and "Other
Experience", etc.
The synonym lists identify and
map these various forms into one.
Apart from the operational
knowledge bases listed above, 3P has knowledge bases that are currently not in
use. Many of these are experimental systems designed by Mr. Parekh, Executive
Director, Three P Consultants Pvt. Ltd. based on his many years of experience.
Some of these are listed below.
- Probability chart for a candidate to be
appointed to various posts depending on his current qualifications and
experience. So a candidate who is a manager for the last two years has a
probability of 0.97 of being appointed as a manager, or 0.95
for senior manager, 0.40 for regional manager, .20 for vice
president, 0.0 for trainee.
- An earlier system developed (not used anymore) had
a human expert classify candidates to a particular sector based on certain
keywords. For example, if the job title of a candidate was regional
sales manager, then the keyword sales would indicate that he was from the marketing
sector. So this expert would then indicate that the candidate may be put
into the marketing sector with the explanation that his job title was
regional sales manager and the word sales triggered the categorisation.
E. Problems with the current
system
- The detection of duplicate Resumes can be
further improved by checking for the recent experience or education
(in absence of experience), in addition to name and date of birth.
The scenario where a candidate
applies for multiple posts within a company has not been considered. While
applying for different posts the candidate may tailor the resume for each post.
In this case the Resumes are not to be considered as duplicates as the emphasis
on experience or skill may differ.
- When a Resume is sent by mail there may be many email
ids in the text and this makes picking the candidate's email id
through a generic mail-id identification routine a little tricky. However
if the segmentation module identifies the right segment from which
to pick the email id then this is not an issue.
- When a phone number was split into two lines the
system was not able to capture the phone number as the length of the
number did not meet the criteria for minimum length of a phone number.
Note: A word grouper could
be used to combine successive number blocks into one to deal with this problem.
- Efficiency of address identification can be
improved using a named entity recognition system.
Confidential $\qquad$ ©C-DAC,
Mumbai (formerly NCST)
- Replacing various forms in which a given piece of
information is expressed (e.g. B.E., BE, Bachelor of Engineering)
by a single form (B.E.) will simplify processing while identifying
keywords in a document.
- The period of service is not taken into
consideration. Consider a scenario where there is a requirement for a DBA.
If there were three applicants:
- Candidate A, who has had seven years of
experience as a DBA and is now working as a system administrator for the
last two years
- Candidate B, who has been working as a DBA
for the last two years.
- Candidate C, who worked as a DBA for the
four years and took a break for one year to pursue further education.
The system currently considers
only the last experience in detail. For profile matching keyword
occurrence are used without regard to place and context of occurrence. This
could result in wrong/poor classifications.
- The list of segment headers is not
standardised and can grow. Segmentation can be improved to some extent
using natural language techniques.
- The knowledge base acquisition needs to be
enhanced to deal with varying styles.
- Profiling can be enhanced with use of ontology.
C-DAC Mumbai's role
C-DAC Mumbai can help with
addressing all the problems mentioned above. In addition, C-DAC Mumbai would
help in improving the accuracy of:
- Information extracted.
- Improving profiling
C-DAC Mumbai can also help Three
P Consultants Pvt. Ltd. build a solution to rank and/or cluster candidates.
A variety of technologies are
available for incorporation into the existing framework. Much of these can be
incorporated as incremental refinements to the existing framework. The
technologies vary in the extent of improvement, availability of attempted solutions,
difficulty of configuration/training, etc. Details of this requires more
elaborate analysis of the current system as well as sample data, and can be
taken up once a preliminary MoU is in place.
raju
From: sasi@ncst.ernet.in
Sent: Friday, January 09, 2004
2:50 PM
To: raju
Cc: Kavitha M;
sasi@yuga.ncst.ernet.in
Subject: RE: Letter of Interest
$\rightarrow$ CDAC Response
Dear Shri Raju Kapoor,
Thanks for the feedback. In order
to move towards an MoU, we need some thoughts from you on
- areas which you would like to take up in decreasing
order of priority
- the model of NCST involvement you have in mind.
Based on this we can work on an
MoU. We may need one or two rounds of discussions before we freeze the items 1
and 2. But based on the broad description provided in our document, we would
like a formal note from you
sharing your thoughts on 1 and 2
above, along with any constraints on
time
and budget you would like to
mention. We can then discuss internally, and then
have a discussion jointly to
finalise the MoU terms.
- Sasi
Quoting Raju Kapoor raju@3pjobs.com:
$\rightarrow$ Original message
from Raju Kapoor raju@3pjobs.com $\rightarrow$
Date: Wed, 7 Jan 2004 18:42:12
+0530
From: Raju Kapoor raju@3pjobs.com
Reply-To: Raju Kapoor
raju@3pjobs.com
To: Kavitha M
kavitham@ncst.ernet.in
Hi Kavitha,
I see from the document you sent
that you have a fair understanding of RecruitGuru. Some minor aberrations can
be explained when we meet next. We would be glad to move to the next stage
(MOU). Kindly let me know our input at this stage.
Kind Regards
Raju Kapoor
Principal
3P CONSULTANTS PVT. LTD.
Member of PENRHYN International
(www.penrhyn.com)
http://www.penrhyn.com
Member AESC (www.aesc.org
http://www.aesc.org)
+91-22-2850 5800 (Office) Ext.15
+91-22-2850 5656 (Direct)
+91-9821111969 (Mobile)
+91-22-2850 6663 (Fax)
www.3pjobs.com
http://www.3pjobs.com/
raju
From: sasi@ncst.ernet.in
Sent: Friday, January 09, 2004
3:05 PM
To: raju
Cc: Kavitha M;
sasi@yuga.ncst.ernet.in
Subject: Re: Missed out on one
point
$\rightarrow$ CDAC Response
We will certainly be interested
in incorporating machine learning in the system, and we do believe it will add
a lot of strength to the system.
However, there are various areas
where ML ideas can be incorporated and various tools are likely to be suitable
for these.
Some elements can be built in
during the initial stage itself, and some of them may need some more time and
can be taken as Phase II.
We will proceed with this, as
soon as we get a formal acknowledgement of our earlier proposal and a direction
to go ahead along with thoughts on time frame, model(s) of cooperation, areas
to be taken up, etc.
- Sasi
Quoting Raju Kapoor raju@3pjobs.com:
$\rightarrow$ Original message
from Raju Kapoor raju@3pjobs.com
Date: Wed, 7 Jan 2004 18:54:13
+0530
From: Raju Kapoor raju@3pjobs.com
Reply-To: Raju Kapoor
raju@3pjobs.com
Subject: Missed out on one point
To: Kavitha M
kavitham@ncst.ernet.in
Hi Kavitha
We were also looking at the
possibility of building a feedback loop to make the system self learning. If
you think this could be too large an engagement to start with, you may take it
as phase II in the agreement.
Regards
Raju Kapoor
Principal
3P CONSULTANTS PVT. LTD.
Member of PENRHYN International
(www.penrhyn.com)
http://www.penrhyn.com
Member AESC (www.aesc.org
http://www.aesc.org)
+91-22-2850 5800 (Office) Ext.15
+91-22-2850 5656 (Direct)
+91-9821111969 (Mobile)
+91-22-2850 6663 (Fax)
www.3pjobs.com
http://www.3pjobs.com/







No comments:
Post a Comment