[A] How to figure-out, the
"INDUSTRY" of the advertising company? Job Advts do not mention this!
This is a critical issue - since
all job-seekers are keen on working in 3/4 specific "Industries" of
their choice. So they wish to look-up "interesting" job-advts from
those "Industries" only - so, figuring-out the advertiser-company's
INDUSTRY is a MUST, for purpose of match-making.
This can be done in following
ways:
# Link each "Company
Name" with one/more "Industry-Names".
We have already developed such
linkages for over 20,000 Companies.
DCA \oplus contains names of 6
Lakh companies - along with their "product/service" categories
(similar to Industries). There are only a few hundred UNIQUE categories which
can, in turn, be linked with our own 52 INDUSTRY-NAMES (may be Inder Sethi,
already did this!).
The only hitch is that many
job-advts are released by Placement Agencies - who, do not reveal the names of
their Client-companies!
In respect of resumes,
such a Software Tool was developed Some 8/10 months ago. Using this
tool, Mr. Anjaria had "categorised" some 1000+ resumes,
as belonging to different Industries. In the tool, there was a drop-list
of 52 industries. After reading the resume, Anjaria concluded that the
Resume belonged to a particular "Industry" - which he clicked.
Then, below the drop-box, there was another box, which said,
"Why I think this resume
belongs to this Industry - which keywords in the resume lead me to conclude
this?"
Of course, processing of 1000
resumes, may not have resulted in a sufficiently large enough knowledge-base!
May be we need a human expert to process 10,000 resumes like this before
we manage to build a respectable Knowledge-Base for 52 different industries.
However, there are \mathbf{2}
problems:
(1) Can we safely assume that
\left\{\begin{array}{c}\text {
Industry-specific } { KEYWORDS } { of a Resume
}\end{array}\right\}=\left\{\begin{array}{c}\text { Industry-specific } {
KEYWORDS } { of a Job-Advt }\end{array}\right\} ?
If we have slightest doubt, this
is not a big issue. In the very same tool which Anjaria used, instead of
loading/processing "Resumes" we could, simply load/process
"Job Advts"! The process remains Same - only the
document being processed changes! This is fairly simple.
For processing 1000 resumes,
Anjaria took nearly a month! He could barely read/interpret &
highlight some 30/40 resumes per day.
Capturing human knowledge and
embedding it in a software is a slow process! To some extent, this
process can be speeded-up, if, instead of loading the full/entire JOB ADVT
into the tool, we were to Index (thru IS4S) the "Words"
contained in each job-advt and then Only load these words into the tool.
Then perhaps, a human-expert can categorise 200 job advts in a day, thru
highlighting of Keywords! So, one expert, can, perhaps assign INDUSTRY-NAME
to, may be 5000 job-advts in a month (- along with its highlighted
Keywords).
A still (much) faster
method of developing KNOWLEDGE-BASE OF INDUSTRY-SPECIFIC KEYWORDS,
So, it is obvious that,
when a Company posts its job-advt on any of these sites, it has to select a
given INDUSTRY-NAME (possibly from a dropdown list).
This means, all Job-advts posted
on these \mathbf{3} sites are tagged with an Industry-Name. And this
tagging is done by a human expert! (- the recruitment manager who
filled-up the job-advt form online).
When human-experts have already
categorized thousands of job-advts into different INDUSTRY-CATEGORIES,
then why re-invent the wheel?
We have already downloaded
thousands of job-advts from these jobsites. We can further download many
more thousands. Probably, we already have more than 100,000 job-advts
from these \mathbf{3} jobsites - each one duly allotted an Industry-Name by
a human expert.
All we have to do is to "group"
NOW, for each INDUSTRY-NAME-GROUP,
create a separate folder & then Index thru IS4S. Then
arrange all the words found, in descending order of occurrence, and
compute \rightarrow their "probabilities" of occurrence.
Perhaps, in each INDUSTRY-NAME-GROUP,
the top 50 keywords, would add-up to, may be 90\% of all occurrences.
PRESTO!
What a human-expert, using the
highlighting tool described earlier, would have taken months to do, the SORT/INDEX/EXTRACT/ARRANGE/COMPUTE
PROBABILITY would take one/two days at most!
Remember, several hundred
human-experts (HR Mgts) have embedded/captured their knowledge in
\mathbf{1} LAKH job-advts, while drafting (composing the same & then
allotting INDUSTRY-NAME to each.
Why re-invent the wheel?
Therefore, in order to do an
accurate job, we need to separate-out the "Industry keywords" from
"Function keywords".
And this separation must be
carried out BEFORE we calculate their "occurrence-frequency" - and
the probabilities.
Reason is obvious.
The "Function"-related
keywords are "polluting" the Industry-related keywords!
What we need is a population of
"pure" industry-related keywords only. And then, we need such
populations, Industry-wise.
One population for each
"Industry".
To do this separation, we would
need
\text{SEPERATOR OF KEYWORDS}
|
Advt. Source: |
Advt. No: |








No comments:
Post a Comment