Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do. There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Now as I approach my 90th birthday ( 27 June 2023 ) , I invite you to visit my Digital Avatar ( www.hemenparekh.ai ) – and continue chatting with me , even when I am no more here physically

Translate

Sunday, 22 February 2004

JOB MINER

[A] How to figure-out, the "INDUSTRY" of the advertising company? Job Advts do not mention this!

This is a critical issue - since all job-seekers are keen on working in 3/4 specific "Industries" of their choice. So they wish to look-up "interesting" job-advts from those "Industries" only - so, figuring-out the advertiser-company's INDUSTRY is a MUST, for purpose of match-making.

This can be done in following ways:

# Link each "Company Name" with one/more "Industry-Names".

We have already developed such linkages for over 20,000 Companies.

DCA \oplus contains names of 6 Lakh companies - along with their "product/service" categories (similar to Industries). There are only a few hundred UNIQUE categories which can, in turn, be linked with our own 52 INDUSTRY-NAMES (may be Inder Sethi, already did this!).

The only hitch is that many job-advts are released by Placement Agencies - who, do not reveal the names of their Client-companies!

In respect of resumes, such a Software Tool was developed Some 8/10 months ago. Using this tool, Mr. Anjaria had "categorised" some 1000+ resumes, as belonging to different Industries. In the tool, there was a drop-list of 52 industries. After reading the resume, Anjaria concluded that the Resume belonged to a particular "Industry" - which he clicked. Then, below the drop-box, there was another box, which said,

"Why I think this resume belongs to this Industry - which keywords in the resume lead me to conclude this?"

Of course, processing of 1000 resumes, may not have resulted in a sufficiently large enough knowledge-base! May be we need a human expert to process 10,000 resumes like this before we manage to build a respectable Knowledge-Base for 52 different industries.

However, there are \mathbf{2} problems:

(1) Can we safely assume that

\left\{\begin{array}{c}\text { Industry-specific } { KEYWORDS } { of a Resume }\end{array}\right\}=\left\{\begin{array}{c}\text { Industry-specific } { KEYWORDS } { of a Job-Advt }\end{array}\right\} ?

If we have slightest doubt, this is not a big issue. In the very same tool which Anjaria used, instead of loading/processing "Resumes" we could, simply load/process "Job Advts"! The process remains Same - only the document being processed changes! This is fairly simple.

For processing 1000 resumes, Anjaria took nearly a month! He could barely read/interpret & highlight some 30/40 resumes per day.

Capturing human knowledge and embedding it in a software is a slow process! To some extent, this process can be speeded-up, if, instead of loading the full/entire JOB ADVT into the tool, we were to Index (thru IS4S) the "Words" contained in each job-advt and then Only load these words into the tool. Then perhaps, a human-expert can categorise 200 job advts in a day, thru highlighting of Keywords! So, one expert, can, perhaps assign INDUSTRY-NAME to, may be 5000 job-advts in a month (- along with its highlighted Keywords).

A still (much) faster method of developing KNOWLEDGE-BASE OF INDUSTRY-SPECIFIC KEYWORDS,

So, it is obvious that, when a Company posts its job-advt on any of these sites, it has to select a given INDUSTRY-NAME (possibly from a dropdown list).

This means, all Job-advts posted on these \mathbf{3} sites are tagged with an Industry-Name. And this tagging is done by a human expert! (- the recruitment manager who filled-up the job-advt form online).

When human-experts have already categorized thousands of job-advts into different INDUSTRY-CATEGORIES, then why re-invent the wheel?

We have already downloaded thousands of job-advts from these jobsites. We can further download many more thousands. Probably, we already have more than 100,000 job-advts from these \mathbf{3} jobsites - each one duly allotted an Industry-Name by a human expert.

All we have to do is to "group"

NOW, for each INDUSTRY-NAME-GROUP, create a separate folder & then Index thru IS4S. Then arrange all the words found, in descending order of occurrence, and compute \rightarrow their "probabilities" of occurrence.

Perhaps, in each INDUSTRY-NAME-GROUP, the top 50 keywords, would add-up to, may be 90\% of all occurrences.

PRESTO!

What a human-expert, using the highlighting tool described earlier, would have taken months to do, the SORT/INDEX/EXTRACT/ARRANGE/COMPUTE PROBABILITY would take one/two days at most!

Remember, several hundred human-experts (HR Mgts) have embedded/captured their knowledge in \mathbf{1} LAKH job-advts, while drafting (composing the same & then allotting INDUSTRY-NAME to each.

Why re-invent the wheel?

Therefore, in order to do an accurate job, we need to separate-out the "Industry keywords" from "Function keywords".

And this separation must be carried out BEFORE we calculate their "occurrence-frequency" - and the probabilities.

Reason is obvious.

The "Function"-related keywords are "polluting" the Industry-related keywords!

What we need is a population of "pure" industry-related keywords only. And then, we need such populations, Industry-wise.

One population for each "Industry".

To do this separation, we would need

\text{SEPERATOR OF KEYWORDS}

Advt. Source:

Advt. No:

 










No comments:

Post a Comment