Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do. There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Now as I approach my 90th birthday ( 27 June 2023 ) , I invite you to visit my Digital Avatar ( www.hemenparekh.ai ) – and continue chatting with me , even when I am no more here physically

Sunday, 30 November 2003

EXTRACTION ACCURACY QA

Kartavya / Abhi / Sanjeev

cc: Rajiv / SriRam / Nirmit

Date: 30-11-03


Accuracy of Extraction

  • We had a long debate on this yesterday with Rajiv, SriRam, and Nirmit.
  • Whereas I agree with them that we have to continuously work on increasing extraction accuracy, it is a long-drawn-out process. Each incremental increase in level of accuracy will take more and more effort. Of course, if we were to design a true Neural Network Software, this accuracy-improvement process would become totally automatic and require no human effort. We are far from that stage.
  • In the meantime, we cannot hold up launch/marketing of Recruitguru.
  • To convince potential clients that we have a damn-good product, we must take help of latest management jargon viz.: SIX SIGMA (a philosophy I used on Shopfloor of Switchgear Machine Shop, more than 40 years ago!)
  • Sanjeev suggested plotting a graph as shown on enclosed page — (and have it get automatically updated on the Recruitguru home screen) as proof that our extraction process not only meets but exceeds what is expected of it!

 

No. of Fields that could not be extracted

No. of Resumes (processed so far) falling in this category

0

10,000

1

20,000

2

40,000

3

40,000

4

10,000

5

6,000

6

2,000

7

800

8

500

9

100

10–23

(remaining few)

 

Kartavya

Date: 22-03-03


Improving Extraction Accuracy

Abhi told me last evening that he has planned to process/convert 1,000 email resumes today morning.

Based on this “experiment”, remaining 26,000 can be processed next week. When this is done, a tabulation such as Annex A should be prepared. It can be rearranged in the descending order of (2nd column) No. of failure cases out of 27,000.

Descending order will tell us which cases to tackle first (Priorities).

You may also consider constructing a TOOL screen, as shown in Annex B.

I have already modified the existing Resume Screen. Several human experts could be asked to work on this by examining different “failed” cases — they may be assigned to different experts.
After studying each failure, these experts should enter their comments into the middle block.

Of course, to speed up the process, Vittal / Santu may go through all “failed cases” in advance and only forward those cases to experts where the value does exist but the software failed to extract the same.
Then this tool will help us to capture the knowledge of several experts.

 






Friday, 28 November 2003

JOB ADVTS. DATABASE (JAWS)

JOB ADVTS. DATABASE (JAWS)

ABHI / DEEPA

cc: Kartavya

Date: 28-05-03

Subject: Job Adverts Database (JANS)

Page: 1/3

In my earlier notes (JANS / CYBER etc.), I have explained how we will use a Job Adverts Database to promote our business both online & offline.

In Project Manhattan, we have a proof of how such a database helped.

I strongly believe that our ability to capture 2 million / 5 million / 10 million resumes largely depends upon:

  1. Our ability to download thousands of job adverts from a large number of portals / myriad sources.
  2. Creating a structured database (converter) of these adverts.
  3. Delivering these job adverts, in a dozen different institutions (or even streamers), to:
    • Newspapers / Magazines
    • TV Channels / Cable Operators
    • Cyber Cafes
    • Job Portals
    • Placement Agencies
    • Cell-phone Companies
    • Individual Homes
    • etc. etc.

(All forming part of JANS / CYBER network)

 

We have known/discovered the shortcomings of Project Manhattan — in pulling in resumes — viz.:

  • No one can possibly believe that we are working on so many search assignments, so after a while, jobseekers stop responding.
  • Occasionally, in downloading/uploading, we do make mistakes — e.g., advertiser’s name/identity/product or senior position jobs which result in phone calls to our consultants — which are embarrassing!
  • We do not pass on the resumes received to the original advertisers — so obviously none of the responding jobseekers ever gets an interview call!
  • Genuine adverts, released by our consultants for our live search assignments, get lost in the sea of fictitious adverts — they don’t get noticed, so effort gets wasted and real/genuine business suffers!

Anyway, Manhattan did serve a useful (limited) purpose, but it is time to shut it down.
So, starting July 20th, delete all fictitious job adverts from all 3 divisions by July 31st.

However, the next 2 months (June–July) must be used to prepare ourselves to enable us to launch:

  • Project CYBER in late August
  • (This will require some changes in our website. After Project EGO gets launched beginning of August.)
  • Project JANS in late September
  • (This will require major modifications to our website.)

Starting 1st June, and using Myriad,

Deepa will daily download as many job adverts as she can from as many job sites as she can.
I have listed only a few in the enclosed chart.

If we could download 1000/day, it would be lovely!


Deepa will use the enclosed chart for entering the download statistics.

She should create this chart on computer (preferably on her computer), so she can put a copy on my machine also.

She can click and enter data easily.

All those downloaded during the day must get converted the same night to create a Structured Database.

If “Auto-converter” cannot convert job adverts downloaded from any particular site, we must modify it.

 






Wednesday, 26 November 2003

BLOCKS

26 Nov 2003

Sanjeev

BLOCKS

·     Enclosed find some UI’s which we developed in the early-stages of Recruitguru. Then, for some or other reason, we dropped these in favour of what we use currently.

·      Two important UI’s you should know about is


-    “Extraction of entire BLOCKS

-    Breaking up of keywords into categories.






Fields to be extracted in Resume Extractor

1.  Name

2.  D.O.B.

3.  Gender

4.  Current Company

5.  Actual Designation

6.  Total Experience

7.  Education Level

8.  Address

9.  City

10.  Country

11.  Pin code

12.  Home Phone

13.  Mobile

14.  Fax

15.  Fax

16.  E-mail Id

Blocks identified in Resume Extractor

1.  Educational

2.  Objective

3.  Experience

4.  Skills

5.  Personal Details

6.  References

Monday, 17 November 2003

TROUT'S MANTRA : DIFFERENTIATION

17 Nov 2003

Raju/Sanju/Kartavya,
   
TROUT'S MANTRA : DIFFERENTIATION

"Before you break the rules, know the rules,"

First rule, attaining success to be different.

"Effective strategy is all about differentiation,"

brands needed to create a real reason to buy and not a meaningless slogan.

to ensure differentiation by attributes, which was the only characteristic that would make products unique.

How each of them were able to stand for attributes the ultimate driving experience, safety, engineering and styling.

Consumers want to believe that products can contain a magic ingredient that will improve performance.

Where most consumers did not bother to know what Trinitron was all about.

Do not ignore the competition, focus and differentiation are critical in competitive world and CEOs must be willing to encourage sacrifice instead of growth. 

If,

BMW              = Ultimate Driving Experience
Volvo              = Safety
Mercedes         = Engineering
Jaguar            = styling

Then,

Guru Mine        =
Guru search     = Short listing competence not resumes?
Recruitguru      = The future of webservices
3P jobs.com     = For Executive search, corporate India's First choice

Remember

Owen corning (India) Ltd, has even managed to get a trade - mark (not patent) on color "PINK"!

HEMEN PAREKH

Thursday, 13 November 2003

INBOX READER

IN-BOX READER

Sanjeev – Abhi – Kartavya

Date: 18-11-03

Page: 1/9


Only 2–3 days back, I sent to Kartavya a mention from PC World magazine, of an anti-spam software which works on the principle of:

Frequency of usage of words in an email.

If a human keeps telling this software:

(A) “This email is a spam.”

(B) “This email is not a spam.”

…it keeps learning!


Simple.

The software simply keeps totaling up the words (and computes their usage frequencies) in each category (A) and (B).

Hence, probabilities of occurrence of any given word keep changing as the word population keeps growing in each population (of good/spam emails).

This is a free software, and Abhi should download it for R&D.

Now, if you simply replace:

  • “not-a-spam” = a resume email,
  • “spam” = any other email,

then get a manual segregation of both types of populations (of emails)—which, I think, Reema does on a daily basis—and then ask this software to “process” like-index words & compute their frequency of usage in both populations, then very soon you will get this software to learn to recognize:

a “resume email” = not-a-spam

an “ordinary email” = spam


PRESTO!

You’ve got a ready-made solution (a free software too!) which, as soon as it reads an incoming email, is able to decide accurately:

→ whether that email is a resume or not!

And what is more, this software keeps learning with addition of each incoming email!

Do try.

 

If you wish to learn how precisely this software works, visit

👉 www.paulgraham.com

That brings me to the small news item pasted at the start of this note.

Once again, the researchers are falling back on Theory of Probability

(and ubiquitous Frequency Distribution Curves)

to figure out

Good = Plays written by Shakespeare

Bad = Plays written by someone else

(but being palmed off as original Shakespearean!)

This they are trying to do, by simply trying to figure out the

Patterns of Usage of Words

in each of Shakespeare’s plays.

At a very crude level, it is no more than computing:

a) Probabilities of (usage of) keywords in all the plays written or claimed to have been written by Shakespeare — the entire population of plays.

b) Probability of usage of keywords in some smaller (suspected) sub-set / sub-population of plays.

c) Keywords used in a single play.

Then compute Coefficient of Correlation (like a line-of-best-fit).

  • The higher the coefficient, the greater the probability that a given play was truly/genuinely written by Shakespeare himself.
  • On the other hand, a low coefficient of correlation would indicate that the divergence is so much that it is highly unlikely to have been written by Shakespeare.

The pattern (of usage of words) is at considerable variance (more than ±3σ) from the mean.

Quite like an SQC Control Chart

(Sketch included showing a fluctuating line with “Upper Control Limit (+3 σ), Lower Control Limit (–3 σ), Mean,” and two outlier points beyond limits labeled 99.9 %.)

Obviously these two readings could not have occurred due to chance variation (since they are falling outside the control limits).

So they are not part of a pattern — hence these two plays must have been written by someone other than Shakespeare!


What, if any, is the significance of:

  • Probabilities
  • Mean / Skewness
  • Standard Deviations
  • Upper Control Limit / Lower Control Limit
  • Line of Best Fit
  • Coefficient of Correlation
  • Patterns
  • etc. etc.

to our business?

Do all these have …

… any practical use for us?

Any specific competitive advantage that we can gain over our competitors (such as Monster / Naukri)?

You bet!

Our Function Profile (with its Raw Score / Percentile / Population-Sample Size) is a proof.
With these profiles, suddenly a resume carries far more meaning compared to a plain email resume.

The ImageBuilder, with its Function Profiles, makes much more sense.
The profile of a jobseeker comes under a much sharper focus — it is no more diffused.

Now you do not need an experienced or trained HR professional to figure out what is the relative standing of this jobseeker in a long queue.

Yesterday we discussed how, in future, we will construct Function Profiles according to a Designation Level (within a function) — a truly “Apple-for-Apple” comparison.

Today, when we construct function-wise profiles (Sales / Mktg / Production etc.), all that we are doing is to separate out …

→ Vegetables (Sales)

→ Fruits (Marketing)

…and then trying to say,

“Mango is sweeter than Apple and Apple is sweeter than Banana!”

When we succeed in creating Function Profiles, designation-level-wise, we would have succeeded in separating Mangoes from Apples from Bananas!

So this has to be a definite goal.

This will make more sense to a recruiter (subscriber), and he will be willing to pay a premium price for such a feature.

And then we will do the same thing with:

  • Salary Profiles
  • Designation-Level Profiles
  • Tenure (Length of each job) Profile
  • Education Profile
  • Age Profile
  • Experience (Total Yrs) Profile
  • Actual Designation Profile (Vacancy/Position Name)
  • etc. etc.

 

That brings me to the question of “Duplicate Resumes (and Duplicate Profiles?)”

Initially, when a subscriber is trying to process/extract 50,000 resumes lying on his PC, it is understandable that he may come across identical resumes (same first name/last name/DOBs) and may delete one of these (obviously the latter processed one) as duplicate.

However, if we (i.e., Gururaj) come across such a duplicate resume after a week, a month, or a year, then—

→ we must process/extract it, and

→ replace the old ImageBuilder with the latest ImageBuilder under the same PEN.

It is obvious why we must replace the old ImageBuilder with the new (latest) ImageBuilder.

Even if the latest email resume (of the same candidate) is identical with the old one (i.e., not a single word changed), when we re-process it:

→ Raw Score may remain same,

→ but Percentile will most likely change.

Because, during the intervening period, a lot of resumes (belonging to some function) got added — changing the population/sample size!

So his function-profile will be different!

(even if keywords remain the same)

And it is this new/revised profile that Gumline should store and make available to both the subscriber (during search) and to the candidate (during bounce-back).

Then there is a good probability that in the latest resume, some keywords have also changed.

In that case, both the Raw Score and the Percentile would change — maintaining the same (i.e., PEN).

It’s only this alive and ever-changing profile (sample size / raw score / percentile) that we can claim that our software is self-learning in the most dramatic possible way!
This (self-learning/adaptive) method is proving this — by simply re-processing the same resume every week for a week or two, and seeing (himself) a different Function Profile each time.
Then he will never tire of talking about Gumline to his friends!

(Signed / dated: 13-11-03)