Notes: CANDIDATE DATABASE

CORPORATE DATABASE

Designation - Industry - Company

Function - SEARCH-PARAMETER - Product

(Below the diagram for CANDIDATE DATABASE):

Edu. Quali
Age
Exp.
Product/"Service" Exposure/knowledge skills / city

(Below the diagram for CORPORATE DATABASE):

Tech. collabora
Top Executives
JV
city
Plants / Sales Office

Anjaria

10/7/99

On our website homepage, there is a feature called

Triangle of Industry-Company-Product

This database contains some 4000+ CII Member data received by us on a floppy.

For each of these Companies, database provides

What "products" they are manufacturing
What "Industry" do they belong to.

Using this database, a surfer on our site can

enter "Company Name" / Get "Products" manufactured by the Company
enter "Product Name" / Get list of "Companies" manufacturing that product

enter "Industry Name" / Get list of "Companies" Operating in that Industry.

This is very Convenient.

Now, we are about to upload (into this website-based database) another 32000 records compiled from KOMPASS-1993 directory. May be there will be some duplication (with CII data). And some records may be obsolete because it is 6 yr old data. Even then it is quite useful.

I believe, one feature of the database allows you to Click on the name of the Company and get to see its "Contact Address / Phone No." etc.

For MODULE 1, we need to create a MASTER-FILE of

Company Name Industry Name (Matching)

Why?

Because, when a Candidate says (writes)

"I am working in "Company ABC"
or
"I worked in "Company XYZ",

the software figures out (from MASTER MATCHING LIST), that

Company ABC belongs to "Industry L"
Company XYZ " " "M".

We have, perhaps, managed to establish this link for some 15000 + Companies.

Ultimately we must establish this link for 500,000 companies (LTD & PRVT. LTD). A daunting task!

Perhaps there is a "short-cut".

Look at the triangle once again

Every child has to have a Mother - so every product has to belong to an Industry. No orphans please!

Now if one link of the triangle is missing, it would look as follows:

But

If we have, created a MASTER whereby

every CHILD'S MOTHER is identified

i.e.

every PRODUCT'S INDUSTRY-NAME is established

then the matter becomes quite SIMPLE (although somewhat circuitous!)

The moment we enter

a Company's Name \& the "Products" which it manufactures (or "Services" it sells),

we can travel "anti-clockwise" along the sides of this triangle and figure out

To which INDUSTRY does this COMPANY belong to!

NO need to create "COMPANY VS INDUSTRY" Master!!

Let that remain the "Missing Link".

We travel,

Company Product Industry Name

Of course, if a Company manufactures several "products", it may simultaneously belong to several "Industries". No problem. we should enter all those Industry-Names against the Name of that Company.

Whereas a COMPANY may enter/exit a Product (Service), the relationship between the Product \& Industry will always remain SAME (mother \& child).

How soon can we build-up an INDUSTRY vs PRODUCT MASTER?

Regards

H. E. P.

cc: Nirav/Cyril

cc: CMT/Nishit

cc: Sajida

Anjaria :

9-7-99

Your task is to ensure that every Resume in our database (on LAN or on INTERNET) is correctly "coded" for

Industry (Category) Name
Function ( " ") Name
Designation ( " ") Name

For each of the above, I have created MASTERS (MASTER LISTS) which you should thoroughly study before you start work.

Some small MASTER-LISTS did exist when we started HH3P database created by Mr. Nagle, but I explained to you how the present problem arose because, the data entry operators, whenever they could not find appropriate MASTER, simply left the

Industry / Function / Designation FIELDS

blank \& proceeded with the remaining data-entry!

This is why our current Internet-based database of 51000+ resumes contains (may be)

Some 20000+ resumes without "Industry"
Some (Same/other) 20000 resumes without "Function"

Similar No. of resumes without "Designation".

There can be a lot of Overlap in these numbers, because a given resume may have

Only "Industry" missing
" "Function" "
" "Designation" "
any permutation/combination of above.

When CYRIL designed MODULE #1 (Data-Capture \& Query Software), we vastly enlarged the MASTER-LISTS (You must see these first).

Even now, these LISTS may not be exhaustive enough to take care of 95\% of the resumes.

This fact (of incompleteness) is borne out by the fact that, while, recently entering 5000 web-forms (resumes received over web), the Operators faced some problems, or rather, problems with respect to some of the web-forms. These are listed \& the list is with Sajida.

So,

The first job is to solve these problems so that these 5000 web-forms

get properly transferred to Module #1.

Sajida can open each of these Resumes (on the screen), tell you what is the problem, you give Correct answer (by reading the resume on the screen) and Sajida Complete the entry.

When this is done, you can turn your attention to

ENLARGING THE MASTER-LISTS

Wherever data-entry operators are facing maximum problems.

I have an intuitive feeling that the

Designation Master List \&
Function Master List

are fairly COMPREHENSIVE and may not need much enlargement/addition. Most likely, it will be the

COMPANY $\leftrightarrow$ INDUSTRY NAME MASTER

which may be posing problem.

This is obvious because, even if we

(Date: 17-11-01)

Now realize that a drop-down list of "Company-Names" is simply not possible!

take LTD \& PRVT. LTD. Companies of India, the list could run into 500,000!

And our Candidate-member (Who has sent his resume by typing \& e-mail) could be working NOW \& could have worked in past, in any of these Companies!

So we need to know

to what "INDUSTRY-CATEGORY" do each of these 500,000 companies belong to?

Once, we have found these answers - and created

COMPANY NAME $\leftrightarrow$ INDUSTRY NAME MASTER LIST

for these 500,000 Companies and entered the same in MODULE #1,

we would have solved 99.99\% of the operator's problems.

The moment he highlights \& clicks on a COMPANY-NAME, appearing in a Candidate's resume, against his

Current Employment or
Past Employment,

Module 1 will automatically pick-up \& enter the correct INDUSTRY-NAME from the MASTER-LIST.

Creating/enlarging MASTER is a ONE-TIME job and your focus must be that.

Focus has to be on PREVENTION (of an data entry error) rather than CURE!

Of course, one Company may be active in several industries, simultaneously. In such a case, you should, in the MASTER list all such "Industries" against that company's name. This can be done by simply adding a "comma" between the names of these industries. Sajida will show you how this is done.

As far as

Structured Web-form \& Floppy

is concerned, the candidate himself selects

Industry
Function
which are most "relevant" to himself \& enters into appropriate "FIELDS". so we have nothing to do/worry! It is

his funeral if he makes wrong choices!

The problem arises only in "typed" or "e-mailed" resumes, where there is no structured field. This is why we are discouraging this method of submission of resumes.

For covering as many COMPANIES (LTD \& PRVT. LTD) into your INDUSTRY MASTER

you may wish to discuss \& find a solution (of MERGING into one, single, master-list of Companies) with help from Sajida, Soma, Nirav, Cyril etc.

from the following Sources

16494 Companies (LTD) - List compiled by Soma
32000 " (LTD \& PRVT. LTD) " from KOMPASS 93
400,000 " in EXPLORE INDIA CD
Several printed directories
CII Membership Floppy (already with us)
ASSOCHAM " List (to be obtained) - 60,000 Cos.
Normal Adut. database
Corpo. "
Several other databases on our harddisk
Several India-related Search Engines/websites on Internet (I have a list).

Although, in itself, such an objective of Creating a MEGA MASTER-LIST of COMPANIES Vs. INDUSTRIES would be highly desirable (from the "prevention of error" angle), the question that we must answer is

how long will it take you to create such a MEGA MASTER?

If it takes one month-or more- it may not be worth to attempt do create such a list in ONE-GO!

In such a case, it may be much better to take-up the list of (say) 20000 resumes where INDUSTRY-NAME is missing, take up one at a time, and go on adding to INDUSTRY MASTER.

Such an approach has the advantage of

solving our immediate problem (of making that resume COMPLETE \& therefore SEARCHABLE)
gradually/simultaneously enlarging the MASTER.

I feel you should take this approach but any case, discuss your strategy with CMT / consultants / Sajida / Nirav first.

9/3/99

cc: CMT

cc: Sajida

cc: Nirav / Cyril

YOGESH

31-07-98

Tasheem If in doubt (as to what I mean), please consult me. This concerns you.

MODULE #1

DATA CAPTURE \& SEARCH QUERY

While designing/implementing this, please ensure to incorporate

my handwritten comments on your typed note dt. 28/04/98
my handwritten note dt. 30/04/98
my handwritten note dt. 15/07/98
our several discussions (including when you showed me some of the data-capture screens on 20/04/98)

On P-12 of my note dt. 30/04/98, I have mentioned the need to "capture" the "Source". I hope you have made that provision. This feature especially becomes crucial in case of RESUME BUILDER FLOPPY, as you can see from enclosed chart. I suppose, you will provide for "automatic" transfer of relevant SERIAL NO. (M/P/A/D/O) while "automatic" allotment of PEN when floppy gets loaded onto the database.

Before you give the demo on SUNDAY, let someone check-out all points mentioned in all of my notes, to ensure that these have been incorporated.

Regards

H. E. Parul (H. E. P.)

31-07-98

RESUME BUILDER FLOPPY

SOURCES FROM WHICH RECEIVED (RETURNED)

"M" Series 'M' Serial No.
"P" Series from P 'P' Serial No.
"D" Series from Distributor Distributor Code No.

For payment of award of Rs. 10/ per CD (Original or clone)

"A" Series from Associate Associate Code No.

eg: Mankodi - 999.
For payment of associate's share of our Prof. Fees in case of appt of Candidate.
Pl. make provision for this on Floppy generating Software on Tasneem's m/c

"O" (other)

CYRIL / Sajida / Chetan / Nishit / H. C. P. (1)

YOGESH

15-07-98

MODULE 1

DATA CAPTURE \& SEARCH

I refer to our discussion in my office yesterday when you explained how the data-capture process (under Module 1) will work with

Internet \} both structured EDS as well as
Extranet \}} \text{e-mailed resumes
Resume Floppy
Hard copy (Typed resumes).

As soon as we install Voice-Recognition Interface/Software on our E-PABX, we would also be able to capture VOICE-RESUMES which will be, essentially electronic files. I suppose these will be treated as e-mails.

During our discussions, it was also felt that there is a VITAL need to build-up a database of NON-MEMBERS.

We agreed that these persons should not be allotted PEN - also that this database should be maintained Separately and must not be mixed-up with MEMBER database.

The main reason for deciding this was that a person whose data we

capture today (partial data of course), could very well become our MEMBER tomorrow by entering his resume on Internet/Extranet.

So, if we allot him a PEN today (as a NON-MEMBER), we may end-up allotting him another PEN tomorrow (as a MEMBER)! In fact Internet-Extranet will do this even without our knowledge!

To avoid this we decided that we shall create a distinct database for NON-MEMBERS (without PEN).

But this database will still be SEARCHABLE. This is essential because our Consultants could use this database for contacting "prospective-potential" Candidates.

As suggested by you, I have prepared a "tabulation" as enclosed. Although there could be dozens of "SOURCES" for NON-MEMBER data, I have picked some 14 different sources which were readily available in our office.

Against each source, I have ticked $\checkmark$ the data which I found in that source. This does not mean that each \& every "field" is available for each \& every person in that particular source,

But

If a particular field (data) occurs even in a few cases, we must make provision for that, otherwise we will miss-out data on that person.

You may now, use the enclosed sheet to create the necessary data-capture screen for NON-MEMBER DATABASE.

My own observations:

① Although

current job (working since)
" salary
Total years of experience

is to be found in only one SOURCE, Viz. Employee details (Annual Reports),

this is a VERY VERY IMPORTANT data from the viewpoint of

Com.com (websites)
headhunting by consultants

So, we must keep it.

② Again, Only one SOURCE, Viz.: MEMBER DATA UPDATE FORM,

Contains "fields" for

Function
Product/Service exposure
Industry
Specialization
\& Achievements.

I have designed this form and sent to 93 local chapters (Centres) of The Institution of Engineers, with a request to distribute amongst their 65000+ members, who will (hopefully) fill-in and directly return to 3P.

I have promised to enter this data \& create a database and make it available to individual Centre. This service is FREE.

If this experiment succeeds, I propose to repeat it with other Professional Bodies such as

NIPM (National Institute of Personnel Managers) etc.

In which case, I would like to standardize on this form. I am preparing it in block format fields } \text{ so that we could capture the fields directly.

So we should keep these fields as well.

③ There are a number of SOURCES,

Where SOME or OTHER data is missing/not available.

But, because we are trying to create a generalised/universal data-capture screen, I suppose, we have no choice except to make provision for all the fields listed on the enclosed tabulation.

On the otherhand, we must keep the database so flexible that we can add more "fields" in future as new "sources" might dictate.

While on the subject, please seriously consider a drag \& drop type data-capture facility for JOB-ADVERTISEMENTS,

Which, we are, in anycase scanning/OCR'ing \& converting to text format. This is because a typical job-advertisement is other side of the coin (of resume), with almost identical

SEARCH-PARAMETERS.

Also because,

only a few months down the line, when we start work on Module #3 PRO-ACTIVE MARKETING, we have already provided for "Adut. based Query".

But, we must also plan for

PRO-ACTIVELY PROMOTING A CANDIDATE (based on his resume).

Idea is that every new resume received everyday in our database will, automatically be matched with the

JOBS AVAILABLE DATABASE

or even

PAST JOB-ADVT. HISTORY OF EACH \& EVERY CORPORATE

for the last 3 years, to see if that Corporate had ever advertized in the past for such a Candidate,

and especially, whether any Corporation had, in the last 3 years

REPEATEDLY ADVERTISED FOR SUCH CANDIDATE.

If so, it is a proof that they either need such a person NOW or will need him in FUTURE.

And We want to tell such an advertiser that we have their MAN-FRIDAY in our database, whenever they need!

All of these correspondence should get done

automatically (without human intervention)
daily
over email}/\text{fax

without revealing the identity of the Candidate or the identity of his present employer.

In last 7 months, we have already Scanned}/\text{OCR}'\text{ed \& converted to text file, over 12000 Job-Advts

and this database is growing at the rate of 1000/month.

For each of these, following fields (Search parameters) are already entered

Industry
Function
Designation

For last 4/5 months, we have also started Keying-in

4. Name of Contact person (or Designation of Person to whom to apply)

5. Company (Advertiser) Name

6. Company Address

7. Phone / Fax / e-mail

Entry of these Fields is currently being done by "keying-in". This is a very time-consuming \& unproductive method.

Considering that

this database Creation is our "CORE" activity}/\text{business process
this activity will grow a dozen-times or even a hundred-times in the months and years to come (as we cover all magazines/newspapers in India \& abroad - which carry job advertisements),

we must immediately "re-engineer" this process (BPR} \text{!).

We have therefore, no option but to scan}/\text{OCR}/\text{text and data-capture thru DRAG \& DROP WIZARD, all the

JOB-ADVERTISEMENTS

as well.

I, therefore, earnestly request you to incorporate this in MODULE 1 right now.

In fact, ADVERTISEMENT DATA CAPTURE SCREEN should be so flexible/generalized, that tomorrow we can use it for any other (type) of advts to capture essential data about

"Who" is the advertiser
"What" (product or service) is being advertised
"To whom" it is "targetted" (Readership)

Advts which I can off-hand think about are

Univ/college admissions
Coaching classes
Training Institutions (esp. Computer Training)
Tenders for Supply of goods
" for Construction contracts
" for Consultancy
" for Maintenance
Adut. for Sale of Consumer Goods/durables
" Engineering Goods
" Software.
" Services (eg. Telephone)
" } = \text{Hospital
" } = \text{Advertising

When we met on 28/5/98 (to discuss your typed proposal on MODULE #1 / WIZARDS / OVERALL STRUCTURES), we agreed upon following SCHEDULE:

Serial No	MODULE DESCRIPTION	Earliest Start Date	Time to Complete	Likely Overflow
1	Data Capture \& Search	28/5/98	6 weeks (2 for design, 2 for coding, 2 for testing)	2 weeks.
2	Order Execution \& All Features of CONTEXT CARTRIDGE	30/7/98	8 weeks.	2 weeks

I would request you to quickly draw-up a most realistic TIME-TABLE for the remaining modules and Send me your proposal. In one of the modules (NO. 2 ?) you must incorporate CORPORATE DATABASE creation as well.

With regards

H. E. P.

16/7/98

COMMENTS ON MODULE 1

30/4/98

"WIZARDS" diagram (Data Processing)

Manual "DB Entry" should be replaced by "Entry thru ARDIS".

If ARDIS cannot be ready (as I suspect) within the next 4 weeks, then, at the least, we should definitely get the Context Cartridge to generate/extract 16 "themes" from each OCR'd resume and put these in respective database fields.

We MUST eliminate manual (Keyboard) entry of DB fields.

A common "PEN Allotment Software" must

ensure avoidance of duplicate Numbers being allotted (we should simply assume that each candidate has earlier sent to us his typed resume and that each of them may have been allotted a PEN already)
A kind of "default" condition.
ensure that the Correct/appropriate "Serial" is used, depending upon

the "Source" of EDS, Viz.

Internet - } 3 \text{ million Series
Extranet - } 2 \text{ " "
EDS on floppy - } 1 \text{ " "
Hard Copy - } 0 \text{ " "
Non Members - } 4 \text{ " "

Fields (Themes (as found in Hard Copy of EDS))

(in the order of Importance)

Industry
Function
Education
Product/Service Exposure
Languages
City/Country Preference
Knowledge/Skills. (P-2 "My Search Profile" of EDS)
Birth Date/Age
Designation
Experience
Current Company-Name
Past Employer Name
Gross Annual Salary
Keywords
Name of Executive
PEN
Resume Date.

Image Files

It would be highly desirable to be able to Search Image Files by

Name of Executives
PEN

Within 2/3 years, harddisk (or any other RANDOM ACCESS MEMORY) will be so huge, that, instead of storing image files, on } \text{50/75 } \text{ CDs } \text{ (of } \text{650 MB each), we would be storing all 50,000 } \text{ image-files on a single storage device, which could be randomly accessed.

When this happens, it would be highly desirable to be able to quickly "locate \& View" the image of any resume in matter of seconds.

This would be extremely useful when a Candidate calls over phone and says, "I am Mr ......."

"My PEN is ......."

At that moment each \& every consultant in our office should be able to instantly view the image (of resume) \& add his remarks/annotations as he is

talking to the Candidate over phone \&

giving him some "feedback"
getting from him some "feedback" (eg. interest)
giving him some "instructions" (eg. interview)

Every telephonic conversation with each \& every Candidate gets "recorded in the image file". This would enable all consultants to know precisely what has transpired (a kind of JANAM-KUNDLI), so that there is no duplicate/wasted effort.

Perhaps, instead of on image-file, all of these "recording/annotations" could be made on txt files (which, I suppose, will in anycase be on a Single hard-disk even now).

Time will soon come, when these "Recordings/Annotations/Remarks" will be carried-out thru SPEECH RECOGNITION SOFTWARE, as a Consultant is talking to a Candidate on phone.

SPELL-CHECK

This is the slowest of the manual processes (database entry by keying-in, is another).

This process must be automated.

OCR softwares cannot "automate" spell-check, because OCR software has no "knowledge/clue" of the "context" in which a particular word is used/employed in a sentence.

But ARDIS will have this "knowledge".

In fact, I have already categorised over 15000 most commonly used words (in a resume).

Once our 50,000 resumes have been scanned/OCR'ed, then we would perhaps have, with us, a database of over 5 Million english-language sentences.

It would be a Simple exercise to find-and list,

all the sentences in which, each of the above-mentioned } \text{15000 words, has appeared.

So, let us say, each of these words have appeared in 300 sentences.

And, we anyway know the correct "spelling" of each of these 15000 words.

So next time, ARDIS comes across (in any scanned/OCR'd resume), "any sentence" which closely "resembles" one of the 300 sentences, we know, what precise word, the writer has intended to use.

We establish the "context" statistically.

Having done that, we tell the software to "Correct" the spelling of the words

so we eliminate manual "spell-check".

Quality Control.

Can we have a "tag" which will tell us the name of the "operator" who scanned/OCR'd/spell-checked a particular resume?

This way, if there are any mistakes of

spell check
Database entry,

we could instantly know whom to catch!

The very knowledge that each resume is being so "tagged" will ensure that the operators are extra careful not to make any mistake!

NON-MEMBERS

These are executives who have NOT registered with us i.e. they have not given us their Resumes.

a. Perhaps they are so old or so senior (e.g. Company Chairman/M.D./Owner etc) that they are never likely to be looking for a job.

b. They are not looking for a job "currently" but, perhaps, would not mind looking at an "opportunity" if presented.

There are hundreds of thousands of executives falling under category (b).

We have bare-minimum information about them. But we still wish to enter this min. information - and their name - in our database, so that over a period of time, we could keep track of their "movements" (from one company to another) and be able to "headhunt" them if \& when need arises.

Therefore we decided to give them 4,000,000 PEN series.

Mr. XXX.

PEN: 4,000,050.

Designation	Company	Source of Info.	Date.	Remarks
1
2
3
4

Available Data

What happens when such an executive (non-member) registers with us at a future date, and becomes a "member"?

I think we should continue with the same PEN (maybe) with an **asterisk *** to indicate that he has now become a "member" and therefore, we have his full resume somewhere.

QUERY

Industry
Function
Edu. Quali.

List for each of these could contain 200 names.

It would be too tedious \& tiring for any Consultant/Headhunter to scroll thru a long list to locate that "name" which is "most appropriate" to the SEARCH-ON-HAND.

A compromise formula could be:

Consultant/Headhunter manually types/enters a word, eg. Auto

Industry

Explore

Immediately he sees a drop-down list as follows:

Industry
Auto	O
Automobile
Car
Truck
Vehicle
2 Wheeler
"
"
"
Tractor
Scooter
Motor Cycle

An executive could have mentioned in his online}/\text{offline resume that he belongs to anyone of the above-mentioned "industries".

So a Consultant/Headhunter can "narrow-down" or "enlarge" his search, by clicking on ONE OR MORE of the industry-names appearing in the drop-down list, successively, till he finds his MAN-FRIDAY.

This way, he does not have to scroll thru 200 industry-names before locating the "most appropriate" name.

He starts by typing a broad/generic name and then quickly zeroes in on the most appropriate name from the DROP DOWN LIST.

This process could be "reversed" for an

Executive trying to enter his resume Online

data-entry operator trying to decide THE MOST APPROPRIATE search parameter (Industry/Function/Edu. Quali.) for a candidate.

DATA ENTRY / QUERY

We must enter the "Source" as far as resumes received from "Associates" is concerned. Please ensure a provision for this.

If an associate's candidate gets placed, the associate is entitled to his share of our professional fees. So we must know the "source" in such cases.

QUERY

As far as "Students/Fresh Graduates" EDS on floppy only will be accepted from such personal or extranet ds. Pl. see my exhaustive notes on the

SEARCH PARAMETERS FOR FRESH GRADUATES.

These will have to be incorporated in your query package.

Translate

Saturday, 17 November 2001

CANDIDATE DATABASE

No comments:

Post a Comment