Hi Friends,
                                              Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do.
                                                  There is just no time to look back, no time to wonder,"Will anyone read these pages?"
                                       With regards,
                                       Hemen Parekh
                                       27 June 2013

Sunday, 24 November 1996

BASIS FOR WORD RECOGNITION SOFTWARE

Any given word (a cluster of character) can be classified (in English) into one of the following "Categories:-
WORD /  Verb/  Adverb/  Preposition / Adjective / Noun /Common Noun/ Proper Noun
So the first task is to create a "directory" of each of this category. Then each "word" must be compared to the words     contained in given directory. If a match occurs then that WORD would get categorized as belonging to that category. The process has to be repeated again and again by trying to match the word with the words contained in each of the categories TILL a match is found. If no "match" is found, that word should be separately stored in a file marked.   "UNMATCHED WORDS" Everyday, an expert would study all the words contained in this file and assign each of these words a definite category, using his "HUMAN INTELLIGENCE" In this way, over a period of time, the human intelligence will identify/ categories’ each and every word contained in ENGLISH LANGUAGE. This will be the process of transferring human intelligence to computer. Essentially the trick lies in getting the computer (Software) to MIMIC the process followed by a human brain while scanning a set of words (i.e. reading) and by analyzing the "Sequence" in which these words are arranged, to assign a MEANING to each word or a string of words (a phrase or a sentence). I cannot believe that no one has attempted this before (especially since it has so much commercial value). We don't   know who has developed this software and where to find it so we must end - up rediscovering the wheel ! Our computer files contain some 900,000 words which have repeatedly occurred in our records - mostly coveted bio - data’s or words captured from bio - dates. We have, in our files, some 3500 Converted bio - data’s. It has taken us about 6 years to accomplish this feat
i.e.  Approx  600 converted biodatas / years  OR  Approx 2 biodatas converted every working day !
Assuming that all those (converted) bio data’s which are older than 2 years are OBSOLUTE, this means that perhaps no more than 1200 are current / valid / useful !
So, one thing becomes clear The "rate of Obsolescence" is faster than the "rate of conversion" !  Of course, we can argue, "Why should we waste / spend our time in "converting" a bio - data ? All we need to do is to Capture the ESSENTIAL  / MINIMUM DATA (from each biodata_ which would qualify that person to get searched / spotted. If he gets short listed, we can always, at that point of time, spend time / effort to fully converted this bio - data .in fact this is what we have done so far - because there was a premium on the time of data - entry operators. That time was best utilized in capturing the essential / minimum data. But if latest technology permits/ enables us to convert 200 biodatas each day (instead of just 2 biodatas with the same effort/ time/ cost, then why not convert 200? why be satisfied with just 2 day ? If this can be made to "happen", we would be in a position to send - out / fax - out e : mail, converted bio - data’s to our clients in matter     of "minutes" instead of "days" - which it takes today !    That is no all A converted bio - data has for more KEYWORDS (Knowledge - skills - attributes - attitudes etc) than the MINIMUM DATA. So there is an improved chance of spotting the RIGHT MAN, using a QUERY which contains a large no. of KEYWORDS. So, to - day, if the clients "likes" only ONE converted bio - data, out of TEN sent to him (a huge waste of  everybody's time/ effort), then under the new situation he should be able to "like" 4 out of every 5 converted bio - data’s sent to him !
This would vastly improve the chance of at least ONE executive getting appointed in such assignment. This should be our goal. This goal could be achieved only if,
 Step  # 1.Each biodata received every day is "scanned" on the same day
 step  # 2. Converted to TEXT (ASCII)
step  # 3. PEN given serially
step  # 4. WORD - RECOGNISED (a step beyond OCR - Optical - CHARACTER recognized)
step  # 5. Each word "categorized" and indexed and stored in appropriate FIELDS of the DATABASE.
step  # 6. Database "reconstituted" to create "converted" biodata as per our standard format
Step 1/ 2/ 3 are not difficult , Step 4 is difficult, Step 5 is more difficult , Step 6 is most difficult  But if we keep working on this problem, it can be solved  50% accurate in 3 months , 70 % accurate in 6 months, 90% accurate in 12 months.
 Even though there are about 900,000 indexed WORDS in our ISYS file, all of these do not occur (in a biodata/ record) with the same frequency. Some occur far more frequently, some frequently some regularly, some occasionally and some rarely. Then the course (in the English language) there must be thousands of other Words, which Love not occurred EVEN ONCE in any of the biodatas. Therefore we won't find them amongst the existing indexed file of 900,000 words. It is quite possible that some of these (so far missing words( may occur if this file (of words) were to grow to 2 million.
 As this file of words grows and grows, the probabilities of :-
·     A words having been left out  and
·     Such a left - out likely to occur (in the next biodata) are "decreasing"
 Meaning, Some 20% of the words (in English language) make - up may be 905 of all the "Occurrences".     This would become clear when we plot the frequency distribution - curve of the 900,000 words which we  have already indexed. And even when this population grows to 2 million, the shape (the nature) of the frequency distribution curve is NOT likely to change! only with a much large WORD - POPULATION, the "accuracy" will marginally increase. So our search is to find, Which are these 20% (20% X 9 Lakh = 180,000) Words which make - up 90% "area under the curve" i.e. POPULATION? Then focus our efforts in "Categorizing" these 180,000 words in the first place If we manage to do this, 90% of our battle is won. Of course this pre - supposes that before we can attempt "Categorization", we must be able to recognize each of them as a "WORD" 6 yrs down the line (Since writing this note), I feel this no. is no more than 30,000 words!
 COMPANY
SIMILAR MEANING WORDS
 Firm/ Corporation/ Organization/ Employer/ Industry (Misnomer)
 ASSOCIATED WORDS
 Name of (Company)/ Company (Profile) /Present/Current/Past /(Company) Products / (Company) Structure/ (Company) Organization.
CAREER
 Career Path/ Career History /Career Achievement/Career Growth/ Career Objective/ Career Progression /  Career Information/ Career Details/ Career Development/ Career Goal/Career Interest/Career Nature/  Career Profile/ Career Record.
 Associated Words
 Past/ Present / Professional/ Academic / Previous/ SIMILAR MEANING WORDS/ SERVICE
CURRICULAM
SIMILAR MEANING WORDS
 Course / Subjects/ Topics
RELATED WORDS
 Academy/ Scholastic / Education/ research / Exam/scholarship/ Graduation/training/  Honors/teaching / Institution/ University/ College/ Degree/ Diploma / Certificate/ Learning / Pass /Passing / Year of passing / Project / Training/ Qualifications
 DEPENDENTS
 Associated Words
Family/ Father / Mother / Brother/ Sister/ Wife / Children/ Son/ Daughter
 EDUCATION
 Education (al)/ Educational Qualifications/ Qualifications/ Academic Qualifications/ Technical Qualifications.
 Associated Words
 Qualification / School/ Degree/ Diploma/university / Graduate/ Graduation/Institution/ Doctorate/ Certificate / Curricular/ Course/ Exam/ Topics/ Subjects/ Electives / Under – Graduate/Fellow/ Honors/ Distinction / First Class/ Grade Point Average (GPA)
 EXPERIENCE
 Employment experience/Work experience / Job experience/ Professional experience/ Current  experience/ Past experience/. Present experience/ Relevant  experience/ Industrial / Industry experience/ Teaching experience / Details of experience /Foreign experience/ Factory experience/ Global experience/ Management experience / Site experience/ Major experience / Practical experience/  Research experience/ Service experience/ Training experience/ Technical experience
 EMPLOYER
 Company/ Firm /Organization/ Corporation
 RELATED WORDS
 Present / Current/ Past/ Career/ Job/ Service/ Name of
 EMPLOYMENT
 Employment Particular / Employment Past / Employment Present/ Employment Current/ Employment Record/ Employment History / Employment Existing /. Employment Data/ Employment Nature/ Employment Period
 FUNCTION
 Responsibility / Duty/Job/ Past / Management/ Present/Description/ Existing / Profile/ Current/ Skills (associated with) /Con – current/ Structure (Functional) / Major / Organization (Functional) / Minor /Technical/ Nature of/ Reports to
 FACTORY
 Plant / Site/ Works /Manufacturing location
 INFOMRATION
DATA / KNOWLEDGE / DATABASE/ DATA SHEET/ Processing/current Collection /Past/ Retrieval/ Personal/ Analysis /job Related/ Category/ Work Related/ Career/ Additional/ Details/ Institutional/ Compilation/ Particular/ Field of/ General/ Industry (IT industry) /Nature of/ Purpose of/ Product / Project related/ Organizational/ Service/ State of / Dissemination/
EXECUTIVE
Employee/ Worker / Work man/ Supervisor/ Officer/ Manager / Data sheet/ Profile/ Staff Company/ Workforce/ Responsibility Position/ Status/ Search /Skills/ Selection/title Placement/designation/ Interview/ Bio Data /Execute/ Exposure  Resume    /Post/ Salary /Compensation/ Training /Experience

h.c.parekh

BASIS FOR WORD RECOGNITION SOFTWARE