·      A large no. of words (usually common nouns or
adverbs) are accompanied by "adjectives" 
·     Normally
the objective precedes the verb or noun. 
·     A
bio - data (resume) is not a piece of fiction/ literature – hence
a.  Most
frequently occurring "common nouns" & "adverbs" are
perhaps no more than a few hundred
b.  Most
frequently occurring "adjectives" which precedes these common nouns
& adverbs are also no more than few hundred. 
c.  Number
of probable combinations of these words & these adjectives is also quite
limited. 
d.  Words
- phrases - Sentences are largely "Statement of Facts" and no
"figments of imagination" hence repetitive in nature and content
e.  The
"sequences" in which these 
words - phrases appear (to make up a sentence) is fairly "Well -
defined" with very little "variations" 
 All of the above
makes it reasonably simple to devise a RESUME - GENERATING SOFTWARE. As
mentioned in one of my earlier (concept) notes, what we need to do is to 
·     Take
a large number of biodatas 
·     Scan
/ OCR/ Index all the words appearing in these bio datas 
·     Study
& record the "occurrences" (the frequencies) of 
                    
I.     
Each
word  (Verb - adverb - adjective - preposition
noun)
                  
II.     
Set of any
two words (prefixed / suffixed to a given word) 
                
III.     
Set of any
three words 
                 
IV.     
So on &
So forth. 
·     We
have already created a "directory" of some 6052 words (out of a total
1 million words) which have, each occurred more than 10 times in 3500 converted
bio - datas. 
·     Soon
I will send to you (indexed) words contained in 100 originally typed bio -
datas for which scanned bit – map image files are already given to you on some
15/20 floppies. 
·     If
required, we can, everyday go on scanning 50 to 100 typed bio - datas (we have
some 35000 typed biodatas available with us) and go on 
    - Increasing the "Population" of
words
    - Improving the "frequency" of occurrence
But ideal time launch this massive scanning
operation (of 35000 bio - datas) is after the ARDIS & ARGIS are ready even
in crude form. Then the massive scanning Operation would itself become a
"self - learning / improving" exercise for the software, making it
truly "intelligent". In the enclosed pages, I have, by studying some
30/40 biodatas manually, tried to figure – out.
   - What
are commonly occurring adjectives before some nouns & adverbs?
   - What words appear, before (and in some
cases "after") following prepositions?
·     OF
·     WITH
·     FOR
·     AT
·     ON
·     IN
·     AFTER
·     TO
·     BY
·     AS
·     SINCE
·     FROM
·     UNDER
·     OUT
To dissect / analyses even a few hundred bio -
datas manually is a her clean task ! To study several thousand (bio     datas) this way is next to impossible !
But it is precisely such set and repetitive (logical) activity where the
computer hardware & software triumph over human brain. Let us harness these
technological advances and create something superior than that RESUMIX has
done.
h.c.parekh
No comments:
Post a Comment