· A large no. of words (usually common nouns or
adverbs) are accompanied by "adjectives"
· Normally
the objective precedes the verb or noun.
· A
bio - data (resume) is not a piece of fiction/ literature – hence
a. Most
frequently occurring "common nouns" & "adverbs" are
perhaps no more than a few hundred
b. Most
frequently occurring "adjectives" which precedes these common nouns
& adverbs are also no more than few hundred.
c. Number
of probable combinations of these words & these adjectives is also quite
limited.
d. Words
- phrases - Sentences are largely "Statement of Facts" and no
"figments of imagination" hence repetitive in nature and content
e. The
"sequences" in which these
words - phrases appear (to make up a sentence) is fairly "Well -
defined" with very little "variations"
All of the above
makes it reasonably simple to devise a RESUME - GENERATING SOFTWARE. As
mentioned in one of my earlier (concept) notes, what we need to do is to
· Take
a large number of biodatas
· Scan
/ OCR/ Index all the words appearing in these bio datas
· Study
& record the "occurrences" (the frequencies) of
I.
Each
word (Verb - adverb - adjective - preposition
noun)
II.
Set of any
two words (prefixed / suffixed to a given word)
III.
Set of any
three words
IV.
So on &
So forth.
· We
have already created a "directory" of some 6052 words (out of a total
1 million words) which have, each occurred more than 10 times in 3500 converted
bio - datas.
· Soon
I will send to you (indexed) words contained in 100 originally typed bio -
datas for which scanned bit – map image files are already given to you on some
15/20 floppies.
· If
required, we can, everyday go on scanning 50 to 100 typed bio - datas (we have
some 35000 typed biodatas available with us) and go on
- Increasing the "Population" of
words
- Improving the "frequency" of occurrence
But ideal time launch this massive scanning
operation (of 35000 bio - datas) is after the ARDIS & ARGIS are ready even
in crude form. Then the massive scanning Operation would itself become a
"self - learning / improving" exercise for the software, making it
truly "intelligent". In the enclosed pages, I have, by studying some
30/40 biodatas manually, tried to figure – out.
- What
are commonly occurring adjectives before some nouns & adverbs?
- What words appear, before (and in some
cases "after") following prepositions?
· OF
· WITH
· FOR
· AT
· ON
· IN
· AFTER
· TO
· BY
· AS
· SINCE
· FROM
· UNDER
· OUT
To dissect / analyses even a few hundred bio -
datas manually is a her clean task ! To study several thousand (bio datas) this way is next to impossible !
But it is precisely such set and repetitive (logical) activity where the
computer hardware & software triumph over human brain. Let us harness these
technological advances and create something superior than that RESUMIX has
done.
h.c.parekh
No comments:
Post a Comment