Notes: LOGIC FOR HOW TO CONSTRUCT PHRASES?

Tuesday, 10 December 1996

LOGIC FOR HOW TO CONSTRUCT PHRASES?

· A large no. of words (usually common nouns or adverbs) are accompanied by "adjectives"

· Normally the objective precedes the verb or noun.

· A bio - data (resume) is not a piece of fiction/ literature – hence

a. Most frequently occurring "common nouns" & "adverbs" are perhaps no more than a few hundred

b. Most frequently occurring "adjectives" which precedes these common nouns & adverbs are also no more than few hundred.

c. Number of probable combinations of these words & these adjectives is also quite limited.

d. Words - phrases - Sentences are largely "Statement of Facts" and no "figments of imagination" hence repetitive in nature and content

e. The "sequences" in which these words - phrases appear (to make up a sentence) is fairly "Well - defined" with very little "variations"

All of the above makes it reasonably simple to devise a RESUME - GENERATING SOFTWARE. As mentioned in one of my earlier (concept) notes, what we need to do is to

· Take a large number of biodatas

· Scan / OCR/ Index all the words appearing in these bio datas

· Study & record the "occurrences" (the frequencies) of

I. Each word (Verb - adverb - adjective - preposition noun)

II. Set of any two words (prefixed / suffixed to a given word)

III. Set of any three words

IV. So on & So forth.

· We have already created a "directory" of some 6052 words (out of a total 1 million words) which have, each occurred more than 10 times in 3500 converted bio - datas.

· Soon I will send to you (indexed) words contained in 100 originally typed bio - datas for which scanned bit – map image files are already given to you on some 15/20 floppies.

· If required, we can, everyday go on scanning 50 to 100 typed bio - datas (we have some 35000 typed biodatas available with us) and go on

- Increasing the "Population" of words

- Improving the "frequency" of occurrence

But ideal time launch this massive scanning operation (of 35000 bio - datas) is after the ARDIS & ARGIS are ready even in crude form. Then the massive scanning Operation would itself become a "self - learning / improving" exercise for the software, making it truly "intelligent". In the enclosed pages, I have, by studying some 30/40 biodatas manually, tried to figure – out.

- What are commonly occurring adjectives before some nouns & adverbs?

- What words appear, before (and in some cases "after") following prepositions?

· OF

· WITH

· FOR

· AT

· ON

· IN

· AFTER

· TO

· BY

· AS

· SINCE

· FROM

· UNDER

· OUT

To dissect / analyses even a few hundred bio - datas manually is a her clean task ! To study several thousand (bio datas) this way is next to impossible ! But it is precisely such set and repetitive (logical) activity where the computer hardware & software triumph over human brain. Let us harness these technological advances and create something superior than that RESUMIX has done.

h.c.parekh

Translate

Tuesday, 10 December 1996

LOGIC FOR HOW TO CONSTRUCT PHRASES?

No comments:

Post a Comment