INPUTS
Yogesh/ cyril
· Texts
(A)
Printed / Typed Documents
(B)
Electronic Documents
Examples of
Texts
(A) Printed
Documents
· Bio
- datas (Typed)
· Job
- Advts
· Magazine
/ Newspaper Articles
· Directories
(Kothari /IRIS etc)
· Directories
(CA/ CS Membership)
· Bullletins
(Bapuji Impex/ Domex)
(B) Electronic
Documents
· Databases
· Bio
- datas received over Internet / Extranet
· E
: mails
· Files
on Floppies
· Files
on CDs
· Files
on Hard - Disks
· Files
on Tapes
· Voice
/ Speech Converted to text (thru speed - recognition software)
PROCESS
· Scanning
of document (in a "scanner", if Printed/ typed document)
· Electronic
Scanning (if in electronic - file format)
· OCR
(for printed / typed document)
· Spell
- check / Automatic Spelling Correction
· Search
/ Identify/ Pick - up "KEYWORDS" from each document / file
· Give
each document a unique "number" (this will be PEN for resume's /
Advertisement Nos. etc)
· Link
key - words with document
· Assign
"meaning" to each keyword (based on context)
· Store
each keyword in relevant "Meaning - lines / Fields" to create a database”
· For
each keyword, create directory of "Synonyms & Autonyms"
· Create
a continuously up - date "tables" re : frequency - of - usage of each
keyword
· Create
and continuously up - date "tables" re : frequency - of - usage of
all words (not only keywords) used "before
& after" each keyword to create "CONTEXT PROBABILITY for
each keyword.
· Repeat
above process for all the "Phrases" and "Sentences" in which
a given KEYWORD has been used to establish "CONTEXT PROBABILITY" of
phrase/ Sentences.
· If
a Keyword has been used in phrase / sentence having a very low "context
Probability" then replaces that phrase/ sentence by a phrase / sentence of
the highest "Context Probability".
OUTPUTS :-
· Databases
of
· Keywords
· Synonyms
· Antonyms
· Bins/
Fields
· Phrases
/ Sentences
· "Frequency
of Usage" tables
PRINTED OUTPUTS:
· Converted
Bio - datas (Full)
· Brief
bio - datas (brief outline)
· One
- line tabulations
· Standard
Letters/ Responses
· Short
- lists of suitable executives.
h.c.parekh