Notes: SELF LEARNING SOFTWARE

05/09/2003 – 17/09/2003

Kartavya / Abhi / Sanjeev

Self-Learning Software

How does a one year old child learn to differentiate between colours Red & Blue, and beyond that between different shades of Red?

This is another way of asking

“How does Learning take place? What steps are involved in the learning process? “

There are no fool proof / ironclad / undisputed scientific theories. But the empirical evidence leads us to believe that the process (of learning), occurs, somewhat as follows:

A mother points a finger at a colour and speaks aloud “RED”. The sound is captured by the child & stored in his memory.

This process is repeated a thousand times and with each repetition, the memory gets etched, deeper & deeper.

An “association “develop between the colour & the sound.

Then the process is repeated with colour BLUE & another memory gets “etched “deeply.

So, on 1001 occasion, when a colour patch is shown to child & question asked,

“What colour is this? “

Child says “RED “perhaps, even without understanding the question (then meaning of the question).

There is, merely, an “association “between what child SEES (sight) & what child HEARS (sound)

The process can be repeated by,

Ø Showing RED colour patch , and

Ø Showing a placard (flag), with RED written on it in big / bold Letters.

Now child “associates “the patch (SIGHT) with placard (also another SIGHT). No Sound.

So, next time a child is shown patch of red colour, he will pick up the sign / placard, learning word RED.

Input (Sound)

→ Brain → Association / Memory (stored)

Input (Photo) (SIGHT)

→ Brain → Association / Memory (stored)

So, next time, what happens?

(First Diagram)

Input (Red patch) (SIGHT) → Brain ← Input (Recall from Memory / compare / database search)*
→ Output (SOUND) → “RED”

(Second Diagram)

Input (SIGHT – i.e., RED patch) → Brain ← Input (SIGHT – Recall from Memory / database – words “RED” & match letters)

→ Output (SIGHT) → Pickup flag bearing letters “RED”

Remember that two MAIN inputs to a brain are

Ø Sight ( Eyes) ----- 80% of learning takes place here

Ø Sound ( Ears ) --- 10% of learning takes place here

Of course, there are other, relatively minor inputs of

Ø Touch / Feel ( Skin ) Balance 10% of learning

Ø Smell ( Nose) takes place thru this

Ø Taste ( Tongue) INPUT – DEVICES

In the examples listed earlier, MOTHER acts as a human expert, who initiates the learning – process by establishing “references / the bench-marks.”

In essence, she uses the process (of showing patch & speaking aloud or showing patch & showing placard), to transmit her OWN EXPERT KNOWLEDGE to the child.

So, all knowledge flows events from a GURU!

You can even watch events & learn – without a single word being uttered!

You can close your eyes & listen to music & learn – without seeing who is singing!

Then there was Beethoven who was deaf but composed great symphonies which he himself, could not hear! But this is an exception.

What is the relevance of all this to “self-Learning Software?”

Simple,

If we want to develop a software which can identify / categories a “resume”, as belonging to

VB C++ etc…..

Then all we need, is to “show” to the software, 1000 resumes and speak aloud,

C++ !

Then 1001^st time, when the software “sees” a similar resumes, it will speak-out loudly

C++ !

So, first of all, we need a human expert – a GURU, who, after reading each resume, shouts

C++ or VB or ASP etc. etc……..

When Guru has accurately identified segregated 1000 resumes each of C++ etc…..

We take those sub-sets & index their Keywords, calculate “frequency of occurrence “of each of those keywords & assign them “weightages” (probabilities).

Then we plot the graphs for each subset (I .e. each “skill”)

Then, when we present to this software any / next resume, it would try to find the keywords. Let us say, it found 40 keywords. Now let us compare these 40 keyword-set, with

Ø VB Keyword-set

Ø C++ Keyword-set

Ø ASP Keyword-set

& see what happens

First Scenario (First Match)

(simple Venn-diagram illustration)
New Resume ∩ VB Keyword Set

→ only 10% match

Second Match

(Venn diagram labeled “New Resume” and “C++ Keyword Set” — overlap about 30%)
→ 30% match

Third Match

(Venn diagram labeled “New Resume” and “ASP Keyword Set” — overlap about 50%)
→ 50% match

We ( i.e. software ) has to keep repeating this “ match-making” exercise for a new resume, with

ALL THE KEYWORDS – SETS

Till it find the highest/ best match.

BINGO

The new resume belongs to an “ASP” guy!

(Self-learning Software – cont.)

That way the FIRST METHOD, where a human expert reads thru 30000 resumes & then regroups these into smaller sub-sets of 1000 resumes-each belonging to different “skill-sets”

This will be a very slow method!

SECOND METHOD

Here, instead of a (one) expert going thru 30000 resumes, we employee 30000 experts the jobseekers themselves!

Obviously, this METHOD would be very fast!

Underlying premises is this.

No one knows better than the jobseeker himself, as to what precisely is his CORE AREA OF COMPETENCE / SKILL.

Is my skill

· VB

· C ++

· ASP

· .Net

So, if I have identified myself, as belonging to VB OR C++ OR ASP etc. etc….

Then you better believe it!

Now, all that we need to do, is to take 1000 resumes of all those guys who call themselves

And find “keywords” from their resumes (& of course, weightages)

If there are job sets where software guys are required to identify themselves by their “ skills”, then best course would be to search resumes on these jobsites by skills,

Then download the search-result resumes! Repeat this search/download exercise for each “skill” for which we want to develop “skill – graphs”

This approach is fairly simple and perhaps, more accurate too.

But,

Ø We have to find such jobsites & then satisfy ourselves that “ Skill-wise” searching of resumes ( and downloading too ) is Possible

Ø Then Subscribe for 1 Month / 3 Month, by paying Rs.20000/40000! There is a cost factor, here

THIRD METHOD

We have, already downloaded from various jobsites 150000 job advts. For each of these we know the “Actual Designation / Vacancy-Name/Position” (thru Auto – converter)

We can re-group these advts. According to identical / similar vacancy names / actual design where we finish, we may get, against each specific “Vacancy – Name’

500 to 5000 Job- advts.

Call each a sub- set (Vacancy-Name-wise)

Then index keywords of each subset & calculate frequency –of-usages (weightage).

So, now, we have Profile-Graphs, which are not skill-wise, but which are “Vacancy-Name” wise!

This can be done real fast & cheap! And, may suffice Software Companies’ BROADER needs, A quick beginning can be made & Result shown within a week!!!

Kartavya / Abhi

24-03-03
(1/2)

Self-Learning Software

Our goal is to make ResuMine / ResuSearch self-learn and improve with usage.

One way is for them to capture the knowledge of hundreds of expert users.

One opportunity (for self-learning) is to study the “editing” of structured database fields by subscribers.

When subscribers find a value missing in any field, they would (hopefully) try to go through the entire text of the email resume to find that missing value.

And if it does exist (but ResuMine somehow missed it), there is a good chance that the subscriber would find it, if he has the patience — and need — to capture that value.

Having found the value, he would “insert” it in the appropriate field, via drag & drop or highlight + click (or thru whatever method we specify).

Here is our opportunity to learn.

When any subscriber carries out any such editing, we must capture:

In what field he inserted the value
Where exactly in the resume he found the value

And we must aggregate all such “editing” instances.

If we do, the following type of scenario might emerge:

Say, “Edn.” (Level-Branch-Degree-Institution) was found missing in 10,000 resume extractions.

When subscribers search for “Edn.” value in resumes, they may find the missing value in 8,000 cases and enter it.

Now, if our software tracks these 8,000 cases, it may discover that in 6,000 (out of 8,000) cases, subscribers located/picked up the value in ‘Personal Detail Block 1’.

Whereas, the logic we have used looks for “Edn.” in ‘Edu Qualification Block 1’.

Such “discoveries” add to our learning and improve our extraction logic.

In essence, we must offer the “Edit” feature free to all subscribers (since we cannot guarantee 100% extraction accuracy) — and encourage them to edit.

Since ResuMine is not a network-based software, a human developer would be required to write the code after studying the patterns (of locating missing values) that emerge from aggregation of edits.

(signature mark)

Friday, 5 September 2003

SELF LEARNING SOFTWARE

No comments:

Post a Comment