Notes: ACCURACY OF EXTRACTION

Saturday, 30 August 2003

ACCURACY OF EXTRACTION

Kartavya / Abhi

cc: Sanjeev

30/09/03

Accuracy of Extraction

I refer to our yesterday’s telecon.

We should develop an “Extraction Accuracy Index” and plot it as shown in enclosed graph.

There should be

A separate / individual graph for each subscriber that he alone can see (after log-in) either on Extraction-page or in Admin-Tool. I prefer to show it on his Extraction-page itself (by default), so that he is constantly alive (made aware) of the level of Accuracy of his own PRIVATE database.

Graphs should reset automatically after each batch-extraction gets over. We may even set a bottom-limit of batch-size. For example, it may be reset only if batch-size is > than 1000. Real “bottom-limit” means that the graph will change every time CUMULATIVE COUNTER of resumes extracted crosses 1000, 2000, 3000, 4000 etc. So the batch-size can be any no.

A second graph shall be cumulative / combined for all subscribers put together. This will be a PUBLIC graph which everyone can see, at anytime.

Viewing of this COMBINED graph must not be restricted to subscribers only. It should also be made visible to stray / casual visitors (also interested corporates or jobseekers). In fact, we should position it in such a manner that ALL visitors are attracted towards it. We want this graph to “shout from the rooftop!”

This (graph) would be a powerful “marketing-tool”. It should become a conversation piece amongst Recruitment Managers wherever they meet. This graph should show:

Total No. of Subscribers 265

Highest No. of Resumes / Subscriber 49,985

Lowest No. of Resumes / Subscriber 235

[Hand-drawn graph showing Accuracy Index vs Cumulative No. of Resumes Extracted]

We are trying to convey that Resumegum has some very BIG subscribers and some TINY ones too!

Construction Details

As far as X axis is concerned, it will need to be dynamically changing & automatically too. This is because, with each batch, cumulative no. is changing.
As far as Y axis is concerned, it could be either fixed (0 to 100%) or, it too, could be changing!

At the beginning (when, say, FIRST batch gets processed), the end-accuracy returned may be 60-90% and as you keep adding batches, this will rise to 65%-70%-75%-80%-85% etc.

If we keep Y axis 0% to 100%, then the graph-line will be as in (A) enclosed. This is not a good image. A better image can be seen, with Y axis, as shown below:

[Hand-drawn graph showing Accuracy Index on Y axis (60%, 70%, 80%) vs Cumulative No. of Resumes Processed]

After a while, Y axis could well become:

70% to 90%
then
80% to 95%
& further to
85% to 100% etc.

as “Cumm. No. of resumes processed” grows.

This presentation will look much better.

As far as “weightages” to be allotted to the different fields are concerned, pl. keep following in mind:

As “pioneers” in this field, WE must set the “Rules of the Game”.
And we must ensure to set these rules in such a way that “the dice is loaded in our favour”. You know that in Casinos all over the world, the slot-machines are built-in where they’re so programmed that in the long term it is the Casino-owner, who always wins!

Once we have set the rules, any new entrant/new competitor would be forced to play the game by OUR rules!

Now what rules will favour us when it comes to computing/plotting

Extraction Accuracy Index?

This should be a rule in which the competitor should “fail” miserably — and where WE are the Undisputed Winners!

So, how about giving “high” weightages to those fields (which a competitor would find more difficult to extract) and “low” weightages to those fields (which nearly all extraction software — commercially available — will succeed in extracting)?

We may even display these field-wise “weightages” to a subscriber (but not to general public or a potential subscriber).

As you said, these “high-weightage” fields could be: