Notes

Sunday, 30 November 2003

EXTRACTION ACCURACY QA

Kartavya / Abhi / Sanjeev

cc: Rajiv / SriRam / Nirmit

Date: 30-11-03

Accuracy of Extraction

We had a long debate on this yesterday with Rajiv, SriRam, and Nirmit.
Whereas I agree with them that we have to continuously work on increasing extraction accuracy, it is a long-drawn-out process. Each incremental increase in level of accuracy will take more and more effort. Of course, if we were to design a true Neural Network Software, this accuracy-improvement process would become totally automatic and require no human effort. We are far from that stage.
In the meantime, we cannot hold up launch/marketing of Recruitguru.
To convince potential clients that we have a damn-good product, we must take help of latest management jargon viz.: SIX SIGMA (a philosophy I used on Shopfloor of Switchgear Machine Shop, more than 40 years ago!)
Sanjeev suggested plotting a graph as shown on enclosed page — (and have it get automatically updated on the Recruitguru home screen) as proof that our extraction process not only meets but exceeds what is expected of it!

No. of Fields that could not be extracted	No. of Resumes (processed so far) falling in this category
0	10,000
1	20,000
2	40,000
3	40,000
4	10,000
5	6,000
6	2,000
7	800
8	500
9	100
10–23	(remaining few)

Kartavya

Date: 22-03-03

Improving Extraction Accuracy

Abhi told me last evening that he has planned to process/convert 1,000 email resumes today morning.

Based on this “experiment”, remaining 26,000 can be processed next week. When this is done, a tabulation such as Annex A should be prepared. It can be rearranged in the descending order of (2nd column) No. of failure cases out of 27,000.

Descending order will tell us which cases to tackle first (Priorities).

You may also consider constructing a TOOL screen, as shown in Annex B.

I have already modified the existing Resume Screen. Several human experts could be asked to work on this by examining different “failed” cases — they may be assigned to different experts.
After studying each failure, these experts should enter their comments into the middle block.

Of course, to speed up the process, Vittal / Santu may go through all “failed cases” in advance and only forward those cases to experts where the value does exist but the software failed to extract the same.
Then this tool will help us to capture the knowledge of several experts.