Notes: AGGREGATION

Monday, 1 January 2001

AGGREGATION

AGGREGATION

01-01-01

Enclosed note is 2 years old but still valid.

When it was written 2 years ago, the objective was to:

Compile in advance (offline), “search-results” for hundreds or thousands of “combinations” of “search-parameters” for a resume-search.

This was expected to result in a FASTER DISPLAY of “search-results” – even if such an advance compilation failed to take into account (i.e. take cognizance of) those resumes which got submitted during last 24 hours (Since we are not talking of hosting our site on a SUPER COMPUTER!).

If our resume database was to grow to a MILLION RESUMES, someday, this objective (of fast display of search-results) would become VERY VALID, since a large resume database would tend to SLOW DOWN the “search-process”.

Another alternative is to apply “sequential filtering” whereby the whole database is continuously being filtered:

Industry wise
Function wise
Skills-wise (Especially for IT resumes)

Such continuously filtering (automatic) would create “SUB-SETS” of main database into several smaller “databases” so that, when corresponding/relevant “filter” is applied, the search is conducted ONLY on that SUB-SET.

These SUB-SETS can be pre-organised,

Industry wise
Function wise
Skills wise.

In the current context, a compilation/aggregation of

Resume Search Queries (DEMAND SIDE)

would tell us – and the jobseekers – WHAT kind of executives are currently in demand

($\text{— a kind of **TREND-ANALYSIS** of jobs available}$)

But if we wish to show to jobseekers a JOB-TREND, we need to aggregate RESUME-SEARCH QUERIES MONTH-WISE.

Industry Wise Name	% of resume-search Queries, MONTH WISE
	Jan
Automobile
Banking
Chemicals

We could repeat such a display-table for

Functions &
Skills (for IT guys).

Such a trend-analysis would also help “students” in their CAREER PLANNING.

To experienced professionals, it would help in re-orientation of their existing CAREERS by acquiring latest/most-in-demand skills.

Such a “demand-trend” would also help us in focussing our efforts on getting resumes of particular niche/group of jobseekers, who are in high demand.

Job Search Queries (SUPPLY SIDE)

A Similar analysis (as foregoing) of all the “job-search queries” shot by job-seekers.

This reveals the situation from the SUPPLY SIDE.

Obviously, candidates are searching for jobs in those Industries/Functions/skills

where they consider themselves EXPERTS/SUITABLE based on their past/current experience.

So such an analysis (of Job queries) reveal the AVAILABILITY OF CANDIDATES.

This (knowledge) would help us to promote such candidates PRO-ACTIVELY amongst industries/companies most likely to require such candidates.

It would (indirectly) help RECRUITERS to know what kind of jobs candidates are searching and therefore, the availability of such candidates.

Both types of QUERIES can also be aggregated ($\text{& displayed month-wise}$) for

CITY OF POSTING ($\text{to guide candidates where are most jobs coming up}$)
CITY OF PREFERENCE ($\text{to guide recruiters, where to put-up their next office/factory!}$)

Such a data-mining "BY-PRODUCT" has great utility.

Cypil 9-1-99

Headhunt / Job-Search

Searching results at “aggregate” level ($\text{Resume Search}$)

If we "store" all headhunt $\text{&}$ job-search queries, then after a couple of years (may be even one year) when we have accumulated a "database" of several thousand queries, we could conduct a STATISTICAL ANALYSIS as follows:

Filter	% of times that this Filter was FIRST CHOICE	% of Time it was SECOND CHOICE
Industry
Function
Designation
Edu. Quali
Age
Exp.

over a period of time (sufficiently long-time), we are obtaining better $\text{&}$ better "probability" for each of the above-mentioned events.

Thereafter ($\text{or simultaneously}$),

Within a given filter (say "Industry"), we can build-up further probabilities of

Probability for occurrence of $\text{— "**Auto Industry**" is} \quad \square$

$\text{— **FMCG**} \quad \square$

$\text{— **Chemical**} \quad \square$

$\text{— **M/c Tool**} \quad \square \text{ etc. etc.}$

$\text{1.00:}$

Process ($\text{of building-up individual probabilities of occurrence}$) is CONTINUOUS / ON GOING / EVER REFINING.

Having established the probability of the occurrence of a GIVEN “COMBINATION” of FILTERS $\text{&}$ SUB-FILTERS,

We can arbitrarily "trigger" searches against each "combination" and keep the search-results "READY/WAITING" for someone to shoot that specific "query".

Then we can flash the results "instantly", processing having been done "offline".

To the Surfer, this would appear miraculously FAST!