Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do. There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Now as I approach my 90th birthday ( 27 June 2023 ) , I invite you to visit my Digital Avatar ( www.hemenparekh.ai ) – and continue chatting with me , even when I am no more here physically

Thursday, 7 June 2007

EMAIL-EXTRACTOR

7 Jun 2007

EMAIL-EXTRACTOR

Purpose of tool

the tool was created to act as a email extractor
the tool can be set up to extract email addresses from specific folders on ones hard disk

the path has to be fixed, from where u need the tool to extract emails from files stored on the disk

this is the screen that we see after the tool is setup


This dialog box shows the status of the tool


If the tool is ready for more extraction then it specifies it with the screen above

If the tool is in the process of extraction then it doesn’t allow the user to setup a new set for extraction.

DATABASE

THE DATABASE IS ACCESS

IT IS STORED AS “email-extract.mdb”

Table name ---  EMAILSTORE

Column
Data type
PKID
AutoNumber
FILENAME
Text
EMAILID
Text
UPDATED
Text

The entries are made in this table

Each and every file(word/txt/html) are stored in the access file in the EMAILSTORE table.

At the beginning the entries are with status “N” in the “UPDATED” COLUMN

And null entry in the “EMAILID” column

The tool is designed to take the top 500 entries with updated status as 

N

The set of 500 is then put through the extraction logic to extract email ids and the specific  email ids are updated in the respective row.

After completion of the set of 500 rows (dataset) the next is taken

If the tool is stopped in between ( due to power problem or forceful stop by the user or any other reason) then, the tool maintains the status in the access file and it can resume the remaining extractions.

The compact db button is allotted to “Compact and Repair Database”.

COMBINE_ALL_MACHINES.vb

This was created to merge the full data across the machines

The 3p folder that is resident on all the machines “D:\” drive was slotted for extraction

This created respective access files on all the five machines i.e

Sonal

Pranav-swati

Rahul

Yogesh

Sourabh

Now the respective access files were stored on Sourabh’s machine.


The folder is stored on the external backup hdd
F:\SOURABH\D\email-extract-all-machines-data


The respective email’s from all the five machines were picked up, and the distinct Emails were stored in the TOTAL folder, which gave us the conclusion of 2,46,000 distinct emails

get100Resumes.vb

this module was created to save top 1000 resumes per designation

It is worked upon without the setup – ie in debug mode

The requirement was raised when, it was thought upon sending some resumes along with resumerater.

the accumulated data from guru archive was used to do this. The two tables are shown below .

database : RESUME


DESIGNATION_LIST_TWO


COLUMN_NAME
DATA_TYPE
LENGTH
FID
int
NULL
FNAME
nvarchar
255
STATUS
varchar
1


OFFLINE_RESUMES_MATCH_INDEX_TWO


COLUMN_NAME
DATA_TYPE
LENGTH
PKID
numeric
NULL
DESIGID
numeric
NULL
RESUMEHTML
text
2147483647
MATCH_INDEX
numeric
NULL
FILEPATH
varchar
500
EMAIL
varchar
150

The database contains top 1000 resumes from the “3P” accumulated resumes, done after indexing. Then file names present in the datatable helps to locate the file(temporarily on SONAL machine)

The UI helps the user to decide on the range of the match index(min-max)
Accordingly the distinct email id’s are fetched and the respective files are stored on the HDD where the tool is residing.

The tool creates a “RESUMES” folder if not present,then according to the designation that the user is working upon, creates a folder with the designation name, and copies the resumes on that folder.

Hemen Parekh


No comments:

Post a Comment