7 Jun 2007
EMAIL-EXTRACTOR
Purpose of tool
the tool was created to act as a email extractor
the tool can be set up to extract email addresses from specific folders
on ones hard disk
the path has to be fixed, from where u need the tool to extract emails
from files stored on the disk
this is the screen that we see after the tool is setup
This dialog box shows the status of the tool
If the tool is ready for more extraction then it specifies it with the
screen above
If the tool is in the process of extraction then it doesn’t allow the
user to setup a new set for extraction.
DATABASE
THE DATABASE IS ACCESS
IT IS STORED AS “email-extract.mdb”
Table name --- EMAILSTORE
Column
|
Data type
|
PKID
|
AutoNumber
|
FILENAME
|
Text
|
EMAILID
|
Text
|
UPDATED
|
Text
|
The entries are made in this table
Each and every file(word/txt/html) are stored in the access file in the EMAILSTORE
table.
At the beginning the entries are with status “N” in the “UPDATED” COLUMN
And null entry in the “EMAILID” column
The tool is designed to take the top 500 entries with updated status as
N
N
The set of 500 is then put through the extraction logic to extract email
ids and the specific email ids are
updated in the respective row.
After completion of the set of 500 rows (dataset) the next is taken
If the tool is stopped in between ( due to power problem or forceful
stop by the user or any other reason) then, the tool maintains the status in
the access file and it can resume the remaining extractions.
The compact db button is allotted to “Compact and Repair Database”.
COMBINE_ALL_MACHINES.vb
This was created to merge the full data across the machines
The 3p folder that is resident on all the machines “D:\” drive was
slotted for extraction
This created respective access files on all the five machines i.e
Sonal
Pranav-swati
Rahul
Yogesh
Sourabh
Now the respective access files were stored on Sourabh’s machine.
The folder is stored on the external backup hdd
F:\SOURABH\D\email-extract-all-machines-data
The respective email’s from all the five machines were picked up, and
the distinct Emails were stored in the TOTAL folder, which gave us the
conclusion of 2,46,000 distinct emails
get100Resumes.vb
this module was created to save
top 1000 resumes per designation
It is worked upon without the
setup – ie in debug mode
The requirement was raised when,
it was thought upon sending some resumes along with resumerater.
the accumulated data from guru
archive was used to do this. The two tables are shown below .
database : RESUME
DESIGNATION_LIST_TWO
COLUMN_NAME
|
DATA_TYPE
|
LENGTH
|
FID
|
int
|
NULL
|
FNAME
|
nvarchar
|
255
|
STATUS
|
varchar
|
1
|
OFFLINE_RESUMES_MATCH_INDEX_TWO
COLUMN_NAME
|
DATA_TYPE
|
LENGTH
|
PKID
|
numeric
|
NULL
|
DESIGID
|
numeric
|
NULL
|
RESUMEHTML
|
text
|
2147483647
|
MATCH_INDEX
|
numeric
|
NULL
|
FILEPATH
|
varchar
|
500
|
EMAIL
|
varchar
|
150
|
The database contains top 1000 resumes from the “3P” accumulated
resumes, done after indexing. Then file names present in the datatable helps to
locate the file(temporarily on SONAL machine)
The UI helps the user to decide on the range of the match index(min-max)
Accordingly the distinct email id’s are fetched and the respective files
are stored on the HDD where the tool is residing.
The tool creates a “RESUMES” folder if not present,then according to the
designation that the user is working upon, creates a folder with the
designation name, and copies the resumes on that folder.
Hemen Parekh
No comments:
Post a Comment