Monday, May 26, 2008

Project Scope

To improve BIG(O) efficiency of large Text file (>1 GB), involves streamlining Input/Output Operations, reducing overheads (avoiding databases), while being able to perform normal data processing functions.

Problem #1 Definition: Text File (A) containing a list of 9 million email addresses, and a Text File (B) containing a list of 65 email addresses.
Every Email that exists in Text File A must be scrubbed against Text File B, following this algorithm.

For Each Email in A
If Email exists in B
Remove from A
else
Keep Email in A
Next

Language of Choice: C, C# or Python.

Maximum Available Processing Time: 24 Hours
Maximum Available System Speed: 4 GHZ
Maximum Available RAM: 2 Gigs

No comments: