Approach
2 : Sort file B and do binary search for file A’s contents
Sort file B
Iterate through A.
Read each line
Do a binary search in file B
See if there is a match
Report
Analysis
Big-O time complexity is : n * log(n) + m * log(n)
Big-O space complexity is : n (you need to host the file B in
memory)
Sorting file B ahead will save time.
However m * log(n) is the Big-O time complexity for each search
operation.
will need a lot of space
will still be slow
Will break in future as m and n grow
Approach
3 : Sort file A and B and do file co-parsing by streaming both
files
Sort file A
Sort file B
Iterate through A while simultaneously reading through contents of
B
See if there is a match.
If both are equal report.
If file B’s content is ‘smaller’ than file A. Read next line of file
B. Continue till you find a match.
If file A’s content is ‘smaller’ than file B. Read next line of file
A. Continue till you find a match.
if ‘end of file’ is reached on either A or B then report and
exit
Analysis
Big-O time complexity is : n * log(n) + m * log(m) + m + n.
Sorting file A and B ahead will save considerable time.
Slightly more time consuming than Approach 2 when all steps are done
for the first time. However, if sorting is done apriori subsequent
searches will take linear time : m + n.
No additional space requirements
Approach is future-proof as file size for A and B grows/changes