How To Sort 40 Lakh Records?
So, here is the scenario. I have a csv (comma separated values) file with almost 40 lakh records in it. Each record is separated by a new line i.e. each line contains new record. The records contain a user id and then some related data. Now, I need to sort it user wise in a way that for every user i will create a separate file and the related data will be stored in it. I wrote a C program for that in which i simply took the user id, opened a file with the particular name and stored the data in it. It actually run continuously for 19 hours and then i had to shut it down for some reasons. The size of file is 388 Mb and i was able to sort 247 Mb of data out of it in those 19 hours.
Now, instead of using the same approach, I am thinking of trying something else. Any ideas are welcome to make the algo more efficient. Any approach with parallelism will be appreciated, I have core2duo i3 second generation processor.