Large arrays with millions of elements and performing integer comparisons - What is the right way?
I'm currently working on user follow system at CrazyEngineers which will replace the connections. If you are viewing someone's followers, we want to show how many of those followers you follow, and follow you; similar to what Twitter does.
Now this basic system is ready and I was able to get it done with just few basic queries; keeping the overall query count per page to 10. Which, I believe is good!
Currently we do it the simple way: Load all the followers in array and then check if authenticated user's ID matches. This system should work for users with say 5K - 10K or even 30K followers.
Now, I was wondering, what if a user has a million or say 10 million followers. Does it make sense to load the "IDs" of all those followers into memory and perform the check?
Or is there any way to do it more efficiently?Posted in: #Databases #Programming #Big Data
I'd check something like the array_intersect function to see mutual followers, and the array_diff function to see which ones I don't follow. Combine that into a flag, and use that.
Yeah, mutual followers can be checked if the available list of followers is available in an array. However, consider a person having say 10 million followers. In that case, do we need to load all the 10 million followers IDs in the database OR is there a better way to achieve this? Right now array functions in PHP do the job and I think it should work till a few tens of thousands of users.
I'm considering a hypothetical case when we've to load say a million IDs in database. Would it still work?
No. Not unless you put in pagination :)
We are trying to reduce the memory and computation costs here and pagination does just that. One could always debate the additional clicks the user has to do vs the computing savings. However, at a million followers, you might as well think of a better mechanism for showing followers (like search and categories).
At the end of it, I would ask you this - is all this tam jhaam really necessary? Doesn't this seem like a death by feature creep?
Not necessary 😀 . I'm just a curious cat!
Pagination is already in place. But at some point of time, there will be a case where authenticated user's ID will have to be checked against millions of records. At least - I think that's how Twitter does it. I could be a different thing with Graph databases (and I've no idea how they work). But with RDBMS, this definitely is something that's been bothering me.
I think this problem is better solved when it actually is a problem! Thank you for your response though. 👍