[SMALL PROJECT] Local movie catalog using IMDB

I think everyone is familiar with #-Link-Snipped-#. If not, have a look at it yourself.

I am proposing this small project (of course with the permission of our CE- Lab seniors like Biggie and Ash).

All we will do here is discuss the algorithms on how to go about the problem.

The Proposal:


  • I have a list of movies on my hard drive (external, as an example).
  • I want to take their names and go online to IMDB and get vital statistics and store them on a local database to access later.
  • I can also use the database to sort my movies by categories like: Name, year of release, actor names, genre, keywords etc.

The Limitations:

Since this is just an algorithm discussion you can give examples in any programming language.

We will divide the project into smaller parts like: getting movie names from directories, storing them in a file, accessing the internet, accessing IMDB, getting IMDB info, storing in local database, etc.

Lastly, please let me know if this is a useful project for you to use personally. If there are many who will benefit from this, then we can make an actual program.

I have some ideas but will post them after some participation of CE members.

Have fun!

Replies

  • Kaustubh Katdare
    Kaustubh Katdare
    Permission granted ๐Ÿ˜€ [...on the side note; no permission is needed to start projects here ๐Ÿ˜ ]. I personally find this project useful. Very good small CE Labs project to begin with. ๐Ÿ˜€
  • xheavenlyx
    xheavenlyx
    Thanks Biggie ๐Ÿ˜€

    Ok here is a raw break down of the Project

    Phase 1:

    1. Get user input for movies location, check folder for access and existence, get directory structure and file names, store these in a 'raw_movies.txt' file. (This file will be used to get names of movies and process them further.)

    2. Get one name of a movie/folder etc. Process it to extract proper movie name that would be used to search IMDB. (This step can be a bit difficult, and interesting)

    3. Post name of movie on IMDB database, get partial or full information about the movie.

    4. Store information on a local database.

    Thats all we will look at for now. This will be Phase 1 of the project. Pour in your thoughts.
  • Manish Goyal
    Manish Goyal
    I want to take their names and go online to IMDB and get vital statistics and store them on a local database to access later.
    I got an idea what you actually want?? But i am getting confused How can these vital statistics be helpful?
  • xheavenlyx
    xheavenlyx
    goyal420
    I got an idea what you actually want?? But i am getting confused How can these vital statistics be helpful?
    I may not have been really clear in my first post. This is what I meant:

    I have a folder of movies I keep, presently I have about 150+ titles in my external hard drive. Now we will not talk about how or from where we get the movies, it can be through backups of your old movies, or of old movies your friends gave you to watch, or downloaded movies from legal sources.

    Now there are about 150 movies, windows (for now) allows me to sort these movies just by names and date added. I have no way of sorting them by keywords, how good they are, the actors, the directors etc.

    To do this I will use the help of the Internet Movie Database (IMDb: Ratings, Reviews, and Where to Watch the Best Movies & TV Shows) to get "vital information" like actor names, director names, rating, even reviews. After downloading this "vital information" for a particular title I will store it locally so I can access it on the move.

    Whenever I get new titles I update the local database. Now whenever someone asks if I have a movie with Vin Diesel and Tom Hanks in it, I will look it up in my database and come up with Saving Private Ryan.

    So this is the basic premise. Hope its clear now.
  • silverscorpion
    silverscorpion
    I think this project is very interesting.
    And it will be personally very useful to me too.. For I have more than 200 titles in my hard drive.

    I think, in the second step, human intervention is required.
    The movie files can be named haphazardly. The filename of a movie can bear no resemblance to the name of the movie.
    So, the first thing would be to properly name the files with the correct movie names.

    Now, we can say that the original haphazard filename itself can be used to search IMDB, but that would entail more difficulties later.
    So, I would say that we must first ensure the movies are aptly named. This is a manual effort and no engineering is involved,
    but still, I would say this is an important preparation step.

    Next is reading these filenames from a program and getting them written into a text file. I dont know how to do it, but I guess it shouldn't be too difficult.
    Then we have to read the movie names one by one from the text file. This can be done easily in any language.

    After getting the movie name, we have to connect to the IMDB website and give the movie name as the search query.
    I have no idea how this can be done. But I'm sure even this will be easy to do.

    The difficult part comes after we have entered the search string and given the search. We have to interpret the resultset correctly. That will be difficult, I guess.

    Once we have interpreted the result and queried the required info from the database, storing it in a local DB in the system wouldn't be an issue at all.

    That's all for now.
  • xheavenlyx
    xheavenlyx
    Excellent @silver!

    silverscorpion
    I think, in the second step, human intervention is required.
    The movie files can be named haphazardly. The filename of a movie can bear no resemblance to the name of the movie.
    So, the first thing would be to properly name the files with the correct movie names.
    This is exactly what I came across, and you know what, in the past few years I am losing my trust in the future of AI. Not everything, but some parts of it. Dont like the thinking the one day machines will replace humans. Anyway, I digress, its 3am over here.

    So, as I was saying you are absolutely right, we need human-in-the-loop. This is where it gets interesting.

    Here is one way:

    0. We have raw_movies.txt file from the hard drive.

    1. Take one movie name "The.Lost.Souls.of.Engineering [2022] - DvdBackup blah"

    2. [IMPORTANT] Place this name to the user with a text cursor, where they can edit the file name on the fly and press .

    3. Store it in a new file clean_movies.txt

    What do you think? This step must be efficient and fast. I think Human-in-the-loop is a very delicate part of UI. It should not irritate ppl or they will never use the product.

    -----------------------------------------------------------------------------------

    Important foot note:

    I would like to build this project in Python. 3 reasons why:

    1. Its a really really good language for scripting (even otherwise its very handy)

    2. It would be an opportunity for CE members to learn python via a live project.

    3. I have working knowledge of Python and can help others to learn it really fast.
  • Ashraf HZ
    Ashraf HZ
    I was looking around the IMDB website, and it appears they do not allow (legally) tools to extract info off their pages. However, they do provide plain text files of their database here for offline personal use:
    #-Link-Snipped-#

    Unfortunately, the FTP links are super slow.

    Anyhow, while this is an interesting premise as a CE project, I will still consciously feel this encourages (by way of convenience) having a collection of movies obtained shadily.

    Carry on ๐Ÿ˜€
  • xheavenlyx
    xheavenlyx
    Hey Ash (been so long!),

    Nice info. But the thing is, since the database would be updated once a while and the Python package I have used, uses their HTTP requests, I think there wouldn't be a problem. Since there hasn't been a problem for the package developer (IMDBpy) so why for us? I know accessing a webpage using a script is kind of 'rude', but thats in extreme cases. I will have to read about it more.

    And as for:

    I will still consciously feel this encourages (by way of convenience) having a collection of movies obtained shadily.
    ๐Ÿ˜›

    How would a project to get information from IMDB (by a way of convenience) 'encourage' me to obtain movies shadily? IMDB is just a database.
  • Ashraf HZ
    Ashraf HZ
    I suppose looking through the source code of IMDBpy would be a good start then ๐Ÿ˜‰

    How would a project to get information from IMDB (by a way of convenience) 'encourage' me to obtain movies shadily? IMDB is just a database.
    I guess we can debate this somewhere else instead ๐Ÿ˜›
  • xheavenlyx
    xheavenlyx
    Re: [PROJECT] Java movie catalog using IMDB/Rottentomatoes

    I am reviving this thread with some updates.

    ***************The Project:

    Problem:

    Too many movies on our hard drives without proper information to sort them according to rating, release dates, themes, etc.

    Solution:

    To do so we need a software that will fetch all the important data from IMDB or Rottentomatoes.com.

    The database will update itself using the name of the movie and fill in important columns like IMDB rating, Starring, Directed by, Written by and Category (Horror, Drama, etc).

    After the database has been populated we can sort them according to any field (Rating, Directed by, Release date, etc) and have a search function.

    *************************

    This would now be written in Java instead of Python as planned before.

    The first version will be a very basic bare-bones database.

    Pour in your suggestions.

    RESOURCES:

    Rotten Tomatoes API for JAVA:
    #-Link-Snipped-#

    IMDB webservice (limited):
    #-Link-Snipped-#

    Java Movie Database:
    www.jmdb.de
  • sookie
    sookie
    Fine,
    Step # 1: I will have some 'x' directory or drive locations + sorting/search[not clear - Please make it clear.] criteria type say 'abc' as input
    Step # 2: For each 'x' location, I will have 'y' files/directories
    Step # 3: For each 'y'
    Case 1: If File, Go to Step # 4
    Case 2: If directory, repeat step 3 again.
    Step # 4:
    a) Send file-name and hit IMDB/rottentomatoes.com/JAVA IMDB, fetch "vital" details [Please make clear about what all will be "vital"]
    b) Make connection to our movie local Database and save the "vital" details fetched in Step # 4a)
    Step #5: Enter the input sorting/search[not clear - Please make it clear.] criteria type say 'abc' , hit the database with search query and get the sorted results.

    Correct me if wrong!

    Thanks,
    -Sookie
  • xheavenlyx
    xheavenlyx
    Excellent analysis sookie. I will make it clearer or expand on it.

    Step #1:

    'x' is location of files ( directory or file(s) ) manually entered by the user where movies are kept. The software will not look for files or directory on it's own.

    Step #2:

    Look for files within the directory

    Step #3:

    If directory goto #2, if file goto #4.

    Step #4:

    Is the file a .avi, .mov, .mpeg, .mpg, .mkv etc?
    Store the name and location of the movie in a temporary database (see Step #5).

    Step #6:

    Goto IMDB/Rottontomatoes and get details for each Actual_Movie_Name:

    Movie Name (done)
    Release date
    IMDB Movie Poster or RT Poster .jpg
    IMDB Rating
    RT Rating
    Plot Summery (60 chars I think)
    Directed by
    Written by
    Starring (First billed actors only, 4 or maybe 5 names)
    Theme group (Horror, Romance, Drama, etc)
    IMDB Link
    RT Link

    Step #7:

    Store in database. Which of course can be sorted by any of the above fields. So if one night I want to watch a 60's Horror movie from my collection, I just refer to my dB and sort it and find the folder it is in.


    Step #5:

    This is an important step. Since it is very difficult to extract exact movie name from the file name through a the software.

    For example I have some movies with the format "Watchmen (2009) DVDRip-Personal" and another "bla.xfm" .

    So an intermediate database will be created first.

    File_Name, Location_PC, Actual_Movie_Name, Select

    After the database has been populated with file names. The user can select which fields he wants to Edit or remove from the list. For example it will detect all the episodes of a series as individual file names:

    File_Name          Location_PC          Actual_Movie_Name          Select
    
    bla.xfm.avi          C:\Movies\Schoolastic\
    Watchmen (2009) DVDRip-Personal.avi          C:\Movies\Watchmen
    House s01e01.avi          C:\Movies\House s01\
    House s01e03.avi          C:\Movies\House s01\
    House s01e04.avi          C:\Movies\House s01\
    House s01e05.avi          C:\Movies\House s01\
    House s01e06.avi          C:\Movies\House s01\
    House s01e07.avi          C:\Movies\House s01\
    House s01e08.avi          C:\Movies\House s01\
    

    WILL BE EDITED BY THE USER INTO:

    File_Name          Location_PC          Actual_Movie_Name          Select
    
    bla.xfm.avi          C:\Movies\Schoolastic\          Schoolastic         1
    Watchmen (2009) DVDRip-Personal.avi          C:\Movies\Watchmen          Watchmen          1
    House s01e01.avi          C:\Movies\House s01\          House M.D.          1    
    
    


    NOTE: Another small problem that can crop up is movies with the same name. Even then there would be a need for user interaction. But we will deal with that at the right time.
  • xheavenlyx
    xheavenlyx
    File and Folder Reading Classes

    I am reusing a class to read files and folders from a given location.

    In the next section they will be edited to list Folders and only Movie Files.

    FileListing.java

    import java.util.*;
    import java.io.*;
    
    /**
    * Recursive file listing under a specified directory.
    *  
    * @author javapractices.com
    * @author Alex Wong
    * @author Anon
    */
    public final class FileListing {
    
      /**
      * Recursively walk a directory tree and return a List of all
      * Files found; the List is sorted using File.compareTo().
      *
      * @param aStartingDir is a valid directory, which can be read.
      */
      
       static public List getSpecListing(
        File aStartingDir
      ) throws FileNotFoundException {
        validateDirectory(aStartingDir);
        List result = getFileListingNoSort(aStartingDir);
        //Collections.sort(result);
        return result;
      }
    
      // PRIVATE //
      static public List getFileListingNoSort(
        File aStartingDir
      ) throws FileNotFoundException {
        List result = new ArrayList();	
        File[] filesAndDirs = aStartingDir.listFiles();
        List filesDirs = Arrays.asList(filesAndDirs);
        for(File file : filesDirs) {
          result.add(file); //always add, even if directory
          if ( ! file.isFile() ) {
            //must be a directory
            //recursive call! and Store Dir in new list, append to results
            List deeperList = getFileListingNoSort(file);
            result.addAll(deeperList);
          }
        }
        return result;
      }
    
      /**
      * Directory is valid if it exists, does not represent a file, and can be read.
      */
    
      static private void validateDirectory (
        File aDirectory
      ) throws FileNotFoundException {
        if (aDirectory == null) {
          throw new IllegalArgumentException("Directory should not be null.");
        }
        if (!aDirectory.exists()) {
          throw new FileNotFoundException("Directory does not exist: " + aDirectory);
        }
        if (!aDirectory.isDirectory()) {
          throw new IllegalArgumentException("Is not a directory: " + aDirectory);
        }
        if (!aDirectory.canRead()) {
          throw new IllegalArgumentException("Directory cannot be read: " + aDirectory);
        }
      }
    } 
    
    The usage of this class is in ExtractMovieName.java

    
    import com.moviedb.FileListing;
    import java.lang.String;
    import java.util.*;
    import java.io.*;
    
    public class ExtractMovieName {
    
    	/**
    	 * This Class is used to get movie names from folders and store them in a "database". 
    	 * @param args; not used
    	 * @throws FileNotFoundException 
    	 */
    
    	public static void main(String[] args) throws FileNotFoundException {
    		File filedir = new File("D:\\Movies\\");
    		
                    List files = FileListing.getSpecListing(filedir);
    		for(File file : files ){
    		      System.out.println(file);
    		    }
    	    }
    }
    
  • xheavenlyx
    xheavenlyx
    [UPDATE: Phase 1] Get movie list from user and store as dsv file

    This is the first part of the LocalMovieDB project. Please try it out and let me know what you think. This will be expanded on for the full project and will support other OS's too in the future.

    If you like to see some feature, let me know after you test this. Imagine, your own movie database locally so you can sort your collection better! And play it back using your choice of video player.

    #-Link-Snipped-#

    PHASE 1

    Get movie list and store as delimiter separated value file.
    This version of JAR file supports windows pathnames only.

    REQUIREMENTS

    Java Run Time Environment (Usually available by default on all systems)

    USAGE

    1. Save file anywhere on disc.
    2. From your command prompt, goto file location:
    3. java -jar ExportMovieName.jar
    4. Type in your movie's directory path. Example: C:\Users\\Videos

    5. Type in location to store dsv file. Example: C\Users\\Videos

    6. If you press enter on both the prompts, it will work by default on:
    Movie Directory : C\Users\Public\Videos
    DSV File : C\Users\Public\Videos

    After you have obtained the .txt list, export it to excel to have a look or just open it with notepad.
  • sookie
    sookie
    Hey awesome work man,

    I am not sure if anyone has tried it or not yet but I faced the problem with manifest file[or may be the way jar file is created was wrong]. I got following error message
    Failed to load Main-Class manifest attribute from
    ExtractMovieName.jar
    I am not sure if anyone also faced the same problem or not.

    How I made it working for me: I have observed that manifest.mf the one inside "com.moviedb" folder need to be in META-INF folder and the Main-Class manifest attribute should include the package name also, then only it worked for me. Manifest.mf file inside "META-INF" folder should be like below
    Manifest-Version: 0.1
    Created-By: Varun Dhanwantri
    Name: java/MovieDB/
    Implementation-Title: com.moviedb
    Implementation-Version: build57
    Implementation-Vendor: Sun Microsystems, Inc.
    Main-Class: com.moviedb.ExtractMovieName
    and I deleted the manifest.mf from com.moviedb folder.
  • xheavenlyx
    xheavenlyx
    Thanks sookie for the feedback. So the movie listing has worked but my packaging was wrong. Ok, I will change it and update it as soon as I can. If anyone is interested to handle the project collaboratively I will add them as admins to the project page.
  • Manish Goyal
    Manish Goyal
    sookie
    Hey awesome work man,

    I am not sure if anyone has tried it or not yet but I faced the problem with manifest file[or may be the way jar file is created was wrong]. I got following error message
    I am not sure if anyone also faced the same problem or not.

    How I made it working for me: I have observed that manifest.mf the one inside "com.moviedb" folder need to be in META-INF folder and the Main-Class manifest attribute should include the package name also, then only it worked for me. Manifest.mf file inside "META-INF" folder should be like below

    and I deleted the manifest.mf from com.moviedb folder.
    I have added modifications as described by you in this post ,but it is still now working again giving me same error ,can you please check ?why it is not working?

    I think when i am trying to create a new jar file ,at that time it is ignoring manifest.mf file ,may be this is the reason

    
    F:\Programs\IMDB\com\moviedb>jar cvf ExtractMovieName.jar *
    added manifest
    adding: com/(in = 0) (out= 0)(stored 0%)
    adding: com/moviedb/(in = 0) (out= 0)(stored 0%)
    adding: com/moviedb/ExtractMovieName.class(in = 4639) (out= 2427)(deflated 47%)
    adding: com/moviedb/FileListing.class(in = 2813) (out= 1509)(deflated 46%)
    ignoring entry META-INF/
    ignoring entry META-INF/MANIFEST.MF
    
    F:\Programs\IMDB\com\moviedb>jar xvf ExtractMovieName.jar
      created: META-INF/
     inflated: META-INF/MANIFEST.MF
      created: com/
      created: com/moviedb/
     inflated: com/moviedb/ExtractMovieName.class
     inflated: com/moviedb/FileListing.class
    
    
  • sookie
    sookie
    IMPOSSIBLE ! What you tried to modify , I am just making a guess that you tried to modify the folder not the existing jar. You really don't need to create another jar again. You can modify the one["jar" not extracted folder] provided by xheavenlyx.
    I have attached the modified working jar. #-Link-Snipped-# . Just change it from .txt to .jar before running it.
  • sookie
    sookie
    xheavenlyx
    Thanks sookie for the feedback. So the movie listing has worked but my packaging was wrong. Ok, I will change it and update it as soon as I can. If anyone is interested to handle the project collaboratively I will add them as admins to the project page.
    You can add me.

You are reading an archived discussion.

Related Posts

howto read,edit and understood object code?
hello frndzzz....i m using a cell charger but cells r getting very hot after i put it to pludge can anyone tell why it is.....
This is just awesome! If you had any doubts about Motorola Droid's capabilities, you should watch this video. Note: Registered members do not see the adjoining advertisement. [video=youtube;0v8pJSGi4CA]https://www.youtube.com/watch?v=0v8pJSGi4CA[/video]
World without: Computer Engineers: World Without Electrical Engineers: World Without Aeronautical Engineers: World Without Electronics Engineers: World Without Mechanical Engineers: World Without Civil Engineers: World Without Electronics and Telecommunication Engineers:...
What should i go for? I am not applying anywhere, just doing it cuz i m nt working rite now and have time so why not use it to add...