CrazyEngineers
  • Create your own search engine!

    Updated: Oct 21, 2024
    Views: 907
    Is there anyone who uses Internet but not Google?? For searching any thing on the web, the first site that comes to our mind is Google. Ever thought to have your own search engine?? Not like the Google but a simple one that runs in your desktop and searches your files. Or what if you need to add search functionality to your website?

    <a href="https://lucene.apache.org/" target="_blank" rel="nofollow noopener noreferrer">Apache Lucene - Welcome to Apache Lucene</a> is there to serve your needs. Lucene is an extremely rich and powerful full-text search API written in Java.

    In this post, I will briefly explain how Lucene Directory works.

    The first step in implementing full-text searching with Lucene is to build an index. This is easy - you just specify a directory and an analyzer class. The analyzer breaks text fields into indexable tokens; this is a core part of Lucene.

    Several types of analyzers are provided out of the box. Below listed some of the more interesting ones.
    Lucene analyzers​
    StandardAnalyzer: A sophisticated general-purpose analyzer.
    WhitespaceAnalyzer: A very simple analyzer that just separates tokens using white space.
    StopAnalyzer: Removes common English words that are not usually useful for indexing.
    SnowballAnalyzer: An interesting experimental analyzer that works on word roots (a search on rain should also return entries with raining, rained, and so on).
    There are even a number of language-specific analyzers, including analyzers for German, Russian, French, Dutch, and others.
    0
    Replies
Howdy guest!
Dear guest, you must be logged-in to participate on CrazyEngineers. We would love to have you as a member of our community. Consider creating an account or login.
Replies
  • PraveenKumar Purushothaman

    MemberJan 12, 2012

    Next, we need to create an IndexWriter object. The IndexWriter object is used to create the index and to add new index entries to this index. You can create an IndexWriter with the StandardAnalyzer analyzer as follows:
    IndexWriter indexWriter = new IndexWriter("index", new StandardAnalyzer(), true);
    The first argument is the directory location in the file system where the index files should be located. The second argument is a StandardAnalyzer object. The third argument is a boolean parameter set to true, which tells the IndexWriter to rebuild the index from scratch if it already exists.

    The next step is to index the business objects. For this, we use the Document class.

    The document is a container for holding a set of indexed fields.
    Document document = new Document();
    Reader reader = new FileReader(file);
    document.add(new Field(FIELD_CONTENTS, reader));  //FIELD_CONTENTS is a String constant having value "contents"
    // i.e It is the name of the field. The value is the contents of the file, as represented by "file" parameter to the reader.
    In above snippet, a Field is created and is being added to the Document. A field is made up of a name and a value (the first two parameters in the class constructor). The value may take the form of a String, or a Reader if the object to be indexed is a file. Field has a lot of overloaded constructors for various needs. For more details on the Field, refer the #-Link-Snipped-#.
    Are you sure? This action cannot be undone.
    Cancel
  • PraveenKumar Purushothaman

    MemberJan 12, 2012

    Now, add the document to index writer.
    indexWriter.addDocument(document);
    So far, we have created an index writer and added the document to it.
    The only step that’s remaining now is to search the indexed values. For this, Lucene provides an IndexSearcher and QueryParser classes. We provide an analyzer object to the QueryParser; note that this must be the same one used during the indexing. You also specify the field that you want to search, and the (user-provided) full-text query.
    // directory is the name of Directory where the indexes will be stored
    IndexReader indexReader = IndexReader.open(directory);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
     
        Analyzer analyzer = new StandardAnalyzer();
        QueryParser queryParser = new QueryParser(FIELD_CONTENTS, analyzer);
        Query query = queryParser.parse(searchString); //searchString - this is user given!
        Hits hits = indexSearcher.search(query);
        System.out.println("Number of hits: " + hits.length());
    In above snippet, we are using the QueryParser to create a new Query, and then passing this Query object to IndexSearcher’s search() method. The search method returns a Hits object which contains the values matching searchString. The length() method gives the number of matches.
    Voila! Our search engine is ready!
    Are you sure? This action cannot be undone.
    Cancel
  • PraveenKumar Purushothaman

    MemberJan 12, 2012

    If you want to see the exact matches, then use an Iterator of Hit type on hits and iterate over it to to get the documents that matched the search string.

    The code would look somewhat like this:
    Iterator<Hit> it = hits.iterator();
            while (it.hasNext()) {
                Hit hit = it.next();
                Document document = hit.getDocument();
               // Get the required value from the document and store in matchedValue
                System.out.println("Hit: " + matchedValue);
            }
    Simple, isn’t it?

    I know a lot of new terms have come into picture - Document, Field, IndexWriter, IndexSearcher, etc. But once you do a sample Java project, things will get simpler.
    Are you sure? This action cannot be undone.
    Cancel
  • K!r@nS!ngu

    MemberJan 12, 2012

    Awesome bro. Let me give a try.....
    Are you sure? This action cannot be undone.
    Cancel
  • PraveenKumar Purushothaman

    MemberJan 15, 2012

    K!r@nS!ngu
    Awesome bro. Let me give a try.....
    Sure, let us know how was the result... 😀
    Are you sure? This action cannot be undone.
    Cancel
  • greatcoder

    MemberMar 22, 2012

    Praveen-Kumar
    Is there anyone who uses Internet but not Google?? For searching any thing on the web, the first site that comes to our mind is Google. Ever thought to have your own search engine?? Not like the Google but a simple one that runs in your desktop and searches your files. Or what if you need to add search functionality to your website?

    <a href="https://lucene.apache.org/" target="_blank" rel="nofollow noopener noreferrer">Apache Lucene - Welcome to Apache Lucene</a> is there to serve your needs. Lucene is an extremely rich and powerful full-text search API written in Java.

    In this post, I will briefly explain how Lucene Directory works.

    The first step in implementing full-text searching with Lucene is to build an index. This is easy - you just specify a directory and an analyzer class. The analyzer breaks text fields into indexable tokens; this is a core part of Lucene.

    Several types of analyzers are provided out of the box. Below listed some of the more interesting ones.

    There are even a number of language-specific analyzers, including analyzers for German, Russian, French, Dutch, and others.
    Praveen-Kumar
    Is there anyone who uses Internet but not Google?? For searching any thing on the web, the first site that comes to our mind is Google. Ever thought to have your own search engine?? Not like the Google but a simple one that runs in your desktop and searches your files. Or what if you need to add search functionality to your website?

    <a href="https://lucene.apache.org/" target="_blank" rel="nofollow noopener noreferrer">Apache Lucene - Welcome to Apache Lucene</a> is there to serve your needs. Lucene is an extremely rich and powerful full-text search API written in Java.

    In this post, I will briefly explain how Lucene Directory works.

    The first step in implementing full-text searching with Lucene is to build an index. This is easy - you just specify a directory and an analyzer class. The analyzer breaks text fields into indexable tokens; this is a core part of Lucene.

    Several types of analyzers are provided out of the box. Below listed some of the more interesting ones.

    There are even a number of language-specific analyzers, including analyzers for German, Russian, French, Dutch, and others.

    No need to do such hard Work.... JUST DOWNLOAD GOOGLE DESKTOP and you can search ur entire computer with any keyword. It will not only search txt files (all sorts of files ppt, doc), but will search the content written inside the file. Also it will search Outlook emails!!

    BEst Tool To keep with You. After all U Cannot compete with Google in Searching 😎
    Are you sure? This action cannot be undone.
    Cancel
  • PraveenKumar Purushothaman

    MemberMar 23, 2012

    greatcoder
    No need to do such hard Work.... JUST DOWNLOAD GOOGLE DESKTOP and you can search ur entire computer with any keyword. It will not only search txt files (all sorts of files ppt, doc), but will search the content written inside the file. Also it will search Outlook emails!!

    BEst Tool To keep with You. After all U Cannot compete with Google in Searching 😎
    Dude, this is something to learn to make yourself... We know everything exists... How it feels when others use the one which you made...

    i.e., Creating is better than Using!!! 😀
    Are you sure? This action cannot be undone.
    Cancel
Home Channels Search Login Register