How to build a web crawler and which languages are required?

Guys Please tell me how to built a search engine and which languages i have to learn for this...
currently i have knowledge of html,js,php..
Thanks in advance

Replies

  • Mahesh Dahale
    Mahesh Dahale
    Re: web crawler

    hi Goyal..
    we already discussed this

    #-Link-Snipped-#

    MaRo
    This is most of my work now 😁

    As you may know, we're indexing media content, so we're making freakishly different indexing techniques.

    For text we're researching in several indexing techniques.

    1- Google's technique.
    2- Keyword analysis, the spider suppose to understand the content of a single article & rank the page according to previously saved keywords.

    means, the spider knows for example the keyword 'Engineer' & has a network of keywords related to the word & crawls for the topic's keyword & analyze the contents of the found pages & rank them according to their relevancy, like if the spider found an article talking about engineers having vacancy will has a rank less than engineers invented a tool with technical details.


    mahesh_dahale
    technology:- ASP.NET supports the searching of files using the Windows Indexing Service, Microsoft .NET Framework or above is required. The code are in C#, Building a word index for a website by using a web crawler also not dependent on the underlying technology used on a website

    methods like SetQuery(query as string),GetSearchResults(),

    and still thinking on indexing methodology and web crawling product -The Website Utility
  • Manish Goyal
    Manish Goyal
    Thanks for information

    actually i searched for this but i didn't find this thread

    but instead of asp.net or vb.net.

    Should i go for python or perl?and one more question]

    guys how many of you have ever tried to make his own search engine.

You are reading an archived discussion.

Related Posts

I do not know whether the term I used is correct or not. I have set of measurement data. It is the density of glycerin-water mixture based on its glycerin's...
VisiPics File Size: 1943 If you get too many pictures on your harddrive, downloaded or photographied, from several different sources, it may happen that you have many duplicates. In that...
This is about bollywood bhadshah sharukh khan.. Yes he is creating issues one after the other.. Recently he has given statement about pak cricketers.. We already know all the developments...
send me details about air pollution monitor
Hi all, Brahma 2010 is a combined Technical cultural extravaganza put together by the students of Adi Shankara Institute of Engineering and Technology, Kalady, which is scheduled to kick off...