-
MaRo
Member •
Oct 3, 2009
This is most of my work now 😁
As you may know, we're indexing media content, so we're making freakishly different indexing techniques.
For text we're researching in several indexing techniques.
1- Google's technique.
2- Keyword analysis, the spider suppose to understand the content of a single article & rank the page according to previously saved keywords.
means, the spider knows for example the keyword 'Engineer' & has a network of keywords related to the word & crawls for the topic's keyword & analyze the contents of the found pages & rank them according to their relevancy, like if the spider found an article talking about engineers having vacancy will has a rank less than engineers invented a tool with technical details.
Are you sure? This action cannot be undone.
-
MaRo - I bet you've enough idea about search engines. I look forward to your posts in this thread.
Are you sure? This action cannot be undone.
-
MaRo
Member •
Oct 3, 2009
I'm not very good in writing so I'd rather getting questions, please.
Are you sure? This action cannot be undone.
-
I'm surprised! No one wants to discuss this? Looks like we should rename this section to Computer Troubleshooting section.
Are you sure? This action cannot be undone.
-
technology:- ASP.NET supports the searching of files using the Windows Indexing Service, Microsoft .NET Framework or above is required. The code are in C#, Building a word index for a website by using a web crawler also not dependent on the underlying technology used on a website
methods like SetQuery(query as string),GetSearchResults(),
and still thinking on indexing methodology and web crawling product -The Website Utility
Are you sure? This action cannot be undone.
-
Biggie, can't you see MaRo is hinting for a Small Talk? 😉
Are you sure? This action cannot be undone.
-
MaRo
Member •
Oct 4, 2009
The problem not with indexing mahesh, for search engines the problem lies in ranking the cached results.
@Ash : yea I'd love to, but not now, if Google get down I'll deserve one 😁
Are you sure? This action cannot be undone.
-
ok MaRo sir, thanks but index is to optimize speed and performance in finding relevant documents for a search query. can you explain in detail
Are you sure? This action cannot be undone.
-
MaRo
Member •
Oct 4, 2009
Spiders fetch webpages for the keywords that appears significant for it & indexes the webpages against the keywords.
Also uses in small percentage the HTML meta tag keywords, Now search engines added the auto-complete keywords feature which get much faster result from the index.
Are you sure? This action cannot be undone.
-
Thank you sir
Are you sure? This action cannot be undone.
-
sir,i need to connect to a given search engine and retrieve the html page of that search engine ?
i am using java. how any idea
Are you sure? This action cannot be undone.
-
MaRo
Spiders fetch webpages for the keywords that appears significant for it & indexes the webpages against the keywords.
Also uses in small percentage the HTML meta tag keywords, Now search engines added the auto-complete keywords feature which get much faster result from the index.
Any new algorithm suggestions for ranking the pages? Or quality inlinks is the only way we can achieve better search results?
Are you sure? This action cannot be undone.
-
MaRo
Member •
Oct 5, 2009
@mahesh : #-Link-Snipped-#
You have to get the link generated from searching the search engine you like, i.e, this is the link generated from googling "engineer" - <a href="https://www.google.com/search?hl=en&source=hp&q=engineer&btnG=Google+Search&aq=f&oq=&aqi=g10" target="_blank" rel="nofollow noopener noreferrer">engineer - Google Search</a> - you have to replace the word engineer to be a string variable, with respect to spaces.
@Big K: I think the Keyword analysis algorithm even if not faster than the present algorithm but will differ in the number of relevant results.
Are you sure? This action cannot be undone.
-
MaRo
@mahesh : #-Link-Snipped-#
You have to get the link generated from searching the search engine you like, i.e, this is the link generated from googling "engineer" - <a href="https://www.google.com/search?hl=en&source=hp&q=engineer&btnG=Google+Search&aq=f&oq=&aqi=g10" target="_blank" rel="nofollow noopener noreferrer">engineer - Google Search</a> - you have to replace the word engineer to be a string variable, with respect to spaces.
Why data remains largely hidden from users by placing it behind form or Web services interfaces (deep web)?
In a study by BrightPlanet shows the hidden Web contains 7,500 terabytes of information and is 400 to 500 times larger than the visible Web.
Are you sure? This action cannot be undone.
-
Hi! Thanks for the well informative post and this is one of the post which impress me a lot and I like to create one of my own search engine so its very effective one.The tips are one of the best.Keep up the nice work.
______________________________
#-Link-Snipped-#
Are you sure? This action cannot be undone.
-
Good maro but give some more info...
hers what i know:
I have no time to explain it myself but google out for you 'WHAT IS GOOGLEBOT?'
Surely I hope it would help BIG to know more about his topic...
till then keep posting your findings.....
Are you sure? This action cannot be undone.
-
MaRo
Member •
Oct 8, 2009
Googlebot is the spider, the software Google relies on to crawl the Internet.
Are you sure? This action cannot be undone.
-
Suppose I launch a new web site then when it get crawl by google.
when start crawling , and updating suppose i change URL then ?😒
Are you sure? This action cannot be undone.
-
MaRo
Member •
Oct 8, 2009
Google has URL submission, which makes your website on their todo list.
#-Link-Snipped-#
Are you sure? This action cannot be undone.
-
This discussion is going from building a search engine to 'what is a search engine'. Do I have too big expectations from CEans?
Are you sure? This action cannot be undone.
-
MaRo sir Today i found in article that contain
Google crawls the Web at varying depths and on more than one schedule. These called deep crawl occurs roughly once a month. This extensive reconnaissance of Web content requires more than a week to complete and an undisclosed length of time after completion to build the results into the index. Forth is reason, it can take up to six weeks for a new page to appear in Google. Brand new sites at new domain addresses that have never been crawled before might not even be indexed at first
Are you sure? This action cannot be undone.
-
yeah you are right Mahesh
there is not as such site until now in which deep crawls occur.......
but in a magazine i have read that researchers are trying to develop such websites
correct me if i am wrong
Are you sure? This action cannot be undone.
-
hi,
search engine works on following techniques: 1.Page ranking: Most activated most visited sites will be considered as an Rank 1 pages . whenever user enter a query in search box first query will pass in to the query optimizer then it will look up the whether page requested is most frequently accessed or not if it is most frequently accessed then it will comes in first page of result..
by using some techniques we can optimize the speed of search engine , some of the techniques are:
1. Modeling Score Distributions for Combining the Outputs of Search Engines
2. Web Crawling
Are you sure? This action cannot be undone.
-
Yes agree to all. But concepts like web crawler, pageing, indexing, etc that i think every CSEs know but HOW TO BUILD IT???( Hope Big will be satisfied after it)
Are you sure? This action cannot be undone.
-
I'm convinced I've bit higher expectations from the members. This discuss was meant to be about building a search engine and not "What is search engine"
Are you sure? This action cannot be undone.