Sachin
Member • Mar 18, 2012
Fetch meta tags from crawled html docs
I have used apache nutch-1.4 and crawled a website. Now i want to fetch meta tags from every html page. Is this possible ?
I have just started using nutch so I don't even know how to compile the code. For crawling i have downloaded the binary files and run some very simple commands.
So one of the doubt in my mind is How to run nutch if i modify one of the source files.
And what modification i can do which can show me the meta tag info corresponding to URL of pages.
I have just started using nutch so I don't even know how to compile the code. For crawling i have downloaded the binary files and run some very simple commands.
So one of the doubt in my mind is How to run nutch if i modify one of the source files.
And what modification i can do which can show me the meta tag info corresponding to URL of pages.