Fetch meta tags from crawled html docs

Sachin Jain

Sachin Jain

@sachin-0wuUmc Oct 26, 2024
I have used apache nutch-1.4 and crawled a website. Now i want to fetch meta tags from every html page. Is this possible ?
I have just started using nutch so I don't even know how to compile the code. For crawling i have downloaded the binary files and run some very simple commands.
So one of the doubt in my mind is How to run nutch if i modify one of the source files.

And what modification i can do which can show me the meta tag info corresponding to URL of pages.

Replies

Welcome, guest

Join CrazyEngineers to reply, ask questions, and participate in conversations.

CrazyEngineers powered by Jatra Community Platform

  • Sachin Jain

    Sachin Jain

    @sachin-0wuUmc Mar 19, 2012

    Has anybody worked on apache nutch, lucene library and solR ?