Fetch meta tags from crawled html docs

@sachin-0wuUmc • Oct 26, 2024

Oct 26, 2024

1.3K

I have used apache nutch-1.4 and crawled a website. Now i want to fetch meta tags from every html page. Is this possible ?
I have just started using nutch so I don't even know how to compile the code. For crawling i have downloaded the binary files and run some very simple commands.
So one of the doubt in my mind is How to run nutch if i modify one of the source files.

And what modification i can do which can show me the meta tag info corresponding to URL of pages.

Like 0 Replies 1

Replies

Welcome, guest

Join CrazyEngineers to reply, ask questions, and participate in conversations.

CrazyEngineers powered by Jatra Community Platform

Sachin Jain

@sachin-0wuUmc • Mar 19, 2012

Has anybody worked on apache nutch, lucene library and solR ?
0

About CrazyEngineers

The official CrazyEngineers Community

Founded Nov 26, 2005

Recently active members

Latest activity

Aarchi Sharma joined 10h

the community
Shubh Soni joined 16h

the community
iivs123 joined 21h

the community
Om Dave joined 1d

the community
Muneeb King joined 1d

the community
Peetla Vishnu joined 1d

the community
Vempala Sushma joined 1d

the community
Rohit joined 1d

the community
Aakansha Dalavi joined 1d

the community
Hitech BPO joined 1d

the community

CrazyEngineers powered by

Jatra Community Platform

CrazyEngineers powered by

Jatra Community Platform