CrazyEngineers
  • Fix Broken Links On Websites Using Algorithm Developed By University of Isfahan

    Ankita Katdare

    Ankita Katdare

    @abrakadabra
    Updated: Feb 13, 2014
    Views: 1.3K
    Content publishers and website owners across the web have been facing the issue of broken links over the last few years. A new algorithm developed by computer engineers from University of Isfahan plans to fix broken links on websites with 90% accuracy. In the world wide web of data, links to datasets get changed over the period of time due to updating features and entities getting new addresses. These changes in the RDF (Resource Description Framework) entities result in broken links, commonly known as "404 - Page Not Found" error. The current techniques used to fix broken links have two major problems. These are - a single point of failure (instead of looking at other probable issues across a database) and focusing on the destination data source. Mohammad Pourzaferani and Mohammad Ali Nematbakhsh - the computer engineers from Isfahan University have developed a new algorithm that employes a new method based on the source point of links, and discovering the new address of the entity that is misplaced or detached.

    The algorithm is based on the fact that entities preserve their structure even after moving to another location. It introduces two datasets - a superior and an inferior, which are used to create an exclusive graph structure for each entity that will be observed over time. This graph is used to identify and discover the new address of the detached entity. Later, a crawler controller module searches for the superiors of each entity in the inferior dataset, and vice versa. In this manner, the search space is narrowed and the the most similar entity which is the best candidate is chosen and suggested by the algorithm.

    404-error-page-not-found

    To demonstrate their algorithm, the engineer tested it on two DBPedia snapshots which has approximately 300,000 person entities. Results showed that the algorithm was able to identify about 5000 entities between the two sets of snapshots and successfully relocated 9 out of 10 of the broken links. If employed as a product for websites, a major issue of fixing broken links will be solved. Most of us have encountered a situation where we follow links from one website to another and land on a good website only to find a broken link with a webpage displaying an error message. Other than the website, broken links have further bad implications in science, healthcare and other industrial sectors where machines communicate and expect to find specific resources that turn out to be missing or dislocated from their identifier.

    The logic is simple - If the resource is still available on the servers, then we should be able to retrieve it by using an efficiently smart algorithm that recreates the broken links. You can read the paper submitted by the duo at the International Journal Web Engineering and Technology in <a href="https://www.inderscience.com/info/inarticle.php?artid=59106" target="_blank" rel="noopener noreferrer">Article: Repairing broken RDF links in the web of data Journal: International Journal of Web Engineering and Technology (IJWET) 2013 Vol.8 No.4 pp.395 - 411 Abstract: In the web of data, linked datasets are changed over time. These changes include updating on features and address of entities. The address change in RDF entities causes their corresponding links to be broken. Broken link is one of the major obstacles that the web of data is facing. Most approaches to solve this problem attempt to fix broken links at the destination point. These approaches have two major problems: a single point of failure; and reliance on the destination data source. In this paper, we introduce a method for fixing broken links which is based on the source point of links, and discover the new address of the detached entity. To this end, we introduce two datasets, which we call 'superior' and 'inferior'. Through these datasets, our method creates an exclusive graph structure for each entity that needs to be observed over time. This graph is used to identify and discover the new address of the detached entity. Afterward, the most similar entity, which is candidate for the detached entity, is deduced and suggested by the algorithm. The proposed model is evaluated with DBpedia dataset within the domain of 'person' entities. The result shows that most of the broken links, which had referred to a 'person' entity in DBpedia, had been fixed correctly. Inderscience Publishers - linking academia, business and industry through research</a>. What do you have to say about that? Share with us in comments below.
    0
    Replies
Howdy guest!
Dear guest, you must be logged-in to participate on CrazyEngineers. We would love to have you as a member of our community. Consider creating an account or login.
Replies
  • Rajni Jain

    MemberFeb 14, 2014

    After this algorithm the image should be...

    Sor
    Are you sure? This action cannot be undone.
    Cancel
Home Channels Search Login Register