Skip to main content

Can I crawl websites to extract candidate terms from there?

Abstract

Can I crawl websites to extract candidate terms from there?

PoolParty provides several ways to create text corpora in order to extract candidate terms from there. Besides the options to upload files, or to harvest content from RSS feeds, one can crawl whole web sites. Starting from a given URL, PoolParty traverses a domain and fetches all web pages from there to build a reference text corpus. The built-in entity & phrase extractor helps to verify whether a thesaurus reflects the found content appropriately. New candidate terms are suggested to be integrated into the existing knowledge graph. PoolParty's integrated web crawler is a 'long-running task' and can be performed in the background without the need to be online.