Konan Web Crawler

Intelligent Web Crawling
Konan Web Crawler is a highly scalable web information crawling software.
It collects information not only from bulletin board types, but also from ‘web pages’ and ‘structured pages’ at once. The product offer enhanced features keeping pace with web technology development, including blog, comment, Ajax page and more. also it improved functions of supporting Java Scripts, Authentication sites and attachment files, and the user interface is more intuitive for beginners.

Strength

Intelligent Crawler
- Over 95% rate of success for data crawler compared to Java Script pattern-based process through outstanding Java Script interpretation
- Collect all types of pages : general web document, structured/unstructured page, AJAX page, RSS/ATOM, multilingual page, new window, HTTPS
- Collect bulletin board file, web document image, video, attachment, and comment

Various Collecting Options
- Improve service quality by checking dead links
- Eliminate URL-based or field-designated duplicate data
- Collect data effectively through incremental data crawler and re-crawler
- Collect data automatically using scheduling settings for each source

Easy Rule Settings
- Simple and easy registration through crawler registration Wizard
- Step-by-step user guide for beginners
- High usability through easy and intuitive UI

Segmentalized statistical analysis
- Stats in list or graph form
- Stats of crawler attempts, data, errors, and image/attachment files
- Crawler error details: http / network / storage / dead link / information extraction / url extraction / script execution error
Key Features
-
Crawler
- Collect structured, unstructured pages and webpages
- Collect image / music / video files
- Process Java Scripts
-
Classification
- Manage category classification
- View by subject, classification code / details / edit (support multi-language)
- Select multiple categories per subject
-
Manage Crawler Rules
- View dead link
- View crawler contents / view details / edit (support multi-language)
- Multi-step view for crawler / crawler group
-
View Crawler
Error- Analyze crawler status / error
- View crawler statistics and utilize statistical data (view graph)
- View crawler errors (http error, network error, storage error, dead link, info extraction error, URL extraction error, script execution error)
-
Manage
- Manage keyword
- Set alerts for crawler errors or information source changes
- View crawler agent state, control error, network error, storage error, dead link, info extraction error, URL extraction error, script execution error
Expected Effects
- Government Ministry and Public Institutions
- Improve service quality and establish strategy to integrate with organizations
- Increase satisfaction of nationwide service with rich contents
- Increase efficiency of national strategy through quick responses to online public opinions
- Private Company
- Increase work efficiency by shortening data collection time and preventing info omission
- Rapid response to harmful information about the company
- Establish efficient strategy by leveraging newest competitor info
- E-Business
- Induce customer revisits and publicity effects through constantly updated contents
- Reduce content production costs by replacing deficient contents with external ones
- Distinguish the company by reprocessing various external contents
Reference

Government Portal of Korea
Safety Executive uses Konan Web Crawler to collect and provide administrative service information of approximately 13,600 sites from 8,200 public institutions
-
Needs
- Combine e-government service of institutions and provide in one place
- Collect and categorize information needed by the public
-
Solution
- Collect contents from over 13,600 sites through Web Crawler
- Collect and provide administration info of approximately 700 categories
-
Effect
- Easily find and utilize government institution from one place
- Provide life support services, such as weather and lost&found


Case Study













