OwnSearch: Web Scraper Software Development
Lots of IT Svit customers faced the need to find some specific information on their corporate websites fast. Platform-specific search engines were not perfect, so we decided to create a bespoke web scraper tool that can be added to a particular website and will create custom search indexes for any website fast.
Project requirements
IT Svit needed to overcome the following challenges:
- Web crawlers must be lightweight and simple, yet efficient
- The search index must be built and processed quickly
- The tools must have convenient user interface
- The tools must have low hardware requirements
Project results
IT Svit developed the required web scrapers and other Big Data solutions to enable our customer to form the data set for training their search engine. Toweya provided the basic specifications and we helped them create an easy to use and performant search engine platform, which enables incremental web search and provides precise results.
Location: Kharkiv, Ukraine
Partnership period: August 2015 – February 2018
Team size: 2 – 4 people
Team location: Kharkiv, Ukraine
Services: Cloud infrastructure design and development, Python development, Data Science, Big Data solutions, Machine Learning algorithms
Expertise delivered: Cloud infrastructure design and implementation, Python development, Big Data architecture design and management
Technologies: Python, asyncio, aiohttp
Product Overview
Client’s goals
The main challenge we had to deal with was the absence of the built-in search tools or their rigidity. We decided to build the web scraper solution anew and ensure it can easily interact with any type of CMS or website builder platform.
We wanted this tool to have the following characteristics:
- High performance
- Low system resource consumption
- Ease of configuration
- Simplicity of usage
Implementation and challenges resolved
The scraper was built with Python using the asyncio and aiohttp libs, and has met all the aforementioned requirements:
- The scraper comes with a built-in webserver to ensure the simplicity of launching it
- The tool can be easily integrated into any website
- The search index results can be viewed through any browser
- The scraper has low hardware requirements
Due to being written in Python, the tool works quickly