Project Idea
Many businesses struggle with processing varied documents spending months of manual work to extract and clean the needed data. This process not only costs significant money but leads to inevitable errors and slows down the company's business processes.
Document Extraction and Processing System (DEPS) can help any business to reduce manual efforts on document parsing by using powerful ML capabilities and a wide range of pluggable modules, such as image processing, data extraction, post-processing, review, and validation. The solution is designed to work with both scanned and digitized documents converting them into searchable and discoverable formats.
Why was DEPS created? How long has it been around?
We've experienced many projects where existing document processing products were limited in functionality to achieve desirable outcomes and quality and offered no or minimum options for customization. Unlike these applications, DEPS is a MOTS-solution (Modified Off-The-Shelf) shipped with a source code and the ability to its further modification per customer needs.
DEPS was created 3 years ago as an accelerator to minimize the effort around finding and extracting specific information and validating the extracted data. We decided to push the limits of exciting market solutions and we continue to do so now, moving towards making DEPS an open-source product.
What documents can be processed with DEPS?
DEPS works with a variety of formats including images, spreadsheets, e-mails, PDFs (both searchable and image-based), machine-readable formats. Adding support of a custom file format is also possible.
Our solution can parse structured, semi-structured, or completely unstructured documents applying ML models, heuristics, template-based approach, or a combination of these depending on the client's needs.
What are the main technologies chosen for the project?
As you can see in the picture below, the DEPS platform offers pluggable sub-systems for additional customization.
The solution works with several OCR engines including both free and paid. So, depending on the client's files we'll be able to propose the most optimal extraction engine or combination of engines.
What does DEPS offer those other solutions don't?
DEPS has distinct and unique solution characteristics that set it apart from its competitors in the market.
Among DEPS key differentiators are:
-
Cloud-agnostic deployment: deploy to existing client's cloud provider with saving current benefits or prepare optimal cloud resource plan for client's needs.
-
Tools and models selection: wide range of extraction tools and approaches including commercial solutions from market leaders and free community tools; in addition to its own ML models, DEPS can use externally trained models (by customer or community).
-
Support availability: the client can choose to support the DEPS solution either by their own team or by vendor efforts.
-
Easy customizable: end-to-end customer document workflow can be tailored using a variety of platform services as well as developing new functionality.
-
Quality of extraction: data accuracy achieved by validation rules, spell checking, comparison with dictionaries, model self-learning, and human-in-the-loop.
-
UI and labeling: built-in UI review station with convenient side-by-side document view and extensive labeling possibilities.
Who uses DEPS?
DEPS is used not only by internal EPAM teams, but by several business companies in the fields of insurance, medicine, oil and gas, retail, finance, and life science. We are working on a DEPS demo stand so that everyone interested can try a free demo. Stay tuned for updates.
What are the future plans for the DEPS solution?
We're constantly in the process of adding new features and improving existing ones. To mention a few, we plan to add multitenancy, improve model training possibilities and template-based extraction, allow users to customize validation rules, provide pre-processing, OCR, table detection, and other services for data scientists.
In addition to the development of the solution's core platform, POC and discoveries for the new clients are ongoing.
DEPS
Document processing, data extraction software