Paxata - Change the data from chaos to clarity - data analysis using DataRobot Data Prep.
Devopsbay helped a multinational manufacturing company on a project to speed up the data preparation process by 70% by implementing DataRobot Data Prep. The project focused on automating the cleaning and transformation of data from multiple sources, significantly reducing the time required to prepare data for analysis.
- Devops
- MLOps
- Infrastructure
- Helm
- Terraform
- Java
- +6DevopsMLOpsInfrastructureHelmTerraformJava
about the project
DataRobot Data Prep (formerly known as Paxata) is an advanced data preparation and management tool.
This solution allows organisations to efficiently collect, explore and prepare data from multiple sources for machine analysis. The three main solutions offered by Paxata are:
- Data cleansing automation - the system automatically detects and fixes errors, inconsistencies and gaps in data, significantly speeding up the data preparation process.
- Advanced integration of data sources - the platform enables data from different systems (including databases, local files, the cloud) to be combined into one coherent set.
- Shared and reused workflows - users can save and share both the data and the steps used to prepare it, making teamwork more efficient.
the challenge
Streamline work on large data sets.
technologies
JDBC
Kerberos
Apache Hive
Apache Impala
Azure Data Lake Storage
Databricks
Kubernetes
Helm
NGINX
Bash
Kustomize
- JDBC (Java Database Connectivity)
A framework for connecting to different databases, used to import and export data from multiple sources.
- Kerberos
An authentication system used for secure access to Hadoop clusters and Big Data services.
- Apache Hive
A data processing framework used to analyze large data sets in a Hadoop environment.
- Apache Impala
SQL query engine used for fast data processing in Hadoop clusters.
- Azure Data Lake Storage
A data storage solution used in integration with Microsoft Azure services.
- Databricks
A data processing platform used for advanced analytics and machine learning.
- Interactive Mode
Paxat's own solution for working with real-time data as it is loaded.
- Project Flows
A data preparation automation system for creating repeatable workflows.
- Kubernetes
A container orchestration platform used to manage and scale Paxata microservices in a production environment.
- Helm
Kubernetes package manager used to define, install and update Paxata applications.
- NGINX
Proxy server used as an access layer and load balancer for applications.
- Bash Scripting
Scripts that automate deployment and environment management processes.
- Kustomize
Kubernetes manifest customization tool used to manage different environments.
Results
- Enhanced platform performance
- Automated error detection
- Improved team collaboration
Enhanced platform performance
Devopsbay's support has optimized the platform's overall performance, enabling faster data processing and a more robust, reliable system for large-scale data operations.
Automated error detection
The system now automatically detects and corrects data errors, with standardized formats that ensure data consistency across sources, minimizing manual intervention and errors.
Improved team collaboration
With shared workflows and reusable processes, teams can collaborate seamlessly on data preparation and analysis, leading to faster insights and improved efficiency across departments.
Enhanced platform performance
Devopsbay's support has optimized the platform's overall performance, enabling faster data processing and a more robust, reliable system for large-scale data operations.
Benefits
- 1
Data Integration
Effective integration of multiple data sources into one complete system
- 2
System Reliability
Ensure high system availability by using NGINX
- 3
Error Reduction
Elimination of errors resulting from manual data processing
- 4
Team Collaboration
Improving collaboration between analytical teams
- 5
Time Efficiency
70% reduction in the time needed to prepare data for analysis
client's feedback
The team has been creative about meeting our needs as the climate of the project changes. Thanks to the expertise of Devopsbay, the company is able to significantly grow their customer base from 150 to 350. The team excels in communication and project management, but internal stakeholders are particularly impressed with their development flexibility.
Aaron Vitt
Sr. Engineering Manager, Paxata
you may also like
Enhancing advance MLOps platform
Devopsbay worked with Algorithmia on a platform for managing AI/ML models. We implemented central management and flexible deployment options. We added integrations with Kafka and Bitbucket SCM. The results were faster model deployment, better scalability and lower operational costs. The client gained a comprehensive tool for managing the lifecycle of AI/ML models.
Optimize e-commerce costs
The {descrb} project aimed to optimise e-commerce costs by automating the creation of product descriptions. We used the synergy of NLP models and our own hosted LLama for better data control. We also implemented a Confidence Index to assess the quality of the content generated. The results? A reduction in description creation time from 30 minutes to less than a minute, an increase in conversions by 25% and traffic by 10%.