Data Engineer – Streaming & Real-Time Storage
Job Title: Data Engineer – Streaming & Real-Time Storage
Level: Intermediate / Senior
Employment Type: Full-time
Role Overview
We are seeking a Data Engineer to own and optimize the data infrastructure that powers our automation and AI ecosystem. You will be responsible for ensuring high-concurrency, low-latency data flow, maintaining data integrity, and designing storage strategies that support both real-time analytics and AI model outputs.
This role requires expertise in streaming architectures, database optimization, and system integration, with a focus on maintaining performance as data volume grows.
Key Responsibilities
Pipeline Optimization & Streaming
Refine and manage Kafka stream consumers and producers for high-throughput, low-latency processing
Ensure timely ingestion of data from RPA sources to storage and analytical sinks
Monitor, troubleshoot, and optimize streaming pipelines for reliability and performance
Schema & Storage Design
Optimize relational (MySQL) and non-relational storage strategies for high-write environments
Design scalable schemas to support AI/ML outputs and downstream analytics
Implement storage solutions that balance speed, reliability, and query efficiency
Data Governance & Quality
Ensure data integrity, consistency, and quality across streaming pipelines
Collaborate with data engineers and analysts to enforce standards and monitoring
Implement validation and alerting mechanisms for real-time data
Scaling & Performance Strategy
Design high-performance ingestion patterns to replace basic database inserts where needed
Support infrastructure growth, ensuring the system scales with increasing data volumes
Provide guidance on architectural improvements and optimization opportunities
Technical Requirements
Streaming & Messaging: Expert knowledge of Kafka (Producers/Consumers, Connect, Schema Registry)
Database Engineering: Strong SQL optimization skills; experience in write-heavy, high-concurrency environments
System Integration: Experience building reliable connectors between distributed systems
Familiarity with real-time storage patterns and high-availability architectures
Experience monitoring and troubleshooting production data pipelines
Nice to Have
Experience with NoSQL or in-memory databases (Redis, Cassandra, etc.)
Knowledge of cloud-based streaming platforms (AWS Kinesis, GCP Pub/Sub, Azure Event Hubs)
Exposure to MLOps pipelines or real-time AI deployment scenarios
Familiarity with containerization and orchestration (Docker, Kubernetes)
Soft Skills
Strong problem-solving and analytical skills
Ability to operate in fast-paced, high-velocity environments
Effective collaboration with data scientists, ML engineers, and operations teams
Ownership mentality with a focus on performance, reliability, and scalability
Why Join
Own the data backbone of a cutting-edge automation and AI ecosystem
Shape high-performance streaming pipelines that directly power ML models
Work in a fast-moving, innovative, and distributed environment with strong technical ownership