Data Engineer – Streaming & Real-Time Storage

Islamabad, Islamabad, Pakistan

Full Time

Manager/Supervisor

Job Title: Data Engineer – Streaming & Real-Time Storage

Level: Intermediate / Senior
Employment Type: Full-time

Role Overview

We are seeking a Data Engineer to own and optimize the data infrastructure that powers our automation and AI ecosystem. You will be responsible for ensuring high-concurrency, low-latency data flow, maintaining data integrity, and designing storage strategies that support both real-time analytics and AI model outputs.

This role requires expertise in streaming architectures, database optimization, and system integration, with a focus on maintaining performance as data volume grows.

Key Responsibilities

Pipeline Optimization & Streaming

Refine and manage Kafka stream consumers and producers for high-throughput, low-latency processing
Ensure timely ingestion of data from RPA sources to storage and analytical sinks
Monitor, troubleshoot, and optimize streaming pipelines for reliability and performance

Schema & Storage Design

Optimize relational (MySQL) and non-relational storage strategies for high-write environments
Design scalable schemas to support AI/ML outputs and downstream analytics
Implement storage solutions that balance speed, reliability, and query efficiency

Data Governance & Quality

Ensure data integrity, consistency, and quality across streaming pipelines
Collaborate with data engineers and analysts to enforce standards and monitoring
Implement validation and alerting mechanisms for real-time data

Scaling & Performance Strategy

Design high-performance ingestion patterns to replace basic database inserts where needed
Support infrastructure growth, ensuring the system scales with increasing data volumes
Provide guidance on architectural improvements and optimization opportunities

Technical Requirements

Streaming & Messaging: Expert knowledge of Kafka (Producers/Consumers, Connect, Schema Registry)
Database Engineering: Strong SQL optimization skills; experience in write-heavy, high-concurrency environments
System Integration: Experience building reliable connectors between distributed systems
Familiarity with real-time storage patterns and high-availability architectures
Experience monitoring and troubleshooting production data pipelines

Nice to Have

Experience with NoSQL or in-memory databases (Redis, Cassandra, etc.)
Knowledge of cloud-based streaming platforms (AWS Kinesis, GCP Pub/Sub, Azure Event Hubs)
Exposure to MLOps pipelines or real-time AI deployment scenarios
Familiarity with containerization and orchestration (Docker, Kubernetes)

Soft Skills

Strong problem-solving and analytical skills
Ability to operate in fast-paced, high-velocity environments
Effective collaboration with data scientists, ML engineers, and operations teams
Ownership mentality with a focus on performance, reliability, and scalability

Why Join

Own the data backbone of a cutting-edge automation and AI ecosystem
Shape high-performance streaming pipelines that directly power ML models
Work in a fast-moving, innovative, and distributed environment with strong technical ownership

Apply for this position

Required*

First Name*

Last Name*

Email Address*

Phone*

Address

Resume*

We've received your resume. Click here to update it.

Attach resume or Paste resume

Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Desired salary*

Earliest start date?*

Human Check*

Submit Application

Datamatics Technologies

Thanks for visiting our Career Page. Please review our open positions and apply to the positions that match your qualifications.