Key Responsibilities
-
Lead the design, development, and maintenance of scalable and high-performance ETL/ELT data pipelines for batch and real-time processing.
-
Serve as the technical expert for the team, making critical architectural decisions and driving the implementation of best practices in data engineering.
-
Hands-on development using Python, Scala, and advanced SQL to solve complex data manipulation and transformation challenges.
-
Implement and manage the LakeHouse architecture using the Medallion structure (Bronze, Silver, Gold layers) and ensure platform scaling capability..
-
Own data ingestion practices, including Change Data Capture (CDC) mechanisms like Debezium,OLake, and utilizing message brokers such as Kafka and Google Pub/Sub.
-
Collaborate with stakeholders to understand data requirements and translate them into efficient data models (Fact, Dimension, Data Marts).
-
Optimize and tune data processing jobs running on Apache Spark (both batch and structured streaming) and cloud-based query engines.
-
Ensure data quality, integrity, and security across all data platforms.
-
Mentor junior and mid-level engineers, conduct code reviews, and foster a culture of technical excellence and continuous improvement.