Responsabilities Key technical skills and experience
• Collaborate with cross-functional teams to understand data requirements and integrate data from multiple sources, ensuring data consistency and quality.
• Implement and maintain robust data pipelines to collect, process, and transport data from various streaming and batch sources to our data storage and analytical systems.
• Clean, preprocess, and transform data as needed, utilizing ETL and ELT processes and tools to prepare data for analysis.
• Develop data validation processes and monitoring to ensure data accuracy and reliability.
• Continuously optimize data pipelines for efficiency, scalability, and cost-effectiveness.
• Work with data security team to implement and maintain data security measures to protect sensitive information and ensure compliance with data privacy regulations.
• Maintain clear and comprehensive documentation of data pipelines, workflows, and systems.
• Work closely with data scientists, analysts, devOps and stakeholders to understand their data needs and provide support for data-related projects.
• Monitor data pipelines and proactively address issues to ensure data availability and reliability.
• Steer technical migration and upgrades including migration to cloud.
• Collaborate with project management team to implement data governance and embed it into processes.
• Coach and mentor team to drive towards effectiveness and collaboration
• Work with Engineering Manager and peer Tech Leads to plan, build and execute technical sustainability roadmap
• Strong knowledge of implementing and maintaining scalable streaming and batch data solutions with high throughput and strict SLA.
• Extensive experience of working in production with Kafka, Spark and Kafka Stream including best practices, observability, optimization, performance tuning.
• Knowledge of Python and Scala including best practices, architecture, dependency management
• Experience of working in production with Ansible, Graphite, Kafka Stream, Kubernetes, Airflow, Mesos, Marathon, Grafana, ELK, Graphite, Pyspark
• Hands on experience in Google Cloud Data Engineering tools focused on BigQuery , Dataproc , Dataform , Dataflow, Composer, Google Cloud Storage
• Experience with Linux OS
• Experience with Git and versioning control.
• Experience with ArgoCD
• Experience with handling complex range of multi-tenant data products in production
• Experience in leading geo distributed teams
• Experience in working with Agile , Scrum and Kanban methodologies.