Data Platform - Migration Engineer

Posted 2026-05-06
Remote, USA Full-time Immediate Start

ROLE SUMMARY

We are looking for a Senior Data Platform / Migration Engineer to lead the modernization of an enterprise data ecosystem, including migration from Cloudera DataIQ DSS to MapR. This role requires deep expertise in large-scale distributed data systems, migration strategy, and performance optimization, with a strong focus on zero data loss, minimal downtime, and production stability.

    KEY RESPONSIBILITIES
  • Lead end-to-end migration of enterprise data lake from Cloudera (DataIQ, DSS, CDP) to MapR
  • Define and execute migration strategy ensuring data integrity, minimal downtime, and rollback readiness
  • Design and build scalable, production-grade data pipelines post-migration
  • Optimize cluster performance including compute, storage, and resource utilization
  • Partner with BI/reporting teams to ensure schema consistency and data availability
  • Implement data validation frameworks to ensure accuracy and completeness post-migration
  • Document architecture, runbooks, lineage, and operational procedures
  • Collaborate with governance teams on data quality, lineage, and compliance requirements
    REQUIRED SKILLS AND EXPERIENCE
  • 8+ years in Data Engineering / Data Platform Engineering
  • Strong hands-on experience with Cloudera (CDP, DSS, DataIQ) and/or MapR
  • Strong hands-on experience with Apache Spark, Hive, Hadoop, HDFS
  • Proven experience executing large-scale data lake migrations
  • Strong programming skills in Python, Scala, or SQL
  • Deep understanding of distributed data processing and storage systems
  • Experience with ETL/ELT frameworks (Informatica, Talend, dbt, or similar)
    PREFERRED QUALIFICATIONS
  • Prior MapR implementation or certification
  • Experience with streaming platforms (Kafka, Pulsar)
  • Exposure to cloud-native data platforms (AWS S3, Azure Data Lake, Google Cloud Platform)
  • Familiarity with data governance, lineage, and catalog tools
  • Experience working in high-scale enterprise environments (multi-terabyte/petabyte)

CORE TECHNOLOGY STACK

Cloudera DSS / DataIQ / CDP, MapR, Apache Spark, Hive, Hadoop, HDFS, Kafka, Python, SQL, dbt, Informatica / Talend

Similar Jobs

Back to Job Board