Lakehouse Platform Software Developer

DataPelago

DataPelago

Software Engineering
Hyderabad, Telangana, India
Posted on Oct 2, 2024

Implement software to perform data processing operations in a lakehouse architecture, where source data is typically maintained in Open Table Format on cloud object stores. Work closely with technical leads and other individual developers in developing this software and to ensure data is processed accurately, efficiently, and reliably as part of the overall engine. Investigate and address issues in the software and develop necessary enhancements in an ongoing manner.

The developer will have the following specific responsibilities in achieving these objectives.

Responsibilities

  • Develop platform software for consistent, efficient, scalable distributed processing on cloud object store and other data sources.
  • Develop high-performance cloud object store access layer for Lakehouse data processing compute platform.
  • Characterize the performance of application workloads and benchmarks and optimize implementations.
  • Research advances in architectures, protocols, APIs, and formats for data lake integration, and enhance platform to take advantage of advanced capabilities.
  • Develop, maintain, and support production software.

Qualifications

  • S. EE/CS or equivalent with 5+ years of experience or MS with no 3+ years’ experience
  • Experience developing capabilities for distributed data processing platforms such as Apache Spark, Trino, Presto.
  • Experience implementing open table formats such as Iceberg, Delta Lake, and Hudi.
  • Experience working with data formats such as Parquet, ORC, and Avro.
  • Experience developing software for scalable access to distributed storage.
  • Experience developing, evaluating, and troubleshooting high-performance software.
  • Strong programming ability in C, C++. Rust experiences a strong plus