Senior Data Engineer
Job Summary
We are seeking a Senior Data Engineer to join our AWS Cloud team. In this role, you will design and develop data ingestion pipelines from various sources into the cloud. You will lead the delivery of data products using cloud-native strategies and best practices, leveraging your 15+ years of IT experience.
Required Skills & Experience:
1. 15 years of experience in design and delivery of Distributed Systems capable of handling petabytes of data in a distributed environment.
2. 10 years of experience in the development of Data Lakes with Data Ingestion from disparate data sources, including relational databases, flat files, APIs, and streaming data.
3. Experience in providing Design and development of Data Platforms and data ingestion from disparate data sources into the cloud.
4. Expertise in core AWS Services including AWS IAM, VPC, EC2, EKS/ECS, S3, RDS, DMS, Lambda, CloudWatch, CloudFormation, CloudTrail, CloudWatch.
5. Proficiency in programming languages like Python and PySpark to ensure efficient data processing. preferably Python.
6. Architect and implement robust ETL pipelines using AWS Glue, defining data extraction methods, transformation logic, and data loading procedures across different data sources
7. 15 years of Experience in using IaC tools like Terraform etc.
8. 10 years of experience in development of CI/CD pipelines (GitHub Actions, Jenkins).
9. Experience in the development of Event-Driven Distributed Systems in the Cloud using Serverless Architecture.
10. Ability to work with Infrastructure team for AWS service provisioning for databases, services, network design, IAM roles and AWS cluster.
11. 2-3 years of experience working with Document DB.
12. Ability to design, orchestrate and schedule jobs using Airflow.
13. Knowledge of AWS AI Services like AWS Entity Resolution, AWS Comprehend.
14. Ability to run custom LLMs using Amazon SageMaker.
15. Ability to use Large Language Models (LLMs) for Data Classification and Identification of PII data entities
Desired Skills & Experience:
1. 10 years of experience in the development of Data Audit, Compliance and Retention standards for Data Governance, and automation of the governance processes.
2. Experience in data modelling with NoSQL Databases like Document DB.
3. Experience in using column-oriented data file format like Apache Parquet, and Apache Iceberg as the table format for analytical datasets.
4. Expertise in development of Retrieval-Augmented Generation (RAG) and Agentic Workflows for providing context to LLMs based on proprietary enterprise data.
5. Ability to develop re-ranking strategies using results from Index and Vector stores for LLMs to improve the quality of the output
Whether enhancing existing robust platforms like Salesforce and Shopify or creating a new, resilient platform from scratch, Compileinfy is your definitive go-to technology provider, delivering unmatched expertise and results.
contactus@compileinfy.com
© 2025 by Compileinfy. All rights reserved.