The Metromax Group
Data Engineer - AWS Glue/EMR
Job Location
bangalore, India
Job Description
Key Responsibilities : - Design and develop data pipelines using AWS Glue, EMR, Spark Scala, and S3 to support both batch and real-time data processing needs. - Implement ETL processes to extract, transform, and load data from various sources (structured and unstructured) into the data lake. - Leverage Apache Spark on EMR for big data processing and transformations, using Spark Scala - Manage and optimize data storage on S3, ensuring proper data partitioning, file formats (Parquet, ORC, Avro), and lifecycle policies for cost-effective storage solutions. - Monitor, troubleshoot, and optimize EMR clusters for performance, scalability, and cost efficiency. - Collaborate with data architect and analysts to ensure seamless data integration and support advanced analytics and machine learning models. - Automate data workflows using AWS Step Functions, Lambda, and other orchestration tools. Required Qualifications : - Bachelor's or master's degree in computer science, Data Engineering, or a related technical field. - 3-5 years of experience in data engineering, particularly using AWS services (EMR, Glue, S3, Lambda). - Strong expertise in Apache Spark for distributed data processing, with hands-on coding experience in Scala and Python. - Experience with building ETL pipelines and working with big data in a cloud-based Lakehouse environment. - Deep understanding of data formats (Parquet, Avro, ORC) and file optimization techniques. - Familiarity with data modeling principles, including partitioning, bucketing, and schema management in AWS Glue Data Catalog. - Strong knowledge of SQL and query optimization for working with large datasets. - Experience with AWS security services such as IAM, KMS (Key Management Service), and encryption best practices. - Proficiency in troubleshooting and performance tuning of Spark and EMR clusters for large-scale data processing. - Familiarity with CI/CD pipelines and infrastructure-as-code (Terraform, CloudFormation) for managing AWS environments. Preferred Qualifications : - AWS Certified Data Analytics, Developer, or Solutions Architect certification. - Experience with streaming data technologies such as Kinesis or Kafka for real-time data ingestion. - Knowledge of serverless computing and experience with AWS Lambda, Step Functions, and DynamoDB. - Familiarity with DevOps and automation tools (e.g., Jenkins, Git, Docker). Soft Skills : - Strong problem-solving and analytical thinking skills. - Ability to work collaboratively in a fast-paced, cross-functional environment. - Excellent communication skills to explain complex technical issues to both technical and non-technical stakeholders. (ref:hirist.tech)
Location: bangalore, IN
Posted Date: 11/21/2024
Location: bangalore, IN
Posted Date: 11/21/2024
Contact Information
Contact | Human Resources The Metromax Group |
---|