BlueByte Technologies
Data Engineer - Spark/Hadoop
Job Location
in, India
Job Description
Job Description : Main Skills : Apache Airflow, Java, Maven, SQL, GCP services like Big Query, Cloud Composer, Data Proc, Data Flow Design & Implement Data Pipelines : - Develop, implement, and maintain scalable data pipelines using Google Cloud Dataflow and Apache Beam. - Ensure the pipelines can process large-scale data efficiently with proper data validation, transformation, and loading. Cloud Infrastructure & GCP Services : - Leverage a variety of GCP services including BigQuery, Cloud Storage, Pub/Sub, Cloud Functions, and Cloud Composer to build, deploy, and manage data workflows. - Utilize Google Cloud SDK and other cloud tools for managing cloud resources and automating workflows. Optimize Data Flow & Performance : - Monitor and optimize pipeline performance to ensure that data processing is cost-effective and efficient, meeting service-level agreements (SLAs). - Troubleshoot and resolve issues related to data quality, pipeline execution failures, and performance bottlenecks. Data Quality & Transformation : - Implement data validation and cleaning techniques to ensure the accuracy and consistency of data throughout the pipeline. - Develop transformation logic to process structured, semi-structured, and unstructured data from various sources. Collaboration & Documentation : - Collaborate with data scientists, analysts, and other stakeholders to ensure data flows meet the analytical needs of the business. - Maintain clear documentation for data pipeline designs, architecture, and operational procedures. Automation & CI/CD : - Implement automation strategies for pipeline deployment, testing, and monitoring using CI/CD tools such as Cloud Build, Jenkins, or GitLab CI. Security & Compliance : - Follow best practices for securing data and ensuring compliance with industry regulations, including encryption, access control, and auditing. Reporting & Monitoring : - Implement monitoring and alerting for data pipelines using tools such as Google Stackdriver, Cloud Monitoring, and Cloud Logging. - Generate reports on pipeline health, data quality, and performance for internal stakeholders. Required Skills and Qualifications : Experience : - 3 years of experience in data engineering or cloud engineering, specifically working with Google Cloud Platform (GCP). - Proficiency in building data pipelines using Google Dataflow, Apache Beam, or similar tools. - Strong experience with BigQuery, Cloud Storage, Pub/Sub, and Cloud Functions for data processing and management. Technical Skills : - Expertise in SQL and scripting languages (e.g., Python, Java, Scala). - Experience with distributed data processing and big data technologies such as Apache Hadoop, Spark, or Kafka. - Understanding of data modeling, ETL processes, and data warehousing. - Familiarity with cloud security concepts, including IAM roles, encryption, and network security in GCP. Soft Skills : - Strong analytical and problem-solving abilities. - Excellent communication skills for collaborating with cross-functional teams. - Ability to manage multiple projects and priorities in a fast-paced environment. Education : - Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field. - Relevant certifications, such as Google Cloud Professional Data Engineer, are a plus. (ref:hirist.tech)
Location: in, IN
Posted Date: 2/5/2025
Location: in, IN
Posted Date: 2/5/2025
Contact Information
Contact | Human Resources BlueByte Technologies |
---|