BlueByte Technologies

Data Engineer - Spark/Hadoop

Click Here to Apply

Job Location

in, India

Job Description

Job Description : Main Skills : Apache Airflow, Java, Maven, SQL, GCP services like Big Query, Cloud Composer, Data Proc, Data Flow Design & Implement Data Pipelines : - Develop, implement, and maintain scalable data pipelines using Google Cloud Dataflow and Apache Beam. - Ensure the pipelines can process large-scale data efficiently with proper data validation, transformation, and loading. Cloud Infrastructure & GCP Services : - Leverage a variety of GCP services including BigQuery, Cloud Storage, Pub/Sub, Cloud Functions, and Cloud Composer to build, deploy, and manage data workflows. - Utilize Google Cloud SDK and other cloud tools for managing cloud resources and automating workflows. Optimize Data Flow & Performance : - Monitor and optimize pipeline performance to ensure that data processing is cost-effective and efficient, meeting service-level agreements (SLAs). - Troubleshoot and resolve issues related to data quality, pipeline execution failures, and performance bottlenecks. Data Quality & Transformation : - Implement data validation and cleaning techniques to ensure the accuracy and consistency of data throughout the pipeline. - Develop transformation logic to process structured, semi-structured, and unstructured data from various sources. Collaboration & Documentation : - Collaborate with data scientists, analysts, and other stakeholders to ensure data flows meet the analytical needs of the business. - Maintain clear documentation for data pipeline designs, architecture, and operational procedures. Automation & CI/CD : - Implement automation strategies for pipeline deployment, testing, and monitoring using CI/CD tools such as Cloud Build, Jenkins, or GitLab CI. Security & Compliance : - Follow best practices for securing data and ensuring compliance with industry regulations, including encryption, access control, and auditing. Reporting & Monitoring : - Implement monitoring and alerting for data pipelines using tools such as Google Stackdriver, Cloud Monitoring, and Cloud Logging. - Generate reports on pipeline health, data quality, and performance for internal stakeholders. Required Skills and Qualifications : Experience : - 3 years of experience in data engineering or cloud engineering, specifically working with Google Cloud Platform (GCP). - Proficiency in building data pipelines using Google Dataflow, Apache Beam, or similar tools. - Strong experience with BigQuery, Cloud Storage, Pub/Sub, and Cloud Functions for data processing and management. Technical Skills : - Expertise in SQL and scripting languages (e.g., Python, Java, Scala). - Experience with distributed data processing and big data technologies such as Apache Hadoop, Spark, or Kafka. - Understanding of data modeling, ETL processes, and data warehousing. - Familiarity with cloud security concepts, including IAM roles, encryption, and network security in GCP. Soft Skills : - Strong analytical and problem-solving abilities. - Excellent communication skills for collaborating with cross-functional teams. - Ability to manage multiple projects and priorities in a fast-paced environment. Education : - Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field. - Relevant certifications, such as Google Cloud Professional Data Engineer, are a plus. (ref:hirist.tech)

Location: in, IN

Posted Date: 2/5/2025
Click Here to Apply
View More BlueByte Technologies Jobs

Contact Information

Contact Human Resources
BlueByte Technologies

Posted

February 5, 2025
UID: 4983636859

AboutJobs.com does not guarantee the validity or accuracy of the job information posted in this database. It is the job seeker's responsibility to independently review all posting companies, contracts and job offers.