TransOrg Analytics
Data Engineer - Data Pipeline
Job Location
delhi, India
Job Description
Job Description_ Data Engineer _ TransOrg Analytics Introduction : TransOrg Analytics specializes in Data Science, Data Engineering and Generative AI, providing advanced analytics solutions to industry leaders and Fortune 500 companies across India, US, APAC and the Middle East. We leverage data science to streamline, optimize, and accelerate our clients' businesses. Visit at www.transorg.com to know more about us. Responsibilities : - Design, develop, and maintain robust data pipelines using Azure Data Factory and Databricks workflows. - Implement and manage big data solutions using Azure Databricks. - Design and maintain relational databases using Azure Delta Lake. - Ensure data quality and integrity through rigorous testing and validation. - Monitor and troubleshoot data pipelines and workflows to ensure seamless operation. - Implement data security and compliance measures in line with industry standards. - Continuously improve data infrastructure (including CI/CD) for scalability and performance. - Design, develop, and maintain ETL processes to extract, transform, and load data from various sources into Snowflake. - Utilize ETL tools (e.g., ADF, Talend) to automate and manage data workflows. - Develop and maintain CI/CD pipelines using GitHub and Jenkins for automated deployment of data models and ETL processes. - Monitor and troubleshoot pipeline issues to ensure smooth deployment and integration. - Design and implement scalable and efficient data models in Snowflake. - Optimize data structures for performance and storage efficiency. - Collaborate with stakeholders to understand data requirements and ensure data integrity - Integrate multiple data sources to create data lake/data mart Perform data ingestion and ETL processes using SQL, Scoop, Spark or Hive - Monitor job performances, manage file system/disk-space, cluster & database connectivity, log files, manage backup/security and troubleshoot various user issues - Design, implement, test and document performance benchmarking strategy for platforms as well as for different use cases - Setup, administer, monitor, tune, optimize and govern large scale implementations - Drive customer communication during critical events and participate/lead various operational improvement initiatives Qualifications, Skill Set and competencies : - Location- Gurgaon - Bachelor's in Computer Science, Engineering, Statistics, Math's or related quantitative degree. - 3 - 6 years of relevant experience in data engineering. - Must have worked on any of the cloud engineering platforms - Azure, GCP, Cloudera - Proven experience as a Data Engineer with a focus on Azure cloud technologies/Snowflake. - Strong proficiency in Azure Data Factory, Azure Databricks, ADLS, and Azure SQL Database. - Experience with big data processing frameworks like Apache Spark. - Expert level proficiency in SQL and experience with data modeling and database design. - Knowledge of data warehousing concepts and ETL processes. - Strong focus on PySpark, Scala and Pandas. - Proficiency in Python programming and experience with other data processing frameworks. - Solid understanding of networking concepts and Azure networking solutions. - Strong problem-solving skills and attention to detail. - Excellent communication and collaboration skills. - Azure Data Engineer certification - AZ-900 and DP-203 (mandatory) - Familiarity with DevOps practices and tools for CI/CD in data engineering. - Certification: MS Azure / DBR Data Engineer - Data Ingestion - Coding & automating ETL pipelines, both batching & streaming. - Should have Worked on both ETL or ELT methodologies using any of traditional & new age tech stack- SSIS, Informatica, Databricks, Talend, Glue, DMS, ADF, Spark, Kafka, Storm, Flink etc. - Data transformation - Experience working with MPPs, big data & distributed computing frameworks on cloud or cloud agnostic tech stack- Databricks, EMR, Hadoop, DBT, Spark etc, - Data storage - Experience working on data lakes, lakehouse architecture- S3, ADLS, Blob, HDFS - DWH - Strong experience modelling & implementing DWHing on tech like Redshift, Snowflake, Azure Synapse, Bigquery, Hive - Orchestration & lineage - Airflow, Oozie etc. (ref:hirist.tech)
Location: delhi, IN
Posted Date: 11/26/2024
Location: delhi, IN
Posted Date: 11/26/2024
Contact Information
Contact | Human Resources TransOrg Analytics |
---|