Recro
Recro.io - Data Scientist - SQL/Python
Job Location
bangalore, India
Job Description
Job Purpose : The Data Scientist will play a critical role in enhancing the Single Customer Identifier (SCI) solution by using advanced data analysis techniques to improve data quality and accuracy. The role requires deep expertise in data exploration, analysis, and validation, using Python, SQL, and Jupyter Notebooks to derive insights that drive the optimization of data matching logic, including fuzzy matching, and improve overall data quality. Key Responsibilities : - Perform continuous data exploration to identify patterns, anomalies, and potential improvements in data matching logic for SCI. - Utilize advanced analysis techniques, including fuzzy logic and machine learning models, to enhance the accuracy and efficiency of data matching algorithms. - Work with large datasets from Excel, databases, and other data sources to support data analysis and validation activities. - Analyze data to provide actionable insights that help improve data quality and match accuracy in the SCI solution. - Regularly review and assess data quality metrics and provide recommendations for enhancements to the data matching system. - Perform data cleansing and validation on large datasets to ensure consistency, accuracy, and completeness before batch ingestion. - Implement best practices for data governance, ensuring that all datasets comply with internal data quality standards. - Use Jupyter Notebooks to prepare and present weekly insights, demonstrating data analysis results and proposed improvements to the team and stakeholders. - Collaborate with the working group to prioritize analysis efforts, incorporate feedback, and showcase ongoing improvements during playback sessions. - Share findings in a clear, visual format to facilitate decision-making by both technical and non-technical stakeholders. - Assist with data acquisition, cleansing, and validation as part of the preparation process for batch ingestion of new markets in scope for SCI. - Collaborate with teams to ensure the successful ingestion of data and troubleshoot any issues that arise during this process. - Monitor batch processes to validate SCI records post-ingestion, ensuring all data has been processed accurately and efficiently. Technical Skills : - Strong proficiency in Python for data analysis, scripting, and automation. - Experience with libraries such as Pandas, NumPy, and Scikit-learn is essential. - Advanced knowledge of SQL for querying and manipulating large datasets in relational databases. - Hands-on experience in using Jupyter Notebooks to demonstrate analysis, visualize data, and share insights interactively with stakeholders. - Experience in performing data cleansing, validation, and transformation for large datasets (including Excel, CSV, or database-based sources). - Expertise in fuzzy logic and data matching techniques for identifying, grouping, and linking records. - Ability to create clear and effective visualizations of complex datasets and insights using tools such as Matplotlib, Seaborn, or Tableau. Qualifications : - Bachelor's degree in Computer Science, Data Science, Mathematics, Engineering, or a related field. - A Master's degree or equivalent experience is a plus : Minimum of 5 years of experience as a Data Scientist or in a related data analytics role. - Prior experience working with large datasets, data quality, data validation, and matching techniques is essential. - Familiarity with the Single Customer Identifier (SCI) solution or similar data solutions is highly desirable (ref:hirist.tech)
Location: bangalore, IN
Posted Date: 11/28/2024
Location: bangalore, IN
Posted Date: 11/28/2024
Contact Information
Contact | Human Resources Recro |
---|