About HouseWorks
Fueled by a real understanding of today’s challenges, HouseWorks is committed to a fundamental re-imagining of what it means to age. With over 20 years of operating experience, we have built a replicable service prototype, developed profitable, long-standing referral relationships, and created an innovative brand that positions us to serve the future customer. HouseWorks has grown to be one of the largest single-site private home care companies in the country and is dedicated to improving the health and well-being of its employees and the people it serves. We are embarking on an exciting new growth chapter that focuses on client service excellence, caregiver engagement, technological innovation, and growth in new markets.
The Opportunity
We are a mission-driven organization that is dedicated to improving the lives of seniors as they age. We are passionate about what we do -- providing seniors and their families with a comprehensive, vetted and coordinated in-home service network that is high-touch, tech-enabled, compassionate and extremely well managed. We are in the exciting and dynamic home healthcare industry. Our market opportunity is large and growing as the baby boomers age and the home increasingly becomes the epicenter for care as consumers demand convenience and lower-cost solutions.
Job Summary:
We are looking for an ETL Developer with experience working in an Agile/Scrum environment to join our data engineering team. In this role, you will be responsible for designing, developing, and maintaining scalable ETL (Extract, Transform, Load) pipelines using Apache Spark and Python (PySpark) in support our data infrastructure within the AWS ecosystem. You will work closely with data analysts, data scientists, and other stakeholders to ensure that data is available, reliable, and efficiently transformed for analytical and business purposes.
Job Responsibilities:
Design, develop, and implement ETL workflows, data models and pipelines using PySpark to meet business requirements
Extract data from multiple sources, transform it for consistency, and load it into data lake/repository.
Optimize PySpark scripts for performance, scalability, and efficiency, ensuring minimal resource consumption.
Ensure data accuracy and reliability through quality checks, error handling, and validations within ETL pipelines.
Monitor, troubleshoot, and enhance ETL pipeline performance.
Automate ETL processes and integrate with scheduling tools like Apache Airflow, AWS Eventbridge, or similar.
Manage and optimize data storage, compute resources, and security configurations on cloud platforms (e.g., AWS, Azure, GCP).
Develop and maintain technical documentation for ETL processes, workflows, and data definitions.
Participate in code reviews, provide feedback, and collaborate within an agile development environment.
Job Requirements: To perform this job successfully, an individual must be able to perform each essential duty satisfactorily. The requirements listed below are representative of the knowledge, skill, and/or ability required. HouseWorks will consider request for reasonable accommodations to enable individuals with disabilities to perform the essential functions.
3+ years of experience in designing and developing ETL pipelines using PySpark.
Bachelor’s degree in computer science, Data Analytics, IT or related field, strongly preferred.
Strong proficiency with Apache Spark and the Python programming language.
Experience working with distributed data processing systems and big data technologies.
Solid understanding of SQL and experience working with relational and non-relational databases (e.g., PostgreSQL, MySQL, MongoDB).
Experience with cloud platforms such as AWS (e.g., S3, EMR), Azure, or Google Cloud Platform.
Familiarity with workflow orchestration tools and processes.
Strong experience with data modeling, ETL best practices, and handling large-scale data transformations.
Hands-on experience in optimizing and tuning PySpark jobs for better performance and cost efficiency.
Familiarity with version control systems such as Git.
Understanding of data governance, data quality, and data security principles.
Strong problem-solving and analytical skills, with the ability to troubleshoot and identify root causes of data issues.
Knowledge of Kafka, AWS Glue, Databricks, or other real-time and cloud data integration tools.
Understanding of data warehousing concepts and experience with data warehouse solutions such as Snowflake, Redshift, or BigQuery.
Strong interpersonal skills with the ability to communicate effectively with both technical and non-technical stakeholders.
Work Environment and Physical Demands:
This position involves sitting for extended periods of time.
Ability to lift 15+ pounds.
Ability to work on a computer screen for extended periods of time.
HouseWorks is an Equal Opportunity Employer. We do not discriminate against race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), national origin, age, disability or genetic information.