Hitachi Vantara, a wholly-owned subsidiary of Hitachi, Ltd., guides our customers from what's now to what's next by solving their digital challenges. Working alongside each customer, we apply our unmatched industrial and digital capabilities to their data and applications to benefit both business and society. More than 80% of the Fortune 100 trust Hitachi Vantara to help them develop new revenue streams, unlock competitive advantages, lower costs, enhance customer experiences, and deliver social and environmental value. Responsibilities
Required Technical and Professional Expertise
- Build, and test an optimal data pipeline architecture (preferably on any cloud environments -AWS experience is a must)
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Rich experience in Data ingestion using Lambda Function, Python, Glue etc., from variety of sources.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and cloud-based big data technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Understand and implement practices to comply with PHI, GDPR and other emerging data privacy initiatives.
Preferred Technical and Professional Expertise
- 5-6 years of overall experience & 5 years of experience in a Data Engineer role
- Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases (SQL Server, Oracle) & AWS services (RDS, Redshift, EC2, EMR etc.,)
- Solid experience building and optimizing big data data pipelines, architectures and data sets.
- Deep knowledge and experience with JSON and XML schemas and documents.
- Experience in Python data structures and data engineering libraries is mandatory
- Working knowledge of REST and implementation patterns pertaining to Data and Analytics.
- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- Working knowledge of message queuing, stream processing, and highly scalable big data data stores (Kafka, Kinesis, Storm) and ETL orchestration tools such as Airflow.
- Working on cloud-based big data automation and orchestration solution
- Understanding of Business Intelligence and Data Warehousing concepts and methods.
- Fully conversant with big-data processing approaches and schema-on-read methodologies. Preference for deep understanding of Spark, Databricks and Delta Lake, and applying them to solve data science and machine learning business problems.
- AWS/Azure DevOps - CI/CD.
We are an equal opportunity employer. All applicants will be considered for employment without attention to age, race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.