The Company Hitachi Vantara, a wholly-owned subsidiary of Hitachi, Ltd., guides our customers from what's now to what's next by solving their digital challenges. Working alongside each customer, we apply our unmatched industrial and digital capabilities to their data and applications to benefit both business and society. More than 80% of the Fortune 100 trust Hitachi Vantara to help them develop new revenue streams, unlock competitive advantages, lower costs, enhance customer experiences, and deliver social and environmental value.
The Role Total Experience - 12-14 Years
Must Have:- Basics of Distributed computing
- MapReduce
- Distributed computing vs RDBMS/ scale up vs scale out
- Hands on experience in any one of the programming languages (Java, Python, Scala)
- Understanding of Linux and Bash scripting
- Knowledge of SQL
- Basics of Hadoop framework, problem patterns that can be solved like filtering, aggregation, joins etc
- Understanding of Spark concepts like RDD, Dataframes, Clousures etc., has implemented at least one project using Spark and Scala
- Should have worked on at least 1-2 bigdata projects (Could be ingestion, ETL processing) on the Cloudera Platform
- Understanding of Hive/Pig, concepts like partitioning, bucketing, metastore, schema on read vs schema on write, SerDe
- Solid programming fundaments /design concepts.
- In depth understanding of different batch and stream processing technologies and NoSQL storage
- Demonstrated work experience as an Sr.Developer/ Jr. Architect role in Bigdata/Cloud and opensource technology stack.
- Should be able to articulate, suggest right use of technology stack for different use cases with reasoning.
- Understanding of Lambda, Kappa architecture
- Should have participated or able to suggest right hardware choices, platform components, distributions etc.
Good to Have:- Programming concepts
- Object oriented vs Functional programming concepts
- Design patterns (Singleton, Immutable, Factory)
- MapReduce Programming like Combiner, Partitioiner, InputFormat/OutputFormat, Serialization
- Distributed Computing
- Scale up vs Scale out
- Scala hands on, SparkSQL, dataframes etc.
- Understanding of different storage formats Avro, RCFile, ORC, Parquet
- Has worked/working on any one of the cloud platform AWS, Azure, GCP
- Has worked/working on any one of the bigdata platforms like Hortonworks, Cloudera, Datastacks, Databricks
- Aware of latest technology trends in streaming, real-time, batch processing frameworks (Storm, Apache Beam, Flink, Spark, Kafka Connect etc)
- Certified in any of the bigdata distribution (Hortonworks/Cloduera/Databricks/Datastacks)
We are an equal opportunity employer. All applicants will be considered for employment without attention to age, race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.