In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Kafka
Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, YARN and MapReduce concepts etc.
Experience with the practical application of data warehousing concepts, methodologies, and frameworks using traditional (Oracle, Teradata, etc.) and modern (SparkSQL, Hadoop, Kafka) distributed technologies.
Experience with enterprise data management, Business Intelligence, data integration, and SQL database implementations
Hands on experience in architecting, designing, and implementing data ingestion pipes on Cloudera/Horton Works platform at scale.
Experience in successfully manipulating, processing, and extracting value from large and disconnected data sets.
Architect and Develop data ingestion process, Experience working with structured unstructured data sets.
Data warehouse implementation, backup, and recovery strategies
Experience in Designing Reporting Application in Big data Platform.
Experience writing complex, high performance queries and experience with distributed querying like Spark SQL for Hive.
Minimum 2-3 years hands on experience with Spark with design and performance tuning.
Research and resolve all technical performance bottlenecks that implement Spark;
Should have experience in designing solutions for multiple large data warehouses with a good understanding of cluster and parallel architecture as well as high-scale or distributed RDBMS and/or knowledge on NoSQL platforms.
Actively participating in high level engineering team activities such as suggesting architecture improvements, recommending process improvements and conducting tool evaluations.
Experience working with complex data models, large databases, extensive reporting and data analysis is a plus.
Solution architecture experience in Big Data technologies e.g. Spark, Hadoop, MapR etc.
Hands on experience of working in Linux including Kerberos etc and provide profiling and optimization guidance and tuning tips and tricks
Data visualization experience. Data Catalog and data lineage experience is plus.
Experience in Data reconciliation between multiple sources, Data Quality tools and entitlements integration
Experience in providing solutions for handling data encryption, pii and confidential data sets.
Propose solutions for Web services, API solutions for Data Retrieval on Hadoop cluster.
Share this job
We're ready to get started, are you?
Get in touch and we can connect you with the right people.