Big Data Engineer

  • Work on the collecting, storing, processing, and analyzing of huge sets of raw data and translating analyses. Select the optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.  Responsible for integrating any Big Data tools and frameworks with the architecture used across the company.
  • Implementing ETL process
  • Monitoring performance and advising any necessary infrastructure changes
  • Defining data retention policies
  • Communicate with business users and data scientists to understand the business objectives and translate those objectives into data-processing workflows.
  • Should have a strong knowledge of statistics, extensive programming experience, ideally in Python or Java, and the ability to design and implement solutions for big data challenges.
  • Knowledge and experience in data mining, processing large amounts of raw data, and designing and maintaining relational databases for storage and data acquisition are desired.
  • Designing and implementing relational databases for storage and processing

 

Qualifications:

  • Bachelor’s degree in a related field and at least 4 years of experience.
  • Proficient understanding of distributed computing principles
  • Management of Hadoop cluster, with all included services
  • Ability to solve any ongoing issues with operating the cluster
  • Proficiency with Hadoop v2, MapReduce, HDFS
  • Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming
  • Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala
  • Experience with integration of data from multiple data sources
  • Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
  • Knowledge of various ETL techniques and frameworks, such as Flume
  • Experience with various messaging systems, such as Kafka or RabbitMQ
  • Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O
  • Good understanding of Lambda Architecture, along with its advantages and drawbacks
  • Experience with Cloudera/MapR/Hortonworks