Educational requirements: Bachelor
English requirements: Competent English
Requirements for skilled employment experience for years: 1-3 years
Required residence status: Temporary visa, Permanent resident, Citizen
Accept remote work: unacceptable
Experience in Data architecture on Cloud.
Have one or more of GCP Data Analytics services : Bigquery / Dataproc / Pubsub / Dataflow / GCS / Composer.
Have one or more of GCP Databases services : CloudSQL(MySQL/Postgres) / BigTable / Spanner.
Working experience on one or more of the Big Data/Hadoop distributions(or ecosystems) such as: Cloudera/Hortonworks, MapR, Azure HDInsight, IBM Open platform .
Experience in creating end to end data pipelines for both batch and real time sources.
Experience with either Java or python.
Working experience on one or more of the following Bigdata/Hadoop services: Spark(including streaming), Kafka, Storm, HBase, Ranger, HDFS, Hive, Flink, Druid.
Working experience on one or more of the following AWS Data services: EMR, Redshift, RDS, Athena, SQS/Kinesis.
Working experience of handling one or more of the following tasks:
Migrating Analytics platform / Datalake from on-prem/AWS to GCP
Knowledge of Spark 2.x / 3.x on YARN with Python, Java, Scala languages is a plus
Knowledge of building & operationalizing data pipelines
Have Exposure on exposure to scheduling engines like Azkaban, Apache Airflow, Oozie
Troubleshooting integration / workflow / performance bugs including scalability/reliability issues
Build self-service tools and utilities
Maintaining the security of Customer data
An understanding of Native and external tables, with different file formats : Avro, ORC, Parquet will be advantageous, as well as experience on any scripting language such as Python, and database replication, HAT setup, DR setup.