Cannot use apache hudi on dataproc

I am trying to use Apache Hudi component on Dataproc cluster

I ran the example code provided by Google, but it doesn't work. (

When I run the spark query w/ hudi I get the following error

  Failed to find data source: hudi. Please find packages at

Also, according to the documentation, the executable script should be located in the path below.


But it doesn't exist

Below is the cluster creation script used to use the hudi component.

gcloud dataproc clusters create hudi-poc \
  --enable-component-gateway --master-machine-type n2-standard-2 \
  --master-boot-disk-size 200 --num-workers 2 \
  --worker-machine-type e2-standard-2 --worker-boot-disk-size 100 \
  --image-version 2.1.2-ubuntu20  --region us-central1 \
  --scopes '' \
  --optional-components HUDI 

Has anyone had success using hudi components on dataproc cluster?

Try to add this property: 

--properties spark:spark.jars.packages="org.apache.hudi:hudi-spark3.3-bundle_2.12:0.12.0" \