who sent the job to dataproc?

Hello community,

I need to know who started a job in the dataproc cluster... I need to know if it was a service account, or a common user. And really know which account he used.
Does anyone know any commands, or how to find them in logs explorer?

Solved Solved
0 3 434
1 ACCEPTED SOLUTION

The Dataproc Jobs API does not directly provide information about who submitted the job. However, you can use the following alternative methods to find out who started a job in the Dataproc cluster:

Method 1: Accessing Dataproc Job Logs

  • Logs Explorer:

    1. Open the Logs Explorer in the Google Cloud Console.
    2. Select the Dataproc Cluster and the Cluster UUID.
    3. Filter the logs using resource.type="cloud_dataproc_cluster" and protoPayload.methodName="google.cloud.dataproc.v1.JobController.SubmitJob".
    4. Look for the SubmitJob method call in the logs.
    5. The principalEmail field within protoPayload.authenticationInfo indicates the identity (user or service account) of the caller who submitted the job.
  • gcloud logging command:

    • Run the following command to filter Dataproc job logs:
      gcloud logging read "resource.type=cloud_dataproc_cluster AND protoPayload.methodName=google.cloud.dataproc.v1.JobController.SubmitJob" --project=${PROJECT_ID}
      Replace ${PROJECT_ID} with your actual project ID.
    • Search the logs for the principalEmail field within authenticationInfo. This indicates the identity used to submit the job.

Method 2: Dataproc Job History Server

  • If you have enabled the Dataproc Persistent History Server, you can view the job details, including the submitter.
    1. Go to the Dataproc Jobs section in the Google Cloud console.
    2. Click on the Job ID.
    3. Click on the "YARN Timeline Server" link.
    4. Navigate to the "Flow Activity" tab.
    5. This will display the job details, including the submitter name under the "Submitted by" column.

Method 3: Audit Logs

  • If Cloud Audit Logs are enabled in your project, they can be used to find who started a Dataproc job.
    1. Go to the Cloud Audit Logs section in the Google Cloud console.
    2. Filter the logs by the dataproc.jobs.submit event type.
    3. This will display the logs for Dataproc job submissions, including the user or service account that submitted the job.

Additional Notes:

  • The availability of submitter information may vary depending on the method used and the version of Dataproc.
  • Ensure you have the necessary permissions to view logs and audit information.
  • Be aware of the time frame you are investigating, as logs are retained for a specific period.
  • There might be a slight delay in log data appearing in the Logs Explorer or Audit Logs.
  • If the submitter information is not readily available, you might have to contact your Google Cloud administrator for further assistance.

View solution in original post

3 REPLIES 3

I tried to look here, but it doesn't appear who executed it: https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/jobs/${JOB_ID}

The Dataproc Jobs API does not directly provide information about who submitted the job. However, you can use the following alternative methods to find out who started a job in the Dataproc cluster:

Method 1: Accessing Dataproc Job Logs

  • Logs Explorer:

    1. Open the Logs Explorer in the Google Cloud Console.
    2. Select the Dataproc Cluster and the Cluster UUID.
    3. Filter the logs using resource.type="cloud_dataproc_cluster" and protoPayload.methodName="google.cloud.dataproc.v1.JobController.SubmitJob".
    4. Look for the SubmitJob method call in the logs.
    5. The principalEmail field within protoPayload.authenticationInfo indicates the identity (user or service account) of the caller who submitted the job.
  • gcloud logging command:

    • Run the following command to filter Dataproc job logs:
      gcloud logging read "resource.type=cloud_dataproc_cluster AND protoPayload.methodName=google.cloud.dataproc.v1.JobController.SubmitJob" --project=${PROJECT_ID}
      Replace ${PROJECT_ID} with your actual project ID.
    • Search the logs for the principalEmail field within authenticationInfo. This indicates the identity used to submit the job.

Method 2: Dataproc Job History Server

  • If you have enabled the Dataproc Persistent History Server, you can view the job details, including the submitter.
    1. Go to the Dataproc Jobs section in the Google Cloud console.
    2. Click on the Job ID.
    3. Click on the "YARN Timeline Server" link.
    4. Navigate to the "Flow Activity" tab.
    5. This will display the job details, including the submitter name under the "Submitted by" column.

Method 3: Audit Logs

  • If Cloud Audit Logs are enabled in your project, they can be used to find who started a Dataproc job.
    1. Go to the Cloud Audit Logs section in the Google Cloud console.
    2. Filter the logs by the dataproc.jobs.submit event type.
    3. This will display the logs for Dataproc job submissions, including the user or service account that submitted the job.

Additional Notes:

  • The availability of submitter information may vary depending on the method used and the version of Dataproc.
  • Ensure you have the necessary permissions to view logs and audit information.
  • Be aware of the time frame you are investigating, as logs are retained for a specific period.
  • There might be a slight delay in log data appearing in the Logs Explorer or Audit Logs.
  • If the submitter information is not readily available, you might have to contact your Google Cloud administrator for further assistance.

Perfect. Thank you for your help.