Skip to content

Conversation

@t3t5u
Copy link
Contributor

@t3t5u t3t5u commented Jun 18, 2025

Changes:

  • Handle auth_method & json_keyfile properly.
    • Added support for authorized_user to auth_method.
    • Fixed json_keyfile to use LocalFile instead of String to support content notation.
  • Support for project & destination_project.
    • Fixed to use project explicitly.
    • Fixed to use destination_project overall.

Confirmation script:

  • Confirmation for all of the following combinations:
    • type:
      • bigquery_java
      • bigquery
    • auth_method:
      • authorized_user
      • service_account
    • destination_project:
      • systemn-playground
      • test-by-sano
    • mode:
      • append
      • append_direct
      • replace
      • delete_in_advance
      • merge
run.sh
#!/bin/sh
cd `dirname ${0}`
DIR=`basename ${0} .sh`
rm -rf ${DIR}
mkdir ${DIR}
sh ../embulk-output-bigquery/rebuild.sh
sh ../embulk-output-bigquery_java/rebuild.sh
LANGUAGES='
java
ruby
'
AUTH_METHODS='
authorized_user
service_account
'
DESTINATION_PROJECTS='
systemn-playground
test-by-sano
'
MODES='
append
append_direct
replace
delete_in_advance
merge
'
DATASET=test_by_sano
TABLE=test
for LANGUAGE in ${LANGUAGES}; do
for AUTH_METHOD in ${AUTH_METHODS}; do
for DESTINATION_PROJECT in ${DESTINATION_PROJECTS}; do
for MODE in ${MODES}; do
if [ "${LANGUAGE}" = "java" ]; then
TYPE=bigquery_java
else
TYPE=bigquery
fi
RUN=${DIR}/${LANGUAGE}_${AUTH_METHOD}_${DESTINATION_PROJECT}_${MODE}
echo "\n##### ${RUN} #####\n"
> ${RUN}.log
sh ../gcloud.sh config configurations activate ${DESTINATION_PROJECT} 2>&1 | tee -a ${RUN}.log
sh ../bq.sh --location=US rm --project_id=${DESTINATION_PROJECT} --recursive=true --force=true --dataset=true ${DATASET} 2>&1 | tee -a ${RUN}.log
sh ../bq.sh --location=US show --project_id=${DESTINATION_PROJECT} --format=prettyjson --dataset=true ${DATASET} 2>&1 | tee -a ${RUN}.log
cat << EOD > .config.yml
in:
  type: file
  path_prefix: .${TABLE}.jsonl
  parser:
    type: jsonl
    columns:
    - {name: key_long, type: long}
    - {name: key_string, type: string}
    - {name: test_boolean, type: boolean}
    - {name: test_long, type: long}
    - {name: test_double, type: double}
    - {name: test_string, type: string}
    - {name: test_timestamp, type: timestamp}
    - {name: test_json, type: json}
out:
  type: ${TYPE}
  auth_method: ${AUTH_METHOD}
  json_keyfile: ${AUTH_METHOD}.json
  project: systemn-playground
  destination_project: ${DESTINATION_PROJECT}
  dataset: ${DATASET}
  table: ${TABLE}
  auto_create_dataset: true
  auto_create_table: true
  location: US
  compression: GZIP
  source_format: NEWLINE_DELIMITED_JSON
  mode: ${MODE}
  merge_keys: [key_long, key_string]
EOD
QUERY='SELECT * FROM `'"${DESTINATION_PROJECT}"'`.`'"${DATASET}"'`.`'"${TABLE}"'` ORDER BY `key_long`, `key_string`'
for N in `seq 2`; do
cp -f ${TABLE}_${N}.jsonl .${TABLE}.jsonl
embulk run .config.yml 2>&1 | tee -a ${RUN}.log
sh ../bq.sh --location=US show --project_id=${DESTINATION_PROJECT} --format=prettyjson --dataset=true ${DATASET} 2>&1 | tee -a ${RUN}.log
sh ../bq.sh --location=US show --project_id=${DESTINATION_PROJECT} --format=prettyjson ${DATASET}.${TABLE} 2>&1 | tee -a ${RUN}.log
sh ../bq.sh --location=US query --use_legacy_sql=false "${QUERY}" 2>&1 | tee -a ${RUN}.log
cat ${RUN}.log | grep -e '^+-' -e '^| ' > ${RUN}.txt
done
rm -f .${TABLE}.jsonl .config.yml
done
done
done
done

Comment on lines +75 to +76
public final String destinationProject; // FIXME: should be private
public final String destinationDataset; // FIXME: should be private
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell me why these properties can't be private ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables are referenced from outside of this class to output to logs.
I think it would be better to refactor the handling of global variables overall.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Thanks. That seems better to do separately.

Comment on lines 104 to 108
private String getProjectIdFromJsonKeyfile() {
return new JSONObject(
new JSONTokener(new ByteArrayInputStream(task.getJsonKeyfile().getContent())))
.getString("project_id");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to have error handling like return null when exception caught.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.
1c6215f

this.schema = schema;
this.dataset = task.getDataset();
project = task.getProject().orElse(getProjectIdFromJsonKeyfile());
destinationProject = task.getDestinationProject().orElse(project);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If destinationProject could be null here, there should be error throwing maybe?

Copy link
Contributor Author

@t3t5u t3t5u Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed project to never be null, so destinationProject will also never be null.

@NamedPython NamedPython merged commit 440a01b into master Jul 22, 2025
1 check passed
@NamedPython NamedPython deleted the destination_project branch July 22, 2025 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants