Skip to content

tici: integrate with TiCI import into interfaces | tidb-test=ef726505f262e22c6ec9c8951273bb5ee20abefb#62820

Merged
ti-chi-bot[bot] merged 15 commits intopingcap:feature/ftsfrom
JaySon-Huang:jayson_import_adapt
Oct 10, 2025
Merged

tici: integrate with TiCI import into interfaces | tidb-test=ef726505f262e22c6ec9c8951273bb5ee20abefb#62820
ti-chi-bot[bot] merged 15 commits intopingcap:feature/ftsfrom
JaySon-Huang:jayson_import_adapt

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Aug 5, 2025

What problem does this PR solve?

Issue Number: ref #61759

Problem Summary:

What changed and how does it work?

  1. If the IMPORT INTO is failed, the user may run the IMPORT INTO again. So table_id + index_id can not consider as a unique_id. Add task_id to the request so TiCI can use tidb_task_id + table_id + index_id as a unique_id.
  • TiCI need this unique id to represent a "Job" in the TiCI side. TiCI rely on the TiCI job status to decide whether to cleanup the cloud storage path, update the compaction strategy in the TiCI side, and get the IMPORT INTO job status from tidb-server
  1. Update tici.proto
  • Use enum class ErrorCode for the status field in response.
  • Rename MarkPartitionUploadFinished -> FinishImportPartitionUpload. Add some fields (tidb_task_id and so on) in the request so that TiCI can retrieve the TiCI job.
  • Rename MarkTableUploadFinished -> FinishImportIndexUpload. Add tidb_task_id so that TiCI can retrieve the TiCI job.
  • Rename GetCloudStoragePath -> GetImportStoragePrefix
    • This API now support represent an IMPORT INTO job with multiple IndexIDs
    • This API now return the remote storage prefix instead of a remote storage filename. The remote storage prefix is kept in the DataWriter
    • For each partition, TiDB generate a uuid with the remote storage prefix and upload the file to the remote storage. The logic is written in DataWriter::InitTICIFileWriter
  • Generate using kvproto scripts: [WIP] tici proto kvproto#1335. But now keep a copy of tici.pb.go
    inside tidb-server code. Will move the proto into kvproto repo later.
  1. Refactor the codes on ticiWriteGroup
  • ticiWriteGroup is shared for multiple region_job so it is not correct to init TICIFileWriters inside ticiWriteGroup.
  • Move the TiCIFileWriter as a local variable in region_job
  • Only write the table data once when the IMPORT INTO job is running on a table with multiple fulltext index
  1. Fix a typo MetaServiceEelectionKey -> MetaServiceElectionKey

Check List

Tests

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 5, 2025
@tiprow
Copy link

tiprow bot commented Aug 5, 2025

Hi @JaySon-Huang. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link

codecov bot commented Aug 5, 2025

Codecov Report

❌ Patch coverage is 41.36364% with 129 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.5503%. Comparing base (f1da580) to head (d7e26d3).
⚠️ Report is 6 commits behind head on feature/fts.

Additional details and impacted files
@@                 Coverage Diff                 @@
##           feature/fts     #62820        +/-   ##
===================================================
- Coverage      74.4116%   71.5503%   -2.8613%     
===================================================
  Files             1889       1875        -14     
  Lines           508453     542144     +33691     
===================================================
+ Hits            378348     387906      +9558     
- Misses          106590     130905     +24315     
+ Partials         23515      23333       -182     
Flag Coverage Δ
integration 43.3960% <0.4545%> (-4.7287%) ⬇️
unit 70.4449% <41.3636%> (-1.5521%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.8700% <ø> (ø)
parser ∅ <ø> (∅)
br 46.4636% <ø> (-16.8456%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JaySon-Huang JaySon-Huang force-pushed the jayson_import_adapt branch 3 times, most recently from 40e6daa to 6f4e3a1 Compare August 10, 2025 07:23
@ti-chi-bot ti-chi-bot bot added component/dumpling This is related to Dumpling of TiDB. sig/planner SIG: Planner labels Aug 10, 2025
@JaySon-Huang JaySon-Huang changed the title [WIP] tici: Adapt to tici proto tici: integrate with TiCI import into interfaces Aug 10, 2025
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 10, 2025
@JaySon-Huang JaySon-Huang force-pushed the jayson_import_adapt branch 3 times, most recently from a621fcb to 7d800d1 Compare August 11, 2025 07:20
@OliverS929
Copy link
Contributor

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2025
Copy link
Contributor Author

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Update the proto according to the design doc after discussion

Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Sep 23, 2025

@D3Hunter here is the comment for running IMPORT INTO test. However, the script of transforming the JSON to CSV input is in the tici repo

# download the JSON input if not exists in your env
wget -q https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz
gzip -d hdfs-logs-multitenants.json.gz

# Transform the JSON to CSV input. The csv file size is about 3.6GB
ASSET_DIR=./ python3 ci/read_hdfs.py --out hdfs-logs-multitenants.csv hdfs-log
# Upload the CSV files to cloud storage. Here use minio for example
mc cp ./hdfs-logs-multitenants.csv k79/jayson/import-src/hdfs-logs-multitenants.csv

# Create the table and fulltext index
DROP TABLE IF EXISTS test.hdfs_log;
CREATE TABLE IF NOT EXISTS test.hdfs_log (id BIGINT AUTO_INCREMENT,timestamp BIGINT,severity_text VARCHAR(50),body TEXT,tenant_id INT,PRIMARY KEY (tenant_id, id));
ALTER TABLE test.hdfs_log ADD FULLTEXT INDEX ft_idx(severity_text, body);

# Run import into
IMPORT INTO test.hdfs_log (`timestamp`, severity_text, body, tenant_id)
  FROM 's3://jayson/import-src/hdfs-logs-multitenants.csv?access-key=minioadmin&secret-access-key=minioadmin&endpoint=http://10.2.12.79:9000'
  WITH CLOUD_STORAGE_URI='s3://jayson/import-sort?access-key=minioadmin&secret-access-key=minioadmin&endpoint=http://10.2.12.79:9000'

@JaySon-Huang JaySon-Huang changed the title tici: integrate with TiCI import into interfaces | tidb-test=3b2adda7436d5e712bffaa859f679c5ca370811b tici: integrate with TiCI import into interfaces Sep 23, 2025
@JaySon-Huang
Copy link
Contributor Author

/test mysql-test

@tiprow
Copy link

tiprow bot commented Sep 23, 2025

@JaySon-Huang: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test mysql-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@JaySon-Huang JaySon-Huang changed the title tici: integrate with TiCI import into interfaces tici: integrate with TiCI import into interfaces | tidb-test=ef726505f262e22c6ec9c8951273bb5ee20abefb Sep 23, 2025
@JaySon-Huang
Copy link
Contributor Author

/test mysql-test

@tiprow
Copy link

tiprow bot commented Sep 23, 2025

@JaySon-Huang: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test mysql-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@OliverS929 OliverS929 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a few comments remaining. The rest looks good to me.

if err != nil {
return nil, err
}
ectdEndpoints, err := util.ParseHostPortAddr(tidbCfg.Path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving getEtcdClient() out of the current file and into another file (e.g., tici_manager_client.go) might be a good idea, since it is not directly related to writing index data to TiCI. cc @Lloyd-Pottiger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it to be resolve later

@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 24, 2025

@OliverS929: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

There are still a few comments remaining. The rest looks good to me.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Oct 9, 2025
@JaySon-Huang
Copy link
Contributor Author

/test unit-test

@tiprow
Copy link

tiprow bot commented Oct 9, 2025

@JaySon-Huang: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 9, 2025

@JaySon-Huang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test-ddlv1 6f4e3a1 link true /test pull-unit-test-ddlv1
pull-br-integration-test 6f4e3a1 link true /test pull-br-integration-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JaySon-Huang
Copy link
Contributor Author

/test unit-test

@tiprow
Copy link

tiprow bot commented Oct 9, 2025

@JaySon-Huang: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added the lgtm label Oct 10, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, OliverS929, windtalker

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Oct 10, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 10, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-10-09 04:37:13.814593075 +0000 UTC m=+328622.845692393: ☑️ agreed by D3Hunter.
  • 2025-10-10 01:43:16.171272632 +0000 UTC m=+404585.202371960: ☑️ agreed by windtalker.

@JaySon-Huang
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 10, 2025
@ti-chi-bot ti-chi-bot bot merged commit a462f31 into pingcap:feature/fts Oct 10, 2025
22 checks passed
@JaySon-Huang JaySon-Huang deleted the jayson_import_adapt branch October 10, 2025 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved component/dumpling This is related to Dumpling of TiDB. lgtm release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants