Add explicit page cache control (hot, cold, lukewarm) to presto-velox benchmarks by kingcrimsontianyu · Pull Request #265 · rapidsai/velox-testing

kingcrimsontianyu · 2026-03-11T15:50:25Z

No description provided.

kingcrimsontianyu · 2026-03-11T21:33:24Z

common/testing/performance_benchmarks/cache_utils.py

        "sh",
        "-c",
-        "free; echo drop_caches; echo 3 > /proc/sys/vm/drop_caches; free",
+        "free; echo drop_caches; sync; echo 3 > /proc/sys/vm/drop_caches; free",


Need to flush the dirty pages to disk using sync.

paul-aiyedun

Changes overall look good to me. I had a few suggestions for simplifying the cache clearing interface and getting the benchmark dataset path.

paul-aiyedun · 2026-03-11T22:11:41Z

common/testing/performance_benchmarks/cache_utils.py

+@functools.cache
+def _libc():
+    return ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)


I believe we can avoid needing this function by using os.posix_fadvise().

paul-aiyedun · 2026-03-11T22:18:10Z

common/testing/performance_benchmarks/cache_utils.py

+    return ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)
+
+
+def _drop_file_cache(


I believe we expect the data files to consistently be in directories. So, we should be able to simplify the interface to one that accepts a directory Path and clears the cache for all files in the directory. Also, with this implementation in place, we can remove _drop_system_cache.

paul-aiyedun · 2026-03-11T22:24:29Z

presto/scripts/run_benchmark.sh

-    --skip-drop-cache       Skip dropping system caches before each benchmark query (dropped by default).
+    --skip-drop-cache       Skip dropping system caches before each benchmark query. This option is only effective
+                            when --cache-mode is not specified.
+    -c, --cache-mode        Cache mode for benchmark queries. Controls page cache state between query iterations.


The current default does lukewarm + hot. I think we would want to keep this default behavior. It probably makes sense to have 3 modes lukewarm_and_hot (default), cold, and none. The none mode effectively replaces --skip-drop-cache , so we can remove that option.

paul-aiyedun · 2026-03-11T22:49:30Z

presto/testing/performance_benchmarks/common_fixtures.py



+@pytest.fixture(scope="session")
+def benchmark_data_dir(request):


The benchmark dataset directory should be one level lower. See repository_path in

velox-testing/presto/testing/common/test_utils.py

Lines 46 to 56 in f8d9835

if bool(schema_name):

# If a schema name is specified, get the scale factor from the metadata file located

# where the table are fetching data from.

table = presto_cursor.execute(f"SHOW TABLES in {schema_name}").fetchone()[0]

location = get_table_external_location(schema_name, table, presto_cursor)

repository_path = os.path.dirname(location)

else:

# default assumed location for metadata file.

repository_path = get_abs_file_path(

__file__, f"../../../common/testing/integration_tests/data/{benchmark_type}"

)

for an example of how we get this path.

Please also do the same for Spark Gluten (see

velox-testing/spark_gluten/testing/common/test_utils.py

Line 14 in f8d9835

dataset_dir = get_dataset_dir(benchmark_type, dataset_name)

).

paul-aiyedun · 2026-03-11T23:09:10Z

common/testing/performance_benchmarks/cache_utils.py

+
+
+def cache_setup_per_iteration(cache_mode, data_dir):
+    if cache_mode == "cold":


Do we need to do more than one iteration for cold run?

kingcrimsontianyu added 6 commits March 11, 2026 15:49

Add per file cache dropping method

3bc954d

Update bash script. Add lukewarm and none modes

762aa07

Update bash script

ac740a5

Add cold cache mode

01a17d7

Add hot cache mode

79f64be

Add back the legacy skip-drop-cache option for spark gluten

d09649b

kingcrimsontianyu commented Mar 11, 2026

View reviewed changes

paul-aiyedun reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add explicit page cache control (hot, cold, lukewarm) to presto-velox benchmarks#265

Add explicit page cache control (hot, cold, lukewarm) to presto-velox benchmarks#265
kingcrimsontianyu wants to merge 6 commits intorapidsai:mainfrom
kingcrimsontianyu:improve-cache-dropping

kingcrimsontianyu commented Mar 11, 2026

Uh oh!

kingcrimsontianyu Mar 11, 2026

Uh oh!

paul-aiyedun left a comment

Uh oh!

paul-aiyedun Mar 11, 2026

Uh oh!

paul-aiyedun Mar 11, 2026

Uh oh!

paul-aiyedun Mar 11, 2026

Uh oh!

paul-aiyedun Mar 11, 2026

Uh oh!

paul-aiyedun Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)


		def _drop_file_cache(



		@pytest.fixture(scope="session")
		def benchmark_data_dir(request):

	if bool(schema_name):
	# If a schema name is specified, get the scale factor from the metadata file located
	# where the table are fetching data from.
	table = presto_cursor.execute(f"SHOW TABLES in {schema_name}").fetchone()[0]
	location = get_table_external_location(schema_name, table, presto_cursor)
	repository_path = os.path.dirname(location)
	else:
	# default assumed location for metadata file.
	repository_path = get_abs_file_path(
	__file__, f"../../../common/testing/integration_tests/data/{benchmark_type}"
	)



		def cache_setup_per_iteration(cache_mode, data_dir):
		if cache_mode == "cold":

Conversation

kingcrimsontianyu commented Mar 11, 2026

Uh oh!

kingcrimsontianyu Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun left a comment

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants