Skip to content

Add explicit page cache control (hot, cold, lukewarm) to presto-velox benchmarks#265

Draft
kingcrimsontianyu wants to merge 6 commits intorapidsai:mainfrom
kingcrimsontianyu:improve-cache-dropping
Draft

Add explicit page cache control (hot, cold, lukewarm) to presto-velox benchmarks#265
kingcrimsontianyu wants to merge 6 commits intorapidsai:mainfrom
kingcrimsontianyu:improve-cache-dropping

Conversation

@kingcrimsontianyu
Copy link

No description provided.

"sh",
"-c",
"free; echo drop_caches; echo 3 > /proc/sys/vm/drop_caches; free",
"free; echo drop_caches; sync; echo 3 > /proc/sys/vm/drop_caches; free",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to flush the dirty pages to disk using sync.

Copy link
Contributor

@paul-aiyedun paul-aiyedun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes overall look good to me. I had a few suggestions for simplifying the cache clearing interface and getting the benchmark dataset path.

Comment on lines +14 to +16
@functools.cache
def _libc():
return ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can avoid needing this function by using os.posix_fadvise().

return ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)


def _drop_file_cache(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we expect the data files to consistently be in directories. So, we should be able to simplify the interface to one that accepts a directory Path and clears the cache for all files in the directory. Also, with this implementation in place, we can remove _drop_system_cache.

--skip-drop-cache Skip dropping system caches before each benchmark query (dropped by default).
--skip-drop-cache Skip dropping system caches before each benchmark query. This option is only effective
when --cache-mode is not specified.
-c, --cache-mode Cache mode for benchmark queries. Controls page cache state between query iterations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current default does lukewarm + hot. I think we would want to keep this default behavior. It probably makes sense to have 3 modes lukewarm_and_hot (default), cold, and none. The none mode effectively replaces --skip-drop-cache , so we can remove that option.



@pytest.fixture(scope="session")
def benchmark_data_dir(request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark dataset directory should be one level lower. See repository_path in

if bool(schema_name):
# If a schema name is specified, get the scale factor from the metadata file located
# where the table are fetching data from.
table = presto_cursor.execute(f"SHOW TABLES in {schema_name}").fetchone()[0]
location = get_table_external_location(schema_name, table, presto_cursor)
repository_path = os.path.dirname(location)
else:
# default assumed location for metadata file.
repository_path = get_abs_file_path(
__file__, f"../../../common/testing/integration_tests/data/{benchmark_type}"
)
for an example of how we get this path.

Please also do the same for Spark Gluten (see

dataset_dir = get_dataset_dir(benchmark_type, dataset_name)
).



def cache_setup_per_iteration(cache_mode, data_dir):
if cache_mode == "cold":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do more than one iteration for cold run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants