Add explicit page cache control (hot, cold, lukewarm) to presto-velox benchmarks#265
Add explicit page cache control (hot, cold, lukewarm) to presto-velox benchmarks#265kingcrimsontianyu wants to merge 6 commits intorapidsai:mainfrom
Conversation
| "sh", | ||
| "-c", | ||
| "free; echo drop_caches; echo 3 > /proc/sys/vm/drop_caches; free", | ||
| "free; echo drop_caches; sync; echo 3 > /proc/sys/vm/drop_caches; free", |
There was a problem hiding this comment.
Need to flush the dirty pages to disk using sync.
paul-aiyedun
left a comment
There was a problem hiding this comment.
Changes overall look good to me. I had a few suggestions for simplifying the cache clearing interface and getting the benchmark dataset path.
| @functools.cache | ||
| def _libc(): | ||
| return ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True) |
There was a problem hiding this comment.
I believe we can avoid needing this function by using os.posix_fadvise().
| return ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True) | ||
|
|
||
|
|
||
| def _drop_file_cache( |
There was a problem hiding this comment.
I believe we expect the data files to consistently be in directories. So, we should be able to simplify the interface to one that accepts a directory Path and clears the cache for all files in the directory. Also, with this implementation in place, we can remove _drop_system_cache.
| --skip-drop-cache Skip dropping system caches before each benchmark query (dropped by default). | ||
| --skip-drop-cache Skip dropping system caches before each benchmark query. This option is only effective | ||
| when --cache-mode is not specified. | ||
| -c, --cache-mode Cache mode for benchmark queries. Controls page cache state between query iterations. |
There was a problem hiding this comment.
The current default does lukewarm + hot. I think we would want to keep this default behavior. It probably makes sense to have 3 modes lukewarm_and_hot (default), cold, and none. The none mode effectively replaces --skip-drop-cache , so we can remove that option.
|
|
||
|
|
||
| @pytest.fixture(scope="session") | ||
| def benchmark_data_dir(request): |
There was a problem hiding this comment.
The benchmark dataset directory should be one level lower. See repository_path in
velox-testing/presto/testing/common/test_utils.py
Lines 46 to 56 in f8d9835
Please also do the same for Spark Gluten (see
).|
|
||
|
|
||
| def cache_setup_per_iteration(cache_mode, data_dir): | ||
| if cache_mode == "cold": |
There was a problem hiding this comment.
Do we need to do more than one iteration for cold run?
No description provided.