Skip to content

CLI improvement to TensorParallelism/02_matmul_tp/matmul_tp_big.cu #17

@dkennetzoracle

Description

@dkennetzoracle

The code could benefit from the following changes:

A CLI with the following abstractions:

#define M 4096  // Rows of A, Rows of C
#define N 8192  // Columns of B, Columns of C
#define K 1024  // Columns of A, Rows of B
#define NGPUS 2 // Number of GPUs to use for computation
  • variable sizes of M, N, K
  • number of GPUs to use
  • error checking for matrix dimensions from CLI values
  • device detection to determine if number of GPUs selected is supported (IE 4 is picked, but you only have 2)
  • boolean of whether to run host matmul or not, and if not - don't assert correctness or measure cpu timing. Should be pulled into a function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions