Skip to content

Add AVX512 operations#13

Closed
dong0321 wants to merge 13 commits intobosilca:masterfrom
dong0321:avx512_reduction
Closed

Add AVX512 operations#13
dong0321 wants to merge 13 commits intobosilca:masterfrom
dong0321:avx512_reduction

Conversation

@dong0321
Copy link

@dong0321 dong0321 commented Oct 21, 2019

Status : Checked all operations with all types.

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Copy link
Owner

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good. Do the cleanup and then let's look at it again.

You should run some performance tests with your local_reduction tester, and see how this compares with the default implementation.

*/
static int avx_component_open(void)
{
opal_output(ompi_op_base_framework.framework_output, "avx component open");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove.

Note that if this function returns non-OMPI_SUCCESS, then this
component won't even be shown in ompi_info output (which is
probably not what you want).
*/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the check ?

*/
static int avx_component_close(void)
{
opal_output(ompi_op_base_framework.framework_output, "avx component close");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact remove all opal_output that are for debugging.

return OMPI_SUCCESS;
}

static char *avx_component_version;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this ?

types are supported. This allows you to change the behavior of
this component at run-time (by setting these MCA params at
run-time), simulating different kinds of hardware. */
mca_op_avx_component.hardware_available = true;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the final version right ?

module->opm_3buff_fns[i] = ompi_op_avx_3buff_functions[OMPI_OP_BASE_FORTRAN_BXOR][i];
}
break;
case OMPI_OP_BASE_FORTRAN_LAND:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No logical AND ?

struct ompi_op_base_module_1_0_0_t *module) \
{ \
int step; \
switch(type_size) { \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there is a better way to decide the step.

int size = *count/step; \
int i; \
int round = size*64; \
for (i = 0; i < round; i+=64) { \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these operations supposed to apply on cache-line aligned elements ?

struct ompi_op_base_module_1_0_0_t *module) \
{ \
int step; \
switch(type_size) { \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get this out and make a macro.

int size = *count/step; \
int i; \
int round = size*64; \
for (i = 0; i < round; i+=64) { \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be a macro.

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
static int avx_component_init_query(bool enable_progress_threads,
bool enable_mpi_thread_multiple)
{
if (mca_op_avx_component.hardware_available && !enable_mpi_thread_multiple) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you disabling the component if threading is enabled ?

The float add operator is now validated.
Trying to figure out the fastest implementation.

Add default avx512 flags to compile seamlesly

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
bosilca and others added 5 commits November 7, 2019 18:39
During configure it detects if the target architecture is x86_64 to
enable itself. Then during query it detects processor capabilities using
cpuid and disable itself if AVX2 support is not found.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
for (i = 0; i < round; i+=64) { \
__m512i vecA = _mm512_loadu_si512((in+i));\
__m512i vecB = _mm512_loadu_si512((out+i));\
__m512i vecA = _mm512_loadu_si512((_in+i));\
Copy link
Author

@dong0321 dong0321 Nov 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definition for _mm512_load_si512:
__m512i _mm512_load_si512 (void const* mem_addr)

I am not sure we need to do type convert. The types_per_step will be different based on the type, I tried to calculate the correct step for each type, but it seems not working.

For correctness purpose I will remove type convertion now. If we really need to specific type. I will fix later.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is plain ugly and highly non portable. I suggest you put back the typecast and add not 64 to i but step. In fact you should model this on the float version below, without using the mask for the last set of operations but instead falling back to a lower vector op and then a Duff device.

…device code

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
@bosilca bosilca closed this Feb 18, 2020
@bosilca
Copy link
Owner

bosilca commented Feb 18, 2020

This has been moved to OMPI master in 7419

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants