Add AVX512 operations by dong0321 · Pull Request #13 · bosilca/ompi

dong0321 · 2019-10-21T17:34:50Z

Status : Checked all operations with all types.

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

bosilca

Overall looking good. Do the cleanup and then let's look at it again.

You should run some performance tests with your local_reduction tester, and see how this compares with the default implementation.

bosilca · 2019-10-21T18:05:42Z

ompi/mca/op/intel_avx_op/op_avx_component.c

+ */
+static int avx_component_open(void)
+{
+    opal_output(ompi_op_base_framework.framework_output, "avx component open");


bosilca · 2019-10-21T18:05:56Z

ompi/mca/op/intel_avx_op/op_avx_component.c

+       Note that if this function returns non-OMPI_SUCCESS, then this
+       component won't even be shown in ompi_info output (which is
+       probably not what you want).
+    */


Where is the check ?

bosilca · 2019-10-21T18:06:16Z

ompi/mca/op/intel_avx_op/op_avx_component.c

+ */
+static int avx_component_close(void)
+{
+    opal_output(ompi_op_base_framework.framework_output, "avx component close");


In fact remove all opal_output that are for debugging.

bosilca · 2019-10-21T18:06:29Z

ompi/mca/op/intel_avx_op/op_avx_component.c

+    return OMPI_SUCCESS;
+}
+
+static char *avx_component_version;


What is this ?

bosilca · 2019-10-21T18:08:28Z

ompi/mca/op/intel_avx_op/op_avx_component.c

+       types are supported.  This allows you to change the behavior of
+       this component at run-time (by setting these MCA params at
+       run-time), simulating different kinds of hardware. */
+    mca_op_avx_component.hardware_available = true;


This is not the final version right ?

bosilca · 2019-10-21T18:09:09Z

ompi/mca/op/intel_avx_op/op_avx_component.c

+            module->opm_3buff_fns[i] = ompi_op_avx_3buff_functions[OMPI_OP_BASE_FORTRAN_BXOR][i];
+        }
+        break;
+    case OMPI_OP_BASE_FORTRAN_LAND:


No logical AND ?

bosilca · 2019-10-21T18:09:59Z

ompi/mca/op/intel_avx_op/op_avx_functions.c

+            struct ompi_op_base_module_1_0_0_t *module) \
+{                                                                      \
+    int step;                                                          \
+    switch(type_size) {                                                \


I'm sure there is a better way to decide the step.

bosilca · 2019-10-21T18:10:55Z

ompi/mca/op/intel_avx_op/op_avx_functions.c

+    int size = *count/step; \
+    int i; \
+    int round = size*64; \
+    for (i = 0; i < round; i+=64) { \


Are these operations supposed to apply on cache-line aligned elements ?

bosilca · 2019-10-21T18:11:32Z

ompi/mca/op/intel_avx_op/op_avx_functions.c

+            struct ompi_op_base_module_1_0_0_t *module) \
+{                                                                      \
+    int step;                                                          \
+    switch(type_size) {                                                \


get this out and make a macro.

bosilca · 2019-10-21T18:11:47Z

ompi/mca/op/intel_avx_op/op_avx_functions.c

+    int size = *count/step; \
+    int i; \
+    int round = size*64; \
+    for (i = 0; i < round; i+=64) { \


This should also be a macro.

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

bosilca · 2019-11-01T00:46:37Z

ompi/mca/op/intel_avx_op/op_avx_component.c

+static int avx_component_init_query(bool enable_progress_threads,
+                                        bool enable_mpi_thread_multiple)
+{
+    if (mca_op_avx_component.hardware_available && !enable_mpi_thread_multiple) {


Why are you disabling the component if threading is enabled ?

The float add operator is now validated. Trying to figure out the fastest implementation. Add default avx512 flags to compile seamlesly Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

During configure it detects if the target architecture is x86_64 to enable itself. Then during query it detects processor capabilities using cpuid and disable itself if AVX2 support is not found. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

dong0321 · 2019-11-10T20:39:31Z

ompi/mca/op/avx/op_avx_functions.c

    for (i = 0; i < round; i+=64) { \
-        __m512i vecA =  _mm512_loadu_si512((in+i));\
-        __m512i vecB =  _mm512_loadu_si512((out+i));\
+        __m512i vecA =  _mm512_loadu_si512((_in+i));\


Definition for _mm512_load_si512:
__m512i _mm512_load_si512 (void const* mem_addr)

I am not sure we need to do type convert. The types_per_step will be different based on the type, I tried to calculate the correct step for each type, but it seems not working.

For correctness purpose I will remove type convertion now. If we really need to specific type. I will fix later.

This is plain ugly and highly non portable. I suggest you put back the typecast and add not 64 to i but step. In fact you should model this on the float version below, without using the mask for the last set of operations but instead falling back to a lower vector op and then a Duff device.

…device code Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

bosilca · 2020-02-18T21:19:52Z

This has been moved to OMPI master in 7419

add AVX512 op

f7a900b

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

bosilca reviewed Oct 21, 2019

View reviewed changes

dong0321 added 3 commits October 25, 2019 16:57

clean the code and fixed module OBJ_RETAIN bug

db633a2

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

minor clean

721e352

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

update test example

fbc8305

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

bosilca reviewed Nov 1, 2019

View reviewed changes

Small improvement to the AVX code, for small cases.

65644d7

The float add operator is now validated. Trying to figure out the fastest implementation. Add default avx512 flags to compile seamlesly Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

bosilca force-pushed the avx512_reduction branch from ca1b5de to 65644d7 Compare November 6, 2019 00:37

bosilca and others added 5 commits November 7, 2019 18:39

Fix a typos in the detection of rdtsc.

d279b49

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

Integrate the AVX component into the OMPI build system.

8a28dee

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

Cleanups on the initialization path.

615692f

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

Correctness check for all types and all operations

85ee66e

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

dong0321 commented Nov 10, 2019

View reviewed changes

dong0321 added 3 commits November 15, 2019 18:56

Correctness check for all types and all operations with mm256 & duff …

7c1c063

…device code Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

add special case for mul int8

e60af1e

Update all 3buff functions

54a9ea0

Signed-off-by: dongzhong <zhongdong0321@hotmail.com>

bosilca closed this Feb 18, 2020

Conversation

dong0321 commented Oct 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bosilca left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dong0321 Nov 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bosilca commented Feb 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dong0321 commented Oct 21, 2019 •

edited

Loading

dong0321 Nov 10, 2019 •

edited

Loading

bosilca commented Feb 18, 2020 •

edited

Loading