Skip to content

Add static-dispatch versions#13

Open
digikar99 wants to merge 2 commits intoLisp-Stat:masterfrom
digikar99:static-dispatch
Open

Add static-dispatch versions#13
digikar99 wants to merge 2 commits intoLisp-Stat:masterfrom
digikar99:static-dispatch

Conversation

@digikar99
Copy link

This is still of limited use though.

  1. It does not emit compiler notes when static-dispatch fails. This will require changes in static-dispatch itself.
  2. It still needs a fair amount of declarations and inlining to actually get much benefits out of this approach.

In the upcoming commits, I will attempt the second task. For example, the following shows a generic-+ in its disassembly.

(disassemble
 (lambda (x y)
   (declare (optimize speed)
            (type (simple-array single-float 1) x y))
   (num-utils:e2+ x y)))

To get rid of it requires one to inline the ref inside mapping-array. But after doing both of these, we get a 2.5x performance boost on SBCL 2.3.4:

CL-USER> (let ((x (aops:rand* 'single-float 1000))
               (y (aops:rand* 'single-float 1000)))
          (declare (notinline num-utils:e2+))
          (time (loop repeat 10000 do (num-utils:e2+ x y))))
Evaluation took:
  0.252 seconds of real time
  0.250618 seconds of total run time (0.250618 user, 0.000000 system)
  99.60% CPU
  553,358,300 processor cycles
  40,972,272 bytes consed
NIL
CL-USER> (let ((x (aops:rand* 'single-float 1000))
               (y (aops:rand* 'single-float 1000)))
           (declare (optimize speed)
                    (type (simple-array single-float 1) x y))
           (time (loop repeat 10000 do (num-utils:e2+ x y))))
Evaluation took:
  0.104 seconds of real time
  0.103522 seconds of total run time (0.103522 user, 0.000000 system)
  100.00% CPU
  228,570,630 processor cycles
  40,938,752 bytes consed
NIL

This can be optimized further if we allow for an optional out argument to the e2+ function.

num-utils.asd Outdated
:license "Same as NUM-UTILS -- this is part of the NUM-UTILS library."
#+asdf-unicode :encoding #+asdf-unicode :utf-8
:depends-on (#:num-utils
:depends-on (#:num-utils/static-dispatch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this temporary? It looks like this replaces all the num-util tests with the ones from static-dispatch. Ideally we could keep two test systems until static-dispatch is ready to be folded into the main system.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, my bad! I will fix it.

@digikar99
Copy link
Author

I think I can relate to tpapp now. Optimizing Common Lisp - even with SBCL - is not trivial. There are many gotchas. For instance, take sequence-minimum and suppose we also inline it:

(declaim (inline sequence-minimum))
(defun sequence-minimum (x)
  "Return the maximum value in the sequence X"
  (check-type x alexandria:proper-sequence)
  (cond ((listp x) (apply 'min x))
    ((vectorp x) (reduce #'min x))))

First of all, as of SBCL 2.3.4, reduce does not inline. So, we might attempt to write it using a loop, expecting that types will be propagated appropriately during inlining - as they should.

(defun sequence-minimum (x)
  "Return the minimum value in the sequence X"
  (check-type x alexandria:proper-sequence)
  (cond ((listp x)
         (cond ((null x)
                (error "Empty sequence"))
               ((null (cdr x))
                (car x))
               (t
                (let ((min (car x)))
                  (dolist (elt x)
                    (setf min (min min elt)))
                  min))))
        ((vectorp x)
         (case (length x)
           (0 (error "Empty sequence"))
           (1 (row-major-aref x 0))
           (t (let ((min (row-major-aref x 0)))
                (dotimes (index (1- (array-total-size x)))
                  (let ((elt (row-major-aref x (1+ index))))
                    (setf min (if (< elt min) elt min))))
                min))))))

However, even this is not enough, a generic-< is left in the disassembly of

(disassemble
 (lambda (x)
   (declare (optimize speed)
            (type (simple-array single-float 1) x))
   (num-utils:sequence-minimum x)))

To actually avoid a generic-<, completely - and we get a 10-15x speed boost if we do this - we need something like the following:

(defun sequence-minimum (x)
  "Return the minimum value in the sequence X"
  (check-type x alexandria:proper-sequence)
  (etypecase x
    (list
     (cond ((null x)
            (error "Empty sequence"))
           ((null (cdr x))
            (car x))
           (t
            (let ((min (car x)))
              (dolist (elt x)
                (setf min (min min elt)))
              min))))
    (vector
     (case (length x)
       (0 (error "Empty sequence"))
       (1 (row-major-aref x 0))
       (t (let ((min (row-major-aref x 0)))
            (declare (type (simple-array single-float 1) x)
                     (type single-float min))     ; <== THIS TYPE DECLARATION
            (dotimes (index (1- (array-total-size x)))
              (let ((elt (row-major-aref x (1+ index))))
                (setf min (if (< elt min) elt min))))
            min))))))

But that makes the code non-generic to vector element types :/. In this particular case, polymorphic-functions too are only helpful if we write a polymorph-compiler-macro, which actually should not be necessary at all. To actually the generic versions we need better type inference.

So, I'm unsure how much benefit static-dispatch alone could get us. May be a 1.5-2x, but so long as generic-+, generic-< etc remain, we are losing out on a 10x performance boost.

@Symbolics
Copy link
Collaborator

@digikar99 I'm so sorry for the delay in this. Life gets in the way sometimes. What is your current opinion on static dispatch for these operations? I recently completed some additional work for vector operations (basically completed coverage of all operators) and have started looking at how to optimise them.

@digikar99
Copy link
Author

No worries!

I have also went down a few rabbit holes in the meanwhile.

I see two ways forward.

  1. Use coalton. It has inlining now. Particularly, it has a better notion of function types, it has typeclasses (which I understand are similar to interfaces). There is an overhead of learning the ML type system basics, but that's once a lifetime thing. The other friction might be lisp-coalton interop, which exists, but I find it less than ideal. I'd guess coalton is also a fair bit tested and definitely more actively developed than my second suggestion below. One downside is it can be tricky to express (i) array dimensions (ii) simple-array is a subtype of array.

  2. Use peltadot which I hacked together over the past year or two from the polymorphic-functions and extensible-compound-types I was working on previously. This reimplements lisp type system, enables type-based dispatch, provides compiler-macro-expansion time type propagation, optional static and default dynamic dispatch, and also has traits (which are again similar to interfaces and typeclasses). I find the integration with standard CL to be neat. The downside is it is restricted to standard CL function types, which are less powerful than what coalton can express. The upside, it is possible to express simple-array as being a subtype of array.

Illustration with peltadot:

(defpackage :peltadot-user
  (:use :peltadot)
  (:local-nicknames (:traits :peltadot-traits-library)))

(in-package :peltadot-user)

(defpolymorph sequence-minimum ((s sequence)) t
  (case (traits:len s)
    (0 (error "Empty sequence"))
    (1 (traits:seq-ref s 0))
    (t (let ((min (traits:seq-ref s 0)))
         (dotimes (index (1- (traits:len s)))
           (setf min (min min (traits:seq-ref s index))))
         min))))

(defun type-parameter-p (s) (member s '(<t>)))
(pushnew 'type-parameter-p peltadot:*parametric-type-symbol-predicates*)

(defpolymorph sequence-minimum ((s (simple-array <t> 1))) <t>
  (case (traits:len s)
    (0 (error "Empty sequence"))
    (1 (traits:seq-ref s 0))
    (t (pflet ((min (traits:seq-ref s 0))
               (elt (traits:seq-ref s 0)))
         (declare (type <t> min elt))
         (dotimes (index (1- (traits:len s)))
           (setf elt (traits:seq-ref s index))
           (setf min (min min elt)))
         min))))

(defun list-min (list)
  (declare (type list list)
           (optimize speed))
  (sequence-minimum list))

(defun array-sf-min (x)
  (declare (type (simple-array single-float 1) x)
           (optimize speed))
  (sequence-minimum x))

(defun array-df-min (x)
  (declare (type (simple-array double-float 1) x)
           (optimize speed))
  (sequence-minimum x))

@snunez1
Copy link

snunez1 commented Mar 4, 2026 via email

@digikar99
Copy link
Author

Yes, I think we had some reddit discussions about this.

Forgoing multi implementation support and focusing solely on, say SBCL, eliminates the need for compiler-macro-expansion time type propagation and dispatch. The alternative (that I'm aware of) is writing deftransforms, lots of thems.

However, after looking at the above sequence-minimum example, I realized there were two other problems: function types and typeclasses. Coalton handles them well. Although it presents other problems mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants