You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Every WarpForth program begins with header directives prefixed by \!. The header is required because Forth's stack-passing convention doesn't provide typed parameter declarations for the kernel interface. It's also used to capture information about the kernel that is better specified in a declarative manner rather than with Forth semantics, such as what shared memory buffers are used by the kernel.
Kernel Declaration
\! kernel main
Required. Must appear first. Names the GPU kernel entry point.
Parameters
\! param DATA i64[256] \ array of 256 i64 → memref<256xi64>
\! param N i64 \ scalar i64
\! param WEIGHTS f64[128] \ array of 128 f64 → memref<128xf64>
\! param SCALE f64 \ scalar f64 (bitcast to i64 on stack)
Array parameters become memref arguments. Using the name as a word pushes the base address.
Scalar parameters become value arguments. Using the name as a word pushes the value.
f64 scalars are bitcast to i64 when pushed to the stack; use F-prefixed words to operate on them.
Declares GPU shared memory. Using the name as a word pushes its base address. Access with S@/S! (i64) or SF@/SF! (f64). Cannot be referenced inside word definitions.
Literals
Integer Literals
Plain numbers are parsed as i64:
42-10255
Float Literals
Numbers containing . or e/E are parsed as f64 and stored on the stack as i64 bit patterns:
3.14-2.01.0e-51e3
Use F-prefixed words (F+, F*, etc.) to operate on float values.
Stack Operations
Word
Stack Effect
Description
DUP
( a -- a a )
Duplicate top
DROP
( a -- )
Discard top
SWAP
( a b -- b a )
Swap top two
OVER
( a b -- a b a )
Copy second to top
ROT
( a b c -- b c a )
Rotate third to top
NIP
( a b -- b )
Drop second
TUCK
( a b -- b a b )
Copy top below second
PICK
( xn ... x0 n -- xn ... x0 xn )
Copy nth item to top
ROLL
( xn ... x0 n -- xn-1 ... x0 xn )
Move nth item to top
Arithmetic
Integer Arithmetic
Word
Stack Effect
Description
+
( a b -- a+b )
Add
-
( a b -- a-b )
Subtract
*
( a b -- a*b )
Multiply
/
( a b -- a/b )
Divide
MOD
( a b -- a%b )
Modulo
Float Arithmetic
Word
Stack Effect
Description
F+
( a b -- a+b )
Float add
F-
( a b -- a-b )
Float subtract
F*
( a b -- a*b )
Float multiply
F/
( a b -- a/b )
Float divide
Float Math Intrinsics
Word
Stack Effect
Description
FEXP
( a -- exp(a) )
Exponential
FSQRT
( a -- sqrt(a) )
Square root
FLOG
( a -- log(a) )
Natural logarithm
FABS
`( a --
a
FNEG
( a -- -a )
Negate
FMAX
( a b -- max(a,b) )
Maximum
FMIN
( a b -- min(a,b) )
Minimum
Bitwise Operations
Word
Stack Effect
Description
AND
( a b -- a&b )
Bitwise AND
OR
( a b -- a|b )
Bitwise OR
XOR
( a b -- a^b )
Bitwise XOR
NOT
( a -- ~a )
Bitwise NOT
LSHIFT
( a n -- a<<n )
Left shift
RSHIFT
( a n -- a>>n )
Right shift
Comparison
Integer Comparison
All comparisons push 1 (true) or 0 (false).
Word
Stack Effect
Description
=
( a b -- flag )
Equal
<
( a b -- flag )
Less than
>
( a b -- flag )
Greater than
<>
( a b -- flag )
Not equal
<=
( a b -- flag )
Less or equal
>=
( a b -- flag )
Greater or equal
0=
( a -- flag )
Equal to zero
Float Comparison
Word
Stack Effect
Description
F=
( a b -- flag )
Float equal
F<
( a b -- flag )
Float less than
F>
( a b -- flag )
Float greater than
F<>
( a b -- flag )
Float not equal
F<=
( a b -- flag )
Float less or equal
F>=
( a b -- flag )
Float greater or equal
Type Conversion
Word
Stack Effect
Description
S>F
( n -- f )
Integer to float (i64 → f64 bit pattern)
F>S
( f -- n )
Float to integer (f64 bit pattern → i64)
Memory Access
Address Arithmetic
Word
Stack Effect
Description
CELLS
( n -- n*8 )
Convert cell index to byte offset (8 bytes per cell)
Global Memory (i64)
Word
Stack Effect
Description
@
( addr -- value )
Load i64 from global memory
!
( value addr -- )
Store i64 to global memory
Global Memory (f64)
Word
Stack Effect
Description
F@
( addr -- value )
Load f64 from global memory (as i64 bit pattern)
F!
( value addr -- )
Store f64 to global memory (from i64 bit pattern)
Shared Memory (i64)
Word
Stack Effect
Description
S@
( addr -- value )
Load i64 from shared memory
S!
( value addr -- )
Store i64 to shared memory
Shared Memory (f64)
Word
Stack Effect
Description
SF@
( addr -- value )
Load f64 from shared memory (as i64 bit pattern)
SF!
( value addr -- )
Store f64 to shared memory (from i64 bit pattern)
Reduced-Width Memory
These words load/store narrower types, converting to/from the stack's native i64.
Integer types — load sign-extends to i64, store truncates from i64:
Word
Width
Memory
Description
I8@ / I8!
8-bit
Global
Load/store i8
SI8@ / SI8!
8-bit
Shared
Load/store i8 (shared)
I16@ / I16!
16-bit
Global
Load/store i16
SI16@ / SI16!
16-bit
Shared
Load/store i16 (shared)
I32@ / I32!
32-bit
Global
Load/store i32
SI32@ / SI32!
32-bit
Shared
Load/store i32 (shared)
Float types — load extends to f64 then bitcasts to i64, store bitcasts i64 to f64 then truncates:
Word
Width
Memory
Description
HF@ / HF!
16-bit
Global
Load/store f16
SHF@ / SHF!
16-bit
Shared
Load/store f16 (shared)
BF@ / BF!
16-bit
Global
Load/store bf16
SBF@ / SBF!
16-bit
Shared
Load/store bf16 (shared)
F32@ / F32!
32-bit
Global
Load/store f32
SF32@ / SF32!
32-bit
Shared
Load/store f32 (shared)
Control Flow
Conditionals
condition IF\ executed when condition is nonzeroTHEN
condition IF\ true branchELSE\ false branchTHEN
Post-Test Loop
BEGIN\ loop body
condition UNTIL\ exits when condition is nonzero
Pre-Test Loop
BEGIN condition WHILE\ loop bodyREPEAT
Counted Loop
limit start DO\ loop body — I is the current indexLOOP
limit start DO\ loop body
n +LOOP\ increment index by n instead of 1
Word
Description
I
Current loop index (innermost loop)
J
Index of next outer loop
K
Index of second outer loop
LEAVE
Exit the innermost loop immediately
UNLOOP
Discard loop parameters before EXIT
EXIT
Return from the current word
User-Defined Words
: square DUP * ;
: add3 + + ;
Local Variables
: dot-product { a-addr b-addr n -- }0
n 0DOI CELLS a-addr + @
I CELLS b-addr + @
* +
LOOP
;
{ name1 name2 ... -- } at the start of a word definition binds read-only locals. Values are popped from the stack in reverse name order. Locals work across all control flow structures and map directly to GPU registers.