Installation

To quickly get started using SH4ZAM within your preexisting KallistiOS environment, simply use the git pull command from within the folder containing the kos-ports git repository, which is typically installed to /opt/toolchains/dc/kos-ports.

Next, cd into the sh4zam folder and run make install to build and install the latest version of SH4ZAM as a statically linked library. Once this succeeds, your KOS application can now easily begin leveraging SH4ZAM to make big performance gainz.

Examples

At this point it is advised that you check out and attempt to run the examples provided with sh4zam, which get installed to the examples/sh4zam directory within kos-ports. You should be able to simply type make to build any example. You can quickly verify the integrity of your SH4ZAM install by playing with Bruce's Balls.

Using Within a Project

In order to use SH4ZAM within your KOS-based Dreamcast project, you must do two things:

Include the header file(s) within your code.
- #include <sh4zam/shz_sh4zam.h> globally includes the whole SH4ZAM C API.
- #include <sh4zam/shz_sh4zam.hpp> globally includes the whole SH4ZAM C++ API.
Link to the statically linked library, libsh4zam.a within your build system.

Linking to SH4ZAM

Unless you link to SH4ZAM, your project will compile correctly, yet fail to link.

Makefile

To link to SH4ZAM using a standard, KallistiOS-style Makefile, simply pass the -lsh4zam flag to the compiler along with any other libraries you are using, which typically looks something like this:

kos-cc -o $(TARGET) $(OBJS) -lsh4zam

CMake

To link to SH4ZAM using cmake as your build system, you must add the following line to your CMakeLists.txt:

target_link_libraries(MyProject PUBLIC -lsh4zam)

Interop with Existing APIs

One of the most common and important use-cases of SH4ZAM is for it to be pulled into an existing codebase, with its own vector math types and abstractions, with the goal of using SH4ZAM to accelerate the back-end implementation without breaking the API and needing to change client code.

SH4ZAM goes to great lengths to not only prioritize making such interoperability as seamless, ergonomic, and noninvasive as possible, but it also aims to add such niceties with a zero-overhead rule by rigorously validating that any conversions or adapting of SH4ZAM's types to existing types will get 100% cleanly optimized away.

SH4ZAMification of an existing codebase usually involves taking the following steps in order to leverage the gainz it has to offer without breaking an existing codebase or math interface:

Convert incoming arguments from existing math types to SH4ZAM's math types.
Forward arguments to the equivalent accelerated SH4ZAM routine.
Convert the return value from SH4ZAM's math types back to the existing math types.

Adapting C Codebases

The following provides an example of adapting SH4ZAM to accelerate the math behind an existing, idiomatic C API with support for multiple targets, such as CGLM or raymath:

// Only introduce SH4ZAM for supported targets within cross-platform codebases.
#ifdef GAINZ
     // Include all of SH4ZAM, using the .h header for the C APIs.
#    include <sh4zam/shz_sh4zam.h>
#endif
 
// The existing 4D vector structure used within the codebase.
typedef struct {
    // Type-compatible layout with SH4ZAM's shz_vec4_t.
    float value[4];
} Vec4;
 
// 4D vector addition operation provided by the existing math API.
Vec4 AddVec4(Vec4 lhs, Vec4 rhs) {
#ifdef GAINZ
    /* 1) Convert incoming arguments to equivalent SH4ZAM types using shz_vec4_from().
       2) Call SH4ZAM's equivalent routine, shz_vec4_add(), with the converted arguments.
       3) Convert SH4ZAM's return value back to existing API type with shz_vec4_to(). */
    return shz_vec4_to(Vec4, shz_vec4_add(shz_vec4_from(lhs), shz_vec4_from(rhs)));
#else
    // The original, unaccelerated path is here for platforms without SH4ZAM.
    return (Vec4){{ lhs.value[0] + rhs.value[0], lhs.value[1] + rhs.value[1],
                    lhs.value[2] + rhs.value[2], lhs.value[3] + rhs.value[3] }};
#endif
}

Adapting C++ Codebases

The following provides an example of adapting SH4ZAM to accelerate the math behind an existing, idiomatic C++ API with support for multiple targets, such as GLM or the Simulant engine:

// Only introduce SH4ZAM for supported targets within cross-platform codebases.
#ifdef GAINZ
     // Include all of SH4ZAM, using the .hpp header for the C++ APIs.
#    include <sh4zam/shz_sh4zam.hpp>
#endif
 
namespace Math {
    // The existing 4D vector structure used within the codebase.
    struct Vec4 {
        // Type-compatible layout with SH4ZAM's shz::vec4.
        float value[4];
        // Overloaded operator providing 4D vector addition operation.
        friend Vec4 operator+(Vec4 lhs, Vec4 rhs);
    };
}
 
using namespace Math;
 
// Implementation of 4D vector addition operator.
Vec4 Vec4::operator+(Vec4 lhs, Vec4 rhs) {
#ifdef GAINZ
    /* 1) Convert incoming arguments to SH4ZAM types using shz::vec4::from().
       2) Call SH4ZAM's equivalent addition operator with the converted arguments.
       3) Convert SH4ZAM's return value back with shz::vec4::to<>(). */
    return (shz::vec4::from(lhs) + shz::vec4::from(rhs)).to<Vec4>();
#else
    // The original, unaccelerated path is here for platforms without SH4ZAM.
    return Vec4({ lhs.value[0] + rhs.value[0], lhs.value[1] + rhs.value[1],
                  lhs.value[2] + rhs.value[2], lhs.value[3] + rhs.value[3] });
#endif
}

Matrix Transforms

Some of the largest, easiest-to-exploit gainz for a project lie within its matrix multiplication and transformation code. It is important to understand how to SH4ZAmify such code optimally in order to achieve the highest gainz.

The following code snippet was taken from a real-world application which was using the CGLM library to create a model-view matrix:

void updateModel(mat4 model, const Transform* transform) {
    glm_mat4_identity(model);
    glm_translate(model, transform->pos);
    glm_rotate_x(model, transform->xRot);
    glm_rotate_y(model, transform->yRot);
    glm_rotate_z(model, transform->zRot);
    glm_scale(model, transform->scale);
}

The most straightforward way to accelerate such a routine is to simply do a direct, 1:1 translation between the CGLM and SH4ZAM APIs:

void updateModel(shz_mat4x4_t* model, const Transform* transform) {
    shz_mat4x4_init_identity(model);
    shz_mat4x4_translate(model, transform->pos.x, transform->pos.y, transform->pos.z);
    shz_mat4x4_rotate_x(model, transform->xRot);
    shz_mat4x4_rotate_y(model, transform->yRot);
    shz_mat4x4_rotate_z(model, transform->zRot);
    shz_mat4x4_scale(model, transform->scale, transform->scale, transform->scale);
}

While this will work, it's still leaving a MASSIVE amount of gainz on the table. The following will perform far better:

void updateModel(shz_mat4x4_t* model, const Transform* transform) {
    /* Don't waste time initializing to the identity matrix just to overwrite it.
       Initialize directly to a compound rotation matrix. */
    shz_xmtrx_init_rotation_xyz(transform->xRot, transform->yRot, transform->zRot);
    // Only "apply" scale to the inner 3x3 submatrix with scaling components.
    shz_xmtrx_apply_scale(transform->scale, transform->scale, transform->scale);
    // Directly set the translational component values of XMTRX.
    shz_xmtrx_set_translation(transform->pos.x, transform->pos.y, transform->pos.z);
    // Only write to our in-memory matrix after we're done operating within XMTRX.
    shz_xmtrx_store_4x4(model);
}

We are now leveraging the following:

All matrix operations are performed within XMTRX registers, rather than within memory.
We directly initialize XMTRX into the first transform, rather than identity.
We use apply operations for when a transform only needs to be applied over a submatrix.
We directly set the translational component rather than applying it as a transform.

Table of Contents