SH4ZAM! 0.1.0
Fast math library for the Sega Dreamcast's SH4 CPU
Loading...
Searching...
No Matches
Guide

Users Guide

The following guide will walk you through installing SH4ZAM, pulling it into your project, and leveraging it to make gainz.

Installation

To quickly get started using SH4ZAM within your preexisting KallistiOS environment, simply use the git pull command from within the folder containing the kos-ports git repository, which is typically installed to /opt/toolchains/dc/kos-ports.

Next, cd into the sh4zam folder and run make install to build and install the latest version of SH4ZAM as a statically linked library. Once this succeeds, your KOS application can now easily begin leveraging SH4ZAM to make big performance gainz.

Examples

At this point it is advised that you check out and attempt to run the examples provided with sh4zam, which get installed to the examples/sh4zam directory within kos-ports. You should be able to simply type make to build any example. You can quickly verify the integrity of your SH4ZAM install by playing with Bruce's Balls.

Using Within a Project

In order to use SH4ZAM within your KOS-based Dreamcast project, you must do two things:

  1. Include the header file(s) within your code.
  2. Link to the statically linked library, libsh4zam.a within your build system.

Linking to SH4ZAM

Unless you link to SH4ZAM, your project will compile correctly, yet fail to link.

Makefile

To link to SH4ZAM using a standard, KallistiOS-style Makefile, simply pass the -lsh4zam flag to the compiler along with any other libraries you are using, which typically looks something like this:

kos-cc -o $(TARGET) $(OBJS) -lsh4zam

CMake

To link to SH4ZAM using cmake as your build system, you must add the following line to your CMakeLists.txt:

target_link_libraries(MyProject PUBLIC -lsh4zam)

Interop with Existing APIs

One of the most common and important use-cases of SH4ZAM is for it to be pulled into an existing codebase, with its own vector math types and abstractions, with the goal of using SH4ZAM to accelerate the back-end implementation without breaking the API and needing to change client code.

SH4ZAM goes to great lengths to not only prioritize making such interoperability as seamless, ergonomic, and noninvasive as possible, but it also aims to add such niceties with a zero-overhead rule by rigorously validating that any conversions or adapting of SH4ZAM's types to existing types will get 100% cleanly optimized away.

SH4ZAMification of an existing codebase usually involves taking the following steps in order to leverage the gainz it has to offer without breaking an existing codebase or math interface:

  1. Convert incoming arguments from existing math types to SH4ZAM's math types.
  2. Forward arguments to the equivalent accelerated SH4ZAM routine.
  3. Convert the return value from SH4ZAM's math types back to the existing math types.

Adapting C Codebases

The following provides an example of adapting SH4ZAM to accelerate the math behind an existing, idiomatic C API with support for multiple targets, such as CGLM or raymath:

// Only introduce SH4ZAM for supported targets within cross-platform codebases.
#ifdef GAINZ
// Include all of SH4ZAM, using the .h header for the C APIs.
#endif
// The existing 4D vector structure used within the codebase.
typedef struct {
// Type-compatible layout with SH4ZAM's shz_vec4_t.
float value[4];
} Vec4;
// 4D vector addition operation provided by the existing math API.
Vec4 AddVec4(Vec4 lhs, Vec4 rhs) {
#ifdef GAINZ
/* 1) Convert incoming arguments to equivalent SH4ZAM types using shz_vec4_from().
2) Call SH4ZAM's equivalent routine, shz_vec4_add(), with the converted arguments.
3) Convert SH4ZAM's return value back to existing API type with shz_vec4_to(). */
#else
// The original, unaccelerated path is here for platforms without SH4ZAM.
return (Vec4){{ lhs.value[0] + rhs.value[0], lhs.value[1] + rhs.value[1],
lhs.value[2] + rhs.value[2], lhs.value[3] + rhs.value[3] }};
#endif
}
Aggregate include file for C API.
#define shz_vec4_from(value)
Converts the given value or expression to the equivalent 4D SH4ZAM vector value.
Definition shz_vector.h:673
#define shz_vec4_to(type, vector)
Converts the given 4D vector into a value of the given type.
Definition shz_vector.h:682
shz_vec4_t shz_vec4_add(shz_vec4_t vec1, shz_vec4_t vec2) SHZ_NOEXCEPT
Returns a 4D vector whose components are the sums of the given vectors' components.

Adapting C++ Codebases

The following provides an example of adapting SH4ZAM to accelerate the math behind an existing, idiomatic C++ API with support for multiple targets, such as GLM or the Simulant engine:

// Only introduce SH4ZAM for supported targets within cross-platform codebases.
#ifdef GAINZ
// Include all of SH4ZAM, using the .hpp header for the C++ APIs.
#endif
namespace Math {
// The existing 4D vector structure used within the codebase.
struct Vec4 {
// Type-compatible layout with SH4ZAM's shz::vec4.
float value[4];
// Overloaded operator providing 4D vector addition operation.
friend Vec4 operator+(Vec4 lhs, Vec4 rhs);
};
}
using namespace Math;
// Implementation of 4D vector addition operator.
Vec4 Vec4::operator+(Vec4 lhs, Vec4 rhs) {
#ifdef GAINZ
/* 1) Convert incoming arguments to SH4ZAM types using shz::vec4::from().
2) Call SH4ZAM's equivalent addition operator with the converted arguments.
3) Convert SH4ZAM's return value back with shz::vec4::to<>(). */
return (shz::vec4::from(lhs) + shz::vec4::from(rhs)).to<Vec4>();
#else
// The original, unaccelerated path is here for platforms without SH4ZAM.
return Vec4({ lhs.value[0] + rhs.value[0], lhs.value[1] + rhs.value[1],
lhs.value[2] + rhs.value[2], lhs.value[3] + rhs.value[3] });
#endif
}
Aggregate include file for C++ API.
static CppType from(const auto &raw) noexcept

Matrix Transforms

Some of the largest, easiest-to-exploit gainz for a project lie within its matrix multiplication and transformation code. It is important to understand how to SH4ZAmify such code optimally in order to achieve the highest gainz.

The following code snippet was taken from a real-world application which was using the CGLM library to create a model-view matrix:

void updateModel(mat4 model, const Transform* transform) {
glm_mat4_identity(model);
glm_translate(model, transform->pos);
glm_rotate_x(model, transform->xRot);
glm_rotate_y(model, transform->yRot);
glm_rotate_z(model, transform->zRot);
glm_scale(model, transform->scale);
}

The most straightforward way to accelerate such a routine is to simply do a direct, 1:1 translation between the CGLM and SH4ZAM APIs:

void updateModel(shz_mat4x4_t* model, const Transform* transform) {
shz_mat4x4_translate(model, transform->pos.x, transform->pos.y, transform->pos.z);
shz_mat4x4_rotate_x(model, transform->xRot);
shz_mat4x4_rotate_y(model, transform->yRot);
shz_mat4x4_rotate_z(model, transform->zRot);
shz_mat4x4_scale(model, transform->scale, transform->scale, transform->scale);
}
void shz_mat4x4_scale(shz_mat4x4_t *mat, float x, float y, float z) SHZ_NOEXCEPT
Multiplies and accumulates mat by a 3D scaling matrix with the given components.
void shz_mat4x4_rotate_y(shz_mat4x4_t *mat, float radians) SHZ_NOEXCEPT
Multiplies and accumulates mat by a 3D rotation matrix about the Y axis.
void shz_mat4x4_init_identity(shz_mat4x4_t *mat) SHZ_NOEXCEPT
Initializes the given matrix to the identity matrix as fast as possible.
void shz_mat4x4_translate(shz_mat4x4_t *mat, float x, float y, float z) SHZ_NOEXCEPT
Multiplies and accumulates mat by a 3D translation matrix with the given components.
void shz_mat4x4_rotate_z(shz_mat4x4_t *mat, float radians) SHZ_NOEXCEPT
Multiplies and accumulates mat by a 3D rotation matrix about the Z axis.
void shz_mat4x4_rotate_x(shz_mat4x4_t *mat, float radians) SHZ_NOEXCEPT
Multiplies and accumulates mat by a 3D rotation matrix about the X axis.
Structure representing a 4x4 column-major matrix.
Definition shz_matrix.h:73

While this will work, it's still leaving a MASSIVE amount of gainz on the table. The following will perform far better:

void updateModel(shz_mat4x4_t* model, const Transform* transform) {
/* Don't waste time initializing to the identity matrix just to overwrite it.
Initialize directly to a compound rotation matrix. */
shz_xmtrx_init_rotation_xyz(transform->xRot, transform->yRot, transform->zRot);
// Only "apply" scale to the inner 3x3 submatrix with scaling components.
shz_xmtrx_apply_scale(transform->scale, transform->scale, transform->scale);
// Directly set the translational component values of XMTRX.
shz_xmtrx_set_translation(transform->pos.x, transform->pos.y, transform->pos.z);
// Only write to our in-memory matrix after we're done operating within XMTRX.
}
void shz_xmtrx_store_4x4(shz_mat4x4_t *matrix) SHZ_NOEXCEPT
Stores the current values held within XMTRX into the given 4x4 matrix.
void shz_xmtrx_set_translation(float x, float y, float z) SHZ_NOEXCEPT
Sets only the translational components of XMTRX to the given values.
void shz_xmtrx_apply_scale(float x, float y, float z) SHZ_NOEXCEPT
Multiplies the values of the inner 3x3 matrix by the given 3D scaling terms.
void shz_xmtrx_init_rotation_xyz(float xAngle, float yAngle, float zAngle) SHZ_NOEXCEPT
Initializes XMTRX to be a 3D X-Y-Z rotation matrix, with the corresponding angles given in radians.

We are now leveraging the following:

  1. All matrix operations are performed within XMTRX registers, rather than within memory.
  2. We directly initialize XMTRX into the first transform, rather than identity.
  3. We use apply operations for when a transform only needs to be applied over a submatrix.
  4. We directly set the translational component rather than applying it as a transform.