The Pixomatic 3D software rasterizer supports the following features:
Pixel shading
- Three fully-programmable stages, supporting two textures
- Supported stage operations: modulate, add, add signed, lerp, multiply-add, select, and dot3
- Supported stage inputs: texture, diffuse, specular, stage constant, global constant, zero, temp, and current
- Stage inputs may be inverted and/or replicated
- Stage output may be scaled and/or directed to temp
- Independently programmable RGB and alpha channels for each stage, except for dot3
- Additional specular additive stage
- Flat or Gouraud diffuse color modulation
- Linearly-interpolated fog
- Full standard set of independent source and destination frame buffer blends, the results of which can be combined via add, subtract, reverse subtract, min, or max; alternatively, user-defined blending code may be welded directly into the pixel pipeline
- All pixel calculations are 32-bit (8 bits per color component)
Other per-pixel
- 16- or 24-bit z buffering, with seven compare modes and z scaling
- Full set of alpha tests against alpharef value
- Stencil test, with seven compare modes
- Stencil rendering, with six z-pass modes and six z-fail modes, and support for single-pass stencil shadows
- Fast early-out z and/or stencil visibility determination (useful for bounding box pretests
of complex objects)
- Pixel counting
- Subpixel correct rasterization
- Features work in any combination, except that stencil rendering can't be combined with pixel rendering
- Z bias
Texture mapping
- 8888 and palettized formats
- Wrap and clamp border modes
- Texture sizes up to 4096x4096
- Subtexel-correct
- Perspective-correct
- Point, bilinear, and fast two- and four-point averaging filters
- Per-triangle mipmapping
- Mipmap level-of-detail bias
- Texture transforms
- Projected and unprojected screen texgen
- Projected textures, independently enabled and specified for each texture stage
Primitives
- Trilists, tristrips, trifans, quadlists, polygons, pointsprites, linelists, and linestrips
- Pointsprites can be automatically scaled by depth
Drawing interfaces
- Indexed streams, non-indexed streams, and begin/end primitive interfaces
- Up to 8 streams, all fully configurable
- Custom API designed for maximum efficiency while allowing easy porting
from DX and OpenGL
Other 3D features
- Transformation, projection, and scaling to the viewport
- Per-triangle callback
- Per-vertex callback allows user-defined vertex shading, such as texgen and lighting
- Top-down or bottom-up screen coordinate system
- Backface culling
- Vertex fog (device space, camera space, and w camera space calculations supported)
- Dynamically-settable viewport
- Homogenous clipping
- Wireframe
- Billboarded text, drawn through the full rasterization pipeline
- 4X bilinear filtered antialiasing
- 2X or 4X bilinear filtered or point sampled zoom
- Bilinear filtered or point sampled stretch blt
Other features
- Optimized blt to front buffer, detecting whether GDI or DirectDraw is
faster
- Support for 32-, 24-, and 16-bit front buffers, with dithering to 16-bit
(565 and 1555)
- Fill and clear support
- Fills and blts can alpha-blend
Optimization
- The pixel-shading pipeline is optimized MMX code, compiled on the fly,
with enregistration of virtually all interpolants and intermediate values
- The geometry and vertex-shading pipelines are MMX, SSE, and 3DNow
optimized
- SSE and 3DNow are automatically used if present, but are not required
Performance
It's difficult to characterize rasterizer performance, because there are so
many possible configurations and data sets, but the following numbers provide
a ballpark idea of Pixomatic's performance.
|
Fill Rate
|
|
PIII/733
|
24.54 MPix/sec
|
|
Athlon/1.8
|
24.64 MPix/sec
|
|
P4/2.2
|
54.78 MPix/sec
|
|
P4/3.3
|
86.54 MPix/sec
|
|
Fill rate test case: multitexture with two 256x256 textures,
Gouraud shading, 16-bit z buffer with z compare and z write enabled,
drawing one quad (two triangles) to fill a 640x480, 32-bpp window.
|
|
Triangle rate |
|
PIII/733 |
1.39 MTri/sec |
|
Athlon/1.8 |
3.17 MTri/sec |
|
P4/2.2 |
3.16 MTri/sec |
|
P4/3.3 |
4.86 MTri/sec |
|
Triangle rate test case: one 256x256 texture with texcoords generated by
PIXO_TEXGEN_PROJECTED_XYZ1, Gouraud shading, 16-bit z buffer with z
compare and z write enabled, drawing a model consisting of 77 indexed
meshes that contain 27,296 triangles in total, 12,841 of which are
front-facing with a non-zero area; culled triangles are counted for
purposes of triangle rate calculations. Target window is 640x480, 32-bpp;
bounding box of model covers about 5 percent of the window; only
rendering time is counting, not the blt to the screen; no triangles are
clipped.
|
|
Transform & project rate |
|
PIII/733 |
5.04 MTri/sec |
|
Athlon/1.8 |
12.82 MTri/sec |
|
P4/2.2 |
15.33 MTri/sec |
|
P4/3.3 |
23.33 MTri/sec |
|
Transform & project rate test case: same as triangle rate test, except that
the model is moved far enough away from the viewer so that all triangles
are culled and no rasterization is performed.
|
|
Quake II timedemo at 640x480, 32-bpp, point sampling |
|
PIII/733 |
28.4 frames/second |
|
Athlon/1.8 |
37.5 frames/second |
|
P4/2.2 |
67.2 frames/second |
|
P4/3.3 |
108.8 frames/second |
|
Includes the blt to the screen, as well as all drawing time.
|
|
Quake II timedemo at 640x480, 32-bpp, bilinear base filtering, fast
two-point lightmap filtering |
|
PIII/733 |
20.3 frames/second |
|
Athlon/1.8 |
26.7 frames/second |
|
P4/2.2 |
47.4 frames/second |
|
P4/3.3 |
75.1 frames/second |
|
Includes the blt to the screen, as well as all drawing time.
|
|
Quake II timedemo at 640x480, 32-bpp, full bilinear filtering |
|
PIII/733 |
17.7 frames/second |
|
Athlon/1.8 |
25.1 frames/second |
|
P4/2.2 |
40.4 frames/second |
|
P4/3.3 |
63.5 frames/second |
|
Includes the blt to the screen, as well as all drawing time.
|
Platforms used for performance tests:
- PIII/733MHz: 830 MB/sec memory bandwidth and 2.2 GB/sec bandwidth to 256KB L2 cache
- Athlon/1.8GHz (3DNow and SSE): 1.3GB/sec memory bandwidth and 3.5 GB/sec bandwidth to 256KB L2 cache
- P4/2.2GHz: 2.0 GB/sec memory bandwidth and 9.3 GB/sec bandwidth to 512KB L2 cache
- P4/3.3GHz: 3.0 GB/sec memory bandwidth and 13.9 GB/sec bandwidth to 512KB L2 cache
Size
- The Pixomatic DLL is 255KB in size. Pixomatic additionally performs one 4KB allocation; all other memory, such as pixel buffers, z buffers, and textures, is allocated by the calling application.
Requirements
- An x86-compatible processor with MMX
- Microsoft Windows or Linux
DX-compatible wrapper
In addition to the native API, a DX9-compatible wrapper is available for Pixomatic. This wrapper supports a broad subset of Pixomatic’s functionality, including two textures, three stages, mipmapping, full sets of stage ops and frame buffer blends, and much more. However, since Pixomatic does not support Vertex Shaders, Pixel Shaders, Cube Maps, or DXT textures, the wrapper does not support those features.
The wrapper has been tested against the D3D samples and in several games and other 3D applications (Medal of Honor, Backyard Baseball 2005, Backyard Hockey, Backyard Skateboarding, Logitech and more) but it has not been run through WHQL tests. If you do run into any implementation/compatibility issues or bugs, please contact mitchs@radgametools.com.
The wrapper does not require any versions of DX to be installed, since it can use GDI to blt to the screen. Alternatively, if DDraw3 is available, the wrapper can use that to lock the primary buffer, in which case it tries both GDI blts and direct copying to the primary buffer, and uses whichever is faster.
Typically, using the wrapper will require adding/changing about four lines of code, in order to implement the following:
1. Add a prototype for the Pixomatic Direct3D Wrapper create function: PixoDirectCreate9().
2. Replace the Direct3DCreate9() call with PixoDirectCreate9().
3. Link with dx9pixo.lib or dx9pixo_debug.lib.
The PixoDirect3DCreate9() call will return a IDirect3D9 interface pointer which you should then be able to use just like a standard Direct3DDevice9 object.
There is also a Direct3D 8.1 wrapper.
Pixomatic features that are not exposed through the wrapper can be accessed either by calling Pixomatic directly or, in some cases, by making slight modifications to the wrapper, source for which is available. The most significant such case is that while the Pixomatic software renderer supports bilinear texture filtering, the DX texture filter state is ignored and point filtering is always used in the DX-compatible wrapper. This is because most games just turn on bilinear filtering and leave it on, which would often result in unacceptable performance levels for Pixomatic, especially on low-end machines. If you do want to run Pixomatic with bilinear filtering enabled, you can simply call Pixomatic directly to enable bilinear when desired, or you can add code to the wrapper to turn on bilinear (or 2- or 4-point) filtering as needed.
|