Jul 11, 2022


X Protocol team is using a new technique to build metaverse scenarios, which is called Tiny CUDA Neural Networks. At the same time, this technique will optimize game lags.

What is Tiny CUDA Neural Networks?

This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning fast “fully fused” multi-layer perceptron (technical paper), a versatile multiresolution hash encoding (technical paper), as well as support for various other input encodings, losses, and optimizers.


Fully fused networks vs. TensorFlow v2.5.0 w/ XLA. Measured on 64 (solid line) and 128 (dashed line) neurons wide multi-layer perceptrons on an RTX 3090. Generated by benchmarks/bench_ours.cu and benchmarks/bench_tensorflow.py using data/config_oneblob.json.


Tiny CUDA neural networks have a simple C++/CUDA API:

#include <tiny-cuda-nn/common.h>

// Configure the model

nlohmann::json config = {

{“loss”, {

{“otype”, “L2”}


{“optimizer”, {

{“otype”, “Adam”},

{“learning_rate”, 1e-3},


{“encoding”, {

{“otype”, “HashGrid”},

{“n_levels”, 16},

{“n_features_per_level”, 2},

{“log2_hashmap_size”, 19},

{“base_resolution”, 16},

{“per_level_scale”, 2.0},


{“network”, {

{“otype”, “FullyFusedMLP”},

{“activation”, “ReLU”},

{“output_activation”, “None”},

{“n_neurons”, 64},

{“n_hidden_layers”, 2},



using namespace tcnn;

auto model = create_from_config(n_input_dims, n_output_dims, config);

// Train the model

GPUMatrix<float> training_batch_inputs(n_input_dims, batch_size);

GPUMatrix<float> training_batch_targets(n_output_dims, batch_size);

for (int i = 0; i < n_training_steps; ++i) {

generate_training_batch(&training_batch_inputs, &training_batch_targets); // ← your code

float loss;

model.trainer->training_step(training_batch_inputs, training_batch_targets, &loss);

std::cout << “iteration=” << i << “ loss=” << loss << std::endl;


// Use the model

GPUMatrix<float> inference_inputs(n_input_dims, batch_size);

generate_inputs(&inference_inputs); // ← your code

GPUMatrix<float> inference_outputs(n_output_dims, batch_size);

model.network->inference(inference_inputs, inference_outputs);

Example: learning a 2D image

We provide a sample application where an image function (x,y) -> (R,G,B) is learned. It can be run via

tiny-cuda-nn/build$ ./mlp_learning_an_image ../data/images/albert.jpg ../data/config_hash.json

producing an image every 1000 training steps. Each 1000 steps should take roughly 0.42 seconds with the default configuration on an RTX 3090.

With this technique, all nodes (users) can upload photos, and the algorithm will continue to learn and calculate, so as to combine these photos into a new 3D object, For example: these photos can be of different angle such as the front, side, front, back, top and bottom of a building, and eventually a complete three-dimensional building will be obtained.

The rapid development of artificial intelligence technology contributes to the research. By using artificial intelligence, which is inseparable from neural networks, the X Metaverse scenes will be continuously refined and evolved, allowing all users to participate in this building process.

Stay tuned! We believe that this technique will bring you a better Metaverse experience!

