Santa also happened to pass by the Techlabs this winter, and we are very happy to announce that we have two new high performance GPUs that are ready to go on Nebula.
The RTX Pro 6000 GPUs support:
– Bigger models like GPT-OSS:120B;
– More models in parallel, up to 5 small models running in parallel on the one GPU;
– Better inference performance from an average 30 tokens per second to 100 tokens per second.
Curious? Log in to Nebula and try the models out on the new GPUs by selecting the ones that have “FAST” in the name.