Moore Threads Unveils S4000 GPU: A Major Improvement over Previous Models

Moore Threads has revealed its latest GPU, the S4000, which marks a significant improvement over its previous S2000 and S3000 models. The S4000 boasts superior performance, more memory capacity, and increased memory bandwidth. It also features the second generation of Moore Threads' Unified System Architecture (MUSA), unlike its predecessors. While the S4000 may not match up to Nvidia's latest offerings, it excels in memory capacity and bandwidth, making it suitable for AI and large language model workloads.

ADVERTISEMENT

Improved Performance and Architecture

Moore Threads has introduced its latest GPU, the S4000, which represents a major improvement over its previous S2000 and S3000 models. While specific details about the S4000 are still limited, it has been revealed that it offers over twice the FP32 performance and five times the INT8 performance compared to the S2000. Moreover, the S4000 boasts 50% more VRAM and upgraded memory bandwidth. It is also based on the second generation of Moore Threads' Unified System Architecture (MUSA) architecture, a significant leap forward from the first generation used in the S2000/S3000.

It is worth noting that, despite presented as first-generation MUSA, the S2000/S3000 may actually be second generation, given that Moore Threads refers to the S4000 as their third-generation model. However, more information is needed to confirm this.

Comparison to Nvidia Models

In terms of performance, the S4000 outperforms the 2018 model, but it lags behind Nvidia's Ampere and Ada Lovelace models released in 2020 and 2022 respectively. While the S4000 may be lacking in raw horsepower, it makes up for it with ample memory capacity and bandwidth. These features are particularly beneficial for AI and large language model workloads, which Moore Threads intends their flagship GPU to excel in.

GPU-to-GPU Data Capabilities and KUAE Intelligent Computing Center

Aside from the S4000 GPU, Moore Threads has also unveiled its KUAE Intelligent Computing Center. Described as a "full-stack solution integrating software and hardware," the center is centered around the powerful S4000 GPU. KUAE clusters, composed of MCCX D800 GPU servers, each housing eight S4000 cards, offer critical GPU-to-GPU data capabilities along with RDMA support. While the 240 GB/s data link and RDMA support of the S4000 may not match the bandwidth of NVLink, it still provides sufficient performance for a weaker GPU.

Furthermore, Moore Threads claims that KUAE supports mainstream large language models such as GPT and frameworks like DeepSpeed. Additionally, their MUSIFY tool allows the S4000 to be compatible with Nvidia GPUs' CUDA software ecosystem, eliminating the need to develop new software from scratch.