Welcome to Hitech Semiconductor Co., Limited

BittWare GroqCard™ Accelerator

2024-02-20

BittWare GroqCard™ Accelerator

BittWare GroqCard™ Accelerator is a double-width PCIe form factor ML accelerator developed to integrate easily. The GroqWare™ suite implements a software-defined hardware approach, allowing easy deployment paths for PyTorch, TensorFlow, and ONNX-trained deep learning models.

The BittWare GroqCard Accelerator features scalability with nine RealScale™ chip-to-chip connections that guarantee the deployment of multiple cards as efficiently as one. Furthermore, an internal software-defined network delivers predictable, repeatable performance with no run-to-run variations.

GROQCHIP™ PROCESSOR

The fully deterministic GroqChip processor is the core of scalable performance. Built from the ground up to accelerate AI, ML, and HPC workloads, GroqChip reduces data movement for predictable low-latency performance, bottleneck-free. This standalone chip provides flexible integration into compute-intensive applications.

The architecture is much simpler than a GPU and is designed with a software-first focus, making it easier to program and providing predictable performance with lower latency.

GROQWARE™ SUITE

GroqWare Suite is a comprehensive and versatile software stack designed to accelerate a variety of HPC and ML workloads. Composed of Groq™ Compiler, Groq API, and Utilities, the suite eases deployment implementations with an open-source driver/runtime and support for industry-standard AI/ML frameworks.

GroqFlow™ Tool Chain (included in the GroqWare Suite) enables a single line of Pytorch or TensorFlow code to import and transform existing models through a fully automated toolchain to run on Groq hardware.

FEATURES

  • Fully deterministic processor
    Predictable and repeatable performance with no run-to-run variation
  • End-to-end on-chip protection
    Improves uptime and reliability with error-correction code (ECC) protection throughout the entire GroqChip™ data path
  • 230MB of on-die memory
    Large globally sharable SRAM for high-bandwidth, low-latency access to model parameters without the need for external memory
  • 9x RealScale chip-to-chip connectors
    Near-linear multi-server and multi-rack scalability without the need for external switches
  • Up to 80TBs on-die memory bandwidth
    Massive concurrency and data parallelism for bandwidth-sensitive applications
  • PCIe Gen4 x16 interface
    Up to 31.5GB/s of bi-directional bandwidth in an industry-standard interface for fast device and network connections

APPLICATIONS

  • Financial
  • Science and government
  • Generative AI
  • Industrial
  • Oil and gas

SPECIFICATIONS

  • Dual width, full height, 3/4 length PCI Express Gen4 x16 adapter form factor
  • Performance of up to 750 TOPs, 188 TFLOPs (INT8, FP16 at 900MHz)
  • Memory
    • 230MB SRAM per chip
    • Up to 80TB/s on-die memory bandwidth
  • Chip scaling up to 9x RealScale chip-to-chip connectors
  •  Numerics
    • INT8, INT16, INT32 and TruePoint™ technology
    • MXM: FP32
    • VXM: FP16, FP32
  • Power - Max: 375W; TDP: 275W; Typical: 240W

GROQCHIP OVERVIEW

Block Diagram - BittWare GroqCard™ Accelerator
Skype Chat Email Phone
Top