TSMC and Graphcore Prepare for AI Acceleration on 3nm

One of the side announcements made during TSMC’s Technology Symposium was that it already has customers on hand with product development progressing for its future 3nm process node technology. As we’ve reported on previously, TSMC is devel…

One of the side announcements made during TSMC’s Technology Symposium was that it already has customers on hand with product development progressing for its future 3nm process node technology. As we’ve reported on previously, TSMC is developing its 3nm for risk production next year, and high volume manufacturing in the second half of 2022, so at this time TSMC’s lead partners are already developing their future silicon on the initial versions of the 3nm PDKs.

One company highlighted during TSMC’s presentations was Graphcore. Graphcore is an AI silicon company that makes the IPU, an ‘Intelligence Processing Unit’, to accelerate ‘machine intelligence’. It recently announced its second generation Colossus Mk2 IPU, built on TSMC’s N7 manufacturing process, and featuring 59.2 billion transistors. The Mk2 has an effective core count of 1472 cores, that can run ~9000 threads for 250 Teraflops of FP16 AI training workloads. The company puts four of these chips together in a single 1U to enable 1 Petaflop, along with 450 GB of memory and a custom low-latency fabric design between the IPUs.

A future generation of products from Graphcore, according to the TSMC presentation, is set to be developed with the TSMC 3nm process in mind, skipping TSMC’s 5nm. No exact timescale was presented, nor any indication of Graphcore’s strategy. As we can see from the slide, the Colossus IPU line involves big high-transistor count chips, using the extra transistor budget afforded by the more dense process node.

We reached out to Graphcore for a statement, and received the following:

Nigel Toon, CEO & co-founder at Graphcore said: “Graphcore was first to build a completely new kind of fully programmable processor, designed from the ground up for machine intelligence. Many of the innovative features of our IPU architecture and the high yields we see even at the cutting edge of the latest process node, are testament to the close technology partnership we enjoy with TSMC. With 59.4Bn transistors, and built using the latest TSMC 7nm technology, the MK2 IPU, which we announced in July, is the world’s most sophisticated processor. Each GC200 IPU has 1472 independent processor cores and an unprecedented 900MB of In-Processor memory delivering an 8x step up in real world performance vs. our MK1 products. We continue to work closely with TSMC as one of their technology innovation partners to explore the advantages of new process nodes and techniques, including N3, so we can continue to deliver more performance improvements to enable our customers to make new breakthroughs in AI.”

PCIe Accelerator with two IPUs

As it stands, Graphcore has a number of products built on its Mk1 and Mk2 IPUs, including systems in partnership with Dell. Graphcore in Q1 2020 went through an extended Series D funding round earlier this year, and has raised $450 million, valuating the company at $1.95 billion, with investors such as BMW, Microsoft, the CEO of DeepMind, and a number of VC firms. According to TechCrunch, who reported this in February, the company still has $300m in cash reserves. As the cost to develop new silicon on the latest manufacturing node increases, it will be interesting to see at what point Graphcore puts an order in with TSMC’s 3nm, or if TSMC and Graphcore are working together to help optimize the process for large scale chips and if TSMC will bear some of that cost.

Related Reading

HPC Systems Special Offer: Two A64FX Nodes in a 2U for $40k

It was recently announced that the Fugaku supercomputer, located at Riken in Japan, has scored the #1 position on the TOP500 supercomputer list, as well as #1 positions in a number of key supercomputer benchmarks. At the heart of Fugaku isn’t any standard x86 processor, but one based on Arm – specifically, the A64FX 48+4-core processor, which uses Arm’s Scalable Vector Extensions (SVE) to enable high-throughput FP64 compute. At 435 PetaFLOPs and 7.3 million cores, Fugaku beat the former #1 system by 2.8x in performance. Currently Fugaku has been used for COVID-19 related research, such as modelling tracking rates or virus in liquid droplet dispersion.

The Fujitsu A64FX card is a unique piece of kit, offering 48 compute cores and 4 control cores, each with monumental bandwidth to keep the 512-bit wide SVE units fed. The chip runs at 2.2 GHz, and can operate in FP64, FP32, FP16 and INT8 modes for a variety of AI applications. There is 1 TB/sec of bandwidth from the 32 GB of HBM2 on each card, and because there are four control cores per chip, it runs by itself without any external host/device situation.

It wasn’t ever clear if the A64FX module would be available on a wider scale beyond supercomputer sales, however today confirms that it is, with the Japanese based HPC Systems set to offer a Fujitsu PrimeHPC FX700 server that contains up to eight A64FX nodes (at 1.8 GHz) within a 2U form factor. Each note is paired with 512 GB of SSD storage and gigabit Ethernet capabilities, with room for expansion (Infiniband EDR etc). The current deal at HPC Systems is for a 2-node implementation, at a price of ¥4,155,330 (~$39000 USD), with the deal running to the end of the year.

The A64FX card already has listed support for quantum chemical calculation software Gaussian16, molecular dynamics software AMBER, non-linear structure analysis software LS-DYNA. Other commercial packages in the structure and fluid analysis fields will be coming on board in due course. There is also Fujitsu’s Software Compiler Package v1.0 to enable developers to build their own software.

Source: HPC Systems, PDF Flyer

Related Reading

 

It was recently announced that the Fugaku supercomputer, located at Riken in Japan, has scored the #1 position on the TOP500 supercomputer list, as well as #1 positions in a number of key supercomputer benchmarks. At the heart of Fugaku isn’t any standard x86 processor, but one based on Arm – specifically, the A64FX 48+4-core processor, which uses Arm’s Scalable Vector Extensions (SVE) to enable high-throughput FP64 compute. At 435 PetaFLOPs and 7.3 million cores, Fugaku beat the former #1 system by 2.8x in performance. Currently Fugaku has been used for COVID-19 related research, such as modelling tracking rates or virus in liquid droplet dispersion.

The Fujitsu A64FX card is a unique piece of kit, offering 48 compute cores and 4 control cores, each with monumental bandwidth to keep the 512-bit wide SVE units fed. The chip runs at 2.2 GHz, and can operate in FP64, FP32, FP16 and INT8 modes for a variety of AI applications. There is 1 TB/sec of bandwidth from the 32 GB of HBM2 on each card, and because there are four control cores per chip, it runs by itself without any external host/device situation.

It wasn’t ever clear if the A64FX module would be available on a wider scale beyond supercomputer sales, however today confirms that it is, with the Japanese based HPC Systems set to offer a Fujitsu PrimeHPC FX700 server that contains up to eight A64FX nodes (at 1.8 GHz) within a 2U form factor. Each note is paired with 512 GB of SSD storage and gigabit Ethernet capabilities, with room for expansion (Infiniband EDR etc). The current deal at HPC Systems is for a 2-node implementation, at a price of ¥4,155,330 (~$39000 USD), with the deal running to the end of the year.

The A64FX card already has listed support for quantum chemical calculation software Gaussian16, molecular dynamics software AMBER, non-linear structure analysis software LS-DYNA. Other commercial packages in the structure and fluid analysis fields will be coming on board in due course. There is also Fujitsu’s Software Compiler Package v1.0 to enable developers to build their own software.

Source: HPC Systems, PDF Flyer

Related Reading