The Semiconductor Roadmap Embraces Innovative Thermal Management Technologies

15 Jul, 2022 Source: Data Center Frontier

In this edition of Voices of the Industry, Paul Hofemann, Chief Strategy Officer of JetCool, outlines the semiconductor roadmap progression and explores critical thermal challenges that the industry must address.

Paul Hofemann, Chief Strategy Officer, JetCool Technologies

Today we are witnessing another great transformation in semiconductor technology. The recent emergence of Heterogeneous System Architectures (HSA), which integrates different chiplets into a single enclosure package, is picking up the slack of a slowing Moore’s Law. Semiconductor performance improvements historically relied on the increasingly difficult nanoscale shrinks of the integrated circuit’s (IC) architecture. These innovations were often referred to as ‘more than Moore.’ New Heterogeneous Integration (HI) schemes offer an alternative path to increased device transistor density, fortifying the semiconductor industry’s ability to continue advances in performance, cost, and smaller form factors. But transitioning to HI does not eliminate all scaling roadblocks. One familiar challenge rapidly becoming a gating item is rising thermal issues. This article will review the semiconductor roadmap progression and explore the critical thermal challenges that must be addressed before HI innovation can reach its full potential.

Semiconductor Roadmaps: Our History

The semiconductor industry has gone through multiple waves of innovation over the last 60 years guided by a roadmap that has no destination but helps orchestrate perennial improvements in device performance, costs, and form factors (Figure 1). Before the turn of the century, the industry roadmap focused on incremental improvements in manufacturing equipment, materials, and process steps for chip fabrication. The United States was the initial custodian of a National Technology Roadmap for Semiconductors (NTRS) until the global International Technology Roadmap for Semiconductors (ITRS) was created in the late 1990s. These first roadmaps mapped themselves to Moore’s Law to set the pace of technology innovation that ensured a viable industry.

Figure 1: Semiconductor Industry Transformation Graphic, Source; DARPA Microelectronics Technology Office 2020

The photolithography process in semiconductor fabrication is responsible for defining the smallest feature patterns on a silicon wafer. Each new technology node would usually shrink by 30% linearly, resulting in twice the number of transistors in areal density. It didn’t take long before the minimum feature size was smaller than the wavelength of light used to image them. By 2010, even an advanced 193-nanometer wavelength lithography system, with a $30M price tag, struggled to keep up with the ITRS roadmap and needed clever, albeit costly, multiple pattern schemes to support new semiconductor technology nodes. Moore’s Law reached an economic end for most semiconductor manufacturers before the 10nm tech node. Only a handful of companies (i.e. Samsung, TSMC, SK Hynix, Micron, and Intel) can afford the latest EUV lithography tool, at more than $150M each, and continue pushing to its more natural ~1-nanometer physics limit early next decade. Solving these nanoscale patterning issues does not mean the semiconductor industry’s perennial performance improvements can be realized. Thermal management of the increasingly power-hungry chips will become a gating challenge.

Dennard Scaling is a corollary to Moore’s Law that states as device dimensions shrink, each transistor consumes less power and has a faster clock rate. Packing more transistors into a chip resulted in better performance at the same power density. Unfortunately, around 2005, Dennard Scaling broke down when minimum transistor voltage thresholds were reached and leakage currents became more prolific. This has caused undesirable power density increases with each new Moore’s Law transistor shrink. These rising power densities are exacerbated by high-performance compute AI chips making thermal management a top design constraint for next-generation semiconductors.

International Roadmap for Devices and Systems (IRDS): Our Future

With the slowdown of transistor scaling and the paradigm shift in HSA designs, the ITRS roadmap becomes insufficient to provide industry direction. In 2015 it was replaced by today’s International Roadmap for Devices and Systems (IRDS). This new roadmap looks beyond traditional planar die scaling to satisfy the exploding demand for modern applications in AI, HPC, IoT, and 5G/6G. Future improvements will come from beyond single-chip advances as the heterogeneous integration of chiplets, sensors, etc., into a system-level package (SiP) take centerstage. This point is driven home when you consider today’s semiconductor chips may have as many as 50 billion transistors, but an advanced HSA device can provide a staggering 300B transistors in a single package (Figure 2).

Figure 2: Innovation Beyond Chip Level Graphic; Source: TSMC (Mark Liu CEO), 2021 IEEE International Solid-State Conference

The HIR Future Roadmap Requires Next-Generation Thermal Management

Modern HSA systems have become such a large undertaking that they need their own Heterogeneous Integration Roadmap (HIR) that is coordinated with the overall IRDS roadmap. HIR focuses on the integration challenges of multiple dissimilar chiplets (logic, memory, sensors) to be interwoven into one functioning device, creating a high-performance System in Package (SiP) (Figure 3).

Figure 3: Heterogeneous architecture of SiP device; Source: HIR Roadmap

With traditional transistor scaling combined with HSA packaging, semiconductor devices have a ballooning thermal management challenge that warrants its own dedicated chapter in the HIR. Many high-performance SiPs have reached a Thermal Design Power (TDP) requirement that cannot be cooled with traditional forced air convection. This heat burden can increase temperatures compromising processor performance, causing memory loss, and accelerating chip aging. For the full potential of HSA to be realized, the semiconductor industry must improve the thermal management of these high-power density devices in a post-Dennard scaling world.

Competitive Cooling Technology Heats Up

In a typical data center environment, heat transfer efficiency favors convection over conduction. Conduction relies on energy to be transferred by surface atom vibrations of the heat source to a heat sink, while convection employs an efficient ‘sweeping’ of a mobile heat-ladened fluid. Forced air convection cooling for devices having TDPs up to 200 Watts has been the ‘go-to’ technology with benefits that include scalability, low cost, and easy implementation. The major drawback of air cooling is that it becomes less effective, given practical airflow limitations, for chip power densities beyond 250 W/cm². As the semiconductor industry develops high power density devices of 500-1000 W/cm² and beyond over the next few years, there will be a shift from traditional airflow solutions to more efficient liquid flow cooling.

It is a busy time in the thermal management community as multiple liquid cooling technologies jockey to lead this market transition. However, given the highly segmented end-markets for data centers, HPC, and AI applications, it is likely that there will be several renditions of liquid cooling technologies adopted, each with its unique advantages.

Liquid Cooling: The Way Forward

The most popular near-term liquid cooling solutions include some form of active cold plates or dielectric immersion systems. Long-term, leading semiconductor manufacturers are developing micro-cooling technology that directly integrates liquid cooling tunnels into the silicon substrate. The merits and challenges of each of these cooling technologies are discussed below.

Cold Plates

The cold plate’s primary job is to act as a heat exchanger between the coolant and high-power devices with TDPs greater than 250 Watts. The cold plate is attached to the heat source processor package and absorbs the heat via conduction. The heat from the inside walls of the cold plate is removed by convection from the cooler fluid pumped through the cold plate body. The warmed fluid leaving the cold plate is typically sent to a heat exchanger, where it is once again cooled and ready to repeat the cycle.

A key parameter for selecting an effective cold plate is its material’s thermal impedance, which is the inverse of thermal conductivity. The lower the impedance, the greater the cooling performance. Typically, cold plates are manufactured with extruded aluminum or copper materials to meet this demand for high heat transfer.
In multi-socket applications where a liquid cooling solution is required, aluminum cold plates are a popular option due to being low cost, light weight, and corrosion-resistant. Aluminum cold plates can be manufactured in various sizes and shapes to meet high-power demands; however, they tend to have inferior thermal performance compared with copper material alternatives. Copper cold plates are better for high-power applications due to their lower thermal impedance. However, copper is more expensive and can be heavy, making it unsuitable for many applications.

Material selection is not the only determinant of a cold plate’s performance. The internals of cold plates also make a difference. Design parameters such as flow rates, pressure drops, single/two-phase liquid, and flow channel patterns can significantly impact any given application.

Immersion Cooling

Another cooling technology being adopted for high power density devices is immersion cooling, which involves submerging server electronics in a dielectric fluid. The dielectric fluid absorbs the heat from the device and then transfers it to a radiator or chiller for disposal.

Although gaining popularity, immersion does have a few challenges. The dielectric fluid cost is high and must be kept in contamination-free containers. While the dielectric fluid has a much better heat capacity than air, it is still not as efficient as water. Also, there is an extra pumping burden required to cycle the tanks filled with viscous and oily dielectric fluid.

On the plus side, immersion creates an environment that protects the electronics from the harsh ambient environment and dust debris. Additionally, the ancillary cooling hardware can be greatly simplified when the entire electronic board is submerged in a dielectric fluid. Rather than using large and expensive heat sinks and fans, you can use a much smaller radiator to dissipate the heat from the dielectric fluid.

Microconvective Liquid Cooling

Another innovative solution quickly gaining popularity is microconvective liquid cooling. This technology differs from conventional cold plates in that it directs fluid through an array of small jetting nozzles straight to the hot surface, resulting in up to a tenfold increase in heat transfer compared to forced air cooling. These jet arrays are optimized to provide maximum convective performance only where it’s needed. The nozzle patterns preferentially target the hotpots ensuring overall efficient cooling of the package’s TDP.

There are no special requirements for the cooling module’s materials, and the coolant can be any liquid, including water, glycols, dielectrics, and refrigerants. Due to the inherently low thermal resistivity of direct impingement, these jet array modules can perform at elevated inlet temperatures and still remove the heat dissipated from a high-power device. Approximately every 1°C increase in the inlet temperature translates to 2% cooling cost savings. It’s a scalable technology that can cool large or small data centers, HPC, and AI systems with improved energy sustainability.

In-Silicon Liquid Cooling

The cooling technologies discussed above are available in some form today and continue to improve in terms of cooling, energy sustainability, ease of adoption, and overall costs. However, the electronic market’s insatiable desire for higher performance will soon require 3D-integrated SIPs having TDPs beyond 1,000 Watts. To meet this challenge, many companies have begun to investigate in-silicon micro-cooling. In-silicon cooling will integrate the fluid cooling channels directly into the semiconductor’s silicon substrate or interposer layer. This close contact cooling has shown great initial results in laboratory settings and is now part of most leading semiconductor fabrication R&D investigations.

Conclusion

During the first several decades of Moore’s Law transistor shrinks, the semiconductor industry has benefited from Dennard’s scaling to keep the power densities constant. Unfortunately, around 2006, this power scaling benefit reached its natural limit, and a proactive heat dissipation strategy became necessary. More recently, heterogeneous system architectures (HSA) have provided a path for continued performance improvements; however, it has accelerated the need for cooling technologies beyond forced air convection. This growing thermal management challenge is now a focus topic on the most recent IRDS and HIR roadmaps.
Data centers, HPC, and AI system providers are beginning to vet and embrace several new liquid cooling technologies, emphasizing simplicity, scalability, and sustainability. The age of liquid cooling is upon us and will require a more active role from chip designers and fabricators for seamless integration and maximum performance.

Paul Hofemann is a 30-year veteran of the semiconductor fabrication industry with experience spanning from large capital equipment OEM’s like Applied Materials and KLA to smaller startups such as Molecular Imprints. His career focus has been in product management and business development in Asia Pacific, North America, and European markets. He has a masters degree in Mechanical Engineering and is currently the Chief Strategy Officer at JetCool Technologies.

More >> The Semiconductor Roadmap Embraces Innovative Thermal Management Technologies

Find Data Center

Featured Data Centers

DN2: Highlands Ranch, CO

USA

Equinix PA6

France