5 solutions to control energy consumption in data centres

Optimizing Data Center Consumption: Both Ecological and Economic Interests

The global energy consumption of IT equipment was estimated at 2.7% of total energy consumption in 2017. It is projected to represent between 4.7% and 6% of total consumption by 2025. These are the estimates reported in the working paper: "Mastering Digital Consumption." Given this increase, controlling their consumption is therefore a significant environmental issue. Large data centers account for a significant portion of this consumption, around 30% according to an article from CNRS Le journal. Therefore, it is interesting to consider ways to reduce their consumption by intervening at the level of data centers, software, and hardware/software combinations.

The question of digital equipment consumption has been addressed for a long time, long before the environmental footprint of digital technology became a mainstream issue. Beyond energy efficiency, there are other reasons why optimizing consumption is interesting:

Density: Reducing consumption limits the heat produced by machines, allowing more functionalities and computing power to be concentrated in the same space.
Power: Sometimes the computing power of a large data center is constrained by the power at the electrical network level. Thus, reducing unit consumption per machine allows for more machines at the same power supply.
Costs: For equipment that consumes a lot of energy, the penalty is twofold. It is necessary to pay both for the energy consumed and for the cooling system. In large data centers, a study by the Technical Association for Energy and the Environment (ATEE) estimates that 50% of energy consumption is for IT equipment operation and 40% for cooling systems. Reducing consumption helps reduce costs on both fronts.
Autonomy: For equipment that operates on battery power, lower consumption automatically leads to better autonomy.

Optimizing equipment consumption thus combines economic logic with environmental concerns. Here are 5 solutions for optimizing data center consumption.

Solution 1: Powering Off Unused Machines to Optimize Data Center Consumption

Data centers are sized to support the maximum load they could be subjected to during peak activity. However, peak loads are occasional, and for most of the time, machines are underutilized.

For example, consider a video streaming service. Users mainly use it in the evening and much less during the day. According to the ATEE study, data centers are mostly used between 40% and 60% of their capacity, and network infrastructure utilization rates are of the same order (between 30% and 60%). When these machines are not in use, they still consume energy.

According to the thesis report "Data centers energy optimization" a server's idle consumption is about 65% of its maximum consumption, which is the amount of energy consumed for nothing when a machine is not in use. An idle server can consume a lot of energy for doing nothing.

The most obvious solution would be simply to turn off unused machines... Simply? These manipulations involve hundreds or even thousands of machines. Moreover, it is necessary to be able to turn them back on in time when they are needed again. When a machine shutdown/restart strategy is implemented, service quality (QoS) is degraded. Indeed, when the need for machines increases, it takes some time for them to be restarted and available.

So, a decision needs to be made. It is necessary to take into account, on the one hand, the acceptable degradation of QoS. On the other hand, the energy savings made possible by such a sacrifice. Several studies address the problem, including the aforementioned thesis, and propose methods to shut down and restart machines while impacting the services provided by the machines as little as possible. The potential reduction in consumption is around 40%.

Solution 2: Sharing Machine Usage

A second solution is to "smooth the curve" - not the congestion of hospitals but the use of machines. What to do with all this equipment when it is not in use? Share it with another business that has a peak load at a different time than in the evening.

This mutualization of machines can be done at two levels. At the level of a company with a private data center, when launching a new service or application, it is worth asking the following question: can this new service or application be executed at times when my machines are not heavily used - for example, during the night or on weekends? If this is possible without degrading the quality of service provided to users, then it is possible to provide the new service without increasing the capacity of IT infrastructures.

At the inter-company level, it is a matter of resorting to the cloud. By offering resources on demand, the cloud allows the mutualization of IT resources. A study of cloud usage published in 2017 shows that mutualizing resources via the cloud allows for a relatively constant utilization rate of physical machines: for a region that gathers several hundred users, the variability of utilization over time is relatively low.

Solution 3: Investing in Code Quality

Using machines as much as possible, is that enough? If the software running on the machines is not efficient, the goal of optimizing consumption is not achieved. This leads us to a new avenue: software optimization.

Each software must meet performance criteria. For example, the time to wait for a result or the required storage space. When it comes to improving the performance of software (for example, getting a result more quickly), two solutions are possible. Either invest in hardware, or invest in code quality. So which solution to prioritize? If the only decision criterion is cost, it is likely that the company will opt for hardware investments. If, on the other hand, the company also wants to take into account the environmental criterion, the option to invest in the quality of development becomes more interesting.

One could object to this approach that code optimizations allow for better resource utilization to execute software instructions more quickly, but that this generally increases chip power consumption. On the one hand, there is a reduction in consumption because the machine is solicited for a shorter period. On the other hand, there is an increase in consumption because the machine is solicited more intensely.

Two studies, one from the University of Orléans and the other involving the University of Dortmund, show that in reality, in all situations tested, the reduction is greater than the increase: code optimization therefore allows energy savings. It is also interesting to note that in the study from the University of Orléans, some optimizations do not affect performance, but only energy consumption: optimizing consumption does not necessarily require performance sacrifices.

However, there are still some barriers to overcome to generalize code optimization for energy consumption management purposes. Firstly, at the tools level. Performance optimization development tools are varied and rather mature. However, tools dedicated to studying consumption are less available. Secondly, at the cultural level. In addition to tools, this requires a cultural shift. Both on the business side and on the IT side, to take into account energy efficiency as well as performance.

Solution 4: Using Accelerators (Application-Specific Components)

So far, hardware and software aspects have been addressed separately, leaving aside a whole class of optimizations for performance and consumption. At the cost of additional complexity, it is possible to design a service considering hardware and software simultaneously to open up the field of possibilities.

The Green500 is a ranking of the world's best computers based on energy efficiency. It was introduced as an alternative to the Top500 ranking, which focuses solely on raw computing power, without considering energy consumption. Not surprisingly, at the top of each of the two rankings, there are recent machines: the latest technologies allow for both greater raw computing power and better energy efficiency.

By focusing on the Green500, it can be seen that the top of the ranking is mainly composed of machines with accelerators. Accelerators are components specially designed for specific types of applications, as opposed to general-purpose components. The most common examples are General Purpose Graphical Processing Units (GPGPUs), designed for massively parallelizable applications. Designing a chip for a specific category of use cases allows for optimizations on almost all fronts: raw power, energy efficiency, or density.

But then, why not systematically use these accelerators? The fact is that they require time and special skills to implement. Using an accelerator often requires rewriting software extensively, usually by specialized developers. This constitutes a significant cost in human resources.

But even before porting the software, these accelerators must exist! Some companies have the ability to design and produce specialized chips for their needs, for example, Google with TPUs. Other companies do not have the capabilities and only have access to chips produced by others.

These may not necessarily be suitable for their needs. This does not mean that all is lost. It is sometimes possible to reuse specialized chips with a little trick for a different use case while benefiting from their efficiency. This is how, for example, components originally designed for graphic applications are now used in banking, aerospace, construction, and scientific research industries.

This optimization strategy using accelerators will therefore depend in part on the company's sector of activity. It will also depend on its willingness to invest in highly specialized developers.

Solution 5: Frugality

After ensuring that hardware is properly utilized, optimizing applications, and designing hardware and software together, there is still another essential lever of action to control consumption: frugality.

Driving a little slower in a car can significantly reduce fuel consumption: An article on the Caradisiac website calculates that reducing speed from 130 to 110 km/h can reduce consumption by 25%. Consumption is not proportional to speed; the few minutes saved cost a lot in energy. The same goes for computer devices. Tests at one of our clients show that a 4% reduction in performance can be accompanied by a 30% reduction in consumption. Other tests conducted at North Carolina State University show other examples with a 20% reduction in consumption at the cost of only a 6% loss of performance for one of the tested applications. In short, a small loss of performance in exchange for a significant energy gain.

Where does this energy saving come from? Several mechanisms are at play. One of the main ones is as follows: machines are composed of several parts. Processor, RAM, storage, interconnections with other machines, etc. There is always one of these components that happens to be the bottleneck for application performance. The other components are therefore underutilized and consume energy unnecessarily. These can therefore be limited to align with the performance of the weakest component. This with little impact on performance. The challenge is then to be able to analyze applications on a case-by-case basis. This allows for recognizing bottlenecks and implementing limitations.

Finally, do our applications really need to go as fast as possible? Take the example of a use case that requires delivering the result of a calculation by a specific deadline. There is no advantage in completing the calculation earlier. It is therefore possible to reduce machine performance to reduce their consumption without compromising business requirements. The two main challenges are:

Defining performance goals based on business constraints.
Mastering application performance to have predictable and repeatable execution times.

Conclusion

The consumption of IT equipment is becoming increasingly important and must also be controlled. There are several proven solutions to reduce consumption. They act at several levels: at the level of data centers, software, and on hardware/software combinations.

Some solutions involve more or less significant compromises. They can be at the level of performance or the quality of the service rendered. In these cases, to implement the solutions, it is therefore necessary to align business and IT. This allows for defining the optimal balance between service quality and energy efficiency. Some solutions are, however, entirely "free": consumption can decrease without any sacrifices elsewhere!