MASTER: A Multicore Cache Energy Saving Technique using Dynamic Cache Reconfiguration

October 5, 2017 | Autor: Sparsh Mittal | Categoría: Computer Architecture, Computer Engineering, Cache Memory, SRAM design, Last Level Cache
Share Embed


Descripción

MASTER: A Technique for Improving Energy Efficiency of Caches In Multicore Processors SPARSH MITTAL, ZHAO ZHANG AND YANAN CAO PUBLISHED IN IEEE TRANSACTIONS ON VLSI 2014

Presentation Plan  Motivation For Cache Power Management  Existing Techniques And Their Limitations  MASTER: Overall Approach

 Cache Coloring and Reconfigurable Cache Emulator  Marginal Gain Computation and Algorithm  Overall Flow-Diagram  Simulation Experiments and Results

Motivation: Power Management Is Crucial  Power issue drives major design decisions.  Modern data centers consume megawatts of peak

power: equal to the needs of a city of thousands of people!  An Exascale machine built with technology used in

today’s supercomputers will consume gigawatts of power!

Motivation: Increasing Cache Sizes  Size of last level

cache is increasing (e.g. 15MB in Intel Core i7-3960X processor)!  Caches consume

huge chip area!

Intel’s 32nm Sandy Bridge Core i7-3960X

Motivation: Increasing Leakage Energy  Leakage energy is increasing dramatically with recent

CMOS technology generations  Leakage energy has become a major source of energy

consumption in last level caches (LLC)  Large power consumption increases cooling cost also

We need effective approaches for saving cache leakage energy!

Cache Reconfiguration Approach: Main Idea  There exists inter- and intra-program variation in

cache requirement of different programs.  By allocating just right amount of cache to each

program, rest of the cache can be turned off.  Leakage saving can be obtained with minimum

performance loss.

1MB

1MB

1MB

1MB

Examples of Existing Cache Reconfiguration Approaches

4 MB, 4-way Base Cache

1 MB, 1-way

2 MB, 4-way

1MB

1MB

1MB

1MB

1MB

1MB

1MB

4 MB, 2-way

256 KB each 1 MB, 4-way

Way concatenation 1MB

Configurable Line size 512 KB Each

2KB

2KB

2KB

2 MB, 2-way 1MB

Selective-Set

2KB

2KB

1MB

1MB

Selective-Way

4 MB, 1-way

Limitations of Existing Cache Energy Saving Techniques  Provide coarse-grain allocation granularity.  Require offline analysis, difficult to scale.  Cannot take components other than cache into account

 In multicore systems  Option

space becomes huge!  Locality of memory access stream is reduced and hence locality based techniques don’t work well.

MASTER: A Microarchitectural Cache Leakage Energy Saving Technique Using Dynamic Cache Reconfiguration

MASTER: Overall Approach

Cache Partitioning and Cache Quota Enforcement

 Collect Profiling Info.  Compute Marginal Gain  Use Energy Saving Algorithm to Decide Cache Quota of Each Application  Use Cache Coloring to Enforce Quotas

Cache Turn-off for Saving Leakage Energy

 Transition Unused Blocks to Low-Power State.  Options  State-preserving  State-destroying technique (Gated Vdd Technique)

Collecting Profiling Information  For each application, we want to find its miss-rate for

different cache sizes  So we use auxiliary tags, called profiling unit.  One profiling unit for each size and each core  Overhead is low since  We store only tags and not data  We use set-sampling (sampling ratio = 64 or more)  We call the structure RCE (Reconfigurable Cache

Emulator).

Reconfigurable Cache Emulator (RCE) Design L2 Access (Address and Core ID)

Address Mappers

Queue

64X/64

A1

0

A2

1

A3

RS

A4

Sampling Filter

A5

Set-sampling to keep overhead low

Finite State Control

32X/64 16X/64

Profiling data for 7 different sizes

8X/64 MUX

4X/64 2X/64 X/64

A6 A7

N-1 Storage for N cores

Each application individually profiled

RCE dynamically profiles each application for 7 different cache sizes.

Profiling Unit Overhead

L2 Cache

8MB, 8-way, 64B block size

Number of L2 Sets

16384

Sampling Ratio

64

Number of Sets In Profiling Unit

=16384/64= 256

Tag size (bits)

24

Profiling Unit Size (Bytes)

=(256*8*24)/8 = 6144

Profiling Unit Size (KB)

6

Reconfigurable Cache Emulator (RCE) Overhead Assume an L2 cache of 8MB size Region

L2 Size Profiled

Profiling Unit Size

32X/64

64X/64

8MB

6KB

16X/64

32X/64

4MB

3KB

8X/64

16X/64

2MB

1.5KB

4X/64 2X/64 X/64

8X/64

1MB

768B

4X/64

512KB

384B

2X/64

256KB

192B

X/64

128KB

96B

64X/64

0 1

N-1 Storage for N cores

Even For 4 Cores: Size and Energy overhead of RCE: < 0.8% of L2 cache

Physical Page Number

Page Offset Block offset

Conventional Set-decoding

Physical Address

Set Index Physical Page Number

Page Offset Block offset

Cache Coloring

Physical Address Memory Region ID Mapping Table

Cache Set # Color Inside Color

Set Index

L2 Cache

Full

L2 Cache

Half Quarter

1/128 of total size

Eighth

Selective-sets Approach

Cache Coloring Approach

Examples of Different Possible Configurations 0

0

0 Core 1

Core 0 23 33 40

Core 0 Core 1 Core 0

Core 1 Turned Off 78

95 127

Turned Off

110 127

Core 0

119 127

Core 1

Marginal Gain Computation  Marginal gain shows reduction in cache miss on

increasing unit cache color Colors 16 8

Misses 14561 20786

Marginal Gain=

(20786−14561) (16−8)

=778.1

Insight  Large marginal gain: Program needs more cache => allocating cache can improve performance and hence energy efficiency.  Small marginal gain: Program’s cache needs are small => turning off cache can save energy.

MASTER: Energy Saving Algorithm Here Cij shows jth color value for ith core Step1: Select 4 color values for each core Core

Answer

Step 2: Choose 2 energy efficient color value for each core

0

1

2

3

0

1

2

3

C01 C02 C03 C04

C11 C12 C13 C14

C21 C22 C23 C24

C31 C32 C33 C34

C01 C02 C03 C04

C11 C12 C13 C14

C21 C22 C23 C24

C31 C32 C33 C34

1. C01 C11 C22 C33 2. C01 C11 C22 C34 3. C01 C11 C23 C33 ……….. 16. C03 C13 C23 C34 Step 4: Find most energy efficient configuration

1. C01 C11 C22 C33 2. C01 C11 C22 C34 3. C01 C11 C23 C33 ……….. 16. C03 C13 C23 C34 Step 3: Form 4-core configurations

MASTER: Overall Flow Diagram L2 Access (Address and Core ID)

Address

Physical Page No. Page Number in Region

Core ID Counters

RCE

Set # Inside Color

Offset

L2 Tag

L2 Cache

Region ID 0 1

Energy Saving Algorithm

Region ID

Page Offset

Remap/control

Offset L2 Tag

Color Index N-1

Set # Inside Color

Storage Color 127 …… Color 1 Color 0 64 Sets Per Color

Mapping Tables Counters

Software/OS

Processor

Salient Features of MASTER  Software-based approach, with light hardware support  Overhead of RCE and mapping tables
Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.