HAL Id: inria-00493906
https://hal.inria.fr/inria-00493906
Submitted on 21 Jun 2010
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Control and Power Management in Presence of Workload Variations
Radu Marculescu
To cite this version:
Radu Marculescu. Control and Power Management in Presence of Workload Variations. ISCA tutorial on ”Multi-domain Processors: Challenges, Design Methods, and Recent Developments”, Jun 2010, Saint Malo, France. �inria-00493906�
Outline
Part I: Multi-Domain Processors Design Overview (2:00-2:45PM)
^
Multi domain server cell phone and media processors
^
Multi-domain server, cell phone, and media processors
^
Power management techniques
Part II: Router Design and Synchronization Issues (2:45-3:30PM)
Part II: Router Design and Synchronization Issues (2:45-3:30PM)
^
Asynchronous router design
^
Quality of Service and virtual channels in QNoC y
Part III: Control and Power Management in Presence of Workload Variations (4:00-4:45PM) ( )
^
VFI partitioning and voltage assignment
^
Workload modeling and dynamic control of multi-VFI designs
Part IV: DVFS in Presence of Process Variations (4:45-5:30PM)
^
Impact of process variations on DVFS controller performance
RGM2– ISCA’10
99
^
Technology-driven limits on DVFS controllability
ISCA-2010 Tutorial #2
Control and Power Management in Presence of Control and Power Management in Presence of
Workload Variations
Radu Marculescu
Carnegie Mellon University radum@cmu.edu
100
Power Management Unit
te te te te
Power Gat Power Gat Power Gat Power Gat
Core7 Core
6 Core
5 Core
4
Sensors Sensors Sensors Sensors
External Voltage
P External Voltage
Regulator Control Power
Control
Unit Power Gates
Control Sensors
Gate Gate Gate Gate
Control
Sensors Sensors Sensors Sensors
Power Power Power Power
Core
0 Core
1 Core
2 Core
3
RGM2– ISCA’10 [Rusu, A-SSCC 2009]
101
Outline
VFI partitioning
^
Multi-VFI NoC designs
P titi i d lt i t
^
Partitioning and voltage assignment
^
Examples
On-line control
^
State-based model construction
^
Feedback control architecture
^
Feedback control architecture
^
Stability issues
Summary
Summary
RGM2– ISCA’10
102
SoC VFI Partitioning
What is the most efficient partitioning and what are the
What is the most efficient partitioning and what are the corresponding voltage/frequency assignments?
y on Energ y umption Area and Over h
Applicati o Cons u Energy head
Increasing level of granularity
RGM2– ISCA’10
103
Design Methodology for Multi-VFI NoCs
NoC Architecture
(topology routing etc ) Application W (topology, routing, etc.)
Scheduling
W orkload
Scheduling VFI Partitioning and
Charact e
VFI Partitioning and Static Voltage-Frequency
Assignment
erization
Interface Design for Voltage- Frequency Islands q y
D i V lt d
Off-line
104 Dynamic Voltage and
Frequency Scaling (DVFS) On-line
VFI Partitioning Problem
Given
N C hit t d h d l f th d i li ti
^
NoC architecture and a schedule for the driver application
^
Maximum number of allowed VFIs and physical constraints
Find
Find
^
VFI partitioning (i.e., optimum number of VFIs, n ≤ N)
^
Assignment of the supply and threshold voltages to each island
^
Assignment of the supply and threshold voltages to each island
Such that the total energy consumption is minimized
Number of VFIs
∑ ( )
+
=
App n VFITotal
E E i
E
Number of VFIs
= i 1
Application (useful)
energy consumption Overhead of i
thVFI
RGM2– ISCA’10
105
(comp+comm)
Voltage/Frequency Assignment Problem
Given a VFI partitioning
Given a VFI partitioning
Find supply (V
i) and threshold (V
ti) voltage assignments
Such that application energy consumption is minimized
Such that application energy consumption is minimized
( ) ∑ ∑ ( ) ( )
∑
∀ ∀∀
+
=
T
i i T
bit T
i
ti i i
App
E V , V vol i , j E i , j
E min
∈
∀ ∀∈
∈
∀i T i T i T
Energy consumed when the task is executed at (V
iV
ti)
Communication energy consumption
Subject to the following deadline constraints per task t:
x
is executed at (V
i,V
ti) consumption
t t
t Comm t
t
t deadline start time f
x
−
−≤ +
RGM2– ISCA’10
106
Execution time Communication delay
VFI Partitioning and Voltage Assignment Algorithm
For all pairs of neighboring islands (i j ) neighboring islands (i , j )
Merge VFIs i and j Solve static VF assignment problem
Merge VFIs i and j Given an initial partitioning
with N islands, find the assignment problem
Compute the ti with N islands, find the
static voltages
energy consumption
Update the VFI Merge the pair of islands that
Solve the static voltage/frequency
assignment problem Update the VFI
configuration
g p
provides the minimum energy assignment problem
RGM2– ISCA’10
107
This can be also implemented as a branch & bound algorithm.
We can obtain exact results for small examples.
Voltage Assignment Algorithm
For all pairs of neighboring islands (i , j) neighboring islands (i , j )
Merge VFIs iand j
Solve static VF assignment problem Given an initial partitioning
with Nislands, find the static voltages
Compute the energy consumption
Merge the pair of islands that Update the VFI configuration Merge the pair of islands that
provides theminimum energy
( ) ∑ ∑ ( ) ( )
∑
∈ ∀∈ ∀∈∀
+
=
T
i i T
bit T
i
ti i i
App
E V , V vol i , j E i , j
E min
x
108
t t
t Comm t
t
t deadline start time f
x
−
−≤ +
subject to
Voltage Assignment Algorithm
For all pairs of neighboring islands (i j) neighboring islands (i , j )
Merge VFIs iand j
Solve static VF assignment problem Given an initial partitioning
with Nislands, find the static voltages
Compute the energy consumption
Merge the pair of islands that Update the VFI configuration Merge the pair of islands that
provides theminimum energy
Constrained nonlinear optimization or nonlinear programming
^
Finds a constrained minimum of a scalar function of several variables
RGM2– ISCA’10
109
^
Use Matlab nonlinear solver (fmincon fmincon)
Why Does VFI Partitioning Matters?
u
Small benchmark scheduled on a 2×2 network using EDF
1 1.5
ge (V)
u
BPM 70 nm used for the technology parameters
u
Energy consumption
1-VFI: 10.5mJ0.5 1
upply Voltag
tile (2,2)
0 5 J
29%
2-VFI: 7.5mJ 3-VFI: 7.6mJ
0
1
2 0
1 20
Su ( , )
tile (2,1)
tile (1,1)
tile (1,2)
Single VFI
29%
1.5
V)
0
0 Single VFI
1.5
V)
0.5 1
ply Voltage (V
0.5 1
ply Voltage (V
1 1 2
20
Supp
tile (1,1)
tile (1,2) tile (2,1)
tile (2,2)
1 1 2
20
Supp
tile (1,1)
tile (1,2) tile (2,1)
tile (2,2)
RGM2– ISCA’10
110
0
1 0
1
Two VFIs
0
1 0
Three VFIs
Experiments with Realistic Benchmarks
Several E3S benchmarks (consumer, network, auto-industry, telecom)
A li i h d l d N C i f 3 3
Applications scheduled to NoCs ranging from 3 × 3 to 5 × 5
20 1 VFI
16
20 1-VFI
2-VFI 3-VFI
tion ( mJ )
Minimum energy
8 12
Consump
gy
4 8
al Energy
0 consumer network auto-industry telecom
To ta
RGM2– ISCA’10
Improvement over 36% 49% 80% 78% 111
1-VFI solution
Outline
VFI partitioning
^
Multi-VFI NoC designs
P titi i d lt i t
^
Partitioning and voltage assignment
^
Examples
On-line control
^
State-based model construction
^
Feedback control architecture
^
Feedback control architecture
^
Stability issues
Summary
Summary
112
Why On-line Control?
Cannot rely on nominal values
because they vary
Temperaturebecause they vary
^
Sources of concern are workload, process, voltage, temperature
Delay Target and
actual queue ili i
p g p
variations
^
Cope with the parameter variations which cannot be
Controller utilizations
variations which cannot be predicted or accurately modeled at design time
System under control Voltage Frequency
Heuristic techniques and
manual tuning won’t work!
Dynamic Power Static Power yRGM2– ISCA’10
113
Distributed Power Management in Magali
A 477mW NoC-based digital baseband for MIMO 4G SDR chip organized
RGM2– ISCA’10
114
around a 15-router asynchronous NoC that connects 22 processing units.
[Clermidyet al, ISSCC’10]
Local Control in Multi Clock Domain Processors
The clock domain partitions in an MCD processor Interface model between the domains
[Semeraro, et al, HPCA’02] [Wu, et al, ASPLOS’04]
PID controller for voltage/frequency control proposed previously using only local queue information
^
Ignores interactions among multiple queues
^
Works fine if frequency change in one clock domain has negligible impact on other domains
For an MCD processor with arbitrary partitions and strong interactions among multiple
RGM2– ISCA’10
115
For an MCD processor with arbitrary partitions and strong interactions among multiple queues, a centralized online DVFS scheme may be needed
Design Methodology for Multi-VFI NoCs
Traditionally, PID controllers are used due to simplicity. However, state-space modeling brings new opportunities
state space modeling brings new opportunities
^
Precise controllability and stability analysis
^
Pole placement linear quadratic regulator robust controller
^
Pole placement, linear quadratic regulator, robust controller
V
1, f
1ns O s
Voltage- frequency
V
1, f
1V
2, f
2… ∑
utilizatio n face FIF O
+ NoC under
control
q y
controller
V
N, f
N… ∑
D esired u for inter f _
D
State Actual utilization of interface FIFOs
116
feedback of interface FIFOs
Design Methodology for Multi-VFI NoCs
Parameter Variations Workload
VFI Configuration
State-space Model Construction
Feedback Control Design
Configuration Construction Design
Voltage- f
V
1, f
1V
2, f
2lizations ce FIFOs + ∑ NoC under
control frequency
controller
V
N, f
N… ∑
esired uti or interfa c _
D e fo
State
Actual utilization of interface FIFOs
RGM2– ISCA’10
117
State feedback
of interface FIFOs
Formal Feedback Control
Multi-VFI Network-on-Chip Write to FIFO
( λ packets/sec) Operating frequency f
2^
Interface queue utilizations are the states of the system
^
State feedback for voltage
( λ packets/sec)
Read from FIFO Operating
f f
frequency f
2^
State feedback for voltage- frequency control
^
Control interval is T μ sec
Read from FIFO ( μ packets/sec) frequency f
1μ
State (queues utilization)
PE PE PE
] [ q
1, q
2, ... , q
NQ =
State (queues utilization)
PE PE PE
] [ f
1f
2f
NF =
MInput (clock speeds)
RGM2– ISCA’10
118
PE PE PE F [ f
1, f
2, ... , f
NM]
Step-by-step Model Construction (one queue)
Amount of data (packets) read from the queue
λ μ
V
1, f
1λλ μ
V
1, f
1Average utilization in the k
thcontrol interval
) 1 ( )
1 ( )
1 ( )
( k = q k − + Tλ
1k − − Tμ
1k −
q
2q
q
1λ
1μ
2λ q
2q
1λ
1λ
1μ
2λ Amount of data (packets)
written to the queue
μ
1λ
2V
2, f
2μ
1μ
1λ
2V
2, f
2 If data read/write rates are proportional to the frequency of the VFI
) 1 (
) 1 (
, ) 1 (
) 1
(
1 1 1 1 21
k − = λ f k − μ k − = μ f k −
λ
1( )
1f
1( ) , μ
1( ) μ
1f
2( )
The state-space equation can be written as
[ ]
⎥ ⎥
⎦
⎤
⎢ ⎢
⎣
⎡ −
− +
−
= ( 1 )
) 1 (
) 1 (
)
(
1 1 1k f
k μ f
λ T k
q k
q
RGM2– ISCA’10
119
⎥⎦
⎢⎣ f
2( k − 1 )
B
Step-by-step Model Construction (three queues)
⎥ ⎤
⎢ ⎡ λ
1− μ
10
V
1, f
1V
2, f
2λ 1 μ First row → q
1⎥ ⎥
⎥ ⎥
⎢ ⎢
⎢ ⎢
−
= μ
20 λ
2B
q
1q
2q
31 μ 1
μ 2 λ 3 Second row → q
2⎥⎦
⎢⎣ 0 λ
3− μ
3V
3, f
3λ 2 μ 3
f f f
Third row → q
3
The topology of the VFIs determines the matrix B f
1f
2f
3
An algorithm automatically constructs B
The structure of the model is the same regardless of B
1 1
1 ( 1 ) ( 1 )
)
( k N × = Q k- N × + TB N × M F k- M ×
Q
120 1
1
1 × × ×
× N N M M
N
System Controllability
In the multiple voltage-frequency island system with M islands, utilization of at most M queues can be controlled.
The system is controllable iff rank(B) = N (i.e., number of controlled queues) Motion
Compensation Motion
Estimation
VFI-1 VFI-2
Compensation Estimation
Queue 1 Input
Buffer
Variable Length Encoder
R R
Queue 1
I Q ti ti
Encoder
DCT & F Queue 2
RGM2– ISCA’10
121
Inv. Quantization
Inv. DCT DCT &
Quantization Frame Buffer
Feedback Control Architecture
R(k) K + ∑
B + ∑
1
I Q(k)
Gain matrix
F (k)
R(k) K
0∑ B ∑ z
-1I
N×NQ(k)
Desired Queue Utilizations
+
I
N×NUtilizations
Queues under control
) 1 ( )
1 ( )
( k Q k TBF k
Q = +
Open loop system: K
State feedback matrix
) 1 ( )
1 ( )
( k Q k- TBF k-
Q = + State feedback matrix
Cl d l t ⎢ ⎡ λ
1− μ
10 ⎥ ⎤
)
01 ( ) (
)
( k I TBK Q k- TBRK
Q = − +
Closed loop system:
⎥ ⎥
⎥ ⎥
⎢ ⎢
⎢ ⎢
−
=
2 20
0 λ
λ μ
B
RGM2– ISCA’10
122
)
0( ) (
)
( Q
Q ⎢⎣ 0 λ
3− μ
3⎥⎦
Feedback Control Architecture
R(k) K + ∑
B + ∑
1
I Q(k)
Gain matrix
F (k)
R(k) K
0∑ B ∑ z
-1I
N×NQ(k)
Desired Queue Utilizations
+
I
N×NUtilizations
Queues under control
K State feedback matrix
) 1 ( ) (
)
( k I TBK Q k TBRK
Q
Closed loop system:
State feedback matrix
Design of the state feedback matri K
)
01 ( ) (
)
( k I TBK Q k- TBRK
Q = − +
Design of the state feedback matrix K
^
Find K such that the eigenvalues of the closed loop system are inside the unit circle despite the workload variations
RGM2– ISCA’10
123
u t c c e desp te t e o oad a at o s
^
Eigenvalue placement, linear quadratic regulator (LQR) design
Feedback Control Architecture
R(k) K + ∑
B + ∑
1
I Q(k)
Gain matrix
F (k)
R(k) K
0∑ B ∑ z
-1I
N×NQ(k)
Desired Queue Utilizations
+
I
N×NUtilizations
Queues under control
K State feedback matrix
) 1 ( ) (
)
( k I TBK Q k TBRK
Q
Closed loop system:
State feedback matrix
By finite value theorem gain matrix K0 = K
)
01 ( ) (
)
( k I TBK Q k- TBRK
Q = − +
By finite value theorem, gain matrix K0 = K
Possible extensions
^
Adaptive techniques such as gain scheduling
124
^
Adaptive techniques, such as gain scheduling
^
Monitor the workload and compute K or use values computed off-line
Experiments with MPEG-2 Encoder
The encoder is divided into three VFI islands and mixed clock FIFOs are used at the interfaces
FIFOs are used at the interfaces
The frequency of Variable Length Encoder is set to achieve the desired encoding rate g
V f V f V f
V
1, f
1V
2, f
2V
3, f
3Motion Compensation Motion
Estimation Compensation
Estimation
Input Variable
Length
R R
DCT &
Q ti ti Inv. Quantization
I DCT Frame
B ff
Buffer p Length
Encoder
R R
RGM2– ISCA’10
125
Quantization
Inv. DCT Buffer
Frequency Tracking Capabilities
50 Frames/sec for 352×288 CIF frames
f
1is set to meet the target, f
2and f
3follow f
1y
0.1 0.2
requency GHz) Nominal clock frequency (f2) Actual clock frequency
0 20 40 60 80 100
0
C t l I t l
Clock F (G
Control Interval
0.02 0.04
equency z)
46% power savings
0 20 40 60 80 100
-0.02 0
Clock Fre (GH
Nominal clock frequency (f3) Actual clock frequency
RGM2– ISCA’10
126
0 20 40 60 80 100
Control Interval
C
Results on FPGA prototyping
Work on FPGA prototype using Virtex-II Pro FPGA from Xilinx
Inter domain communication
Inter-domain communication
^
Delay Locked Loops (DLLs) used to generate individual clock signals
^
Block RAM based mi ed clock FIFOs
^
Block-RAM based mixed-clock FIFOs
^
Voltage conversion not supported yet by Xilinx boards
MPEG-2 encoder design divided into three VFIs
S h d i tili 16966 LUT
^
Synchronous design utilizes 16966 LUTs
^
Design with three VFIs utilizes 19161 LUTs
^
P ti bt i d i XP
13% overhead
^
Power consumption obtained using XPower
Without voltage scaling, power drops from 277W to 259W
Consistent with simulations
RGM2– ISCA’10
127
Consistent with simulations
Clock Control Architecture
4 basic clocks Clock Control
Clock Source
Clock Divider Algorithm ROM
PicoBlaze Microprocessor
17.5 15
Clock DLLs Four basic
Divider Microprocessor
Search
12.5 20
MHz MHz
Clock Control Four basic
clocks
Search Frequency Frequency Table
ROM
MHz MHz
Clock 1 Clock 2 Clock 3
ROM
PE1 MicroBlaze Microprocessor Clock 1
PE1 MicroBlaze Microprocessor Clock 2
PE1 MicroBlaze Microprocess Clock 3
128 FIFO 1
Microprocessor Microprocessor
FIFO 2 Microprocess
or
Clock Control Architecture
4 basic clocks Clock Control
Clock Source
Clock Divider Algorithm ROM
PicoBlaze Microprocessor
17.5 15
Clock DLLs Four basic
Divider Microprocessor
Search
12.5 20
MHz MHz
Clock Control Four basic
clocks
Search Frequency Frequency Table
ROM
MHz MHz
Clock 1 Clock 2 Clock 3
ROM
PE1 MicroBlaze Microprocessor Clock 1
PE1 MicroBlaze Microprocessor Clock 2
PE1 MicroBlaze Microprocess Clock 3
RGM2– ISCA’10
129
FIFO 1
Microprocessor Microprocessor
FIFO 2 Microprocess or
Clock Control Architecture
4 basic clocks Clock Control
Clock Source
Clock Divider Algorithm ROM
PicoBlaze Microprocessor
17.5 15
Clock DLLs Four basic
Divider Microprocessor
Search
12.5 20
MHz MHz
Clock Control Four basic
clocks
Search Frequency Frequency Table
ROM
MHz MHz
Clock 1 Clock 2 Clock 3
ROM
The voltage/frequency control circuitry can PE1
MicroBlaze Microprocessor Clock 1
PE1 MicroBlaze Microprocessor Clock 2
PE1 MicroBlaze Microprocess Clock 3
The voltage/frequency control circuitry can be implemented in HW
The control algorithm can be implemented
i fi / OS
RGM2– ISCA’10
130
FIFO 1
Microprocessor Microprocessor
FIFO 2 Microprocess
in firmware / OS or
Summary
Energy issues in multi-VFI NoCs are crucial
^
VFI synthesis via partitioning and voltage allocation
^
Other formulations are possible
Dynamic V/F control yields significant power savings over static approaches while being robust to workload variations
DVFS ll h i i i kl d h i i
^
DVFS controller smoothes out variations in workload characteristics
^
Precise controllability and stability conditions can be defined
More work needed to address
More work needed to address
^
Adaptive techniques for VFI control
^
Run-time optimizations for multiple applications
^
Run-time optimizations for multiple applications
^
Impact of dynamic traffic on overall DVFS-based power management
RGM2– ISCA’10
131
Outline
Part I: Multi-Domain Processors Design Overview (2:00-2:45PM)
^
Multi domain server cell phone and media processors
^
Multi-domain server, cell phone, and media processors
^
Power management techniques
Part II: Router Design and Synchronization Issues (2:45-3:30PM)
Part II: Router Design and Synchronization Issues (2:45-3:30PM)
^
Asynchronous router design
^
Quality of Service and virtual channels in QNoC y
Part III: Control and Power Management in Presence of Workload Variations (4:00-4:45PM) ( )
^
VFI partitioning and voltage assignment
^
Workload modeling and dynamic control of multi-VFI designs
Part IV: DVFS in Presence of Process Variations (4:45-5:30PM)
^
Impact of process variations on DVFS controller performance
132
^
Technology-driven limits on DVFS controllability
References (Part III)
U. Y. Ogras, et al., ‘Design and Management of Voltage-Frequency Island Partitioned Networks-on-Chip,’ in IEEE Trans. VLSI, March 2009.
P Choudhary D Marculescu ‘Power Management of Voltage/Frequency Island Based Systems Using Hardware
P. Choudhary, D. Marculescu, Power Management of Voltage/Frequency Island-Based Systems Using Hardware Based Methods,’ in IEEE Trans. on VLSI, March 2009
T. Simunic, S. P. Boyd, P. Glynn, 'Managing power consumption in networks on chips,‘ in IEEE Trans. on VLSI, Jan. 2004.
A Ali d t l ‘F db k B d A h t DVFS i D t Fl A li ti ’ i IEEE T CAD f
A. Alimonda, et al., ‘Feedback-Based Approach to DVFS in Data-Flow Applications,’ in IEEE Trans. on CAD of Integrated Circuits and Systems 28(11): 1691-1704 (2009)
E. Beigne, et al., ‘Dynamic voltage and frequency scaling architecture for units integration within a GALS NoC,’ in Proc. Int. Symp. Netw. Chip, 2008, pp. 129–138.
C.-L. Chou and R. Marculescu, ‘Energy- and performance-aware incremental mapping for networks on chip with multiple voltage levels,’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 10, pp. 1866–1879, Oct. 2008
Q Wu et al ‘Formal Online Methods for Voltage/Frequency Control in Multiple Clock Domain Microprocessors’ in
Q. Wu, et al, Formal Online Methods for Voltage/Frequency Control in Multiple Clock Domain Microprocessors , in Proc. ASPLOS 2004
G. Semeraro, et al, ‘Energy efficient processor design using multiple clock domains with dynamic voltage and frequency scaling,’ in Proc. HPCA 2002
frequency scaling, in Proc. HPCA 2002
U. Y. Ogras, et. al, 'NoC Prototyping Using FPGAs: Challenges and Promising Results in NoC Prototyping Using FPGAs, ' in IEEE Micro, September/October 2007
Clermidy et al, ‘A 477mW NoC-Based Digital Baseband for MIMO 4G SDR,’ IEEE ISSCC, February 2010
P J t l ‘C di t d di t ib t d f l t f hi lti ’ P ISLPED 2005
RGM2– ISCA’10
179
P. Juang, et al. ‘Coordinated, distributed, formal energy management of chip multiprocessors,’ Proc. ISLPED 2005..
References (Part IV)
Y. Abulafia and A. Kornfeld. Estimation of FMAX and ISB in microprocessors. IEEE Trans. on VLSI Systems, 13(10), Oct 2006.
A. Bonnoit, S. Herbert, D. Marculescu and L. Pileggi. Integrating Dynamic Voltage/Frequency Scaling and Adaptive
S f / C S 2009
Body Biasing using Test-time Voltage Selection. In Proc. of IEEE/ACM ISLPED, Aug. 2009.
K. Bowman, S. Duvall, and J. Meindl. Impact of die-to die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE Journal of Solid-State Circuits, 37(2), Feb 2002.
S. Garg, D. Marculescu. System-Level Mitigation of WID Leakage Variations using Body-Bias Islands. In Proc.
ACM/IEEE CODES+ISSS Atlanta GA October 2008 ACM/IEEE CODES+ISSS, Atlanta, GA, October 2008.
S. Garg, D. Marculescu, R. Marculescu and U. Ogras. Technology-driven Limits on DVFS Controllability of Multiple Voltage-Frequency Island Designs. In Proc. of IEEE/ACM Design Automation Conference (DAC), Jul. 2009.
S. Herbert and D. Marculescu. Analysis of dynamic voltage/frequency scaling in chip-multiprocessors. In ISLPED ’07:
Proc. of the 2007 ISLPED, 2007.,
S. Herbert and D. Marculescu. Variation-Aware Dynamic Voltage/Frequency Scaling. In Proc. of the 15th HPCA, Feb.
2009.
C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi. An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget. In MICRO ’06, 2006.
S R S i B G k R T d J N k A Ti i d J T ll VARIUS A M d l f P
S.R. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari and J. Torrellas. VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects. IEEE Transactions on Semiconductor Manufacturing (IEEE TSM), February 2008.
R. Teodorescu and J. Torrellas. Variation-aware application scheduling and power management for chip multiprocessors. In ISCA’08: Proc. of the 35th ISCA, 2008.
multiprocessors. In ISCA 08: Proc. of the 35th ISCA, 2008.
J.Tschanz, J.T. Cao, S.G. Narendra, R. Nair, D.A. Antoniadis, A.P. Chandrakasan, V. De. Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-Die Parameter Variations on Microprocessor Frequency and Leakage.
IEEE Journal of Solid-State Circuits, Vol. 37, No. 11, Nov. 2002.
W. Zhao and Y. Cao. New generation of predictive technology model for sub-45nm early design exploration. IEEE
T El t D i l 53 11 2816 2823 N 2006
RGM2– ISCA’10
180
Trans. Electron Devices, vol. 53, no. 11, pp. 2816--2823, Nov. 2006.
This list of references is NOT exhaustive. There are many good contributions not mentioned here due to involuntary omissions or space limitations.