Skip Navigation


The Computer Journal Advance Access originally published online on September 4, 2007
The Computer Journal 2008 51(2):192-206; doi:10.1093/comjnl/bxm061
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
51/2/192    most recent
bxm061v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Van Ertvelde, L.
Right arrow Articles by Eeckhout, L.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Accurate and Efficient Cache Warmup for Sampled Processor Simulation Through NSL–BLRL

Luk Van Ertvelde, Filip Hellebaut and Lieven Eeckhout*

ELIS Department, Ghent University, Sint-Pietersnieuwstraat 41, B-9000 Gent, Belgium

* Corresponding author: leeckhou{at}elis.ugent.be

Received 16 June 2006; revised 13 July 2007

Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation that selects a number of samples from the complete benchmark execution yields substantial speedups. However, there is one major issue that needs to be dealt with in order to minimize non-sampling bias, namely the hardware state at the beginning of each sample. This is well known in the literature as the cold-start problem. The hardware structures that suffer the most from the cold–start problem are cache hierarchies. In this paper, we propose NSL–BLRL, which combines two previously proposed cache hierarchy warmup approaches, namely: no-state-loss (NSL) and boundary line reuse latency (BLRL). The idea of NSL–BLRL is to warmup the cache hierarchy using a hardware state checkpoint that stores a truncated NSL stream. The NSL stream is a least-recently used stream of (unique) memory references in the pre-sample. This NSL stream is then truncated to form the NSL–BLRL warmup checkpoint; this is done by inspecting the sample for determining how far in the pre-sample one needs to go back to accurately warmup the hardware state for the given sample. We show using SPEC CPU2000 benchmarks that NSL–BLRL is (i) nearly as accurate as BLRL and NSL for sampled processor simulation, (ii) yields simulation time speedups of several orders of magnitude compared to BLRL and (iii) is more space-efficient than NSL. As such, we conclude that NSL–BLRL is a highly efficient and accurate cache warmup strategy for sampled processor simulation.

Key Words: computer architecture • sampled simulation • cold-start problem • warmup


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.