copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Locality &\#38; utility co-optimization for practical capacity management of shared last level caches

D. Zhan, H. Jiang, and S. Seth. Proceedings of the 26th ACM international conference on Supercomputing, page 279--290. New York, NY, USA, ACM, (2012)
DOI: 10.1145/2304576.2304615

Abstract

Shared last-level caches (SLLCs) on chip-multiprocessors play an important role in bridging the performance gap between processing cores and main memory. Although there are already many proposals targeted at overcoming the weaknesses of the least-recently-used (LRU) replacement policy by optimizing either locality or utility for heterogeneous workloads, very few of them are suitable for practical SLLC designs due to their large overhead of log associativity bits per cache line for re-reference interval prediction. The two recently proposed practical replacement policies, TA-DRRIP and SHiP, have significantly reduced the overhead by relying on just 2 bits per line for prediction, but they are oriented towards managing locality only, missing the opportunity provided by utility optimization. This paper is motivated by our two key experimental observations: (i) the not-recently-used (NRU) replacement policy that entails only one bit per line for prediction can satisfactorily approximate the LRU performance; (ii) since locality and utility optimization opportunities are concurrently present in heterogeneous workloads, the co-optimization of both would be indispensable to higher performance but is missing in existing practical SLLC schemes. Therefore, we propose a novel practical SLLC design, called COOP, which needs just one bit per line for re-reference interval prediction, and leverages lightweight per-core locality & utility monitors that profile sample SLLC sets to guide the co-optimization. COOP offers significant throughput improvement over LRU by 7.67% on a quad-core CMP with a 4MB SLLC for 200 random workloads, outperforming both of the recent practical replacement policies at the in-between cost of 17.74KB storage overhead (TA-DRRIP: 4.53% performance improvement with 16KB storage cost; SHiP: 6.00% performance improvement with 25.75KB storage overhead).

Links and resources

BibTeX key: Zhan:2012:LUC:2304576.2304615
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of the 26th ACM international conference on Supercomputing
year: 2012
pages: 279--290
publisher: ACM
series: ICS '12
location: San Servolo Island, Venice, Italy
acmid: 2304615
isbn: 978-1-4503-1316-2
numpages: 12
DOI: 10.1145/2304576.2304615

Cite this publication

%0 Conference Paper %1 Zhan:2012:LUC:2304576.2304615 %A Zhan, Dongyuan %A Jiang, Hong %A Seth, Sharad C. %B Proceedings of the 26th ACM international conference on Supercomputing %C New York, NY, USA %D 2012 %I ACM %K cache chip-multiprocessor %P 279--290 %R 10.1145/2304576.2304615 %T Locality &\#38; utility co-optimization for practical capacity management of shared last level caches %X Shared last-level caches (SLLCs) on chip-multiprocessors play an important role in bridging the performance gap between processing cores and main memory. Although there are already many proposals targeted at overcoming the weaknesses of the least-recently-used (LRU) replacement policy by optimizing either locality or utility for heterogeneous workloads, very few of them are suitable for practical SLLC designs due to their large overhead of log associativity bits per cache line for re-reference interval prediction. The two recently proposed practical replacement policies, TA-DRRIP and SHiP, have significantly reduced the overhead by relying on just 2 bits per line for prediction, but they are oriented towards managing locality only, missing the opportunity provided by utility optimization. This paper is motivated by our two key experimental observations: (i) the not-recently-used (NRU) replacement policy that entails only one bit per line for prediction can satisfactorily approximate the LRU performance; (ii) since locality and utility optimization opportunities are concurrently present in heterogeneous workloads, the co-optimization of both would be indispensable to higher performance but is missing in existing practical SLLC schemes. Therefore, we propose a novel practical SLLC design, called COOP, which needs just one bit per line for re-reference interval prediction, and leverages lightweight per-core locality & utility monitors that profile sample SLLC sets to guide the co-optimization. COOP offers significant throughput improvement over LRU by 7.67% on a quad-core CMP with a 4MB SLLC for 200 random workloads, outperforming both of the recent practical replacement policies at the in-between cost of 17.74KB storage overhead (TA-DRRIP: 4.53% performance improvement with 16KB storage cost; SHiP: 6.00% performance improvement with 25.75KB storage overhead). %@ 978-1-4503-1316-2

@inproceedings{Zhan:2012:LUC:2304576.2304615, abstract = {Shared last-level caches (SLLCs) on chip-multiprocessors play an important role in bridging the performance gap between processing cores and main memory. Although there are already many proposals targeted at overcoming the weaknesses of the least-recently-used (LRU) replacement policy by optimizing either locality or utility for heterogeneous workloads, very few of them are suitable for practical SLLC designs due to their large overhead of log associativity bits per cache line for re-reference interval prediction. The two recently proposed practical replacement policies, TA-DRRIP and SHiP, have significantly reduced the overhead by relying on just 2 bits per line for prediction, but they are oriented towards managing locality only, missing the opportunity provided by utility optimization. This paper is motivated by our two key experimental observations: (i) the not-recently-used (NRU) replacement policy that entails only one bit per line for prediction can satisfactorily approximate the LRU performance; (ii) since locality and utility optimization opportunities are concurrently present in heterogeneous workloads, the co-optimization of both would be indispensable to higher performance but is missing in existing practical SLLC schemes. Therefore, we propose a novel practical SLLC design, called COOP, which needs just one bit per line for re-reference interval prediction, and leverages lightweight per-core locality & utility monitors that profile sample SLLC sets to guide the co-optimization. COOP offers significant throughput improvement over LRU by 7.67% on a quad-core CMP with a 4MB SLLC for 200 random workloads, outperforming both of the recent practical replacement policies at the in-between cost of 17.74KB storage overhead (TA-DRRIP: 4.53% performance improvement with 16KB storage cost; SHiP: 6.00% performance improvement with 25.75KB storage overhead).}, acmid = {2304615}, added-at = {2012-11-07T01:56:16.000+0100}, address = {New York, NY, USA}, author = {Zhan, Dongyuan and Jiang, Hong and Seth, Sharad C.}, biburl = {https://www.bibsonomy.org/bibtex/27ec1e70a6cf3127aebf2763982777ffb/ytyoun}, booktitle = {Proceedings of the 26th ACM international conference on Supercomputing}, doi = {10.1145/2304576.2304615}, interhash = {abac5673fe9f477a5b07ef1b77c2b42a}, intrahash = {7ec1e70a6cf3127aebf2763982777ffb}, isbn = {978-1-4503-1316-2}, keywords = {cache chip-multiprocessor}, location = {San Servolo Island, Venice, Italy}, numpages = {12}, pages = {279--290}, publisher = {ACM}, series = {ICS '12}, timestamp = {2012-11-07T01:56:16.000+0100}, title = {Locality \&\#38; utility co-optimization for practical capacity management of shared last level caches}, year = 2012 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Locality &\#38; utility co-optimization for practical capacity management of shared last level caches

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Locality &\#38; utility co-optimization for practical capacity management of shared last level caches

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Locality &\#38; utility co-optimization for practical capacity management of shared last level caches

Comments and Reviews
(0)