Event-based analysis of a L2 prefetch related parallel nonscaling on intel dual core processor

Nan Zhang

doi:10.1109/ICCET.2010.5485345

Event-based analysis of a L2 prefetch related parallel nonscaling on intel dual core processor

Nan Zhang^*

^*Corresponding author for this work

School of Advanced Technology

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

1 Citation (Scopus)

Abstract

Performance degradation is a common problem in parallel computing as when a workload is parallelised the speedup gained is notably lower than the factor predicted by Amdahl's law. This work examines such a case where image pixels are summed up along the rows in parallel by two threads, but no speedup against the sequential summation is gained on images whose sizes exceed the capacity of the L2 cache of the processor. Counts collected by Intel VTune™ Performance Analyser on relevant performance events show that this nonscaling problem is not caused by those well-understood pitfalls, such as cache contention, bus overloading and unbalanced workload, but that over the parallel summation the L2 prefetchers of the processor are less effective in bringing in data before they are needed. Consequently, a considerably more number of cache lines are brought into the L2 cache by demand requests originated from the L1 data caches, and for such accesses the parallel computation pay the penalty.

Original language	English
Title of host publication	ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings
Pages	V225-V229
DOIs	https://doi.org/10.1109/ICCET.2010.5485345
Publication status	Published - 2010
Event	2010 2nd International Conference on Computer Engineering and Technology, ICCET 2010 - Chengdu, China Duration: 16 Apr 2010 → 18 Apr 2010

Publication series

Name	ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings
Volume	2

Conference

Conference	2010 2nd International Conference on Computer Engineering and Technology, ICCET 2010
Country/Territory	China
City	Chengdu
Period	16/04/10 → 18/04/10

Keywords

Hardware prefetching
Parallel nonscaling analysis
Parallel performance degradation
Parallel scalability

Access to Document

10.1109/ICCET.2010.5485345

Cite this

Zhang, N. (2010). Event-based analysis of a L2 prefetch related parallel nonscaling on intel dual core processor. In ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings (pp. V225-V229). Article 5485345 (ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings; Vol. 2). https://doi.org/10.1109/ICCET.2010.5485345

@inproceedings{8254e6d312424f569596251f2f459b16,

title = "Event-based analysis of a L2 prefetch related parallel nonscaling on intel dual core processor",

abstract = "Performance degradation is a common problem in parallel computing as when a workload is parallelised the speedup gained is notably lower than the factor predicted by Amdahl's law. This work examines such a case where image pixels are summed up along the rows in parallel by two threads, but no speedup against the sequential summation is gained on images whose sizes exceed the capacity of the L2 cache of the processor. Counts collected by Intel VTune{\texttrademark} Performance Analyser on relevant performance events show that this nonscaling problem is not caused by those well-understood pitfalls, such as cache contention, bus overloading and unbalanced workload, but that over the parallel summation the L2 prefetchers of the processor are less effective in bringing in data before they are needed. Consequently, a considerably more number of cache lines are brought into the L2 cache by demand requests originated from the L1 data caches, and for such accesses the parallel computation pay the penalty.",

keywords = "Hardware prefetching, Parallel nonscaling analysis, Parallel performance degradation, Parallel scalability",

author = "Nan Zhang",

year = "2010",

doi = "10.1109/ICCET.2010.5485345",

language = "English",

isbn = "9781424463503",

series = "ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings",

pages = "V225--V229",

booktitle = "ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings",

note = "2010 2nd International Conference on Computer Engineering and Technology, ICCET 2010 ; Conference date: 16-04-2010 Through 18-04-2010",

}

Zhang, N 2010, Event-based analysis of a L2 prefetch related parallel nonscaling on intel dual core processor. in ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings., 5485345, ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings, vol. 2, pp. V225-V229, 2010 2nd International Conference on Computer Engineering and Technology, ICCET 2010, Chengdu, China, 16/04/10. https://doi.org/10.1109/ICCET.2010.5485345

Event-based analysis of a L2 prefetch related parallel nonscaling on intel dual core processor. / Zhang, Nan.
ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings. 2010. p. V225-V229 5485345 (ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings; Vol. 2).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Event-based analysis of a L2 prefetch related parallel nonscaling on intel dual core processor

AU - Zhang, Nan

PY - 2010

Y1 - 2010

N2 - Performance degradation is a common problem in parallel computing as when a workload is parallelised the speedup gained is notably lower than the factor predicted by Amdahl's law. This work examines such a case where image pixels are summed up along the rows in parallel by two threads, but no speedup against the sequential summation is gained on images whose sizes exceed the capacity of the L2 cache of the processor. Counts collected by Intel VTune™ Performance Analyser on relevant performance events show that this nonscaling problem is not caused by those well-understood pitfalls, such as cache contention, bus overloading and unbalanced workload, but that over the parallel summation the L2 prefetchers of the processor are less effective in bringing in data before they are needed. Consequently, a considerably more number of cache lines are brought into the L2 cache by demand requests originated from the L1 data caches, and for such accesses the parallel computation pay the penalty.

AB - Performance degradation is a common problem in parallel computing as when a workload is parallelised the speedup gained is notably lower than the factor predicted by Amdahl's law. This work examines such a case where image pixels are summed up along the rows in parallel by two threads, but no speedup against the sequential summation is gained on images whose sizes exceed the capacity of the L2 cache of the processor. Counts collected by Intel VTune™ Performance Analyser on relevant performance events show that this nonscaling problem is not caused by those well-understood pitfalls, such as cache contention, bus overloading and unbalanced workload, but that over the parallel summation the L2 prefetchers of the processor are less effective in bringing in data before they are needed. Consequently, a considerably more number of cache lines are brought into the L2 cache by demand requests originated from the L1 data caches, and for such accesses the parallel computation pay the penalty.

KW - Hardware prefetching

KW - Parallel nonscaling analysis

KW - Parallel performance degradation

KW - Parallel scalability

UR - http://www.scopus.com/inward/record.url?scp=77958071519&partnerID=8YFLogxK

U2 - 10.1109/ICCET.2010.5485345

DO - 10.1109/ICCET.2010.5485345

M3 - Conference Proceeding

AN - SCOPUS:77958071519

SN - 9781424463503

T3 - ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings

SP - V225-V229

BT - ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings

T2 - 2010 2nd International Conference on Computer Engineering and Technology, ICCET 2010

Y2 - 16 April 2010 through 18 April 2010

ER -

Event-based analysis of a L2 prefetch related parallel nonscaling on intel dual core processor

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this