skip to main content
research-article

PGZ: automatic zero-value code specialization

Published: 27 February 2021 Publication History

Abstract

In prior work we proposed Zeroploit, a transform that duplicates code, specializes one path assuming certain key program operands, called versioning variables, are zero, and leaves the other path unspecialized. Dynamically, depending on the versioning variable’s value, either the specialized fast path or the default slow path will execute. We evaluated Zeroploit with hand-optimized codes in that work.
In this paper, we present PGZ, a completely automated, profile-guided compiler approach for Zeroploit. Our compiler automatically determines which versioning variables or combinations thereof are profitable, and determines the code region to duplicate and specialize. PGZ’s heuristic takes operand zero value probabilities as input and it then uses classical techniques such as constant folding and dead-code elimination to estimate the potential savings of specializing a versioning variable. PGZ transforms profitable candidates, yielding an average speedup of 21.2% for targeted shader programs, and an average frame-rate speedup of 4.4% across a collection of modern gaming applications on an NVIDIA GeForce RTX 2080 GPU.

References

[1]
Brad Calder, Peter Feller, and Alan Eustace. 1997. Value Profiling. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 30). 259-269.
[2]
Brad Calder, Peter Feller, and Alan Eustace. 1999. Value Profiling and Optimization. Journal of Instruction Level Parallelism (JILP) 1 (March 1999 ). htps://www.jilp.org/vol1/v1paper2.pdf
[3]
Max Christof. 2020. Chrome just got faster with Profile Guided Optimization. htps://blog.chromium.org/ 2020 /08/chrome-just-got-fasterwith-profile.html
[4]
Eui-Young Chung, B. Luca, G. DeMicheli, G. Luculli, and M. Carilli. 2002. Value-Sensitive Automatic Code Specialization for Embedded Software. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 21 ( September 2002 ). Issue 9.
[5]
Robert Cohn and P. Geofrey Lowney. 1999. Feedback Directed Optimization in Compaq's Compilation Tools for Alpha. In Proceedings of the 2nd ACM Workshop on Feedback-Directed Optimization.
[6]
Microprocessor Standards Committee. 2019. 754-2019-IEEE Standard for Floating-Point Arithmetic. htps://ieeexplore.ieee.org/servlet/opac? punumber= 8766227
[7]
Charles Consel, Luke Hornof, François Noël, Jacques Noyé, and Nicolae Volansche. 1996. A Uniform Approach for Compile-Time and RunTime Specialization. In Selected Papers from the International Seminar on Partial Evaluation.
[8]
Microsoft Corporation. 2018. Direct3D 11 graphics. htps: //docs.microsoft.com/en-us/windows/win32/direct3d11/atoc-dxgraphics-direct3d-11
[9]
Microsoft Corporation. 2018. Direct3D 12 graphics. htps: //docs.microsoft.com/en-us/windows/win32/direct3d12/direct3d-12-graphics
[10]
Microsoft Corporation. 2018. Shader Model 4 Assembly (DirectX HLSL)-dcl_globalFlags. htps://docs.microsoft.com/en-us/windows/win32/ direct3dhlsl/dcl-globalflags
[11]
Microsoft Corporation. 2018. Variable Syntax. htps: //docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dxgraphics-hlsl-variable-syntax
[12]
NVIDIA Corporation. 2019. Nsight 2019.6. htps://developer.nvidia. com/nsight-graphics
[13]
NVIDIA Corporation. 2019. Parallel Thread Execution ISA: Application Guide. htps://docs.nvidia.com/pdf/ptx_isa_6.5.pdf
[14]
Igor Costa, Pericles Alves, Henrique Nazare Santos, and Fernando Magno Quintao Pereira. 2013. Just-in-time Value Specialization. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (CGO '13). 1-11.
[15]
Brian Grant, Matthai Philipose, Markus Mock, Craig Chambers, and Susan J. Eggers. 1999. An Evaluation of Staged Run-time Optimizations in DyC. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (PLDI '99). 293-304.
[16]
The Khronos Group Inc. [n.d.]. OpenGL Overview. htps://www.opengl. org/documentation/
[17]
The Khronos Group Inc. 2018. Vulkan Overview. htps://www.khronos. org/vulkan/
[18]
Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. 1993. Partial Evaluation and Automatic Program Generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
[19]
Baldur Karlsson. 2019. Renderdoc v1.5. htps://renderdoc.org/docs/ index.html
[20]
U. Khedker, A. Sanyal, and B. Sathe. 2017. Data Flow Analysis: Theory and Practice. CRC Press.
[21]
Samuel Larsen and Saman Amarasinghe. 2000. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (PLDI '00). 145-156.
[22]
Guilherme Vieira Leobas and Fernando Magno Quintão Pereira. 2020. Semiring Optimizations: Dynamic Elision of Expressions with Identity and Absorbing Elements. In OOPSLA 2020: Conference on ObjectOriented Programming Systems, Languages, and Applications.
[23]
Microsoft Corporation 2018. Discard-Shader Mode 4 Assembly Documentation. Microsoft Corporation. htps://docs.microsoft.com/enus/windows/win32/direct3dhlsl/discard--sm4---asm-
[24]
Robert Muth, Scott A. Watterson, and Saumya K. Debray. 2000. Code Specialization Based on Value Profiles. In Proceedings of the 7th International Symposium on Static Analysis (SAS '00). Springer-Verlag, London, UK, UK, 340-359.
[25]
CUDA NVIDIA. 2021. CUDA Occupancy Calculator. htps://docs. nvidia.com/cuda/cuda-occupancy-calculator/index.html
[26]
Tech Powerup. 2018. NVIDIA Geforce RTX 2080. htps://www. techpowerup.com/gpu-specs/geforce-rtx-2080.c3224
[27]
Ram Rangan, Mark W. Stephenson, Aditya Ukarande, Shyam Murthy, Virat Agarwal, and Marc Blackstein. 2020. Zeroploit: Exploiting Zero Valued Operands in Interactive Gaming Applications. ACM Transactions on Architecture and Code Optimization 17, 3, Article 17 ( Aug. 2020 ), 26 pages. htps://doi.org/10.1145/3394284
[28]
S. Subramanya Sastry, Rastilav Bodik, and James E. Smith. 2000. Characterizing Coarse-Grained Reuse of Computation. In Proceedings of the ACM Workshop on Feedback Directed and Dynamic Optimization.
[29]
Ajeet Shankar, S. Subramanya Sastry, Rastislav Bodík, and James E. Smith. 2005. Runtime Specialization with Optimistic Heap Analysis. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Objectoriented Programming, Systems, Languages, and Applications (OOPSLA '05). 327-343.

Cited By

View all

Index Terms

  1. PGZ: automatic zero-value code specialization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CC 2021: Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction
    March 2021
    164 pages
    ISBN:9781450383257
    DOI:10.1145/3446804
    • General Chair:
    • Aaron Smith,
    • Program Chairs:
    • Delphine Demange,
    • Rajiv Gupta
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 February 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPUs
    2. gaming applications
    3. profile guided optimization
    4. shader programs
    5. value specialization

    Qualifiers

    • Research-article

    Conference

    CC '21
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 172
      Total Downloads
    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media