bcm5719-llvm/llvm/test/Transforms/PGOProfile, branch meklort-10.0.1

bcm5719-llvm/llvm/test/Transforms/PGOProfile, branch meklort-10.0.1 Project Ortega BCM5719 LLVM https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1 2020-01-13T22:38:58+00:00 [PGO][CHR] Guard against 0-to-0 branch weight and avoid division by zero crash. 2020-01-13T22:38:58+00:00 Hiroshi Yamauchi yamauchi@google.com 2020-01-13T22:19:45+00:00 urn:sha1:7b9f8e17d15d7516b186c0a85de71133b780f939 Summary: This fixes a crash in internal builds under SamplePGO. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72653 [PGO] Don't group COMDAT variables for compiler generated profile variables in ELF 2019-09-30T18:11:22+00:00 Rong Xu xur@google.com 2019-09-30T18:11:22+00:00 urn:sha1:367405008755640eac6114b18ec8c98be0cf5392 With this patch, compiler generated profile variables will have its own COMDAT name for ELF format, which syncs the behavior with COFF. Tested with clang PGO bootstrap. This shows a modest reduction in object sizes in ELF format. Differential Revision: https://reviews.llvm.org/D68041 llvm-svn: 373241 [PGO] Change hardcoded thresholds for cold/inlinehint to use summary 2019-09-17T23:12:13+00:00 Teresa Johnson tejohnson@google.com 2019-09-17T23:12:13+00:00 urn:sha1:fd2044f29993ea014898cf0a21a03b2a147e6340 Summary: The PGO counter reading will add cold and inlinehint (hot) attributes to functions that are very cold or hot. This was using hardcoded thresholds, instead of the profile summary cutoffs which are used in other hot/cold detection and are more dynamic and adaptable. Switch to using the summary-based cold/hot detection. The hardcoded limits were causing some code that had a medium level of hotness (per the summary) to be incorrectly marked with a cold attribute, blocking inlining. Reviewers: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67673 llvm-svn: 372189 [SimplifyCFG] FoldTwoEntryPHINode(): consider *total* speculation cost, not per-BB cost 2019-09-16T16:18:24+00:00 Roman Lebedev lebedev.ri@gmail.com 2019-09-16T16:18:24+00:00 urn:sha1:10151f661854e3ee4922662f1d0f62b327cbfa8c Summary: Previously, if the threshold was 2, we were willing to speculatively execute 2 cheap instructions in both basic blocks (thus we were willing to speculatively execute cost = 4), but weren't willing to speculate when one BB had 3 instructions and other one had no instructions, even thought that would have total cost of 3. This looks inconsistent to me. I don't think `cmov`-like instructions will start executing until both of it's inputs are available: https://godbolt.org/z/zgHePf So i don't see why the existing behavior is the correct one. Also, let's add it's own `cl::opt` for this threshold, with default=4, so it is not stricter than the previous threshold: will allow to fold when there are 2 BB's each with cost=2. And since the logic has changed, it will also allow to fold when one BB has cost=3 and other cost=1, or there is only one BB with cost=4. This is an alternative solution to D65148: This fix is mainly motivated by `signbit-like-value-extension.ll` test. That pattern comes up in JPEG decoding, see e.g. `Figure F.12 – Extending the sign bit of a decoded value in V` of `ITU T.81` (JPEG specification). That branch is not predictable, and it is within the innermost loop, so the fact that that pattern ends up being stuck with a branch instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial. This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240) | metric | old | new | delta | % | | x86-mi-counting.NumMachineFunctions | 37720 | 37721 | 1 | 0.00% | | x86-mi-counting.NumMachineBasicBlocks | 773545 | 771181 | -2364 | -0.31% | | x86-mi-counting.NumMachineInstructions | 7488843 | 7486442 | -2401 | -0.03% | | x86-mi-counting.NumUncondBR | 135770 | 135543 | -227 | -0.17% | | x86-mi-counting.NumCondBR | 423753 | 422187 | -1566 | -0.37% | | x86-mi-counting.NumCMOV | 24815 | 25731 | 916 | 3.69% | | x86-mi-counting.NumVecBlend | 17 | 17 | 0 | 0.00% | We significantly decrease basic block count, notably decrease instruction count, significantly decrease branch count and very significantly increase `cmov` count. Performance-wise, unsurprisingly, this has great effect on target RawSpeed benchmark. I'm seeing 5 **major** improvements: ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean -0.3064 -0.3064 226.9913 157.4452 226.9800 157.4384 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median -0.3057 -0.3057 226.8407 157.4926 226.8282 157.4828 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev -0.4985 -0.4954 0.3051 0.1530 0.3040 0.1534 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean -0.1747 -0.1747 80.4787 66.4227 80.4771 66.4146 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median -0.1742 -0.1743 80.4686 66.4542 80.4690 66.4436 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev +0.6089 +0.5797 0.0670 0.1078 0.0673 0.1062 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean -0.1598 -0.1598 171.6996 144.2575 171.6915 144.2538 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median -0.1598 -0.1597 171.7109 144.2755 171.7018 144.2766 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev +0.4024 +0.3850 0.0847 0.1187 0.0848 0.1175 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean -0.0550 -0.0551 280.3046 264.8800 280.3017 264.8559 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median -0.0554 -0.0554 280.2628 264.7360 280.2574 264.7297 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev +0.7005 +0.7041 0.2779 0.4725 0.2775 0.4729 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean -0.0354 -0.0355 316.7396 305.5208 316.7342 305.4890 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median -0.0354 -0.0356 316.6969 305.4798 316.6917 305.4324 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev +0.0493 +0.0330 0.3562 0.3737 0.3563 0.3681 ``` That being said, it's always best-effort, so there will likely be cases where this worsens things. Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67318 llvm-svn: 372009 Reland "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" 2019-09-11T16:19:50+00:00 Petr Hosek phosek@chromium.org 2019-09-11T16:19:50+00:00 urn:sha1:7bdad08429411e7d0ecd58cd696b1efe3cff309e This patch contains the basic functionality for reporting potentially incorrect usage of __builtin_expect() by comparing the developer's annotation against a collected PGO profile. A more detailed proposal and discussion appears on the CFE-dev mailing list (http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a prototype of the initial frontend changes appear here in D65300 We revised the work in D65300 by moving the misexpect check into the LLVM backend, and adding support for IR and sampling based profiles, in addition to frontend instrumentation. We add new misexpect metadata tags to those instructions directly influenced by the llvm.expect intrinsic (branch, switch, and select) when lowering the intrinsics. The misexpect metadata contains information about the expected target of the intrinsic so that we can check against the correct PGO counter when emitting diagnostics, and the compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight. We use these branch weight values to determine when to emit the diagnostic to the user. A future patch should address the comment at the top of LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and UnlikelyBranchWeight values into a shared space that can be accessed outside of the LowerExpectIntrinsic pass. Once that is done, the misexpect metadata can be updated to be smaller. In the long term, it is possible to reconstruct portions of the misexpect metadata from the existing profile data. However, we have avoided this to keep the code simple, and because some kind of metadata tag will be required to identify which branch/switch/select instructions are influenced by the use of llvm.expect Patch By: paulkirth Differential Revision: https://reviews.llvm.org/D66324 llvm-svn: 371635 Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" 2019-09-11T09:16:17+00:00 Dmitri Gribenko gribozavr@gmail.com 2019-09-11T09:16:17+00:00 urn:sha1:57256af307ab7bdf42c47da019a1a288b9f9451a This reverts commit r371584. It introduced a dependency from compiler-rt to llvm/include/ADT, which is problematic for multiple reasons. One is that it is a novel dependency edge, which needs cross-compliation machinery for llvm/include/ADT (yes, it is true that right now compiler-rt included only header-only libraries, however, if we allow compiler-rt to depend on anything from ADT, other libraries will eventually get used). Secondly, depending on ADT from compiler-rt exposes ADT symbols from compiler-rt, which would cause ODR violations when Clang is built with the profile library. llvm-svn: 371598 clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM 2019-09-11T01:09:16+00:00 Petr Hosek phosek@chromium.org 2019-09-11T01:09:16+00:00 urn:sha1:394a8ed8f1adb405b55eb5189620da4f62127ac7 This patch contains the basic functionality for reporting potentially incorrect usage of __builtin_expect() by comparing the developer's annotation against a collected PGO profile. A more detailed proposal and discussion appears on the CFE-dev mailing list (http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a prototype of the initial frontend changes appear here in D65300 We revised the work in D65300 by moving the misexpect check into the LLVM backend, and adding support for IR and sampling based profiles, in addition to frontend instrumentation. We add new misexpect metadata tags to those instructions directly influenced by the llvm.expect intrinsic (branch, switch, and select) when lowering the intrinsics. The misexpect metadata contains information about the expected target of the intrinsic so that we can check against the correct PGO counter when emitting diagnostics, and the compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight. We use these branch weight values to determine when to emit the diagnostic to the user. A future patch should address the comment at the top of LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and UnlikelyBranchWeight values into a shared space that can be accessed outside of the LowerExpectIntrinsic pass. Once that is done, the misexpect metadata can be updated to be smaller. In the long term, it is possible to reconstruct portions of the misexpect metadata from the existing profile data. However, we have avoided this to keep the code simple, and because some kind of metadata tag will be required to identify which branch/switch/select instructions are influenced by the use of llvm.expect Patch By: paulkirth Differential Revision: https://reviews.llvm.org/D66324 llvm-svn: 371584 Reland "Change the X86 datalayout to add three address spaces 2019-09-10T23:15:38+00:00 Amy Huang akhuang@google.com 2019-09-10T23:15:38+00:00 urn:sha1:7b1d793713cf9ed9ab719f33b332f9c66a1fc5cc for 32 bit signed, 32 bit unsigned, and 64 bit pointers." This reverts 57076d3199fc2b0af4a3736b7749dd5462cacda5. Original review at https://reviews.llvm.org/D64931. Review for added fix at https://reviews.llvm.org/D66843. llvm-svn: 371568 Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" 2019-09-10T06:25:13+00:00 Petr Hosek phosek@chromium.org 2019-09-10T06:25:13+00:00 urn:sha1:7d1757aba808db3d42362176862fe4e60bf3840d This reverts commit r371484: this broke sanitizer-x86_64-linux-fast bot. llvm-svn: 371488 clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM 2019-09-10T03:11:39+00:00 Petr Hosek phosek@chromium.org 2019-09-10T03:11:39+00:00 urn:sha1:a10802fd73f937cab8cfd448d9e9ff283f75b191 This patch contains the basic functionality for reporting potentially incorrect usage of __builtin_expect() by comparing the developer's annotation against a collected PGO profile. A more detailed proposal and discussion appears on the CFE-dev mailing list (http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a prototype of the initial frontend changes appear here in D65300 We revised the work in D65300 by moving the misexpect check into the LLVM backend, and adding support for IR and sampling based profiles, in addition to frontend instrumentation. We add new misexpect metadata tags to those instructions directly influenced by the llvm.expect intrinsic (branch, switch, and select) when lowering the intrinsics. The misexpect metadata contains information about the expected target of the intrinsic so that we can check against the correct PGO counter when emitting diagnostics, and the compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight. We use these branch weight values to determine when to emit the diagnostic to the user. A future patch should address the comment at the top of LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and UnlikelyBranchWeight values into a shared space that can be accessed outside of the LowerExpectIntrinsic pass. Once that is done, the misexpect metadata can be updated to be smaller. In the long term, it is possible to reconstruct portions of the misexpect metadata from the existing profile data. However, we have avoided this to keep the code simple, and because some kind of metadata tag will be required to identify which branch/switch/select instructions are influenced by the use of llvm.expect Patch By: paulkirth Differential Revision: https://reviews.llvm.org/D66324 llvm-svn: 371484