summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] VQADD instructionsDavid Green2019-10-102-20/+36
| | | | | | | | | This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68566 llvm-svn: 374336
* [Alignment][NFC] Make VectorUtils uas llvm::AlignGuillaume Chatelet2019-10-102-8/+9
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, rogfer01, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68784 llvm-svn: 374330
* Fix -Wparentheses warning. NFCI.Simon Pilgrim2019-10-101-2/+2
| | | | llvm-svn: 374326
* [Mips] Fix 374055Mirko Brkusanin2019-10-101-3/+3
| | | | | | | | EXPENSIVE_CHECKS build was failing on new test. This is fixed by marking $ra register as undef. Test now has -verify-machineinstrs to check for operand flags. llvm-svn: 374320
* [IfCvt][ARM] Optimise diamond if-conversion for code sizeOliver Stannard2019-10-103-17/+152
| | | | | | | | | | | | | | | | | Currently, the heuristics the if-conversion pass uses for diamond if-conversion are based on execution time, with no consideration for code size. This adds a new set of heuristics to be used when optimising for code size. This is mostly target-independent, because the if-conversion pass can see the code size of the instructions which it is removing. For thumb, there are a few passes (insertion of IT instructions, selection of narrow branches, and selection of CBZ instructions) which are run after if conversion and affect these heuristics, so I've added target hooks to better predict the code-size effect of a proposed if-conversion. Differential revision: https://reviews.llvm.org/D67350 llvm-svn: 374301
* AMDGPU: Use SGPR_128 instead of SReg_128 for vregsMatt Arsenault2019-10-108-21/+25
| | | | | | | | | SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284
* [Attributor][NFC] clang formatJohannes Doerfert2019-10-101-2/+2
| | | | llvm-svn: 374281
* [Attributor] Handle `null` differently in capture and alias logicJohannes Doerfert2019-10-101-4/+21
| | | | | | | | | | | | | | | | | | Summary: `null` in the default address space (=AS 0) cannot be captured nor can it alias anything. We make this clear now as it can be important for callbacks and other cases later on. In addition, this patch improves the debug output for noalias deduction. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68624 llvm-svn: 374280
* Reland "[TextAPI] Introduce TBDv4"Cyndy Ishida2019-10-103-32/+525
| | | | | | | | | Original Patch broke for compilations w/ gcc and exposed asan fail. This reland repairs those bugs. Differential Revision: https://reviews.llvm.org/D67529 llvm-svn: 374277
* [codeview] Try to avoid emitting .cv_loc with line zeroReid Kleckner2019-10-101-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Visual Studio doesn't like it while stepping. It kicks you out of the source view of the file being stepped through and tries to fall back to the disassembly view. Fixes PR43530 The fix is incomplete, because it's possible to have a basic block with no source locations at all. In this case, we don't emit a .cv_loc, but that will result in wrong stepping behavior in the debugger if the layout predecessor of the location-less BB has an unrelated source location. We could try harder to find a valid location that dominates or post-dominates the current BB, but in general it's a dataflow problem, and one still might not exist. I left a FIXME about this. As an alternative, we might want to consider having the middle-end check if its emitting codeview and get it to stop using line zero. Reviewers: akhuang Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68747 llvm-svn: 374267
* Conservatively add volatility and atomic checks in a few placesPhilip Reames2019-10-092-4/+14
| | | | | | | | | | | | As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86. As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them. This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions. Differential Revision: https://reviews.llvm.org/D68419 llvm-svn: 374261
* AMDGPU: Don't fold copies to physregsMatt Arsenault2019-10-091-5/+9
| | | | | | | | | | | In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257
* AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointerMatt Arsenault2019-10-091-4/+14
| | | | | | | | | | This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255
* AMDGPU: Relax register classes usedMatt Arsenault2019-10-091-2/+2
| | | | llvm-svn: 374254
* AMDGPU: Fix typosMatt Arsenault2019-10-091-2/+2
| | | | llvm-svn: 374253
* GlobalISel: Implement fewerElementsVector for G_BUILD_VECTORMatt Arsenault2019-10-092-1/+71
| | | | | | Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252
* [InstCombine] Fix PR43617Evandro Menezes2019-10-091-4/+5
| | | | | | Check for `nullptr` before inspecting composite function. llvm-svn: 374243
* [AMDGPU] Fixed dpp combine of VOP1Stanislav Mekhanoshin2019-10-091-0/+8
| | | | | | | | | If original instruction did not have source modifiers they were not added to the new DPP instruction as well, even if needed. Differential Revision: https://reviews.llvm.org/D68729 llvm-svn: 374241
* [WebAssembly] Make returns variadicThomas Lively2019-10-099-168/+73
| | | | | | | | | | | | | | | | | | Summary: This is necessary and sufficient to get simple cases of multiple return working with multivalue enabled. More complex cases will require block and loop signatures to be generalized to potentially be type indices as well. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68684 llvm-svn: 374235
* [SampleFDO] Add indexing for function profiles so they can be loaded on demandWei Mi2019-10-093-35/+126
| | | | | | | | | | | | | | | | | | | | | | | | in ExtBinary format Currently for Text, Binary and ExtBinary format profiles, when we compile a module with samplefdo, even if there is no function showing up in the profile, we have to load all the function profiles from the profile input. That is a waste of compile time. CompactBinary format profile has already had the support of loading function profiles on demand. In this patch, we add the support to load profile on demand for ExtBinary format. It will work no matter the sections in ExtBinary format profile are compressed or not. Experiment shows it reduces the time to compile a server benchmark by 30%. When profile remapping and loading function profiles on demand are both used, extra work needs to be done so that the loading on demand process will take the name remapping into consideration. It will be addressed in a follow-up patch. Differential Revision: https://reviews.llvm.org/D68601 llvm-svn: 374233
* llvm-dwarfdump: Support multiple debug_loclists contributionsDavid Blaikie2019-10-092-15/+18
| | | | | | | Also fixing the incorrect "offset" field being computed/printed for each location list. llvm-svn: 374232
* [System Model] [TTI] Define AMDGPUTTIImpl::getST and AMDGPUTTIImpl::getTLIVitaly Buka2019-10-091-2/+10
| | | | | | To fix "infinite recursion" warning. llvm-svn: 374222
* [AMDGPU] Use math constants defined in MathExtras (NFC)Evandro Menezes2019-10-093-45/+19
| | | | | | | | Use the the new math constants in `MathExtras.h`. Differential revision: https://reviews.llvm.org/D68285 llvm-svn: 374208
* [Support] Add mathematical constantsEvandro Menezes2019-10-092-10/+8
| | | | | | | | Add own version of the mathematical constants from the upcoming C++20 `std::numbers`. Differential revision: https://reviews.llvm.org/D68257 llvm-svn: 374207
* [System Model] [TTI] Update cache and prefetch TTI interfacesDavid Greene2019-10-099-37/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo. Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Moving the default TargetTransformInfoImplBase implementation to a default MCSubtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default MCSubtargetInfo implementation essentially returns the defaults TargetTransformInfoImplBase used to return. Existing users of TTI defaults will hit the defaults now in MCSubtargetInfo. Targets that define their own custom TTI implementations won't use the BasicTTIImpl implementations that route to the subtarget. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 llvm-svn: 374205
* DebugInfo: Shot in the dark attempt to fix ubsan error from r374122David Blaikie2019-10-091-1/+1
| | | | | | | | (specifying an underlying type for the enum might also be suitable - but this seems better/as good, since there's a clear expectation this can contain values other than the actual enumerators of this enum) llvm-svn: 374196
* [WebAssembly] Add builtin and intrinsic for v8x16.swizzleThomas Lively2019-10-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: This clang builtin and corresponding LLVM intrinsic are necessary to expose the exact semantics of the underlying WebAssembly instruction to users. LLVM produces a poison value if the dynamic swizzle indices are greater than the vector size, but the WebAssembly instruction sets the corresponding output lane to zero. Users who depend on this behavior can safely use this builtin. Depends on D68527. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68531 llvm-svn: 374189
* [WebAssembly] v8x16.swizzle and rewrite BUILD_VECTOR loweringThomas Lively2019-10-093-74/+132
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Adds the new v8x16.swizzle SIMD instruction as specified at https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#swizzling-using-variable-indices. In addition to adding swizzles as a candidate lowering in LowerBUILD_VECTOR, also rewrites and simplifies the lowering to minimize the number of replace_lanes necessary rather than trying to minimize code size. This leads to more uses of v128.const instead of splats, which is expected to increase performance. The new code will be easier to tune once V8 implements all the vector construction operations, and it will also be easier to add new candidate instructions in the future if necessary. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68527 llvm-svn: 374188
* [SLP] respect target register width for GEP vectorization (PR43578)Sanjay Patel2019-10-091-4/+10
| | | | | | | | | | | | | | | | | | | | | We failed to account for the target register width (max vector factor) when vectorizing starting from GEPs. This causes vectorization to proceed to obviously illegal widths as in: https://bugs.llvm.org/show_bug.cgi?id=43578 For x86, this also means that SLP can produce rogue AVX or AVX512 code even when the user specifies a narrower vector width. The AArch64 test in ext-trunc.ll appears to be better using the narrower width. I'm not exactly sure what getelementptr.ll is trying to do, but it's testing with "-slp-threshold=-18", so I'm not worried about those diffs. The x86 test is an over-reduction from SPEC h264; this patch appears to restore the perf loss caused by SLP when using -march=haswell. Differential Revision: https://reviews.llvm.org/D68667 llvm-svn: 374183
* [AArch64] Ensure no tagged memory is left in the unallocated portion of theMomchil Velikov2019-10-091-15/+68
| | | | | | | | | | | | | | | | | stack This patch makes sure that if we tag some memory, we untag that memory before the function returns/throws via any exit, reachable from the tag operation. For that we place the untag operation either at: a) the lifetime end call for the alloca, if that call post-dominates the lifetime start call (where the tag operation is placed), or it (the lifetime end call) dominates all reachable exits, otherwise b) at the reachable exits Differential Revision: https://reviews.llvm.org/D68469 llvm-svn: 374182
* [MemorySSA] Make the use of moveAllAfterMergeBlocks consistent.Alina Sbirlea2019-10-094-29/+64
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The rule for the moveAllAfterMergeBlocks API si for all instructions from `From` to have been moved to `To`, while keeping the CFG edges (and block terminators) unchanged. Update all the callsites for moveAllAfterMergeBlocks to follow this. Pending follow-up: since the same behavior is needed everytime, merge all callsites into one. The common denominator may be the call to `MergeBlockIntoPredecessor`. Resolves PR43569. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68659 llvm-svn: 374177
* [LV] Emitting SCEV checks with OptForSizeSjoerd Meijer2019-10-091-1/+2
| | | | | | | | | | | | | | | | | | | When optimising for size and SCEV runtime checks need to be emitted to check overflow behaviour, the loop vectorizer can run in this assert: LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks( llvm::Loop *, llvm::BasicBlock *): Assertion `!BB->getParent()->hasOptSize() && "Cannot SCEV check stride or overflow when opt We should not generate predicates while optimising for size because code will be generated for predicates such as these SCEV overflow runtime checks. This should fix PR43371. Differential Revision: https://reviews.llvm.org/D68082 llvm-svn: 374166
* [mips] Rename local variable. NFCSimon Atanasyan2019-10-091-19/+19
| | | | llvm-svn: 374165
* [mips] Split expandLoadImmReal into multiple methods. NFCSimon Atanasyan2019-10-091-154/+205
| | | | | | | | | | | The `expandLoadImmReal` handles four different and almost non-overlapping cases: loading a "single" float immediate into a GPR, loading a "single" float immediate into a FPR, and the same couple for a "double" float immediate. It's better to move each `else if` branch into separate methods. llvm-svn: 374164
* Unify the two CRC implementationsHans Wennborg2019-10-099-154/+84
| | | | | | | | | | | | | | | | | | | | | David added the JamCRC implementation in r246590. More recently, Eugene added a CRC-32 implementation in r357901, which falls back to zlib's crc32 function if present. These checksums are essentially the same, so having multiple implementations seems unnecessary. This replaces the CRC-32 implementation with the simpler one from JamCRC, and implements the JamCRC interface in terms of CRC-32 since this means it can use zlib's implementation when available, saving a few bytes and potentially making it faster. JamCRC took an ArrayRef<char> argument, and CRC-32 took a StringRef. This patch changes it to ArrayRef<uint8_t> which I think is the best choice, and simplifies a few of the callers nicely. Differential revision: https://reviews.llvm.org/D68570 llvm-svn: 374148
* DebugInfo: Move LLE enum handling to .def to match RLE handlingDavid Blaikie2019-10-082-1/+12
| | | | llvm-svn: 374122
* [CVP} Replace SExt with ZExt if the input is known-non-negativeRoman Lebedev2019-10-081-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: zero-extension is far more friendly for further analysis. While this doesn't directly help with the shift-by-signext problem, this is not unrelated. This has the following effect on test-suite (numbers collected after the finish of middle-end module pass manager): | Statistic | old | new | delta | percent change | | correlated-value-propagation.NumSExt | 0 | 6026 | 6026 | +100.00% | | instcount.NumAddInst | 272860 | 271283 | -1577 | -0.58% | | instcount.NumAllocaInst | 27227 | 27226 | -1 | 0.00% | | instcount.NumAndInst | 63502 | 63320 | -182 | -0.29% | | instcount.NumAShrInst | 13498 | 13407 | -91 | -0.67% | | instcount.NumAtomicCmpXchgInst | 1159 | 1159 | 0 | 0.00% | | instcount.NumAtomicRMWInst | 5036 | 5036 | 0 | 0.00% | | instcount.NumBitCastInst | 672482 | 672353 | -129 | -0.02% | | instcount.NumBrInst | 702768 | 702195 | -573 | -0.08% | | instcount.NumCallInst | 518285 | 518205 | -80 | -0.02% | | instcount.NumExtractElementInst | 18481 | 18482 | 1 | 0.01% | | instcount.NumExtractValueInst | 18290 | 18288 | -2 | -0.01% | | instcount.NumFAddInst | 139035 | 138963 | -72 | -0.05% | | instcount.NumFCmpInst | 10358 | 10348 | -10 | -0.10% | | instcount.NumFDivInst | 30310 | 30302 | -8 | -0.03% | | instcount.NumFenceInst | 387 | 387 | 0 | 0.00% | | instcount.NumFMulInst | 93873 | 93806 | -67 | -0.07% | | instcount.NumFPExtInst | 7148 | 7144 | -4 | -0.06% | | instcount.NumFPToSIInst | 2823 | 2838 | 15 | 0.53% | | instcount.NumFPToUIInst | 1251 | 1251 | 0 | 0.00% | | instcount.NumFPTruncInst | 2195 | 2191 | -4 | -0.18% | | instcount.NumFSubInst | 92109 | 92103 | -6 | -0.01% | | instcount.NumGetElementPtrInst | 1221423 | 1219157 | -2266 | -0.19% | | instcount.NumICmpInst | 479140 | 478929 | -211 | -0.04% | | instcount.NumIndirectBrInst | 2 | 2 | 0 | 0.00% | | instcount.NumInsertElementInst | 66089 | 66094 | 5 | 0.01% | | instcount.NumInsertValueInst | 2032 | 2030 | -2 | -0.10% | | instcount.NumIntToPtrInst | 19641 | 19641 | 0 | 0.00% | | instcount.NumInvokeInst | 21789 | 21788 | -1 | 0.00% | | instcount.NumLandingPadInst | 12051 | 12051 | 0 | 0.00% | | instcount.NumLoadInst | 880079 | 878673 | -1406 | -0.16% | | instcount.NumLShrInst | 25919 | 25921 | 2 | 0.01% | | instcount.NumMulInst | 42416 | 42417 | 1 | 0.00% | | instcount.NumOrInst | 100826 | 100576 | -250 | -0.25% | | instcount.NumPHIInst | 315118 | 314092 | -1026 | -0.33% | | instcount.NumPtrToIntInst | 15933 | 15939 | 6 | 0.04% | | instcount.NumResumeInst | 2156 | 2156 | 0 | 0.00% | | instcount.NumRetInst | 84485 | 84484 | -1 | 0.00% | | instcount.NumSDivInst | 8599 | 8597 | -2 | -0.02% | | instcount.NumSelectInst | 45577 | 45913 | 336 | 0.74% | | instcount.NumSExtInst | 84026 | 78278 | -5748 | -6.84% | | instcount.NumShlInst | 39796 | 39726 | -70 | -0.18% | | instcount.NumShuffleVectorInst | 100272 | 100292 | 20 | 0.02% | | instcount.NumSIToFPInst | 29131 | 29113 | -18 | -0.06% | | instcount.NumSRemInst | 1543 | 1543 | 0 | 0.00% | | instcount.NumStoreInst | 805394 | 804351 | -1043 | -0.13% | | instcount.NumSubInst | 61337 | 61414 | 77 | 0.13% | | instcount.NumSwitchInst | 8527 | 8524 | -3 | -0.04% | | instcount.NumTruncInst | 60523 | 60484 | -39 | -0.06% | | instcount.NumUDivInst | 2381 | 2381 | 0 | 0.00% | | instcount.NumUIToFPInst | 5549 | 5549 | 0 | 0.00% | | instcount.NumUnreachableInst | 9855 | 9855 | 0 | 0.00% | | instcount.NumURemInst | 1305 | 1305 | 0 | 0.00% | | instcount.NumXorInst | 10230 | 10081 | -149 | -1.46% | | instcount.NumZExtInst | 60353 | 66840 | 6487 | 10.75% | | instcount.TotalBlocks | 829582 | 829004 | -578 | -0.07% | | instcount.TotalFuncs | 83818 | 83817 | -1 | 0.00% | | instcount.TotalInsts | 7316574 | 7308483 | -8091 | -0.11% | TLDR: we produce -0.11% less instructions, -6.84% less `sext`, +10.75% more `zext`. To be noted, clearly, not all new `zext`'s are produced by this fold. (And now i guess it might have been interesting to measure this for D68103 :S) Reviewers: nikic, spatel, reames, dberlin Reviewed By: nikic Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68654 llvm-svn: 374112
* [tblgen] Add getOperatorAsDef() to RecordDaniel Sanders2019-10-081-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: While working with DagInit's, it's often the case that you expect the operator to be a reference to a def. This patch adds a wrapper for this common case to reduce the amount of boilerplate callers need to duplicate repeatedly. getOperatorAsDef() returns the record if the DagInit has an operator that is a DefInit. Otherwise, it prints a fatal error. There's only a few pre-existing examples in LLVM at the moment and I've left a few instances of the code this simplifies as they had more specific error messages than the generic one this produces. I'm going to be using this a fair bit in my subsequent patches. Reviewers: bogner, volkan, nhaehnle Reviewed By: nhaehnle Subscribers: nhaehnle, hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68424 llvm-svn: 374101
* [BPF] do compile-once run-everywhere relocation for bitfieldsYonghong Song2019-10-087-96/+371
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bpf specific clang intrinsic is introduced: u32 __builtin_preserve_field_info(member_access, info_kind) Depending on info_kind, different information will be returned to the program. A relocation is also recorded for this builtin so that bpf loader can patch the instruction on the target host. This clang intrinsic is used to get certain information to facilitate struct/union member relocations. The offset relocation is extended by 4 bytes to include relocation kind. Currently supported relocation kinds are enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; for __builtin_preserve_field_info. The old access offset relocation is covered by FIELD_BYTE_OFFSET = 0. An example: struct s { int a; int b1:9; int b2:4; }; enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; void bpf_probe_read(void *, unsigned, const void *); int field_read(struct s *arg) { unsigned long long ull = 0; unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET); unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE); #ifdef USE_PROBE_READ bpf_probe_read(&ull, size, (const void *)arg + offset); unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ lshift = lshift + (size << 3) - 64; #endif #else switch(size) { case 1: ull = *(unsigned char *)((void *)arg + offset); break; case 2: ull = *(unsigned short *)((void *)arg + offset); break; case 4: ull = *(unsigned int *)((void *)arg + offset); break; case 8: ull = *(unsigned long long *)((void *)arg + offset); break; } unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #endif ull <<= lshift; if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS)) return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); } There is a minor overhead for bpf_probe_read() on big endian. The code and relocation generated for field_read where bpf_probe_read() is used to access argument data on little endian mode: r3 = r1 r1 = 0 r1 = 4 <=== relocation (FIELD_BYTE_OFFSET) r3 += r1 r1 = r10 r1 += -8 r2 = 4 <=== relocation (FIELD_BYTE_SIZE) call bpf_probe_read r2 = 51 <=== relocation (FIELD_LSHIFT_U64) r1 = *(u64 *)(r10 - 8) r1 <<= r2 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) r0 = r1 r0 >>= r2 r3 = 1 <=== relocation (FIELD_SIGNEDNESS) if r3 == 0 goto LBB0_2 r1 s>>= r2 r0 = r1 LBB0_2: exit Compare to the above code between relocations FIELD_LSHIFT_U64 and FIELD_LSHIFT_U64, the code with big endian mode has four more instructions. r1 = 41 <=== relocation (FIELD_LSHIFT_U64) r6 += r1 r6 += -64 r6 <<= 32 r6 >>= 32 r1 = *(u64 *)(r10 - 8) r1 <<= r6 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) The code and relocation generated when using direct load. r2 = 0 r3 = 4 r4 = 4 if r4 s> 3 goto LBB0_3 if r4 == 1 goto LBB0_5 if r4 == 2 goto LBB0_6 goto LBB0_9 LBB0_6: # %sw.bb1 r1 += r3 r2 = *(u16 *)(r1 + 0) goto LBB0_9 LBB0_3: # %entry if r4 == 4 goto LBB0_7 if r4 == 8 goto LBB0_8 goto LBB0_9 LBB0_8: # %sw.bb9 r1 += r3 r2 = *(u64 *)(r1 + 0) goto LBB0_9 LBB0_5: # %sw.bb r1 += r3 r2 = *(u8 *)(r1 + 0) goto LBB0_9 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 r2 <<= r1 r1 = 60 r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit Considering verifier is able to do limited constant propogation following branches. The following is the code actually traversed. r2 = 0 r3 = 4 <=== relocation r4 = 4 <=== relocation if r4 s> 3 goto LBB0_3 LBB0_3: # %entry if r4 == 4 goto LBB0_7 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 <=== relocation r2 <<= r1 r1 = 60 <=== relocation r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit For native load case, the load size is calculated to be the same as the size of load width LLVM otherwise used to load the value which is then used to extract the bitfield value. Differential Revision: https://reviews.llvm.org/D67980 llvm-svn: 374099
* AMDGPU: Fix i16 arithmetic pattern redundancyMatt Arsenault2019-10-081-78/+23
| | | | | | | | | | | | | | There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092
* Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure ↵Jinsong Ji2019-10-0815-174/+66
| | | | | | | | | | | | | | separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a. This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091
* [CodeExtractor] Factor out and reuse shrinkwrap analysisVedant Kumar2019-10-085-106/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Factor out CodeExtractor's analysis of allocas (for shrinkwrapping purposes), and allow the analysis to be reused. This resolves a quadratic compile-time bug observed when compiling AMDGPUDisassembler.cpp.o. Pre-patch (Release + LTO clang): ``` ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 176.5278 ( 57.8%) 0.4915 ( 18.5%) 177.0192 ( 57.4%) 177.4112 ( 57.3%) Hot Cold Splitting ``` Post-patch (ReleaseAsserts clang): ``` ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 1.4051 ( 3.3%) 0.0079 ( 0.3%) 1.4129 ( 3.2%) 1.4129 ( 3.2%) Hot Cold Splitting ``` Testing: check-llvm, and comparing the AMDGPUDisassembler.cpp.o binary pre- vs. post-patch. An alternate approach is to hide CodeExtractorAnalysisCache from clients of CodeExtractor, and to recompute the analysis from scratch inside of CodeExtractor::extractCodeRegion(). This eliminates some redundant work in the shrinkwrapping legality check. However, some clients continue to exhibit O(n^2) compile time behavior as computing the analysis is O(n). rdar://55912966 Differential Revision: https://reviews.llvm.org/D68616 llvm-svn: 374089
* AMDGPU: Add offsets to MMO when lowering buffer intrinsicsTom Stellard2019-10-082-11/+73
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Without offsets on the MachineMemOperands (MMOs), MachineInstr::mayAlias() will return true for all reads and writes to the same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler when analyzing dependencies of buffer loads and stores. It also limits the SILoadStoreOptimizer from merging more instructions. This patch reduces the compile time of one pathological compute shader from 12 seconds to 1 second. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65097 llvm-svn: 374087
* CodeGenPrepare - silence static analyzer dyn_cast<> null dereference ↵Simon Pilgrim2019-10-081-11/+8
| | | | | | | | warnings. NFCI. The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<> directly and if not assert will fire for us. llvm-svn: 374085
* [AMDGPU] Disable unused gfx10 dpp instructionsStanislav Mekhanoshin2019-10-082-0/+8
| | | | | | | | | | | Inhibit generation of unused real dpp instructions on gfx10 just like it is done on other subtargets. This does not change anything because these are illegal anyway and not accepted, but it does reduce the number of instruction definitions generated. Differential Revision: https://reviews.llvm.org/D68607 llvm-svn: 374083
* [WebAssembly] Fix a bug in 'try' placementHeejin Ahn2019-10-081-13/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When searching for local expression tree created by stackified registers, for 'block' placement, we start the search from the previous instruction of a BB's terminator. But in 'try''s case, we should start from the previous instruction of a call that can throw, or a EH_LABEL that precedes the call, because the return values of the call's previous instructions can be stackified and consumed by the throwing call. For example, ``` i32.call @foo call @bar ; may throw br $label0 ``` In this case, if we start the search from the previous instruction of the terminator (`br` here), we end up stopping at `call @bar` and place a 'try' between `i32.call @foo` and `call @bar`, because `call @bar` does not have a return value so it is not a local expression tree of `br`. But in this case, unlike when placing 'block's, we should start the search from `call @bar`, because the return value of `i32.call @foo` is stackified and used by `call @bar`. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68619 llvm-svn: 374073
* [DebugInfo][If-Converter] Update call site info during the optimizationNikola Prica2019-10-0812-19/+56
| | | | | | | | | | | | | | During the If-Converter optimization pay attention when copying or deleting call instructions in order to keep call site information in valid state. Reviewers: aprantl, vsk, efriedma Reviewed By: vsk, efriedma Differential Revision: https://reviews.llvm.org/D66955 llvm-svn: 374068
* [Attributor][MustExec] Deduce dereferenceable and nonnull attribute using ↵Hideto Ueno2019-10-081-24/+209
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MustBeExecutedContextExplorer Summary: In D65186 and related patches, MustBeExecutedContextExplorer is introduced. This enables us to traverse instructions guaranteed to execute from function entry. If we can know the argument is used as `dereferenceable` or `nonnull` in these instructions, we can mark `dereferenceable` or `nonnull` in the argument definition: 1. Memory instruction (similar to D64258) Trace memory instruction pointer operand. Currently, only inbounds GEPs are traced. ``` define i64* @f(i64* %a) { entry: %add.ptr = getelementptr inbounds i64, i64* %a, i64 1 ; (because of inbounds GEP we can know that %a is at least dereferenceable(16)) store i64 1, i64* %add.ptr, align 8 ret i64* %add.ptr ; dereferenceable 8 (because above instruction stores into it) } ``` 2. Propagation from callsite (similar to D27855) If `deref` or `nonnull` are known in call site parameter attributes we can also say that argument also that attribute. ``` declare void @use3(i8* %x, i8* %y, i8* %z); declare void @use3nonnull(i8* nonnull %x, i8* nonnull %y, i8* nonnull %z); define void @parent1(i8* %a, i8* %b, i8* %c) { call void @use3nonnull(i8* %b, i8* %c, i8* %a) ; Above instruction is always executed so we can say that@parent1(i8* nonnnull %a, i8* nonnull %b, i8* nonnull %c) call void @use3(i8* %c, i8* %a, i8* %b) ret void } ``` Reviewers: jdoerfert, sstefan1, spatel, reames Reviewed By: jdoerfert Subscribers: xbolva00, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65402 llvm-svn: 374063
* Revert [TextAPI] Introduce TBDv4Cyndy Ishida2019-10-083-531/+32
| | | | | | This reverts r374058 (git commit 5d566c5a46aeaa1fa0e5c0b823c9d5f84036dc9a) llvm-svn: 374062
* [Attributor] Add helper class to compose two structured deduction.Hideto Ueno2019-10-081-0/+15
| | | | | | | | | | | | | | Summary: This patch introduces a generic way to compose two structured deductions. This will be used for composing generic deduction with `MustBeExecutedExplorer` and other existing generic deduction. Reviewers: jdoerfert, sstefan1 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66645 llvm-svn: 374060
OpenPOWER on IntegriCloud