summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [Coroutines] const-ify internal helpers (NFC)Brian Gesiak2020-01-013-11/+13
| | | | | Several helpers internal to llvm/Transforms/Coroutines do not use 'const' for parameters that are not modified. Add const where possible.
* [RegisterClassInfo] Use SmallVector::assign instead of resize to make sure ↵Craig Topper2020-01-011-1/+1
| | | | | | | | | | | | we erase previous contents from all entries of the vector. resize only writes to elements that get added. Any elements that already existed maintain their previous value. In this case we're trying to erase cached information so we should use assign which will write to every element. Found while trying to add new tests to an existing X86 test and noticed register allocation changing in other functions.
* [Coroutines] Rename "legacy" passes (NFC)Brian Gesiak2020-01-016-51/+52
| | | | | | | | | | | | | | | | | | A series of patches beginning with https://reviews.llvm.org/D71898 propose to add an implementation of the coroutine passes to the new pass manager. As part of these changes, the coroutine passes that implement the legacy pass manager interface are renamed, to `<PassName>Legacy`. This mirrors similar changes that have been made to many other passes in LLVM as they've been transitioned to support both old and new pass managers. This commit splits out the renaming portion of that patch and commits it in advance as an NFC (no functional change intended) commit. It renames: * `CoroEarly` => `CoroEarlyLegacy` * `CoroSplit` => `CoroSplitLegacy` * `CoroElide` => `CoroElideLegacy` * `CoroCleanup` => `CoroCleanupLegacy`
* build: make `LLVM_ENABLE_ZLIB` a tri-bool for usersSaleem Abdulrasool2020-01-011-1/+4
| | | | | | Treat the flag `LLVM_ENABLE_ZLIB` as a tri-bool, `FORCE_ON` being `ON`, and `ON` being an auto-detect. This is needed as many of the builders enable the flag without having zlib available.
* build: reduce CMake handling for zlibSaleem Abdulrasool2020-01-013-7/+4
| | | | | | | | | Rather than handling zlib handling manually, use `find_package` from CMake to find zlib properly. Use this to normalize the `LLVM_ENABLE_ZLIB`, `HAVE_ZLIB`, `HAVE_ZLIB_H`. Furthermore, require zlib if `LLVM_ENABLE_ZLIB` is set to `YES`, which requires the distributor to explicitly select whether zlib is enabled or not. This simplifies the CMake handling and usage in the rest of the tooling.
* [polly][Support] Un-break polly testsAlexandre Ganea2020-01-011-1/+1
| | | | | | Previously, the polly unit tests were stuck in a infinite loop. There was an edge case in StringRef::count() introduced by 9f6b13e5cce96066d7262d224c971d93c2724795, where an empty 'Str' would cause the function to never exit. Also fixed usage in polly.
* [InstCombine] Preserve inbounds when merging with zero-index GEP (PR44423)Nikita Popov2020-01-011-2/+11
| | | | | | | | This addresses https://bugs.llvm.org/show_bug.cgi?id=44423. If one of the GEPs is inbounds and the other is zero-index, we can also preserve inbounds. Differential Revision: https://reviews.llvm.org/D72060
* [InstCombine] Fix incorrect inbounds on GEP of GEP (PR44425)Nikita Popov2020-01-011-0/+1
| | | | | | | | This fixes https://bugs.llvm.org/show_bug.cgi?id=44425. We need to drop inbounds if one of the GEPs is not inbounds. This was already done when creating a new GEP, but not when modifying in place. Differential Revision: https://reviews.llvm.org/D72059
* [MachineScheduler] improve reuse of 'releaseNode'methodLorenzo Casalino2020-01-011-17/+21
| | | | | | | | | | | | | | | | The 'SchedBoundary::releaseNode' is merely invoked for releasing the Top/Bottom root nodes. However, 'SchedBoundary::releasePending' uses its same logic to check if the Pending queue has any releasable SUnit. It is possible to slightly modify the body of the two, allowing re-use of the former ('releaseNode') in the latter. Patch by Lorenzo Casalino <lorenzo.casalino93@gmail.com> Reviewers: MatzeB, fhahn, atrick Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D65506
* [X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if ↵Craig Topper2020-01-011-24/+42
| | | | | | | | the condition is used by something other than select conditions. We might be able to bypass some nodes on the condition path. Differential Revision: https://reviews.llvm.org/D71984
* [NFC] Fixes -Wrange-loop-analysis warningsMark de Wever2020-01-0120-34/+35
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71857
* add strict float for round operationLiu, Chen32020-01-016-41/+86
| | | | Differential Revision: https://reviews.llvm.org/D72026
* [MC][TargetMachine] Delete MCTargetOptions::MCPIECopyRelocationsFangrui Song2020-01-012-10/+9
| | | | | | | | | | | | clang/lib/CodeGen/CodeGenModule performs the -mpie-copy-relocations check and sets dso_local on applicable global variables. We don't need to duplicate the work in TargetMachine shouldAssumeDSOLocal. Verified that -mpie-copy-relocations can still emit PC relative relocations for external variable accesses. clang -target x86_64 -fpie -mpie-copy-relocations -c => R_X86_64_PC32 clang -target aarch64 -fpie -mpie-copy-relocations -c => R_AARCH64_ADR_PREL_PG_HI21+R_AARCH64_LDST64_ABS_LO12_NC
* [Attributor] AAValueConstantRange: Value range analysis using constant rangeHideto Ueno2020-01-011-7/+499
| | | | | | | | | | | | | | | | | | | | | This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point. One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument). The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty). Currently, AAValueConstantRange is created when AAValueSimplify cannot simplify the value. Supported - BinaryOperator(add, sub, ...) - CmpInst(icmp eq, ...) - !range metadata `AAValueConstantRange` is not intended to extend to polyhedral range value analysis. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D71620
* [X86] Fix typo in getCMovOpcode.Craig Topper2019-12-311-1/+1
| | | | | | The 64-bit HasMemoryOperand line was using CMOV32rm instead of CMOV64rm. Not sure how to test this. We have no test coverage that passes true for HasMemoryOperand.
* [X86] Add X87 FCMOV support to X86FlagsCopyLowering.Craig Topper2019-12-311-0/+73
| | | | Fixes PR44396
* DAG: Stop trying to fold FP -(x-y) -> y-x in getNode with nszMatt Arsenault2019-12-312-5/+10
| | | | | | | | | | | | | | This was increasing the number of instructions when fsub was legalized on AMDGPU with no signed zeros enabled. This fold should be guarded by hasOneUse, and I don't think getNode should be doing that. The same fold is already done as a regular combine through isNegatibleForFree. This does require duplicating, even though isNegatibleForFree does this combine already (and properly checks hasOneUse) to avoid one PPC regression. In the regression, the outer fneg has nsz but the fsub operand does not. isNegatibleForFree only sees the operand, and doesn't see it's used from a nsz context. A nsz parameter needs to be added and threaded through isNegatibleForFree to avoid this.
* [X86] Constant fold KSHIFT of an all zeros vector to just an all zeros vector.Craig Topper2019-12-311-0/+3
|
* [X86][InstCombine] Add constant folding and simplification support for pdep ↵Craig Topper2019-12-311-0/+58
| | | | | | | | | | | | and pext The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result. There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted. Fixes PR44389 Differential Revision: https://reviews.llvm.org/D71952
* [X86] Use carry flag from add for (seteq (add X, -1), -1).Craig Topper2019-12-311-10/+31
| | | | | | | | If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest. Fixes one case from PR44412 Differential Revision: https://reviews.llvm.org/D72019
* [LegalizeVectorOps][AArch64] Stop asking for v4f16 fp_round and fp_extend to ↵Craig Topper2019-12-312-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | be promoted. These operations are needed as building blocks for promoting so they can't be promoted themselves. This appeared to work because the fp_extend query type for operation actions is the result type, not the input type so it never triggered in the legalizer. For fp_round, the vector op legalizer just ended up creating a nop fp_extend that was elided by getNode, followed by a nop fp_round that was also elided by getNode. This was followed by a final fp_round from v4f32 back to vf416 which was CSEd to the original node. Then legalize vector ops just believed that node legalized to itself. LegalizeDAG took another crack at promoting it, but didn't have a handler so just skipped it with a debug message saying it wasn't promoted. This patch just removes the operation actions to avoid this non-sense. Found while trying to refactor LegalizeVectorOps to handle multiple result nodes better.
* [amdgpu] Fix scoreboard updating on `s_waitcnt_vscnt`.Michael Liao2019-12-311-1/+1
| | | | | | | | | | Summary: - Other counters are accidentally cleared. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71866
* [InstCombine] fold zext of masked bit set/clearSanjay Patel2019-12-311-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does not solve PR17101, but it is one of the underlying diffs noted here: https://bugs.llvm.org/show_bug.cgi?id=17101#c8 We could ease the one-use checks for the 'clear' (no 'not' op) half of the transform, but I do not know if that asymmetry would make things better or worse. Proofs: https://rise4fun.com/Alive/uVB Name: masked bit set %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp ne i32 %and, 0 %r = zext i1 %cmp to i32 => %s = lshr i32 %x, %y %r = and i32 %s, 1 Name: masked bit clear %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp eq i32 %and, 0 %r = zext i1 %cmp to i32 => %xn = xor i32 %x, -1 %s = lshr i32 %xn, %y %r = and i32 %s, 1
* Revert "[InstCombine] Fix infinite loop due to bitcast <-> phi transforms"Nikita Popov2019-12-311-6/+0
| | | | | | This reverts commit 27a0795943fee0f30b995fe5165428afc2dfd402. Seems to break test-suite.
* [PowerPC][NFC] Fix clang-tidy warningJinsong Ji2019-12-311-5/+5
| | | | | | | | | | | | | | | | | | Reported by https://results.llvm-merge-guard.org/amd64_debian_testing_clang8-726/clang-tidy.txt /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:11672:10: warning: invalid case style for variable 'isEQ' [readability-identifier-naming] bool isEQ = (MI.getOpcode() == PPC::ANDI_rec_1_EQ_BIT || ^~~~ IsEq /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:11679:14: warning: invalid case style for variable 'dl' [readability-identifier-naming] DebugLoc dl = MI.getDebugLoc(); ^~ Dl
* [InstCombine] Fix infinite loop due to bitcast <-> phi transformsNikita Popov2019-12-311-0/+6
| | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=44245. The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering. This patch fixes the issue by adding manually removing the old phis. Differential Revision: https://reviews.llvm.org/D71164
* [ARM][TypePromotion] Re-enable by defaultSam Parker2019-12-311-1/+1
| | | | Re-enable the pass after it was reverted and the bug fixed.
* [InstCombine] Don't rewrite phi-of-bitcast when the phi has other usersConnor Abbott2019-12-311-14/+43
| | | | | | | | | Judging by the existing comments, this was the intention, but the transform never actually checked if the existing phi's would be removed. See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where this causes much worse code generation on AMDGPU. Differential Revision: https://reviews.llvm.org/D71209
* [Attributor] Suppress unused warnings when assertions are disabled. NFCIlya Biryukov2019-12-311-0/+2
|
* [Attributor] Function signature rewrite infrastructureJohannes Doerfert2019-12-311-0/+259
| | | | | | | | | | | | | | | As part of the Attributor manifest we want to change the signature of functions. This patch introduces a fairly generic interface to do so. As a first, very simple, use case, we remove unused arguments. A second use case, pointer privatization, will be committed with this patch as well. A lot of the code and ideas are taken from argument promotion and we run all argument promotion tests through this framework as well. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D68765
* [X86] Slightly improve our attempted error recovery for 64-bit -mno-sse2 in ↵Craig Topper2019-12-311-2/+8
| | | | | | | | | | | | | | | LowerCallResult to use FP1 if there are two return values. If the return value is a struct of 2 doubles we need two return registers. If SSE2 is disabled we can't return in XMM registers like the ABI says. After logging an error we attempt to recover by using FP0 instead of an XMM register. But if the return needs two registers, we may have already used FP0. So if the register we were supposed to copy to is XMM1, copy to FP1 in the recovery instead. This seems to fix the assertion/crash in PR44413.
* [NFC] Style cleanupShengchen Kan2019-12-311-28/+29
| | | | | | 1. make function Is16BitMemOperand static 2. Use Doxygen features in comment 3. Rename functions to make them start with a lower case letter
* [Attributor] Propagate known align from arguments to call sites argumentsJohannes Doerfert2019-12-311-0/+12
| | | | | | | Since the information is known we can simply use it at the call site. This is especially useful for callbacks but also helps regular calls. The test changes are mechanical.
* [Attributor] Use abstract call sites to determine associated argumentsJohannes Doerfert2019-12-312-2/+111
| | | | | | | | | | | | | | | | | | | | | | This is the second step after D67871 to make use of abstract call sites. In this patch the argument we associate with a abstract call site argument can be the one in the callback callee instead of the one in the callback broker. Caveat: We cannot allow no-alias arguments for problematic callbacks: As described in [1], adding no-alias (or restrict) to arguments could break synchronization as the synchronization effect, e.g., a barrier, does not "alias" with the pointer anymore. This disables no-alias annotation for potentially problematic arguments until we implement the fix described in [1]. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D68008 [1] Compiler Optimizations for OpenMP, J. Doerfert and H. Finkel, International Workshop on OpenMP 2018, http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
* [Attributor] Annotate the memory behavior of call site argumentsJohannes Doerfert2019-12-311-2/+7
| | | | | | | | | Especially for callbacks, annotating the call site arguments is important. Doing so exposed a too strong dependence of AAMemoryBehavior on AANoCapture since we handle the case of potentially captured pointers explicitly. The changes to the tests are all mechanical.
* [NFC] Make X86MCCodeEmitter::isPCRel32Branch staticShengchen Kan2019-12-311-4/+2
|
* Revert "DebugInfo: Fix rangesBaseAddress DICompileUnit bitcode ↵David Blaikie2019-12-303-4/+3
| | | | | | | | | | | | serialization/deserialization" Seeing some curious CFI failures internally - which makes little sense to me, as I don't think anyone is using this flag (even us, internally)... so sounds like a bug in my code somewhere (possibly a latent one that propagating this flag exposed, not sure). Reverting while I investigate. This reverts commit c51b45e32ef7f35c11891f60871aa9c2c04cd991.
* [NFC] Style cleanupShengchen Kan2019-12-311-389/+479
| | | | | | | 1. Remove function is64BitMode() and use STI.hasFeature(X86::Mode16Bit) directly 2. Use Doxygen features in comment 3. Rename functions to make them start with a lower case letter 4. Format the code with clang-format
* [TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues ↵Craig Topper2019-12-305-25/+25
| | | | | | | | | | | instead of creating a MERGE_VALUES node. NFCI This allows us to clean up some places that were peeking through the MERGE_VALUES node after the call. By returning the SDValues directly, we can clean that up. Unfortunately, there are several call sites in AMDGPU that wanted the MERGE_VALUES and now need to create their own.
* Remove a redundant `default:` on an exhaustive switch(enum).Eric Astor2019-12-301-2/+0
|
* [OpenMP] Use the OpenMPIRBuilder for `omp parallel`Johannes Doerfert2019-12-302-1/+30
| | | | | | | | | | | | This allows to use the OpenMPIRBuilder for parallel regions. Code was extracted from D61953 and adapted to work with the new version (D70109). All but one feature should be supported. An update of this patch will provide test coverage and privatization other than shared. Reviewed By: fghanim Differential Revision: https://reviews.llvm.org/D70290
* [OpenMP] Use the OpenMPIRBuilder for `omp cancel`Johannes Doerfert2019-12-301-30/+79
| | | | | | | | | | | | An `omp cancel parallel` needs to be emitted by the OpenMPIRBuilder if the `parallel` was emitted by the OpenMPIRBuilder. This patch makes this possible. The cancel logic is shared with the cancel barriers. Testing is done via unit tests and the clang cancel_codegen.cpp file once D70290 lands. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D71948
* [X86][AsmParser] re-introduce 'offset' operatorEric Astor2019-12-304-98/+200
| | | | | | | | | | | | | | | | | | | | | | | Summary: Amend MS offset operator implementation, to more closely fit with its MS counterpart: 1. InlineAsm: evaluate non-local source entities to their (address) location 2. Provide a mean with which one may acquire the address of an assembly label via MS syntax, rather than yielding a memory reference (i.e. "offset asm_label" and "$asm_label" should be synonymous 3. address PR32530 Based on http://llvm.org/D37461 Fix broken test where the break appears unrelated. - Set up appropriate memory-input rewrites for variable references. - Intel-dialect assembly printing now correctly handles addresses by adding "offset". - Pass offsets as immediate operands (using "r" constraint for offsets of locals). Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D71436
* AMDGPU/GlobalISel: Select mul24 intrinsicsMatt Arsenault2019-12-302-4/+12
|
* [X86] Add X86ISD::PCMPGT to SimplifyMultipleUseDemandedBitsForTargetNode.Craig Topper2019-12-301-0/+7
| | | | | If only the sign bit is demanded, and the LHS is all zeroes, then we can bypass the PCMPGT.
* AMDGPU/GlobalISel: Re-use MRI available in selectorMatt Arsenault2019-12-301-9/+7
|
* Ignore "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" in favor ↵Fangrui Song2019-12-303-13/+22
| | | | | | | | | | | | | of "frame-pointer" D56351 (included in LLVM 8.0.0) introduced "frame-pointer". All tests which use "no-frame-pointer-elim" or "no-frame-pointer-elim-non-leaf" have been migrated to use "frame-pointer". Implement UpgradeFramePointerAttributes to upgrade the two obsoleted function attributes for bitcode. Their semantics are ignored. Differential Revision: https://reviews.llvm.org/D71863
* [MIPS GlobalISel] Select bitreverse. RecommitPetar Avramovic2019-12-302-1/+50
| | | | | | | | | | | | | | | G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics, clang genrates these intrinsics from __builtin_bitreverse32 and __builtin_bitreverse64. Add lower and narrowscalar for G_BITREVERSE. Lower G_BITREVERSE on MIPS32. Recommit notes: Introduce temporary variables in order to make sure instructions get inserted into MachineFunction in same order regardless of compiler used to build llvm. Differential Revision: https://reviews.llvm.org/D71363
* AMDGPU/GlobalISel: Select llvm.amdgcn.fmad.ftzMatt Arsenault2019-12-302-4/+9
|
* [InstCombine] propagate sign argument through nested copysignsSanjay Patel2019-12-301-0/+10
| | | | | This is another optimization suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153
OpenPOWER on IntegriCloud