summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
...
* DAG: Consider nnan in isKnownNeverNaNMatt Arsenault2017-01-181-0/+3
| | | | llvm-svn: 292328
* Revert "[TLI] Robustize SDAG proto checking by merging it into TLI."Ahmed Bougacha2017-01-171-8/+70
| | | | | | This reverts commit r292189, as it causes issues on SystemZ bots. llvm-svn: 292191
* [TLI] Robustize SDAG proto checking by merging it into TLI.Ahmed Bougacha2017-01-171-70/+8
| | | | | | | | | | | | | | | | | | | SelectionDAGBuilder recognizes libfuncs using some homegrown parameter type-checking. Use TLI instead, removing another heap of redundant code. This isn't strictly NFC, as the SDAG code was too lax. Concretely, this means changes are required to two tests: - calling a non-variadic function via a variadic prototype isn't OK; it just happens to work on x86_64 (but not on, e.g., aarch64). - mempcpy has a size_t parameter; the SDAG code accepts any integer type, which meant using i32 on x86_64 worked. I don't think it's worth supporting either of these (IMO) broken testcases. Instead, fix them to be more correct. llvm-svn: 292189
* [SelectionDAG] Add knownbits support for BITREVERSE Simon Pilgrim2017-01-161-0/+7
| | | | llvm-svn: 292130
* [SelectionDAG] Add support for BITREVERSE constant foldingSimon Pilgrim2017-01-162-0/+8
| | | | | | We were relying on constant folding of the legalized instructions to do what constant folding we had previously llvm-svn: 292114
* Remove unused lambda captures. NFCMalcolm Parsons2017-01-131-1/+1
| | | | llvm-svn: 291916
* Apply clang-tidy's performance-unnecessary-value-param to LLVM.Benjamin Kramer2017-01-133-7/+7
| | | | | | | With some minor manual fixes for using function_ref instead of std::function. No functional change intended. llvm-svn: 291904
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-132-4/+4
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor ↵Craig Topper2017-01-111-9/+0
| | | | | | | | AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent." Some test appears to be hanging on the build bots. llvm-svn: 291650
* [DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) ↵Craig Topper2017-01-111-0/+9
| | | | | | -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent. llvm-svn: 291645
* DAGCombiner: Add hasOneUse checks to fadd/fma combineMatt Arsenault2017-01-111-3/+6
| | | | | | | | Even with aggressive fusion enabled, this requires duplicating the fmul, or increases an fadd to another fma which is not an improvement. llvm-svn: 291642
* [Target] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-01-111-3/+16
| | | | | | other minor fixes (NFC). llvm-svn: 291641
* Remove unused CONVERT_RNDSAT intrinsicsMatt Arsenault2017-01-106-213/+2
| | | | llvm-svn: 291607
* DAG: Avoid OOB when legalizing vector indexingMatt Arsenault2017-01-105-52/+57
| | | | | | | | | If a vector index is out of bounds, the result is supposed to be undefined but is not undefined behavior. Change the legalization for indexing the vector on the stack so that an out of bounds index does not create an out of bounds memory access. llvm-svn: 291604
* [mips] Fix Mips MSA instrinsicsSimon Dardis2017-01-101-2/+2
| | | | | | | | | | | | | | | | The usage of some MIPS MSA instrinsics that took immediates could crash LLVM during lowering. This patch addresses that behaviour. Crucially this patch also makes the use of intrinsics with out of range immediates as producing an internal error. The ld,st instrinsics would trigger an assertion failure for MIPS64 as their lowering would attempt to add an i32 offset to a i64 pointer. Reviewers: vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D25438 llvm-svn: 291571
* [DAGCombiner] Merge together duplicate checks for folding fold (select C, 1, ↵Craig Topper2017-01-101-10/+4
| | | | | | | | X) -> (or C, X) and folding (select C, X, 0) -> (and C, X). Also be consistent about checking that both the condition and the result type are i1. NFC I guess previously we just assumed if the result type was i1, then the condition type must also be i1? llvm-svn: 291548
* [DAGCombiner] Remove code for optimizing select (xor Cond, 0), X, Y -> ↵Craig Topper2017-01-101-4/+0
| | | | | | select Cond, X, Y. Just let combine on the xor itself take care of it. llvm-svn: 291534
* [SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486.Bjorn Pettersson2017-01-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Originally i64 = umax t8, Constant:i64<4> was expanded into i32,i32 = umax Constant:i32<0>, Constant:i32<0> i32,i32 = umax t7, Constant:i32<4> Now instead the two produced umax:es return i32 instead of i32, i32. Thanks to Jan Vesely for help with the test case. Patch by mikael.holmen at ericsson.com Reviewers: bogner, jvesely, tstellarAMD, arsenm Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D28135 llvm-svn: 291441
* [SelectionDAG] Rework lowerRangeToAssertZExtDavid Majnemer2017-01-061-6/+11
| | | | | | Utilize ConstantRange to make it easier to interpret range metadata. llvm-svn: 291211
* [SelectionDAG] Correctly transform range metadata to AssertZExtDavid Majnemer2017-01-061-1/+1
| | | | | | | | We used the logBase2 of the high instead of the ceilLogBase2 resulting in the wrong result for certain values. For example, it resulted in an i1 AssertZExt when the exclusive portion of the range was 3. llvm-svn: 291196
* [Legalizer] Fix fp-to-uint to fp-tosint promotion assertion.Tim Shen2017-01-041-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero extended. For example, given double 65534.0, without legalization: fp-to-uint16: 65534.0 -> 0xfffe With the legalization: fp-to-sint32: 65534.0 -> 0x0000fffe Without this patch, legalization wrongly emits a signed extend assertion, which is consumed by later icmp instruction, and cause miscompile. Note that the floating point value must be in [0, 65535), otherwise the behavior is undefined. This patch reverts r279223 behavior and adds more tests and documentations. In PR29041's context, James Molloy mentioned that: We don't need to mask because conversion from float->uint8_t is undefined if the integer part of the float value is not representable in uint8_t. Therefore we can assume this doesn't happen! which is totally true and good, because fptoui is documented clearly to have undefined behavior when overflow/underflow happens. We should take the advantage of this behavior so that we can save unnecessary mask instructions. Reviewers: jmolloy, nadav, echristo, kbarton Subscribers: mehdi_amini, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28284 llvm-svn: 291015
* The patch fixes (base, index, offset) match.Evgeny Stupachenko2017-01-041-9/+9
| | | | | | | | | | | | | | | Summary: Instead of matching: (a + i) + 1 -> (a + i, undef, 1) Now it matches: (a + i) + 1 -> (a, i, 1) Reviewers: rengolin Differential Revision: http://reviews.llvm.org/D26367 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 291012
* [selectiondag] Check PromotedFloats map during expansive checks.Florian Hahn2017-01-011-0/+4
| | | | | | | | | | | | | | | Summary: `PromotedFloats` needs to be checked in `DAGTypeLegalizer::PerformExpensiveChecks`. This patch fixes a few type legalization failures with expansive checks for ARM fp16 tests. Reviewers: baldrick, bogner, arsenm Subscribers: arsenm, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D28187 llvm-svn: 290796
* Simplify FunctionLoweringInfo.cpp with range for loopsReid Kleckner2016-12-301-40/+31
| | | | | | | I'm preparing to add some pattern matching code here, so simplify the code before I do. NFC llvm-svn: 290731
* Introduce element-wise atomic memcpy intrinsicIgor Laevsky2016-12-291-0/+45
| | | | | | | | | | This change adds a new intrinsic which is intended to provide memcpy functionality with additional atomicity guarantees. Please refer to the review thread or language reference for further details. Differential Revision: https://reviews.llvm.org/D27133 llvm-svn: 290708
* [SelectionDAG] Early out from computeKnownBits when we know we will have no ↵Simon Pilgrim2016-12-241-8/+26
| | | | | | | | common bits. Avoid extra (recursive) calls to computeKnownBits if we already know that there are no common known bits. llvm-svn: 290490
* Make the canonicalisation on shifts benifit to more case.Zijiao Ma2016-12-231-9/+13
| | | | | | | | | | | 1.Fix pessimized case in FIXME. 2.Add tests for it. 3.The canonicalisation on shifts results in different sequence for tests of machine-licm.Correct some check lines. Differential Revision: https://reviews.llvm.org/D27916 llvm-svn: 290410
* Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.Wei Mi2016-12-221-2/+9
| | | | | | | | | This is for splitMergedValStore in DAG Combine to share the target query interface with similar logic in CodeGenPrepare. Differential Revision: https://reviews.llvm.org/D24707 llvm-svn: 290363
* DAG: Add helper for testing constant valuesMatt Arsenault2016-12-221-0/+10
| | | | | | | | There are helpers for testing for constant or constant build_vector, and for splat ConstantFP vectors, but not for a constantfp or non-splat ConstantFP vector. llvm-svn: 290317
* [X86] Vectorcall Calling Convention - Adding CodeGen Complete SupportOren Ben Simhon2016-12-211-2/+24
| | | | | | | | | | | | | The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible. vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above. The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it. This aubmit also includes additional lit tests to cover better HVAs corner cases. Differential Revision: https://reviews.llvm.org/D27392 llvm-svn: 290240
* Fix name typo in SelectonDAGJoel Jones2016-12-161-4/+4
| | | | llvm-svn: 289969
* Add extra headers that got deleted by my revert in r289916 but for whichChandler Carruth2016-12-161-1/+2
| | | | | | new usage had already grown in the file. llvm-svn: 289917
* Revert patch series introducing the DAG combine to match a load-by-bytesChandler Carruth2016-12-161-283/+0
| | | | | | | | | | | | | | | | | | | | | | | | idiom. r289538: Match load by bytes idiom and fold it into a single load r289540: Fix a buildbot failure introduced by r289538 r289545: Use more detailed assertion messages in the code ... r289646: Add a couple of assertions to the load combine code ... This DAG combine has a bad crash in it that is quite hard to trigger sadly -- it relies on sneaking code with UB through the SDAG build and into this particular combine. I've responded to the original commit with a test case that reproduces it. However, the code also has other problems that will require substantial changes to address and so I'm going ahead and reverting it for now. This should unblock us and perhaps others that are hitting the crash in the wild and will let a fresh patch with updated approach come in cleanly afterward. Sorry for any trouble or disruption! llvm-svn: 289916
* Don't combine splats with other shuffles.Eli Friedman2016-12-151-0/+5
| | | | | | | | | | | We sometimes end up creating shuffles which are worse than the obvious translation of the IR. Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 . Differential Revision: https://reviews.llvm.org/D27793 llvm-svn: 289882
* Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.Eli Friedman2016-12-151-10/+23
| | | | | | | | | | | | | Targets can't handle this case well in general; we often transform a shuffle of two cheap BUILD_VECTORs to element-by-element insertion, which is very inefficient. Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially fixes https://llvm.org/bugs/show_bug.cgi?id=31301. Differential Revision: https://reviews.llvm.org/D27787 llvm-svn: 289874
* [DAG] allow more select folding for targets that have 'and not' (PR31175)Sanjay Patel2016-12-141-6/+26
| | | | | | | | | | | | | | The original motivation for this patch comes from wanting to canonicalize more IR to selects and also canonicalizing min/max. If we're going to do that, we need more backend fixups to undo select codegen when simpler ops will do. I chose AArch64 for the tests because that shows the difference in the simplest way. This should fix: https://llvm.org/bugs/show_bug.cgi?id=31175 Differential Revision: https://reviews.llvm.org/D27489 llvm-svn: 289738
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-12-141-228/+278
| | | | | | | | | | UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-12-141-278/+228
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659
* [DAGCombiner] Try to use SelectionDAG::isKnownToBeAPowerOfTwo instead of ↵Simon Pilgrim2016-12-142-30/+63
| | | | | | | | | | | | just APInt::isPowerOf2 Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases. Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value. Differential Revision: https://reviews.llvm.org/D27714 llvm-svn: 289654
* Replace APFloatBase static fltSemantics data members with getter functionsStephan Bergmann2016-12-144-9/+9
| | | | | | | | | | | | | At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
* Add a couple of assertions to the load combine code introduced by r289538Artur Pilipenko2016-12-141-1/+5
| | | | llvm-svn: 289646
* Use more detailed assertion messages in the code introduced by r289538Artur Pilipenko2016-12-131-4/+8
| | | | llvm-svn: 289545
* Fix a buildbot failure introduced by r289538Artur Pilipenko2016-12-131-2/+1
| | | | | | Build failed because of unused variable in product mode. llvm-svn: 289540
* [DAGCombiner] Match load by bytes idiom and fold it into a single loadArtur Pilipenko2016-12-131-0/+276
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it. Assuming little endian target: i8 *a = ... i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24) => i32 val = *((i32)a) i8 *a = ... i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3] => i32 val = BSWAP(*((i32)a)) This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations. Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part: i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24) Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses. The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed. Reviewed By: hfinkel, RKSimon, filcab Differential Revision: https://reviews.llvm.org/D26149 llvm-svn: 289538
* Move BaseIndexOffset in DAGCombiner.cpp so it will be available for the ↵Artur Pilipenko2016-12-131-104/+104
| | | | | | upcoming user llvm-svn: 289537
* [SelectionDAG] computeKnownBits - simplified knownbits sign extension. NFCI.Simon Pilgrim2016-12-131-13/+4
| | | | | | We don't need to extract+test the sign bit of the known ones/zeros, we can use sext which will handle all of this. llvm-svn: 289534
* [Statepoints] Reuse stack slots more than once within a basic blockPhilip Reames2016-12-131-4/+9
| | | | | | | | | | The stack slot reuse code had a really amusing bug. We ended up only reusing a stack slot exact once (initial use + reuse) within a basic block. If we had a third statepoint to process, we ended up allocating a new set of stack slots. If we crossed a basic block boundary, the set got cleared. As a result, code which is invoke heavy doesn't see the problem, but multiple calls within a basic block does. Net result: as we optimize invokes into calls, lowering gets worse. The root error here is that the bitmap uses by the custom allocator wasn't kept in sync. The result was that we ended up resizing the bitmap on the next statepoint (to handle the cross block case), reset the bit once, but then never reset it again. Differential Revision: https://reviews.llvm.org/D25243 llvm-svn: 289509
* [SelectionDAG] Add support for EXTRACT_SUBVECTOR to ComputeNumSignBitsSimon Pilgrim2016-12-121-0/+2
| | | | | | Pre-commit as discussed on D27657 llvm-svn: 289425
* [SelectionDAG] Add ability for computeKnownBits to peek through bitcasts ↵Simon Pilgrim2016-12-101-1/+23
| | | | | | | | from 'large element' scalar/vector to 'small element' vector. Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types. llvm-svn: 289329
* [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)Simon Pilgrim2016-12-091-0/+36
| | | | | | Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type. llvm-svn: 289232
OpenPOWER on IntegriCloud