summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
* Converted back to Unix format (after my last commit 222632)Elena Demikhovsky2014-11-231-3241/+3241
| | | | llvm-svn: 222636
* Masked Vector Load and Store Intrinsics.Elena Demikhovsky2014-11-238-3127/+3557
| | | | | | | | | | | | | | Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632
* Don't repeat class/function/variable names in comments. NFC.Sanjay Patel2014-11-211-47/+35
| | | | llvm-svn: 222555
* Less space; NFCSanjay Patel2014-11-211-8/+4
| | | | llvm-svn: 222546
* [DAG] Teach how to turn a build_vector into a shuffle if some of the ↵Andrea Di Biagio2014-11-211-11/+39
| | | | | | | | | | | | | | operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. llvm-svn: 222536
* [DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.Andrea Di Biagio2014-11-211-153/+73
| | | | | | | | This patch simplifies the logic that combines a pair of shuffle nodes into a single shuffle if there is a legal mask. Also added comments to better describe the algorithm. No functional change intended. llvm-svn: 222522
* DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same ↵Hao Liu2014-11-211-0/+38
| | | | | | | | | | | | divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510
* [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2Simon Pilgrim2014-11-191-2/+2
| | | | | | | | | | This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain. I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr. Differential Revision: http://reviews.llvm.org/D5699 llvm-svn: 222340
* Update SetVector to rely on the underlying set's insert to return a ↵David Blaikie2014-11-1910-19/+22
| | | | | | | | | | | | | pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
* Fix an incorrect chain operand when expanding INSERT_VECTOR operations ↵Owen Anderson2014-11-181-1/+1
| | | | | | | | through the stack. Patch by Daniil Troshkov! llvm-svn: 222254
* Fix optimisations of SELECT_CC which assumed result is booleanOliver Stannard2014-11-171-2/+5
| | | | | | | | | | | | Some optimisations in DAGCombiner cause miscompilations for targets that use TargetLowering::UndefinedBooleanContent, because they assume that the results of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and XORed. These optimisations are only valid for targets that use ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent. This is a follow-up to D6210/r221693. llvm-svn: 222123
* Move register class name strings to a single array in MCRegisterInfo to ↵Craig Topper2014-11-171-2/+2
| | | | | | | | reduce static table size and number of relocation entries. Indices into the table are stored in each MCRegisterClass instead of a pointer. A new method, getRegClassName, is added to MCRegisterInfo and TargetRegisterInfo to lookup the string in the table. llvm-svn: 222118
* Replace a couple asserts with static_asserts.Craig Topper2014-11-171-2/+2
| | | | llvm-svn: 222114
* Convert some EVTs to MVTs where only a SimpleValueType is needed.Craig Topper2014-11-163-12/+12
| | | | llvm-svn: 222109
* [DAG] Improved target independent vector shuffle folding logic.Andrea Di Biagio2014-11-151-0/+20
| | | | | | | | | This patch teaches the DAGCombiner how to combine shuffles according to rules: shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2) llvm-svn: 222090
* Allow the use of functions as typeinfo in landingpad clausesReid Kleckner2014-11-142-5/+5
| | | | | | This is one step towards supporting SEH filter functions in LLVM. llvm-svn: 221954
* We can get the TLOF from the TargetMachine - so constructor no longer ↵Aditya Nandakumar2014-11-131-4/+3
| | | | | | requires TargetLoweringObjectFile to be passed. llvm-svn: 221926
* Add an assert and a test that verify r221709's fix.Frederic Riss2014-11-131-2/+4
| | | | llvm-svn: 221854
* Revert "IR: MDNode => Value"Duncan P. N. Exon Smith2014-11-112-8/+8
| | | | | | | | | | | | | | | | | Instead, we're going to separate metadata from the Value hierarchy. See PR21532. This reverts commit r221375. This reverts commit r221373. This reverts commit r221359. This reverts commit r221167. This reverts commit r221027. This reverts commit r221024. This reverts commit r221023. This reverts commit r220995. This reverts commit r220994. llvm-svn: 221711
* Totally forget deallocated SDNodes in SDDbgInfo.Frederic Riss2014-11-111-4/+12
| | | | | | | | | | | | | | | | What would happen before that commit is that the SDDbgValues associated with a deallocated SDNode would be marked Invalidated, but SDDbgInfo would keep a map entry keyed by the SDNode pointer pointing to this list of invalidated SDDbgNodes. As the memory gets reused, the list might get wrongly associated with another new SDNode. As the SDDbgValues are cloned when they are transfered, this can lead to an exponential number of SDDbgValues being produced during DAGCombine like in http://llvm.org/bugs/show_bug.cgi?id=20893 Note that the previous behavior wasn't really buggy as the invalidation made sure that the SDDbgValues won't be used. This commit can be considered a memory optimization and as such is really hard to validate in a unit-test. llvm-svn: 221709
* LLVM incorrectly folds xor into selectOliver Stannard2014-11-111-1/+2
| | | | | | | | | LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with (set_cc !cc x y), which is only correct when the xor has type i1. Instead, we should check that the constant operand to the xor is all ones. llvm-svn: 221693
* [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.Andrea Di Biagio2014-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves the folding of vector AND nodes into blend operations for targets that feature SSE4.1. A vector AND node where one of the operands is a constant build_vector with elements that are either zero or all-ones can be converted into a blend. This allows for example to simplify the following code: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1> %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0> %3 = or <4 x i32> %1, %2 ret <4 x i32> %3 } Before this patch llc (-mcpu=corei7) generated: andps LCPI1_0(%rip), %xmm0, %xmm0 andps LCPI1_1(%rip), %xmm1, %xmm1 orps %xmm1, %xmm0, %xmm0 retq With this patch we generate a single 'vpblendw'. llvm-svn: 221343
* Normally an 'optnone' function goes through fast-isel, which does notPaul Robinson2014-11-031-0/+7
| | | | | | | | | | | | call DAGCombiner. But we ran into a case (on Windows) where the calling convention causes argument lowering to bail out of fast-isel, and we end up in CodeGenAndEmitDAG() which does run DAGCombiner. So, we need to make DAGCombiner check for 'optnone' after all. Commit includes the test that found this, plus another one that got missed in the original optnone work. llvm-svn: 221168
* IR: MDNode => Value: Instruction::getMetadata()Duncan P. N. Exon Smith2014-11-012-8/+8
| | | | | | | | | | Change `Instruction::getMetadata()` to return `Value` as part of PR21433. Update most callers to use `Instruction::getMDNode()`, which wraps the result in a `cast_or_null<MDNode>`. llvm-svn: 221024
* [SelectionDAG] When scalarizing trunc, don't assert for legal operands.Ahmed Bougacha2014-10-301-1/+17
| | | | | | | | | | | | | | | | | | | | | r212242 introduced a legalizer hook, originally to let AArch64 widen v1i{32,16,8} rather than scalarize, because the legalizer expected, when scalarizing the result of a conversion operation, to already have scalarized the operands. On AArch64, v1i64 is legal, so that commit ensured operations such as v1i32 = trunc v1i64 wouldn't assert. It did that by choosing to widen v1 types whenever possible. However, v1i1 types, for which there's no legal widened type, would still trigger the assert. This commit fixes that, by only scalarizing a trunc's result when the operand has already been scalarized, and introducing an extract_elt otherwise. This is similar to r205625. Fixes PR20777. llvm-svn: 220937
* Fix incorrect invariant check in DAG CombineLouis Gerbarg2014-10-301-1/+1
| | | | | | | | | | | | | Earlier this summer I fixed an issue where we were incorrectly combining multiple loads that had different constraints such alignment, invariance, temporality, etc. Apparently in one case I made copt paste error and swapped alignment and invariance. Tests included. rdar://18816719 llvm-svn: 220933
* Whitespace.NAKAMURA Takumi2014-10-293-33/+33
| | | | llvm-svn: 220857
* Fix copy paste commentMatt Arsenault2014-10-241-2/+2
| | | | llvm-svn: 220581
* Use rsqrt (X86) to speed up reciprocal square root calcsSanjay Patel2014-10-241-40/+77
| | | | | | | | | | | | | | | | | | | | | This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570
* [SelectionDAG] Teach the vector scalarizer about FP conversions.Ahmed Bougacha2014-10-231-0/+4
| | | | | | | | | | | | | | | | | This adds support for legalization of instructions of the form: [fp_conv] <1 x i1> %op to <1 x double> where fp_conv is one of fpto[us]i, [us]itofp. This used to assert because they were simply missing from the vector operand scalarizer. A similar problem arose in r190830, with trunc instead. Fixes PR20778. Differential Revision: http://reviews.llvm.org/D5810 llvm-svn: 220533
* Update comment and fix typos in assert message. (NFC)Ahmed Bougacha2014-10-231-3/+3
| | | | llvm-svn: 220531
* ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodesTim Northover2014-10-233-19/+36
| | | | | | | | | | | | | | | x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS dependency because it was represented by a pair of CopyFromReg(EFLAGS) -> CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an implicit-def on the instruction, where the result numbers in the DAG and the Uses list in TableGen matched up precisely. The Copy notation seems much more robust, so this patch extends ScheduleDAG rather than refactoring x86. Should fix PR20376. llvm-svn: 220529
* Strength reduce constant-sized vectors into arrays. No functionality change.Benjamin Kramer2014-10-221-2/+2
| | | | llvm-svn: 220412
* Add minnum / maxnum codegenMatt Arsenault2014-10-219-0/+166
| | | | llvm-svn: 220342
* Introduce enum values for previously defined metadata types. (NFC)Philip Reames2014-10-212-5/+5
| | | | | | | | | | | Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types: + MD_nontemporal = 9, // "nontemporal" + MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access" + MD_nonnull = 11 // "nonnull" I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update. llvm-svn: 220248
* Check for dynamic alloca's when selecting lifetime intrinsics.Pete Cooper2014-10-171-1/+7
| | | | | | | | | | | | | | | | | | TL;DR: Indexing maps with [] creates missing entries. The long version: When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime. Trouble is, we don't first check to see if this is a dynamic alloca. On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime. 0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots. PEI would later trigger an assert because it expects the stack protector to not be dead. This fix ensures that we only get frame indices for static allocas, ie, those in the map. Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken. rdar://problem/18672951 llvm-svn: 220099
* [Stackmaps] Enable invoking the patchpoint intrinsic.Juergen Ributzka2014-10-172-51/+64
| | | | | | | | | | | Patch by Kevin Modzelewski Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits, reames Differential Revision: http://reviews.llvm.org/D5634 llvm-svn: 220055
* SelectionDAG: Add sext_inreg optimizationsJan Vesely2014-10-171-0/+22
| | | | | | | | | | v2: use dyn_cast fixup comments v3: use cast Reviewed-by: Matt Arsenault <arsenm2@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220044
* Reduce code duplication between patchpoint and non-patchpoint lowering. NFC.Juergen Ributzka2014-10-162-44/+58
| | | | | | | | | | | | This is in preparation for another patch that makes patchpoints invokable. Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5657 llvm-svn: 219967
* Erase fence insertion from SelectionDAGBuilder.cpp (NFC)Robin Morisset2014-10-161-67/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain *unsound*, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957
* constify the getters in SDNodeDbgValue.Adrian Prantl2014-10-131-12/+12
| | | | llvm-svn: 219627
* Refactor debug statement and remove dead argument. NFC.Chad Rosier2014-10-132-18/+13
| | | | llvm-svn: 219626
* Modernize old-style static asserts. NFC.Benjamin Kramer2014-10-121-1/+1
| | | | llvm-svn: 219588
* Improve sqrt estimate algorithm (fast-math)Sanjay Patel2014-10-091-17/+16
| | | | | | | | | | | | | | | | | | | This patch changes the fast-math implementation for calculating sqrt(x) from: y = 1 / (1 / sqrt(x)) to: y = x * (1 / sqrt(x)) This has 2 benefits: less code / faster code and one less estimate instruction that may lose precision. The only target that will be affected (until http://reviews.llvm.org/D5658 is approved) is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf or vector sqrtf and 4 less flops for a double-precision sqrt. We also eliminate a constant load and extra register usage. Differential Revision: http://reviews.llvm.org/D5682 llvm-svn: 219445
* Remove more calls to getSubtargetImpl from the schedulers andEric Christopher2014-10-093-24/+17
| | | | | | remove cached or unnecessary TargetMachines. llvm-svn: 219387
* Remove unused argument to CreateTargetScheduleState and changeEric Christopher2014-10-091-1/+1
| | | | | | | the TargetMachine to a TargetSubtargetInfo since everything we wanted is off of that. llvm-svn: 219382
* Remove uses of getSubtargetImpl from ResourcePriorityQueue andEric Christopher2014-10-091-7/+5
| | | | | | replace them with calls off of the MachineFuncton. llvm-svn: 219381
* Remove the uses of getSubtargetImpl from InstrEmitter and removeEric Christopher2014-10-092-9/+6
| | | | | | the now unused TargetMachine variable. llvm-svn: 219379
* Use the subtarget on the dag to get TargetFrameLowering ratherEric Christopher2014-10-092-2/+2
| | | | | | than off the target machine. llvm-svn: 219378
* Remove uses of the TargetMachine from FunctionLoweringInfoEric Christopher2014-10-092-15/+11
| | | | | | via caching TargetLowering and using the MachineFunction. llvm-svn: 219375
OpenPOWER on IntegriCloud