summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
* InstrProf: An intrinsic and lowering for instrumentation based profilingJustin Bogner2014-12-081-0/+2
| | | | | | | | | | | | | | | | | | | | | Introduce the ``llvm.instrprof_increment`` intrinsic and the ``-instrprof`` pass. These provide the infrastructure for writing counters for profiling, as in clang's ``-fprofile-instr-generate``. The implementation of the instrprof pass is ported directly out of the CodeGenPGO classes in clang, and with the followup in clang that rips that code out to use these new intrinsics this ends up being NFC. Doing the instrumentation this way opens some doors in terms of improving the counter performance. For example, this will make it simple to experiment with alternate lowering strategies, and allows us to try handling profiling specially in some optimizations if we want to. Finally, this drastically simplifies the frontend and puts all of the lowering logic in one place. llvm-svn: 223672
* SelectionDAG switch lowering: Replace unreachable default with most popular ↵Hans Wennborg2014-12-061-17/+40
| | | | | | | | | | | | | | | | | | case. This can significantly reduce the size of the switch, allowing for more efficient lowering. I also worked with the idea of exploiting unreachable defaults by omitting the range check for jump tables, but always ended up with a non-neglible binary size increase. It might be worth looking into some more. SimplifyCFG currently does this transformation, but I'm working towards changing that so we can optimize harder based on unreachable defaults. Differential Revision: http://reviews.llvm.org/D6510 llvm-svn: 223566
* [InstCombine] Minor optimization for bswap with binary opsSimon Pilgrim2014-12-041-0/+2
| | | | | | | | | | | | | | | | | Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349
* Masked Load / Store Intrinsics - the CodeGen part.Elena Demikhovsky2014-12-048-0/+429
| | | | | | | | | | | | | | | | | | I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348
* [PowerPC] Implement readcyclecounter for PPC32Hal Finkel2014-12-021-0/+11
| | | | | | | | | | | | | | | | | | | We've long supported readcyclecounter on PPC64, but it is easier there (the read of the 64-bit time-base register can be accomplished via a single instruction). This now provides an implementation for PPC32 as well. On PPC32, the time-base register is still 64 bits, but can only be read 32 bits at a time via two separate SPRs. The ISA manual explains how to do this properly (it involves re-reading the upper bits and looping if the counter has wrapped while being read). This requires PPC to implement a custom integer splitting legalization for the READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then gets turned into a pseudo-instruction, which is then expanded to the necessary sequence (which has three SPR reads, the comparison and the branch). Thanks to Paul Hargrove for pointing out to me that this was still unimplemented. llvm-svn: 223161
* Restructure some assertion checking based on post commit feedback by Aaron ↵Philip Reames2014-12-021-7/+7
| | | | | | and Tom. llvm-svn: 223150
* Appease a build bot complaining about an unused variable that's used in an ↵Philip Reames2014-12-021-0/+1
| | | | | | assertion. llvm-svn: 223142
* [Statepoints 3/4] Statepoint infrastructure for garbage collection: ↵Philip Reames2014-12-026-0/+814
| | | | | | | | | | | | | | | | | | SelectionDAGBuilder This is the third patch in a small series. It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085). The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them. With this change, gc.statepoints should be functionally complete. The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now. I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated. The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it. During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints. Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack. The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases. In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator. In principal, we shouldn't need to eagerly spill at all. The register allocator should do any spilling required and the statepoint should simply record that fact. Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure. Reviewed by: atrick, ributzka llvm-svn: 223137
* Revert r223049, r223050 and r223051 while investigating test failures.Hans Wennborg2014-12-011-42/+17
| | | | | | I didn't foresee affecting the Clang test suite :/ llvm-svn: 223054
* SelectionDAG switch lowering: Replace unreachable default with most popular ↵Hans Wennborg2014-12-011-17/+42
| | | | | | | | | | | | | case. This can significantly reduce the size of the switch, allowing for more efficient lowering. I also worked with the idea of exploiting unreachable defaults by omitting the range check for jump tables, but always ended up with a non-neglible binary size increase. It might be worth looking into some more. llvm-svn: 223049
* [stack protector] Set edge weights for newly created basic blocks.Akira Hatanaka2014-12-012-4/+7
| | | | | | | | | This commit fixes a bug in stack protector pass where edge weights were not set when new basic blocks were added to lists of successor basic blocks. Differential Revision: http://reviews.llvm.org/D5766 llvm-svn: 222987
* Switch lowering: reformat some for loops etc. NFCHans Wennborg2014-11-291-7/+5
| | | | llvm-svn: 222962
* Switch lowering: Fix broken 'Figure out which block is next' codeHans Wennborg2014-11-291-0/+3
| | | | | | | This doesn't seem to have worked in a long time, but other optimizations would clean it up. llvm-svn: 222961
* Revert "Masked Vector Load and Store Intrinsics."Duncan P. N. Exon Smith2014-11-288-430/+0
| | | | | | | | | | | This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936
* Converted back to Unix format (after my last commit 222632)Elena Demikhovsky2014-11-231-3241/+3241
| | | | llvm-svn: 222636
* Masked Vector Load and Store Intrinsics.Elena Demikhovsky2014-11-238-3127/+3557
| | | | | | | | | | | | | | Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632
* Don't repeat class/function/variable names in comments. NFC.Sanjay Patel2014-11-211-47/+35
| | | | llvm-svn: 222555
* Less space; NFCSanjay Patel2014-11-211-8/+4
| | | | llvm-svn: 222546
* [DAG] Teach how to turn a build_vector into a shuffle if some of the ↵Andrea Di Biagio2014-11-211-11/+39
| | | | | | | | | | | | | | operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. llvm-svn: 222536
* [DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.Andrea Di Biagio2014-11-211-153/+73
| | | | | | | | This patch simplifies the logic that combines a pair of shuffle nodes into a single shuffle if there is a legal mask. Also added comments to better describe the algorithm. No functional change intended. llvm-svn: 222522
* DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same ↵Hao Liu2014-11-211-0/+38
| | | | | | | | | | | | divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510
* [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2Simon Pilgrim2014-11-191-2/+2
| | | | | | | | | | This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain. I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr. Differential Revision: http://reviews.llvm.org/D5699 llvm-svn: 222340
* Update SetVector to rely on the underlying set's insert to return a ↵David Blaikie2014-11-1910-19/+22
| | | | | | | | | | | | | pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
* Fix an incorrect chain operand when expanding INSERT_VECTOR operations ↵Owen Anderson2014-11-181-1/+1
| | | | | | | | through the stack. Patch by Daniil Troshkov! llvm-svn: 222254
* Fix optimisations of SELECT_CC which assumed result is booleanOliver Stannard2014-11-171-2/+5
| | | | | | | | | | | | Some optimisations in DAGCombiner cause miscompilations for targets that use TargetLowering::UndefinedBooleanContent, because they assume that the results of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and XORed. These optimisations are only valid for targets that use ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent. This is a follow-up to D6210/r221693. llvm-svn: 222123
* Move register class name strings to a single array in MCRegisterInfo to ↵Craig Topper2014-11-171-2/+2
| | | | | | | | reduce static table size and number of relocation entries. Indices into the table are stored in each MCRegisterClass instead of a pointer. A new method, getRegClassName, is added to MCRegisterInfo and TargetRegisterInfo to lookup the string in the table. llvm-svn: 222118
* Replace a couple asserts with static_asserts.Craig Topper2014-11-171-2/+2
| | | | llvm-svn: 222114
* Convert some EVTs to MVTs where only a SimpleValueType is needed.Craig Topper2014-11-163-12/+12
| | | | llvm-svn: 222109
* [DAG] Improved target independent vector shuffle folding logic.Andrea Di Biagio2014-11-151-0/+20
| | | | | | | | | This patch teaches the DAGCombiner how to combine shuffles according to rules: shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2) llvm-svn: 222090
* Allow the use of functions as typeinfo in landingpad clausesReid Kleckner2014-11-142-5/+5
| | | | | | This is one step towards supporting SEH filter functions in LLVM. llvm-svn: 221954
* We can get the TLOF from the TargetMachine - so constructor no longer ↵Aditya Nandakumar2014-11-131-4/+3
| | | | | | requires TargetLoweringObjectFile to be passed. llvm-svn: 221926
* Add an assert and a test that verify r221709's fix.Frederic Riss2014-11-131-2/+4
| | | | llvm-svn: 221854
* Revert "IR: MDNode => Value"Duncan P. N. Exon Smith2014-11-112-8/+8
| | | | | | | | | | | | | | | | | Instead, we're going to separate metadata from the Value hierarchy. See PR21532. This reverts commit r221375. This reverts commit r221373. This reverts commit r221359. This reverts commit r221167. This reverts commit r221027. This reverts commit r221024. This reverts commit r221023. This reverts commit r220995. This reverts commit r220994. llvm-svn: 221711
* Totally forget deallocated SDNodes in SDDbgInfo.Frederic Riss2014-11-111-4/+12
| | | | | | | | | | | | | | | | What would happen before that commit is that the SDDbgValues associated with a deallocated SDNode would be marked Invalidated, but SDDbgInfo would keep a map entry keyed by the SDNode pointer pointing to this list of invalidated SDDbgNodes. As the memory gets reused, the list might get wrongly associated with another new SDNode. As the SDDbgValues are cloned when they are transfered, this can lead to an exponential number of SDDbgValues being produced during DAGCombine like in http://llvm.org/bugs/show_bug.cgi?id=20893 Note that the previous behavior wasn't really buggy as the invalidation made sure that the SDDbgValues won't be used. This commit can be considered a memory optimization and as such is really hard to validate in a unit-test. llvm-svn: 221709
* LLVM incorrectly folds xor into selectOliver Stannard2014-11-111-1/+2
| | | | | | | | | LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with (set_cc !cc x y), which is only correct when the xor has type i1. Instead, we should check that the constant operand to the xor is all ones. llvm-svn: 221693
* [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.Andrea Di Biagio2014-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves the folding of vector AND nodes into blend operations for targets that feature SSE4.1. A vector AND node where one of the operands is a constant build_vector with elements that are either zero or all-ones can be converted into a blend. This allows for example to simplify the following code: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1> %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0> %3 = or <4 x i32> %1, %2 ret <4 x i32> %3 } Before this patch llc (-mcpu=corei7) generated: andps LCPI1_0(%rip), %xmm0, %xmm0 andps LCPI1_1(%rip), %xmm1, %xmm1 orps %xmm1, %xmm0, %xmm0 retq With this patch we generate a single 'vpblendw'. llvm-svn: 221343
* Normally an 'optnone' function goes through fast-isel, which does notPaul Robinson2014-11-031-0/+7
| | | | | | | | | | | | call DAGCombiner. But we ran into a case (on Windows) where the calling convention causes argument lowering to bail out of fast-isel, and we end up in CodeGenAndEmitDAG() which does run DAGCombiner. So, we need to make DAGCombiner check for 'optnone' after all. Commit includes the test that found this, plus another one that got missed in the original optnone work. llvm-svn: 221168
* IR: MDNode => Value: Instruction::getMetadata()Duncan P. N. Exon Smith2014-11-012-8/+8
| | | | | | | | | | Change `Instruction::getMetadata()` to return `Value` as part of PR21433. Update most callers to use `Instruction::getMDNode()`, which wraps the result in a `cast_or_null<MDNode>`. llvm-svn: 221024
* [SelectionDAG] When scalarizing trunc, don't assert for legal operands.Ahmed Bougacha2014-10-301-1/+17
| | | | | | | | | | | | | | | | | | | | | r212242 introduced a legalizer hook, originally to let AArch64 widen v1i{32,16,8} rather than scalarize, because the legalizer expected, when scalarizing the result of a conversion operation, to already have scalarized the operands. On AArch64, v1i64 is legal, so that commit ensured operations such as v1i32 = trunc v1i64 wouldn't assert. It did that by choosing to widen v1 types whenever possible. However, v1i1 types, for which there's no legal widened type, would still trigger the assert. This commit fixes that, by only scalarizing a trunc's result when the operand has already been scalarized, and introducing an extract_elt otherwise. This is similar to r205625. Fixes PR20777. llvm-svn: 220937
* Fix incorrect invariant check in DAG CombineLouis Gerbarg2014-10-301-1/+1
| | | | | | | | | | | | | Earlier this summer I fixed an issue where we were incorrectly combining multiple loads that had different constraints such alignment, invariance, temporality, etc. Apparently in one case I made copt paste error and swapped alignment and invariance. Tests included. rdar://18816719 llvm-svn: 220933
* Whitespace.NAKAMURA Takumi2014-10-293-33/+33
| | | | llvm-svn: 220857
* Fix copy paste commentMatt Arsenault2014-10-241-2/+2
| | | | llvm-svn: 220581
* Use rsqrt (X86) to speed up reciprocal square root calcsSanjay Patel2014-10-241-40/+77
| | | | | | | | | | | | | | | | | | | | | This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570
* [SelectionDAG] Teach the vector scalarizer about FP conversions.Ahmed Bougacha2014-10-231-0/+4
| | | | | | | | | | | | | | | | | This adds support for legalization of instructions of the form: [fp_conv] <1 x i1> %op to <1 x double> where fp_conv is one of fpto[us]i, [us]itofp. This used to assert because they were simply missing from the vector operand scalarizer. A similar problem arose in r190830, with trunc instead. Fixes PR20778. Differential Revision: http://reviews.llvm.org/D5810 llvm-svn: 220533
* Update comment and fix typos in assert message. (NFC)Ahmed Bougacha2014-10-231-3/+3
| | | | llvm-svn: 220531
* ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodesTim Northover2014-10-233-19/+36
| | | | | | | | | | | | | | | x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS dependency because it was represented by a pair of CopyFromReg(EFLAGS) -> CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an implicit-def on the instruction, where the result numbers in the DAG and the Uses list in TableGen matched up precisely. The Copy notation seems much more robust, so this patch extends ScheduleDAG rather than refactoring x86. Should fix PR20376. llvm-svn: 220529
* Strength reduce constant-sized vectors into arrays. No functionality change.Benjamin Kramer2014-10-221-2/+2
| | | | llvm-svn: 220412
* Add minnum / maxnum codegenMatt Arsenault2014-10-219-0/+166
| | | | llvm-svn: 220342
* Introduce enum values for previously defined metadata types. (NFC)Philip Reames2014-10-212-5/+5
| | | | | | | | | | | Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types: + MD_nontemporal = 9, // "nontemporal" + MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access" + MD_nonnull = 11 // "nonnull" I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update. llvm-svn: 220248
* Check for dynamic alloca's when selecting lifetime intrinsics.Pete Cooper2014-10-171-1/+7
| | | | | | | | | | | | | | | | | | TL;DR: Indexing maps with [] creates missing entries. The long version: When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime. Trouble is, we don't first check to see if this is a dynamic alloca. On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime. 0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots. PEI would later trigger an assert because it expects the stack protector to not be dead. This fix ensures that we only get frame indices for static allocas, ie, those in the map. Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken. rdar://problem/18672951 llvm-svn: 220099
OpenPOWER on IntegriCloud