summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* R600/SI: Remove v_sub_f64 pseudoMatt Arsenault2015-02-203-24/+5
| | | | | | | | | | The expansion code does the same thing. Since the operands were not defined with the correct types, this has the side effect of fixing operand folding since the expanded pseudo would never use SGPRs or inline immediates. llvm-svn: 230072
* R600: Use new fmad node.Matt Arsenault2015-02-207-41/+28
| | | | | | | | | | | This enables a few useful combines that used to only use fma. Also since v_mad_f32 apparently does not support denormals, disable the existing cases that are custom handled if they are requested. llvm-svn: 230071
* Reversed revision 229706. The reason is regression, which is caused by theJozef Kolek2015-02-202-6/+3
| | | | | | | | usage of instruction ADDU16 by CodeGen. For this instruction an improper register is allocated, i.e. the register that is not from register set defined for the instruction. llvm-svn: 230053
* Fix an asan use-after-free bug introduced by the asm printerEric Christopher2015-02-201-1/+11
| | | | | | | | | changes to remove non-Function based subtargets out of the asm printer. For module level emission we'll need to construct up an MCSubtargetInfo so that we can encode instructions for emission. llvm-svn: 230050
* [X86][FastIsel] Teach how to select float-half conversion intrinsics.Andrea Di Biagio2015-02-201-0/+62
| | | | | | | | | | | This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and intrinsic 'convert_to_fp16'. If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion, and VCVTPH2PSrr for a half-float conversion. Differential Revision: http://reviews.llvm.org/D7673 llvm-svn: 230043
* Remove a use of the Subtarget in the darwin ppc asm printer.Eric Christopher2015-02-201-5/+4
| | | | | | | | EmitFunctionStubs is called from doFinalization and so can't depend on the Subtarget existing. It's also irrelevant as we know we're darwin since we're in the darwin asm printer. llvm-svn: 230039
* Get the cached subtarget off the MachineFunction rather thanEric Christopher2015-02-201-4/+2
| | | | | | inquiring for a new one from the TargetMachine. llvm-svn: 230037
* canonicalize a v2f64 blendi of 2 registersSanjay Patel2015-02-202-23/+29
| | | | | | | | | | | | | This canonicalization step saves us 3 pattern matching possibilities * 4 math ops for scalar FP math that uses xmm regs. The backend can re-commute the operands post-instruction-selection if that makes register allocation better. The tests in llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll cover this scenario already, so there are no new tests with this patch. Differential Revision: http://reviews.llvm.org/D7777 llvm-svn: 230024
* I incorrectly marked the VORC instruction as isCommutable when I added it. Kit Barton2015-02-201-1/+2
| | | | | | | | This fix removes the VORC instruction definition from the isCommutable block. Phabricator review: http://reviews.llvm.org/D7772 llvm-svn: 230020
* [x86] Switching the shuffle equivalence test to a variadic template wasChandler Carruth2015-02-201-115/+108
| | | | | | | | the wrong answer. We also got initializer lists which are *way* cleaner for this kind of thing. Let's use those and make this a normal, boring functionn accepting ArrayRef. llvm-svn: 230004
* Fix wording and grammar in Mips subtarget options.Eric Christopher2015-02-201-23/+18
| | | | llvm-svn: 230001
* Get the cached subtarget off the MachineFunction rather thanEric Christopher2015-02-202-6/+5
| | | | | | inquiring for a new one from the TargetMachine. llvm-svn: 230000
* Get the cached subtarget off the MachineFunction rather thanEric Christopher2015-02-208-16/+17
| | | | | | inquiring for a new one from the TargetMachine. llvm-svn: 229999
* Get the cached subtarget off the MachineFunction rather thanEric Christopher2015-02-201-4/+4
| | | | | | inquiring for a new one from the TargetMachine. llvm-svn: 229998
* Save the MachineFunction in startFunction so that we can use it forEric Christopher2015-02-202-4/+6
| | | | | | lookups of the subtarget later. llvm-svn: 229996
* Use the cached subtarget from the MachineFunction rather thanEric Christopher2015-02-202-6/+4
| | | | | | doing a lookup on the TargetMachine. llvm-svn: 229995
* Make the TargetMachine::getSubtarget that takes a Function argumentEric Christopher2015-02-202-2/+2
| | | | | | | take a reference to match the getSubtargetImpl that takes a Function argument. llvm-svn: 229994
* Fix build in release mode, -Wunused-variable on this lambda function used ↵Nick Lewycky2015-02-201-0/+1
| | | | | | only in an assert. llvm-svn: 229977
* Fix -Wunused-variable warning in non-asserts build, and optimize a little ↵David Blaikie2015-02-201-3/+3
| | | | | | bit while I'm here. llvm-svn: 229970
* [PowerPC] Loop Data Prefetching for the BG/QHal Finkel2015-02-204-0/+248
| | | | | | | | | | | | | | | | | | | | | The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it prefetches into its own L1P buffer, and the latency to access that buffer is significantly higher than that to the L1 cache (although smaller than the latency to the L2 cache). As a result, especially when multiple hardware threads are not actively busy, explicitly prefetching data into the L1 cache is advantageous. I've been using this pass out-of-tree for data prefetching on the BG/Q for well over a year, and it has worked quite well. It is enabled by default only for the BG/Q, but can be enabled for other cores as well via a command-line option. Eventually, we might want to add some TTI interfaces and move this into Transforms/Scalar (there is nothing particularly target dependent about it, although only machines like the BG/Q will benefit from its simplistic strategy). llvm-svn: 229966
* [x86] Remove the old vector shuffle lowering code and its flag.Chandler Carruth2015-02-201-2940/+22
| | | | | | | | | | | | | | | | | The new shuffle lowering has been the default for some time. I've enabled the new legality testing by default with no really blocking regressions. I've fuzz tested this very heavily (many millions of fuzz test cases have passed at this point). And this cleans up a ton of code. =] Thanks again to the many folks that helped with this transition. There was a lot of work by others that went into the new shuffle lowering to make it really excellent. In case you aren't using a diff algorithm that can handle this: X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-) llvm-svn: 229964
* [x86] Now that the new vector shuffle legality is enabled and everythingChandler Carruth2015-02-201-77/+5
| | | | | | | | | is going well, remove the flag and the code for the old legality tests. This is the first step toward removing the entire old vector shuffle lowering. *Much* more code to delete coming up next. llvm-svn: 229963
* [x86] Make the new vector shuffle legality test on by default, whichChandler Carruth2015-02-201-1/+1
| | | | | | | | | | reflects the fact that the x86 backend can in fact lower any shuffle you want it to with reasonably high code quality. My recent work on the new vector shuffle has made this regress *very* little. The diff in the test cases makes me very, very happy. llvm-svn: 229958
* Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and ↵Eric Christopher2015-02-205-85/+48
| | | | | | | | | | intrinsics." The instructions were being generated on architectures that don't support avx512. This reverts commit r229837. llvm-svn: 229942
* Add a license header to the AVX512 file.Eric Christopher2015-02-201-0/+15
| | | | llvm-svn: 229941
* [ARM] Re-re-apply VLD1/VST1 base-update combine.Ahmed Bougacha2015-02-192-19/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932
* [ARM] Minor cleanup to CombineBaseUpdate. NFC.Ahmed Bougacha2015-02-191-20/+22
| | | | | | | | | | | In preparation for a future patch: - rename isLoad to isLoadOp: the former is confusing, and can be taken to refer to the fact that the node is an ISD::LOAD. (it isn't, yet.) - change formatting here and there. - add some comments. - const-ify bools. llvm-svn: 229929
* [CodeGen] Use ArrayRef instead of std::vector&. NFC.Ahmed Bougacha2015-02-192-2/+2
| | | | | | The former lets us use SmallVectors. Do so in ARM and AArch64. llvm-svn: 229925
* [Hexagon] Moving remaining methods off of HexagonMCInst in to ↵Colin LeMahieu2015-02-1913-173/+113
| | | | | | HexagonMCInstrInfo and eliminating HexagonMCInst class. llvm-svn: 229914
* Remove unused argument from emitInlineAsmStart.Eric Christopher2015-02-192-3/+2
| | | | llvm-svn: 229907
* [Hexagon] Moving more functions off of HexagonMCInst and in to ↵Colin LeMahieu2015-02-194-174/+191
| | | | | | HexagonMCInstrInfo. llvm-svn: 229903
* [Hexagon] Creating HexagonMCInstrInfo namespace as landing zone for static ↵Colin LeMahieu2015-02-197-40/+87
| | | | | | functions detached from HexagonMCInst. llvm-svn: 229885
* [Hexagon] Removing static variable holding MCInstrInfo.Colin LeMahieu2015-02-194-8/+7
| | | | llvm-svn: 229872
* Demote vectors to arrays. No functionality change.Benjamin Kramer2015-02-196-134/+68
| | | | llvm-svn: 229861
* [x86] Delete still more piles of complex code now that we have a goodChandler Carruth2015-02-191-74/+9
| | | | | | | | | | | | | | systematic lowering of v8i16. This required a slight strategy shift to prefer unpack lowerings in more places. While this isn't a cut-and-dry win in every case, it is in the overwhelming majority. There are only a few places where the old lowering would probably be a touch faster, and then only by a small margin. In some cases, this is yet another significant improvement. llvm-svn: 229859
* [x86] Teach the unpack lowering how to lower with an initial unpack inChandler Carruth2015-02-191-1/+36
| | | | | | | | | | addition to lowering to trees rooted in an unpack. This saves shuffles and or registers in many various ways, lets us handle another class of v4i32 shuffles pre SSE4.1 without domain crosses, etc. llvm-svn: 229856
* [x86] Dramatically improve v8i16 shuffle lowering by not using itsChandler Carruth2015-02-191-119/+0
| | | | | | | | | | | | | | | | | | | | terribly complex partial blend logic. This code path was one of the more complex and bug prone when it first went in and it hasn't faired much better. Ultimately, with the simpler basis for unpack lowering and support bit-math blending, this is completely obsolete. In the worst case without this we generate different but equivalent instructions. However, in many cases we generate much better code. This is especially true when blends or pshufb is available. This does expose one (minor) weakness of the unpack lowering that I'll try to address. In case you were wondering, this is actually a big part of what I've been trying to pull off in the recent string of commits. llvm-svn: 229853
* [x86] Remove the final fallback in the v8i16 lowering that isn't reallyChandler Carruth2015-02-191-57/+72
| | | | | | | | | | | | | | | | | | | | | | needed, and significantly improve the SSSE3 path. This makes the new strategy much more clear. If we can blend, we just go with that. If we can't blend, we try to permute into an unpack so that we handle cases where the unpack doing the blend also simplifies the shuffle. If that fails and we've got SSSE3, we now call into factored-out pshufb lowering code so that we leverage the fact that pshufb can set up a blend for us while shuffling. This generates great code, especially because we *know* we don't have a fast blend at this point. Finally, we fall back on decomposing into permutes and blends because we do at least have a bit-math-based blend if we need to use that. This pretty significantly improves some of the v8i16 code paths. We never need to form pshufb for the single-input shuffles because we have effective target-specific combines to form it there, but we were missing its effectiveness in the blends. llvm-svn: 229851
* [x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposingChandler Carruth2015-02-191-75/+71
| | | | | | | | | | | | | | | | | them into permutes and a blend with the generic decomposition logic. This works really well in almost every case and lets the code only manage the expansion of a single input into two v8i16 vectors to perform the actual shuffle. The blend-based merging is often much nicer than the pack based merging that this replaces. The only place where it isn't we end up blending between two packs when we could do a single pack. To handle that case, just teach the v2i64 lowering to handle these blends by digging out the operands. With this we're down to only really random permutations that cause an explosion of instructions. llvm-svn: 229849
* [x86] Remove the insanely over-aggressive unpack lowering strategy forChandler Carruth2015-02-191-38/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | v16i8 shuffles, and replace it with new facilities. This uses precise patterns to match exact unpacks, and the new generalized unpack lowering only when we detect a case where we will have to shuffle both inputs anyways and they terminate in exactly a blend. This fixes all of the blend horrors that I uncovered by always lowering blends through the vector shuffle lowering. It also removes *sooooo* much of the crazy instruction sequences required for v16i8 lowering previously. Much cleaner now. The only "meh" aspect is that we sometimes use pshufb+pshufb+unpck when it would be marginally nicer to use pshufb+pshufb+por. However, the difference there is *tiny*. In many cases its a win because we re-use the pshufb mask. In others, we get to avoid the pshufb entirely. I've left a FIXME, but I'm dubious we can really do better than this. I'm actually pretty happy with this lowering now. For SSE2 this exposes some horrors that were really already there. Those will have to fixed by changing a different path through the v16i8 lowering. llvm-svn: 229846
* [mips][microMIPS] Make usage of AND16, OR16 and XOR16 by code generatorJozef Kolek2015-02-191-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D7611 llvm-svn: 229845
* [x86] The SELECT x86 DAG combine also does legalization. It used to relyChandler Carruth2015-02-191-6/+6
| | | | | | | | | | | | | | | | | | | on things not being marked as either custom or legal, but we now do custom lowering of more VSELECT nodes. To cope with this, manually replicate the legality tests here. These have to stay in sync with the set of tests used in the custom lowering of VSELECT. Ideally, we wouldn't do any of this combine-based-legalization when we have an actual custom legalization step for VSELECT, but I'm not going to be able to rewrite all of that today. I don't have a test case for this currently, but it was found when compiling a number of the test-suite benchmarks. I'll try to reduce a test case and add it. This should at least fix the test-suite fallout on build bots. llvm-svn: 229844
* Reverting r229831 due to multiple ARM/PPC/MIPS build-bot failures.Michael Kuperstein2015-02-1923-248/+231
| | | | llvm-svn: 229841
* AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics.Elena Demikhovsky2015-02-195-48/+85
| | | | llvm-svn: 229837
* [x86] Add support for bit-wise blending and use it in the v8 and v16Chandler Carruth2015-02-191-1/+39
| | | | | | | | | | | | | | | | | | | | | | | lowering paths. I'm going to be leveraging this to simplify a lot of the overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes. Sadly, this isn't profitable on v4i32 and v2i64. There, the float and double blending instructions for pre-SSE4.1 are actually pretty good, and we can't beat them with bit math. And once SSE4.1 comes around we have direct blending support and this ceases to be relevant. Also, some of the test cases look odd because the domain fixer canonicalizes these to floating point domain. That's OK, it'll use the integer domain when it matters and some day I may be able to update enough of LLVM to canonicalize the other way. This restores almost all of the regressions from teaching x86's vselect lowering to always use vector shuffle lowering for blends. The remaining problems are because the v16 lowering path is still doing crazy things. I'll be re-arranging that strategy in more detail in subsequent commits to finish recovering the performance here. llvm-svn: 229836
* [x86,sdag] Two interrelated changes to the x86 and sdag code.Chandler Carruth2015-02-191-48/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as *legal* so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the *hilarious* deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835
* Use std::bitset for SubtargetFeaturesMichael Kuperstein2015-02-1923-231/+248
| | | | | | | | | | | Previously, subtarget features were a bitfield with the underlying type being uint64_t. Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset. No functional change. Differential Revision: http://reviews.llvm.org/D7065 llvm-svn: 229831
* Remove the local subtarget variable from the SystemZ asm printerEric Christopher2015-02-192-8/+3
| | | | | | and update the two calls accordingly. llvm-svn: 229805
* Remove a few more calls to TargetMachine::getSubtarget from theEric Christopher2015-02-192-4/+4
| | | | | | R600 port. llvm-svn: 229804
* Grab the subtarget off of the machine function for the R600Eric Christopher2015-02-192-15/+14
| | | | | | asm printer and clean up a bunch of uses. llvm-svn: 229803
OpenPOWER on IntegriCloud