summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][SSE] Improve legal SHUFP and PSHUFD shuffle matchingSimon Pilgrim2014-11-151-8/+19
| | | | | | | | | | Updated X86TargetLowering::isShuffleMaskLegal to match SHUFP masks with commuted inputs and PSHUFD masks that reference the second input. As part of this I've refactored isPSHUFDMask to work in a more general manner and allow it to match against either the first or second input vector. Differential Revision: http://reviews.llvm.org/D6287 llvm-svn: 222087
* R600: Permute operands when selecting legacy min/maxMatt Arsenault2014-11-151-6/+9
| | | | | | | | | | This gets the correct NaN behavior based on the compare type the hardware uses. This now passes the new piglit test I have for this on SI. Add stricter tests for the operand order. llvm-svn: 222079
* R600: Fix 64-bit integer divisionTom Stellard2014-11-151-2/+2
| | | | | | | | This fixes a failure in one of the oclconform tests. Patch by: Jan Vesely llvm-svn: 222073
* R600: Factor i64 UDIVREM lowering into its own fuctionTom Stellard2014-11-153-68/+84
| | | | | | | | This is so it could potentially be used by SI. However, the current implementation does not always produce correct results, so the IntegerDivisionPass is being used instead. llvm-svn: 222072
* Rename EH related stuff to be more preciseReid Kleckner2014-11-142-8/+7
| | | | | | | | | | | | | | | | | | | | Summary: The current "WinEH" exception handling type is more about Itanium-style LSDA tables layered on top of the Windows native unwind info format instead of .eh_frame tables or EHABI unwind info. Use the name "ItaniumWinEH" to better reflect the hybrid nature of the design. Also rename isExceptionHandlingDWARF to usesItaniumLSDAForExceptions, since the LSDA is part of the Itanium C++ ABI document, and not the DWARF standard. Reviewers: echristo Subscribers: llvm-commits, compnerd Differential Revision: http://reviews.llvm.org/D6279 llvm-svn: 222062
* ARM: refactor .cfi_def_cfa_offset emission.Tim Northover2014-11-141-97/+126
| | | | | | | | | | | | We use to track quite a few "adjusted" offsets through the FrameLowering code to account for changes in the prologue instructions as we went and allow the emission of correct CFA annotations. However, we were missing a couple of cases and the code was almost impenetrable. It's easier to just add any stack-adjusting instruction to a list and emit them together. llvm-svn: 222057
* ARM: correctly calculate the offset of FP in its push.Tim Northover2014-11-141-2/+7
| | | | | | | | | | | When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ended up pointing at an incorrect offset. The .cfi_def_cfa_offset directives are still wrong in this case, but I think that can be improved by refactoring. llvm-svn: 222056
* R600/SI: Mark s_movk_i32 as rematerializableTom Stellard2014-11-141-0/+2
| | | | llvm-svn: 222037
* R600/SI: Fix spilling of m0 registerTom Stellard2014-11-141-1/+9
| | | | | | | | | | If we have spilled the value of the m0 register, then we need to restore it with v_readlane_b32 to a regular sgpr, because v_readlane_b32 can't write to m0. v_readlane_b32 can't write to m0, so llvm-svn: 222036
* R600/SI: Combine min3/max3 instructionsMatt Arsenault2014-11-146-10/+135
| | | | llvm-svn: 222032
* R600/SI: Fix verifier error from a branch on IMPLICIT_DEFMatt Arsenault2014-11-141-0/+8
| | | | | | SIILowerI1Copies wasn't correctly handling this case. llvm-svn: 222020
* Fix unused variable warning without assertsMatt Arsenault2014-11-141-0/+2
| | | | llvm-svn: 222017
* R600/SI: Match integer min / max instructionsMatt Arsenault2014-11-142-29/+86
| | | | llvm-svn: 222015
* R600/SI: Use S_BFE_I64 for 64-bit sext_inregMatt Arsenault2014-11-145-17/+82
| | | | llvm-svn: 222012
* [AVX512] Add 512b masked integer shift by immediate patterns.Cameron McInally2014-11-142-29/+21
| | | | llvm-svn: 222002
* R600/SI: Fix assembly names for exec_hi and exec_loTom Stellard2014-11-141-2/+2
| | | | llvm-svn: 221995
* R600/SI: Start implementing an assemblerTom Stellard2014-11-1413-30/+413
| | | | | | | This was done using the Sparc and PowerPC AsmParsers as guides. So far it is very simple and only supports sopp instructions. llvm-svn: 221994
* [PowerPC] Add VSX builtins for vec_divBill Schmidt2014-11-141-0/+6
| | | | | | | | | This patch adds builtin support for xvdivdp and xvdivsp, along with a test case. Straightforward stuff. There's a companion patch for Clang. llvm-svn: 221983
* R600/SI: Make constant array staticMatt Arsenault2014-11-141-1/+1
| | | | llvm-svn: 221965
* X86: use getConstant rather than getTargetConstant behind BUILD_VECTOR.Tim Northover2014-11-141-7/+7
| | | | | | | | | | | | getTargetConstant should only be used when you can guarantee the instruction selected will be able to cope with the raw value. BUILD_VECTOR is rather too generic for this so we should use getConstant instead. In that case, an instruction can still consume the constant, but if it doesn't it'll be materialised through its own round of ISel. Should fix PR21352. llvm-svn: 221961
* Fix build of Mips code with MSVC by using our macro instead of ↵Reid Kleckner2014-11-142-5/+4
| | | | | | __attribute__((unused)) directly llvm-svn: 221956
* First stage of call lowering for Mips fast-iselReed Kotler2014-11-133-2/+319
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This has most of what is needed for mips fast-isel call lowering for O32. What is missing I will add on the next patch because this patch is already too large. It should not be doing anything wrong but it will punt on some cases that it is basically capable of doing. The mechanism is there for parameters to be passed on the stack but I have not enabled it because it serves as a way for now to prevent some of the strange cases of O32 register passing that I have not fully checked yet and have some issues. The Mips O32 abi rules are very complicated as far how data is passed in floating and integer registers. However there is a way to think about this all very simply and this implementation reflects that. Basically, the ABI rules are written as if everything is passed on the stack and aligned as such. Once that is conceptually done, it is nearly trivial to reassign those locations to registers and then all the complexity disappears. So I have told tablegen that all the data is passed on the stack and during the lowering I fix this by assigning to registers as per the ABI doc. This has been my approach and you can line up what I did with the ABI document and see 1 to 1 what is going on. Test Plan: callabi.ll Reviewers: dsanders Reviewed By: dsanders Subscribers: jholewinski, echristo, ahatanak, llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5714 llvm-svn: 221948
* R600/SI: Fix fmin_legacy / fmax_legacy matching for SIMatt Arsenault2014-11-135-33/+75
| | | | | | select_cc is expanded on SI, so this was never matched. llvm-svn: 221941
* We can get the TLOF from the TargetMachine - so constructor no longer ↵Aditya Nandakumar2014-11-1312-12/+12
| | | | | | requires TargetLoweringObjectFile to be passed. llvm-svn: 221926
* [FastISel][AArch64] Don't bail during simple GEP instruction selection.Juergen Ributzka2014-11-131-0/+23
| | | | | | | | | | | | | | | The generic FastISel code would bail, because it can't emit a sign-extend for AArch64. This copies the code over and uses AArch64 specific emit functions. This is not ideal and 'computeAddress' should handles this, so it can fold the address computation into the memory operation. I plan to clean up 'computeAddress' anyways, so I will add that in a future commit. Related to rdar://problem/18962471. llvm-svn: 221923
* R600/SI: Use s_movk_i32Matt Arsenault2014-11-133-2/+17
| | | | llvm-svn: 221922
* R600/SI: Fix definition for s_cselect_b32Matt Arsenault2014-11-132-3/+7
| | | | | | | | | | | These were directly using the old base instruction class, and specifying the wrong register classes for operands. The operands can be the other special inputs besides SGPRs. The op name was also being directly used for the asm string, so this was printed without any operands. llvm-svn: 221921
* R600: Fix assert on empty functionMatt Arsenault2014-11-131-1/+0
| | | | | | | | If a function is just an unreachable, this would hit a "this is not a MachO target" assertion because of setting HasSubsectionViaSymbols. llvm-svn: 221920
* R600: Error on initializer for LDS.Matt Arsenault2014-11-131-2/+21
| | | | | | Also give a proper error for other address spaces. llvm-svn: 221917
* R600/SI: Get rid of FCLAMP_SI pseudoMatt Arsenault2014-11-134-25/+16
| | | | | | | It's not necessary. Also use complex patterns to allow src modifier usage. llvm-svn: 221916
* R600/SI: Allow commuting with src2_modifiersMatt Arsenault2014-11-131-5/+0
| | | | llvm-svn: 221911
* R600/SI: Allow commuting some 3 op instructionsMatt Arsenault2014-11-131-3/+27
| | | | | | | | | | | | | e.g. v_mad_f32 a, b, c -> v_mad_f32 b, a, c This simplifies matching v_madmk_f32. This looks somewhat surprising, but it appears to be OK to do this. We can commute src0 and src1 in all of these instructions, and that's all that appears to matter. llvm-svn: 221910
* ARM: allow constpool entry to be moved to the user's block in all cases.Tim Northover2014-11-131-1/+7
| | | | | | | | | | | | | | | Normally entries can only move to a lower address, but when that wasn't viable, the user's block was considered anyway. Unfortunately, it went via createNewWater which wasn't designed to handle the case where there's already an island after the block. Unfortunately, the test we have is slow and fragile, and I couldn't reduce it to anything sane even with the @llvm.arm.space intrinsic. The test change here is recreating the previous one after the change. rdar://problem/18545506 llvm-svn: 221905
* ARM: avoid duplicating branches during constant islands.Tim Northover2014-11-131-6/+10
| | | | | | | | | We were using a naive heuristic to determine whether a basic block already had an unconditional branch at the end. This mostly corresponded to reality (assuming branches got optimised) because there's not much point in a branch to the next block, but could go wrong. llvm-svn: 221904
* ARM: add @llvm.arm.space intrinsic for testing ConstantIslands.Tim Northover2014-11-133-0/+10
| | | | | | | | Creating tests for the ConstantIslands pass is very difficult, since it depends on precise layout details. Having the ability to precisely inject a number of bytes into the stream helps greatly. llvm-svn: 221903
* [Hexagon]Colin LeMahieu2014-11-131-4/+4
| | | | | | NFC Renaming reserved identifier. llvm-svn: 221898
* AVX-512: SINT_TO_FP cost model and some bugfixesElena Demikhovsky2014-11-132-4/+25
| | | | | | | Checked some corner cases, for example translation of <8 x i1> to <8 x double> llvm-svn: 221883
* This patch changes the ownership of TLOF from TargetLoweringBase to ↵Aditya Nandakumar2014-11-1336-59/+136
| | | | | | TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878
* [x86] Teach the vector shuffle lowering to make a more nuanced decisionChandler Carruth2014-11-131-12/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | between splitting a vector into 128-bit lanes and recombining them vs. decomposing things into single-input shuffles and a final blend. This handles a large number of cases in AVX1 where the cross-lane shuffles would be much more expensive to represent even though we end up with a fast blend at the root. Instead, we can do a better job of shuffling in a single lane and then inserting it into the other lanes. This fixes the remaining bits of Halide's regression captured in PR21281 for AVX1. However, the bug persists in AVX2 because I've made this change reasonably conservative. The cases where it makes sense in AVX2 to split into 128-bit lanes are much more rare because we can often do full permutations across all elements of the 256-bit vector. However, the particular test case in PR21281 is an example of one of the rare cases where it is *always* better to work in a single 128-bit lane. I'm going to try to teach the logic to detect and form the good code even in AVX2 next, but it will need to use a separate heuristic. Finally, there is one pesky regression here where we previously would craftily use vpermilps in AVX1 to shuffle both high and low halves at the same time. We no longer pull that off, and not for any really good reason. Ultimately, I think this is just another missing nuance to the selection heuristic that I'll try to add in afterward, but this change already seems strictly worth doing considering the magnitude of the improvements in common matrix math shuffle patterns. As always, please let me know if this causes a surprising regression for you. llvm-svn: 221861
* [x86] Don't form overly fragmented blends when splitting andChandler Carruth2014-11-131-2/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | re-combining shuffles because nothing was available in the wider vector type. The key observation (which I've put in the comments for future maintainers) is that at this point, no further combining is really possible. And so even though these shuffles trivially could be combined, we need to actually do that as we produce them when producing them this late in the lowering. This fixes another (huge) part of the Halide vector shuffle regressions. As it happens, this was already well covered by the tests, but I hadn't noticed how bad some of these got. The specific patterns that turn directly into unpckl/h patterns were occurring *many* times in common vector processing code. There are still more problems here sadly, but trying to incrementally tease them apart and it looks like this is the core of the problem in the splitting logic. There is some chance of regression here, you can see it in the test changes. Specifically, where we stop forming pshufb in some cases, it is possible that pshufb was in fact faster. Intel "says" that pshufb is slower than the instruction sequences replacing it. llvm-svn: 221852
* [FastISel][AArch64] Optimize select when one of the operands is a 'true' or ↵Juergen Ributzka2014-11-131-0/+61
| | | | | | | | | | | 'false' value. Optimize selects of i1 in the presence of 'true' and 'false' operands to simple logic operations. This fixes rdar://problem/18960150. llvm-svn: 221848
* [FastISel][AArch64] Fold the cmp into the select when possible.Juergen Ributzka2014-11-131-0/+54
| | | | | | | | | This folds the compare emission into the select emission when possible, so we can directly use the flags and don't have to emit a separate compare. Related to rdar://problem/18960150. llvm-svn: 221847
* [FastISel][AArch64] Extend 'select' lowering to support also i1 to i16.Juergen Ributzka2014-11-131-34/+46
| | | | | | Related to rdar://problem/18960150. llvm-svn: 221846
* Expose the number of Newton-Raphson iterations applied to the hardware's ↵Sanjay Patel2014-11-121-3/+7
| | | | | | | | | | | | reciprocal estimate as a parameter (x86). This is a follow-on to r221706 and r221731 and discussed in more detail in PR21385. This patch also loosens the testcase checking for btver2. We know that the "1.0" will be loaded, but we can't tell exactly when, so replace the CHECK-NEXT specifiers with plain CHECKs. The CHECK-NEXT sequence relied on a quirk of post-RA-scheduling that may change independently of anything in these tests. llvm-svn: 221819
* Add fortified (__*_chk) library functions to TLI (NFC)Ahmed Bougacha2014-11-121-0/+6
| | | | | | | | | | | | One of them (__memcpy_chk) was already there, the others were checked by comparing function names. Note that the fortified libfuncs are now part of TLI, but are always available, because they aren't generated, only optimized into the non-checking versions. Differential Revision: http://reviews.llvm.org/D6179 llvm-svn: 221817
* [AVX512] Add integer shift by immediate intrinsics.Cameron McInally2014-11-122-1/+12
| | | | llvm-svn: 221811
* Fix broken doxygen annotations, NFCJingyue Wu2014-11-122-4/+0
| | | | llvm-svn: 221801
* Disable indvar widening if arithmetics on the wider type are more expensiveJingyue Wu2014-11-121-2/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Reapply r221772. The old patch breaks the bot because the @indvar_32_bit test was run whether NVPTX was enabled or not. IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. This test is run only when NVPTX is enabled. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221799
* Add support for small-model PIC for PowerPC.Justin Hibbits2014-11-125-63/+116
| | | | | | | | | | | | | | | | | | | | Summary: Large-model was added first. With the addition of support for multiple PIC models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI. This generates more optimal code, for shared libraries with less than about 16380 data objects. Test Plan: Test cases added or updated Reviewers: joerg, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, mcrosier, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D5399 llvm-svn: 221791
* [mips][micromips] Add predicate 'InMicroMips' at CodeGen patterns for ↵Zoran Jovanovic2014-11-121-1/+2
| | | | | | | | microMIPS instructions Differential Revision: http://reviews.llvm.org/D6198 llvm-svn: 221780
OpenPOWER on IntegriCloud