summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Fix the build with MSVC 2013 after new shuffle codeReid Kleckner2014-08-151-2/+8
| | | | | | | | | | | | MSVC gives this awesome diagnostic: ..\lib\Target\X86\X86ISelLowering.cpp(7085) : error C2971: 'llvm::VariadicFunction1' : template parameter 'Func' : 'isShuffleEquivalentImpl' : a local variable cannot be used as a non-type argument ..\include\llvm/ADT/VariadicFunction.h(153) : see declaration of 'llvm::VariadicFunction1' ..\lib\Target\X86\X86ISelLowering.cpp(7061) : see declaration of 'isShuffleEquivalentImpl' Using an anonymous namespace makes the problem go away. llvm-svn: 215744
* R600/SI: Fix offset folding in some cases with shifted pointers.Matt Arsenault2014-08-154-1/+137
| | | | | | | | | | | | | Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) is only done if the add has one use. If the resulting constant add can be folded into an addressing mode, force this to happen for the pointer operand. This ends up happening a lot because of how LDS objects are allocated. Since the globals are allocated next to each other, acessing the first element of the second object is directly indexed by a shifted pointer. llvm-svn: 215739
* [x86] Teach the new AVX v4f64 shuffle lowering to use UNPCK instructionsChandler Carruth2014-08-151-0/+42
| | | | | | where applicable for blending. llvm-svn: 215737
* R600/SI: Add intrinsic for ldexpMatt Arsenault2014-08-154-2/+14
| | | | llvm-svn: 215734
* R600/SI: Implement isLegalAddressingModeMatt Arsenault2014-08-152-0/+47
| | | | | | | | | | | | | The default assumes that a 16-bit signed offset is used. LDS instruction use a 16-bit unsigned offset, so it wasn't being used in some cases where it was assumed a negative offset could be used. More should be done here, but first isLegalAddressingMode needs to gain an addressing mode argument. For now, copy most of the rest of the default implementation with the immediate offset change. llvm-svn: 215732
* ARM: Fix and re-enable load/store optimizer for Thumb1.Moritz Roth2014-08-151-111/+8
| | | | | | | | | | | | | | | In a previous iteration of the pass, we would try to compensate for writeback by updating later instructions and/or inserting a SUBS to reset the base register if necessary. Since such a SUBS sets the condition flags it's not generally safe to do this. For now, only merge LDR/STRs if there is no writeback to the base register (LDM that loads into the base register) or the base register is killed by one of the merged instructions. These cases are clear wins both in terms of instruction count and performance. Also add three new test cases, and update the existing ones accordingly. llvm-svn: 215729
* ARM load/store optimizer: Compute BaseKill correctly.Moritz Roth2014-08-151-5/+11
| | | | | | | | | | | This adds some code back that was deleted in r92053. The location of the last merged memory operation needs to be kept up-to-date since MemOps may be in a different order to the original instruction stream to allow merging (since registers need to be in ascending order). Also simplify the logic to determine BaseKill using findRegisterUseOperandIdx to use an equivalent function call instead. llvm-svn: 215728
* [FastISel][ARM] Fix a think-o in my previous commit (r215682).Juergen Ributzka2014-08-151-15/+15
| | | | | | | | | We actually need to return the register into which we materialized the constant and not just "true" for success. This code is currently partially dead, that is why it didn't trigger any failures yet. Once I change the order of the constant materialization this code will be fully exercised. llvm-svn: 215727
* [AArch64] Narrow arguments passed in wrong position on the stack inAmara Emerson2014-08-151-2/+2
| | | | | | | | | | big-endian mode. Patch by Asiri Rathnayake. Differential Revision: http://reviews.llvm.org/D4922 llvm-svn: 215716
* Remove HasLEB128.Rafael Espindola2014-08-1510-13/+0
| | | | | | We already require CFI, so it should be safe to require .leb128 and .uleb128. llvm-svn: 215712
* PPC: Clean up pointer casting, no functionality change.Benjamin Kramer2014-08-151-2/+2
| | | | | | Silences GCC's -Wcast-qual. llvm-svn: 215703
* [x86] Add the initial skeleton of type-based dispatch for AVX vectors inChandler Carruth2014-08-151-9/+125
| | | | | | | | | | | | | the new shuffle lowering and an implementation for v4 shuffles. This allows us to handle non-half-crossing shuffles directly for v4 shuffles, both integer and floating point. This currently misses places where we could perform the blend via UNPCK instructions, but otherwise generates equally good or better code for the test cases included to the existing vector shuffle lowering. There are a few cases that are entertainingly better. ;] llvm-svn: 215702
* [x86] Teach the instruction printer to decode immediate operands toChandler Carruth2014-08-153-0/+74
| | | | | | | | | BLENDPS, BLENDPD, and PBLENDW instructions into pretty shuffle comments. These will be used in my next commit as part of test cases for AVX shuffles which can directly use blend in more places. llvm-svn: 215701
* ARM: implement MRS/MSR (banked reg) system instructions.Tim Northover2014-08-157-4/+241
| | | | | | | | | | These are system-only instructions for CPUs with virtualization extensions, allowing a hypervisor easy access to all of the various different AArch32 registers. rdar://problem/17861345 llvm-svn: 215700
* Remove testcase from README which we didn't get. We do get it now.Erik Verbruggen2014-08-151-1/+1
| | | | llvm-svn: 215699
* Current implementation of c.cond.fmt instructions only accept default cc0 ↵Vladimir Medic2014-08-152-14/+51
| | | | | | register. This patch enables the instruction to accept other fcc registers. The aliases with default fcc0 registers are also defined. llvm-svn: 215698
* [x86] Remove the duplicated code for testing whether we can widen theChandler Carruth2014-08-151-12/+4
| | | | | | | elements of a shuffle mask and simplify how it works. No functionality changed now that the bug that was here has been fixed. llvm-svn: 215696
* [x86] Fix the very broken formation of vpunpck instructions in theChandler Carruth2014-08-151-1/+1
| | | | | | | | | | | | | | | | | target-specific shuffl DAG combines. We were recognizing the paired shuffles backwards. This code needs to be replaced anyways as we have the same functionality elsewhere, but I'll do the refactoring in a follow-up, this is the minimal fix to the behavior. In addition to fixing miscompiles with the new vector shuffle lowering, it also causes the canonicalization to kick in much better, selecting the smaller encoding variants in lots of places in the new AVX path. This still isn't quite ideal as we don't need both the shufpd and the punpck instructions, but that'll get fixed in a follow-up patch. llvm-svn: 215690
* [x86] Fix PR20540 where the x86 shuffle DAG combiner had completelyChandler Carruth2014-08-151-23/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | broken logic for merging shuffle masks in the face of SM_SentinelZero mask operands. While these are '-1' they don't mean 'undef' the way '-1' means in the pre-legalized shuffle masks. Instead, they mean that the shuffle operation is forcibly zeroing that lane. Reflect this and explicitly handle it in a bunch of places. In one place the effect is equivalent but much more clear. In the rest it was really weirdly broken. Also, rewrite the entire merging thing to be a more directy operation with a single loop and just doing math to map the indices through the various masks. Also add a bunch of asserts to try to make in extremely clear what the different masks can possibly look like. Finally, add some comments to clarify that we're merging shuffle masks *up* here rather than *down* as we do everywhere else, and thus the logic is quite confusing. Thanks to several different people for sending test cases, and for Robert Khasanov for an initial attempt at fixing. llvm-svn: 215687
* [PPC64] Add missing dependency on X2 to LDinto_toc.Bill Schmidt2014-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LDinto_toc pattern has been part of 64-bit PowerPC for a long time, and represents loading from a memory location into the TOC register (X2). However, this pattern doesn't explicitly record that it modifies that register. This patch adds the missing dependency. It was very surprising to me that this has never shown up as a problem in the past, and that we only saw this problem recently in a single scenario when building a self-hosted clang. It turns out that in most cases we have another dependency present that keeps the LDinto_toc instruction tied in place. LDinto_toc is used for TOC restore following a call site, so this is a typical sequence: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use> Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there is a natural anti-dependency between the two that keeps it in place. Therefore we don't usually see a problem. However, in one particular case, one call is followed immediately by another call, and the second call requires a parameter that is a TOC-relative address. This is the code sequence: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use> ADJCALLSTACKDOWN 96, %R1<imp-def>, %R1<imp-use> %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 Note that the back-to-back stack adjustments are the same size! The back end is smart enough to recognize this and optimize them away: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 Now there is nothing to prevent the ADDIStocHA instruction from moving ahead of the LDinto_toc instruction, and because of the longest-path heuristic, this is what happens. With the accompanying patch, %X2 is represented as an implicit def: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1, %X2<imp-def,dead> ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use> ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use> %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 So now when the two stack adjustments are removed, ADDIStocHA is prevented from being moved above LDinto_toc. I have not yet created a test case for this, because the original failure occurs on a relatively large function that needs reduction. However, this is a fairly serious bug, despite its infrequency, and I wanted to get this patch onto the list as soon as possible so that it can be considered for a 3.5 backport. I'll work on whittling down a test case. Have we missed the boat for 3.5 at this point? Thanks, Bill llvm-svn: 215685
* [FastISel][ARM] Fall-back to constant pool loads when materializing an i32 ↵Juergen Ributzka2014-08-141-1/+2
| | | | | | | | | | | | | | constant. FastEmit_i won't always succeed to materialize an i32 constant and just fail. This would trigger a fall-back to SelectionDAG, which is really not necessary. This fix will first fall-back to a constant pool load to materialize the constant before giving up for good. This fixes <rdar://problem/18022633>. llvm-svn: 215682
* Revert several FastISel commits to track down a buildbot error.Juergen Ributzka2014-08-142-110/+28
| | | | | | | | | | | | This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673
* Fix whitespace error from r215279, NFCDuncan P. N. Exon Smith2014-08-141-1/+1
| | | | llvm-svn: 215667
* [AVX512] Switch FMA intrinsics to the masking versionAdam Nemet2014-08-141-24/+37
| | | | | | | | This does the renaming and updates the lowering logic. Part of <rdar://problem/17688758> llvm-svn: 215664
* [X86] Break out logic to map FMA Intrinsic number to OpcodeAdam Nemet2014-08-141-57/+51
| | | | | | No functional change. Will be used to lower AVX512 masking FMA intrinsics. llvm-svn: 215663
* [AVX512] Add enum for the static rounding typesAdam Nemet2014-08-142-1/+13
| | | | | | | | | | No functional change. This will be used by the new FMA intrinsic lowering code. We can probably add NO_EXC here as well, I am just not too familiar with this part of AVX512 yet. We can add that later. llvm-svn: 215662
* [AVX512] Break out the logic to lower masking intrinsicsAdam Nemet2014-08-141-13/+21
| | | | | | | No functional change. This will be used by the FMA intrinsic lowering as well and hopefully many more. llvm-svn: 215661
* [AVX512] Add masking variant for the FMA instructionsAdam Nemet2014-08-142-33/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change further evolves the base class AVX512_masking in order to make it suitable for the masking variants of the FMA instructions. Besides AVX512_masking there is now a new base class that instructions including FMAs can use: AVX512_masking_3src. With three-source (destructive) instructions one of the sources is already tied to the destination. This difference from AVX512_masking is captured by this new class. The common bits between _masking and _masking_3src are broken out into a new super class called AVX512_masking_common. As with valign, there is some corresponding restructuring of the underlying format classes. The idea is the same we want to derive from two classes essentially: one providing the format bits and another format-independent multiclass supplying the various masking and non-masking instruction variants. Existing fma tests in avx512-fma*.ll provide coverage here for the non-masking variants. For masking, the next patches in the series will add intrinsics and intrinsic tests. For AVX512_masking_3src to work, the (ins ...) dag has to be passed *without* the leading source operand that is tied to dst ($src1). This is necessary to properly construct the (ins ...) for the different variants. For the record, I did check that if $src is mistakenly included, you do get a fairly intuitive error message from the tablegen backend. Part of <rdar://problem/17688758> llvm-svn: 215660
* Revert "[FastISel][AArch64] Add support for more addressing modes."Juergen Ributzka2014-08-141-289/+168
| | | | | | This reverts commits r215597, because it might have broken the build bots. llvm-svn: 215659
* Testing commit access.Moritz Roth2014-08-141-1/+1
| | | | | | Remove a trailing whitespace. llvm-svn: 215653
* Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly ↵Aaron Ballman2014-08-141-1/+1
| | | | | | converted to 64 bits (was 64-bit shift intended?)). NFC. llvm-svn: 215642
* [x86] Begin stubbing out the AVX support in the new vector shuffleChandler Carruth2014-08-141-0/+88
| | | | | | | | | | | | | | | | | | lowering scheme. Currently, this just directly bails to the fallback path of splitting the 256-bit vector into two 128-bit vectors, operating there, and then joining the results back together. While the results are far from perfect, they are *shockingly* good for what we're doing here. I'll be layering the rest of the functionality on top of this piece by piece and updating tests as I go. Note that 256-bit vectors in this mode are still somewhat WIP. While I think the code paths that I'm adding here are clean and good-to-go, there are still a lot of 128-bit assumptions that I'll need to stomp out as I march through the functional spread here. llvm-svn: 215637
* [mips][microMIPS] MicroMIPS Compact Branch Instructions BEQZC and BNEZCZoran Jovanovic2014-08-142-0/+28
| | | | | | Differential Revision: http://reviews.llvm.org/D3545 llvm-svn: 215636
* [mips] Add assembler support for the "la $reg,symbol" pseudo-instruction.Toma Tabacu2014-08-141-6/+91
| | | | | | | | | | | | | | | | | | Summary: This pseudo-instruction allows the programmer to load an address from a symbolic expression into a register. Patch by David Chisnall. His work was sponsored by: DARPA, AFRL I've made some minor changes to the original, such as improving the formatting and adding some comments, and I've also added a test case. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D4808 llvm-svn: 215630
* [mips] Rename [gs]etCanHaveModuleDir to more natural namesDaniel Sanders2014-08-144-59/+42
| | | | | | | | | | | | | | | | | Summary: getCanHaveModuleDir() is renamed to isModuleDirectiveAllowed(), and setCanHaveModuleDir() is renamed to forbidModuleDirective() since it is only ever given a false argument. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4885 llvm-svn: 215628
* AArch64: Silence warning in AArch64FastISelDavid Majnemer2014-08-141-1/+1
| | | | | | GCC was emitting a signed vs unsigned comparison warning. llvm-svn: 215620
* [X86] Fix the value of the low mask for the lowering of MUL_LOHI for v4i32.Quentin Colombet2014-08-131-1/+1
| | | | | | Found by code inspection. llvm-svn: 215604
* [AArch64, fast-isel] Fall back to SelectionDAG to select tail calls.Akira Hatanaka2014-08-131-0/+5
| | | | | | | | | | Certain functions such as objc_autoreleaseReturnValue have to be called as tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls, we have to fall back to SelectionDAG to select calls that are marked as tail. <rdar://problem/17991614> llvm-svn: 215600
* [FastISel][AArch64] Add support for more addressing modes.Juergen Ributzka2014-08-131-168/+289
| | | | | | | | | | | | | | | | | FastISel didn't take much advantage of the different addressing modes available to it on AArch64. This commit allows the ComputeAddress method to recognize more addressing modes that allows shifts and sign-/zero-extensions to be folded into the memory operation itself. For Example: lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3] ldr x0, [x0, x1] sxtw x1, w1 lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3] ldr x0, [x0, x1] llvm-svn: 215597
* [FastISel][X86] Add large code model support for materializing ↵Juergen Ributzka2014-08-131-1/+17
| | | | | | | | | | | | | | floating-point constants. In the large code model for X86 floating-point constants are placed in the constant pool and materialized by loading from it. Since the constant pool could be far away, a PC relative load might not work. Therefore we first materialize the address of the constant pool with a movabsq and then load from there the floating-point value. Fixes <rdar://problem/17674628>. llvm-svn: 215595
* [FastISel][X86] Use XOR to materialize the "0" value.Juergen Ributzka2014-08-131-0/+23
| | | | llvm-svn: 215594
* [FastISel][X86] Emit more efficient instructions for integer constant ↵Juergen Ributzka2014-08-131-1/+28
| | | | | | | | | | | | | materialization. This mostly affects the i64 value type, which always resulted in an 15byte mobavsq instruction to materialize any constant. The custom code checks the value of the immediate and tries to use a different and smaller mov instruction when possible. This fixes <rdar://problem/17420988>. llvm-svn: 215593
* [FastISel][AArch64] Make use of the zero register when possible.Juergen Ributzka2014-08-131-1/+13
| | | | | | | | | | This change materializes now the value "0" from the zero register. The zero register can be folded by several instruction, so no materialization is need at all. Fixes <rdar://problem/17924413>. llvm-svn: 215591
* [MachineCombiner] Removal of dangling DBG_VALUES after combining [20598]Gerolf Hoflehner2014-08-131-2/+1
| | | | | | | | This is a cleaner solution to the problem described in r215431. When instructions are combined a dangling DBG_VALUE is removed. This resolves bug 20598. llvm-svn: 215587
* [FastISel][X86] Refactor constant materialization. NFCI.Juergen Ributzka2014-08-131-54/+67
| | | | | | | Split the constant materialization code into three separate helper functions for Integer-, Floating-Point-, and GlobalValue-Constants. llvm-svn: 215586
* [FastISel][ARM] Use MOVT/MOVW if the subtarget requests it.Juergen Ributzka2014-08-131-0/+3
| | | | | | | | This change is also in preparation for a future change to make sure that the constant materialization uses MOVT/MOVW when available and not a load from the constant pool. llvm-svn: 215584
* [FastISel][ARM] Fix a bug in the integer materialization code.Juergen Ributzka2014-08-131-1/+3
| | | | | | | | | | | | getRegClassFor returns the incorrect register class when in Thumb2 mode. This fix simply manually selects the register class as in the code just a few lines above. There is no test case for this code, because the code is currently unreachable. This will be changed in a future commit and existing test cases will exercise this code. llvm-svn: 215583
* [FastISel][AArch64] Cleanup constant materialization code. NFCI.Juergen Ributzka2014-08-131-26/+30
| | | | | | Cleanup and prepare constant materialization code for future commits. llvm-svn: 215582
* R600: Correctly set the src value offset for scalarized kernel argsMatt Arsenault2014-08-131-11/+29
| | | | | | | | | | This for some reason fixes v1i64 kernel arguments on pre-SI. This currently breaks some other cases in the kernel-args.ll test for R600, but I'm not particularly confident in the new output. VTX_READ_* are not used for some of the scalarized cases, and the code reading from the constant buffer doesn't make much sense to me. llvm-svn: 215564
* Canonicalize header guards into a common format.Benjamin Kramer2014-08-13284-622/+644
| | | | | | | | | | Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558
OpenPOWER on IntegriCloud