summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Remove some seemingly unnecessary patterns that supported vector ↵Craig Topper2016-07-132-33/+0
| | | | | | | | zext/sext with 256-bit source types producing a 256-bit result. These patterns just extracted the source down to 128-bits to use the instructions. AVX512 seems to have blindly copied them over for VLX, but did not create similar patterns for 512-bit sources. So I'm hoping the backend can't actually produce these cases. llvm-svn: 275240
* AMDGPU: Follow up to r275203Matt Arsenault2016-07-125-33/+101
| | | | | | I meant to squash this into it. llvm-svn: 275220
* [Power9] Add codegen for VSX word insert/extract instructionsNemanja Ivanovic2016-07-125-8/+220
| | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D20239 It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that are useful in some cases for inserting and extracting vector elements of v4[if]32 vectors. llvm-svn: 275215
* [X86][AVX] Add support for target shuffle combining to VPERM2F128/VPERM2I128Simon Pilgrim2016-07-121-4/+28
| | | | llvm-svn: 275212
* X86FixupBWInsts: No need for forward liveness analysis.Matthias Braun2016-07-121-35/+0
| | | | | | | | | | With r274952 and r275201 in place there are no cases left where a forward liveness analysis yields different results than a backward one. So we can remove the forward stepping logic. Differential Revision: http://reviews.llvm.org/D22083 llvm-svn: 275204
* AMDGPU: Fix verifier error with kill intrinsicMatt Arsenault2016-07-121-65/+122
| | | | | | | Don't create a terminator in the middle of the block. We should probably get rid of this intrinsic. llvm-svn: 275203
* AMDGPU: Set isConvergent on v_cmpx* instructionsMatt Arsenault2016-07-121-2/+3
| | | | | | | No test since these aren't used now, except for one place in a pre-emit pass. llvm-svn: 275200
* AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8Wei Ding2016-07-121-0/+4
| | | | | | Differential Revision: http://reviews.llvm.org/D22239 llvm-svn: 275197
* [AArch64] Set FMOVS0 and FMOVD0 as isAsCheapAsAMove when needed.Haicheng Wu2016-07-121-0/+6
| | | | | | | | | If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now only Kryo has both), set FMOVS0 and FMOVD0 isAsCheapAsAMove. Differential Revision: http://reviews.llvm.org/D22256 llvm-svn: 275178
* [PowerPC] Cannonicalize applicable vector shift immediates as swapsNemanja Ivanovic2016-07-123-3/+18
| | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D21358 Vector shifts that have the same semantics as a vector swap are cannonicalized as such to provide additional opportunities for swap removal optimization to remove unnecessary swaps. llvm-svn: 275168
* AMDGPU: Unify MOVRELSOffset and MOVRELDOffsetNicolai Haehnle2016-07-123-34/+9
| | | | | | | | | | | | | | | | Summary: Previously, constant index insertelements would be turned into SI_INDIRECT_DST, which is bound to prevent some optimization opportunities. Worse, it mislead the heuristic that decides whether immediates should be lowered to S_MOV_B32 or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22217 llvm-svn: 275160
* [AVX512] Remove masked logic op intrinsics and autoupgrade them to native IR.Craig Topper2016-07-121-24/+0
| | | | llvm-svn: 275155
* X86: Avoid implicit iterator conversions, NFCDuncan P. N. Exon Smith2016-07-1210-220/+220
| | | | | | | | Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr*, mainly by preferring MachineInstr& over MachineInstr* and using range-based for loops. llvm-svn: 275149
* [Kryo] Enable ZCZeroing featureHaicheng Wu2016-07-121-1/+2
| | | | | | | | This feature uses immediate #0 to zero a register. Differential Revision: http://reviews.llvm.org/D19985 llvm-svn: 275143
* Hexagon: Avoid implicit iterator conversions, NFCDuncan P. N. Exon Smith2016-07-1215-442/+426
| | | | | | | | | | | | | | Avoid implicit iterator conversions from MachineInstrBundleIterator to MachineInstr* in the Hexagon backend, mostly by preferring MachineInstr& over MachineInstr* and switching to range-based for loops. There's a long tail of API cleanup here, but I'm planning to leave the rest to the Hexagon maintainers. HexagonInstrInfo defines many of its own predicates, and most of them still take MachineInstr*. Some of those actually check for nullptr, so I didn't feel comfortable changing them to MachineInstr& en masse. llvm-svn: 275142
* Mips: Avoid implicit iterator conversions, NFCDuncan P. N. Exon Smith2016-07-126-57/+51
| | | | | | | | | Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr* in the Mips backend, mainly by preferring MachineInstr& over MachineInstr* when a pointer isn't nullable and using range-based for loops. llvm-svn: 275141
* SystemZ: Avoid implicit iterator conversions, NFCDuncan P. N. Exon Smith2016-07-123-81/+75
| | | | | | | | Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr* in the SystemZ backend, mainly by preferring MachineInstr& over MachineInstr* and using range-based for loops. llvm-svn: 275137
* Teach FastISel about thiscall (and, hence, about callee-pop).Nico Weber2016-07-121-5/+12
| | | | | | http://reviews.llvm.org/D22115 llvm-svn: 275135
* AMDGPU: Cleanup pseudoinstructionsMatt Arsenault2016-07-123-58/+55
| | | | llvm-svn: 275133
* AMDGPU: Fix missing scc def on control flow pseudosMatt Arsenault2016-07-121-2/+2
| | | | | | These are all expanded to instructions that include an scc def. llvm-svn: 275132
* AMDGPU: Enable trackLivenessAfterRegAllocMatt Arsenault2016-07-112-0/+6
| | | | | | This has caught a number of bugs. llvm-svn: 275131
* ARM: validate immediate branch targets in AsmParser.Tim Northover2016-07-115-51/+93
| | | | | | | | | | Immediate branch targets aren't commonly used, but if they are we should make sure they can actually be encoded. This means they must be divisible by 2 when targeting Thumb mode, and by 4 when targeting ARM mode. Also do a little naming cleanup while I was changing everything around anyway. llvm-svn: 275116
* AMDGPU: Treat texture gather instructions more like other MIMG instructionsNicolai Haehnle2016-07-115-4/+17
| | | | | | | | | | | | | | | | | | | | | Summary: Setting MIMG to 0 has a bunch of unexpected side effects, including that isVMEM returns false which leads to incorrect treatment in the hazard recognizer. The reason I noticed it is that it also leads to incorrect treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug. The only reason why MIMG was set to 0 is to signal the special handling of dmasks, but that can be checked differently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22210 llvm-svn: 275113
* AMDGPU: fix local stack slot allocation bugsNicolai Haehnle2016-07-111-2/+8
| | | | | | | | | | | | | | | | | | | | | | | Summary: The main bug fix here is using the 32-bit encoding of V_ADD_I32 in materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary immediates work. The second part is that we may now require the SegmentWaveByteOffset even when there are initially no stack objects and VGPR spilling isn't enabled, for stack slots that are allocated later. This means that some bits become effectively dead and can be cleaned up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602 Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21551 llvm-svn: 275108
* [X86] Make some cast costs more preciseMichael Kuperstein2016-07-111-3/+16
| | | | | | | | | Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106
* [X86] Fix tailcall return address clobber bug.Quentin Colombet2016-07-112-11/+30
| | | | | | | | | | | | | | | | | | | | | | | | This bug (llvm.org/PR28124) was introduced by r237977, which refactored the tail call sequence to be generated in two passes instead of one. Unfortunately, the stack adjustment produced by the first pass was not recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing code such as the following, which clobbers the return address, to be generated: popl %edi popl %edi pushl %eax jmp tailcallee # TAILCALL To fix the problem, the entire stack adjustment is performed in X86ExpandPseudo::ExpandMI() for tail calls. Patch by Magnus Lång <margnus1@gmail.com> Differential Revision: http://reviews.llvm.org/D21325 llvm-svn: 275103
* [X86] Disable FixupSetCC for CodeGenOpt::NoneMichael Kuperstein2016-07-111-4/+4
| | | | | | | | | | It is an optimization pass, and should not run at -O0. Especially since Fast RA will not do the required register coalescing anyway, so it's a loss even from the optimization standpoint. This also works around (but doesn't quite fix) PR28489. llvm-svn: 275099
* [SystemZ] Recognize Load On Condition Immediate (LOCHI/LOGHI) opportunitiesZhan Jun Liau2016-07-116-2/+73
| | | | | | | | | | | | | | | | | | Summary: Add support for the z13 instructions LOCHI and LOCGHI which conditionally load immediate values. Add target instruction info hooks so that if conversion will allow predication of LHI/LGHI. Author: RolandF Reviewers: uweigand Subscribers: zhanjunl Commiting on behalf of Roland. Differential Revision: http://reviews.llvm.org/D22117 llvm-svn: 275086
* Add missing include from previous commitNirav Dave2016-07-111-0/+1
| | | | llvm-svn: 275069
* Fix branch relaxation in 16-bit mode.Nirav Dave2016-07-1113-44/+78
| | | | | | | | | | | | | | | Thread through MCSubtargetInfo to relaxInstruction function allowing relaxation to generate jumps with 16-bit sized immediates in 16-bit mode. This fixes PR22097. Reviewers: dwmw2, tstellarAMD, craig.topper, jyknight Subscribers: jfb, arsenm, jyknight, llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D20830 llvm-svn: 275068
* [X86][SSE] Generalise target shuffle combine of shuffles using variable masksSimon Pilgrim2016-07-111-13/+21
| | | | | | At present the only shuffle with a variable mask we recognise is PSHUFB, which influences if its worth the cost of mask creation/loading of a combined target shuffle with a variable mask. This change sets up the infrastructure to support other shuffles in the future but has no effect yet. llvm-svn: 275059
* [AMDGPU][llvm-mc] Quickfix for r272748 to enable labels in branch instructions.Artem Tamazov2016-07-111-0/+2
| | | | | | | | | | Fixes issue mentioned at: https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra/issues/13. Lit tests added. Differential Revision: http://reviews.llvm.org/D22133 llvm-svn: 275054
* [mips][microMIPS] Implement LDC1, SDC1, LDC2, SDC2, LWC1, SWC1, LWC2 and ↵Zlatko Buljan2016-07-1115-58/+324
| | | | | | | | SWC2 instructions and add CodeGen support Differential Revision: http://reviews.llvm.org/D18824 llvm-svn: 275050
* AVX-512: DAG lowering for scalar MIN/MAX commutable opsElena Demikhovsky2016-07-111-3/+36
| | | | | | | | DAG lowering was missing for the scalar FMINC, FMAXC nodes. The nodes are generated only in the "unsafe-fp-math" mode. Added tests. llvm-svn: 275048
* [AVX512] Add support for 512-bit ANDN now that all ones build vectors ↵Craig Topper2016-07-111-1/+2
| | | | | | survive long enough to allow the matching. llvm-svn: 275046
* [AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one ↵Craig Topper2016-07-113-5/+19
| | | | | | vectors. llvm-svn: 275045
* [X86] Add the AVX512 SET0 pseudos to foldMemoryOperandImpl since they are ↵Craig Topper2016-07-112-3/+14
| | | | | | | | marked for CanFoldAsLoad. I don't really know how to test this. llvm-svn: 275044
* [X86][SSE] Relax type assertions for matchVectorShuffleAsInsertPSSimon Pilgrim2016-07-101-2/+4
| | | | | | Calls to matchVectorShuffleAsInsertPS only need to ensure the inputs are 128-bit vectors. Only lowerVectorShuffleAsInsertPS needs to ensure that they are v4f32. llvm-svn: 275028
* AMDGPU/R600: Add implicitarg.ptr intrinsicJan Vesely2016-07-103-6/+12
| | | | | | Differential Revision: http://reviews.llvm.org/D21622 llvm-svn: 275024
* [X86][SSE] Add support for target shuffle combining to PSHUFLW/PSHUFHWSimon Pilgrim2016-07-101-3/+48
| | | | llvm-svn: 275022
* [SystemZ] Utilize Test Data Class instructions.Marcin Koscielnicki2016-07-106-4/+432
| | | | | | | | | | | This adds a new SystemZ-specific intrinsic, llvm.s390.tdc.f(32|64|128), which maps straight to the test data class instructions. A new IR pass is added to recognize instructions that can be converted to TDC and perform the necessary replacements. Differential Revision: http://reviews.llvm.org/D21949 llvm-svn: 275016
* Give helper classes/functions internal linkage. NFC.Benjamin Kramer2016-07-101-9/+14
| | | | llvm-svn: 275014
* [AVX512] Add support for lowering to 512-bit SHUFPS.Craig Topper2016-07-101-4/+8
| | | | llvm-svn: 275011
* [X86][SSE] Add support for target shuffle combining to INSERTPSSimon Pilgrim2016-07-091-10/+48
| | | | llvm-svn: 274990
* [X86][SSE] Use scaleShuffleMask helper. NFCI.Simon Pilgrim2016-07-091-11/+2
| | | | llvm-svn: 274988
* [lanai] Treat .t as optional in assembly parser for RR operands and add ↵Jacques Pienaar2016-07-092-9/+38
| | | | | | predicate operand to ShiftRR llvm-svn: 274980
* AMDGPU: Move R600 only pieces into R600 classesMatt Arsenault2016-07-0911-109/+81
| | | | llvm-svn: 274979
* Revert "AMDGPU: Remove unused control flow intrinsic"Matt Arsenault2016-07-097-0/+32
| | | | llvm-svn: 274978
* AMDGPU: Prune AMDGPUAsmParser in libdeps.NAKAMURA Takumi2016-07-091-1/+1
| | | | llvm-svn: 274970
* AMDGPU: Fix fdiv lowering when f32 denormals supportedMatt Arsenault2016-07-091-5/+3
| | | | | | Also fix test not actually using function labels. llvm-svn: 274969
OpenPOWER on IntegriCloud