summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* livePhysRegs: Pass MBB by reference in addLive{Ins|Outs}(); NFCMatthias Braun2016-05-038-10/+10
| | | | | | | The block must no be nullptr for the addLiveIns()/addLiveOuts() function. llvm-svn: 268340
* LivePhysRegs: Automatically determine presence of pristine regs.Matthias Braun2016-05-036-8/+8
| | | | | | | | | | | | | | | | | | | | | | Remove the AddPristinesAndCSRs parameters from addLiveIns()/addLiveOuts(). We need to respect pristine registers after prologue epilogue insertion, Seeing that we got this wrong in at least two commits already, we should rather pay the small price to query MachineFrameInfo for it. There are three cases that did not set AddPristineAndCSRs to true even after register allocation: - ExecutionDepsFix: live-out registers are used as a hint that the register is used soon. This is not true for pristine registers so use the new addLiveOutsNoPristines() to maintain this behaviour. - SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like a bug, should do the right thing automatically now. - StackMapLivenessAnalysis: Not adding pristine registers looks like a bug to me. Added a FIXME comment but maintain the current behaviour as a change may need to get coordinated with GC runtimes. llvm-svn: 268336
* [X86] Model FAULTING_LOAD_OP as a terminator and branch.Quentin Colombet2016-05-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This operation may branch to the handler block and we do not want it to happen anywhere within the basic block. Moreover, by marking it "terminator and branch" the machine verifier does not wrongly assume (because of AnalyzeBranch not knowing better) the branch is analyzable. Indeed, the target was seeing only the unconditional branch and not the faulting load op and thought it was a simple unconditional block. The machine verifier was complaining because of that and moreover, other optimizations could have done wrong transformation! In the process, simplify the representation of the handler block in the faulting load op. Now, we directly reference the handler block instead of using a label. This has the benefits of: 1. MC knows how to issue a label for a BB, so leave that to it. 2. Accessing the target BB from its label is painful, whereas it is direct from a MBB operand. Note: The 2 bytes offset in implicit-null-check.ll comes from the fact the unconditional jumps are not removed anymore, as the whole terminator sequence is not analyzable anymore. Will fix it in a subsequence commit. llvm-svn: 268327
* [X86][SSE] Added placeholder for 128/256-bit wide shuffle combinesSimon Pilgrim2016-05-021-6/+14
| | | | | | Begun adding placeholder for future support for vperm2f128/vshuff64x2 style 128/256-bit wide shuffles llvm-svn: 268306
* AMDGPU: Custom lower v2i32 loads and storesMatt Arsenault2016-05-021-7/+39
| | | | | | | This will allow us to split up 64-bit private accesses when necessary. llvm-svn: 268296
* AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratchTom Stellard2016-05-021-2/+1
| | | | | | | | | We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295
* AMDGPU: Make i64 loads/stores promote to v2i32Matt Arsenault2016-05-022-55/+12
| | | | | | | | | | | | Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293
* Fix instance of -Winconsistent-missing-override in AMDGPU codeReid Kleckner2016-05-021-1/+1
| | | | llvm-svn: 268289
* AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratchTom Stellard2016-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | Summary: When we restore an SGPR value from scratch, we first load it into a temporary VGPR and then use v_readlane_b32 to copy the value from the VGPR back into an SGPR. We weren't setting the kill flag on the VGPR in the v_readlane_b32 instruction, so the register scavenger wasn't able to re-use this temp value later. I wasn't able to create a lit test for this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19744 llvm-svn: 268287
* ARM: fix handling of SUB immediates in peephole opt.Tim Northover2016-05-021-12/+30
| | | | | | | | | | | We were negating an immediate that was going to be used in a SUBri form unnecessarily. Since ADD/SUB are very similar we *can* do that, but we have to change the SUB to an ADD at the same time. This also applies to ADD, and allows us to handle a slightly larger range of immediates for those two operations. rdar://25992245 llvm-svn: 268276
* [NVPTX] Fix sign/zero-extending ldg/ldu instruction selectionJustin Holewinski2016-05-023-48/+77
| | | | | | | | | | | | | | | | | Summary: We don't have sign-/zero-extending ldg/ldu instructions defined, so we need to emulate them with explicit CVTs. We were originally handling the i8 case, but not any other cases. Fixes PR26185 Reviewers: jingyue, jlebar Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D19615 llvm-svn: 268272
* AMDGPU: Move R600 specific code out of AMDGPUISelLowering.cppTom Stellard2016-05-023-39/+51
| | | | | | | | | | Reviewers: arsenm Subscribers: jvesely, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19736 llvm-svn: 268267
* AMDGPU/SI: Fix bug in SIInstrInfo::insertWaitStates() uncovered by r268260Tom Stellard2016-05-021-1/+2
| | | | | | | We can't use MI->getDebugLoc() when MI is an iterator that could be MBB.end(). llvm-svn: 268265
* AMDGPU/SI: Use the hazard recognizer to break SMEM soft clausesTom Stellard2016-05-023-4/+72
| | | | | | | | | | | | | | | Summary: Add support for detecting hazards in SMEM soft clauses, so that we only break the clauses when necessary, either by adding s_nop or re-ordering other alu instructions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18870 llvm-svn: 268260
* AMDGPU: llvm.SI.fs.constant is a source of divergenceNicolai Haehnle2016-05-021-0/+1
| | | | | | | | | | | | | | | | Summary: This intrinsic is used to get flat-shaded fragment shader inputs. Those are uniform across a primitive, but a fragment shader wave may process pixels from multiple primitives (as indicated by the prim_mask), and so that's where divergence can arise. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19747 llvm-svn: 268259
* [WebAssembly] Rename memory_size intrinsic to current_memoryDerek Schuff2016-05-021-9/+9
| | | | | | This follows the recent renaming in the wasm spec. llvm-svn: 268255
* AMDGPU/SI: Use hazard recognizer to detect DPP hazardsTom Stellard2016-05-023-55/+27
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18603 llvm-svn: 268247
* [X86][SSE] Dropped X86ISD::FGETSIGNx86 and use MOVMSK instead for FGETSIGN ↵Simon Pilgrim2016-05-024-37/+12
| | | | | | | | lowering movmsk.ll tests are unchanged. llvm-svn: 268237
* Cleanup comments. NFC.Chad Rosier2016-05-022-3/+4
| | | | llvm-svn: 268236
* Cleanup comments. NFC.Chad Rosier2016-05-021-4/+3
| | | | llvm-svn: 268235
* Silence unused variable warnings; NFC.Aaron Ballman2016-05-021-9/+4
| | | | llvm-svn: 268234
* Enable the X86 call frame optimization for the 64-bit targets that allow it.David L Kreitzer2016-05-022-16/+36
| | | | | | | | Fixes PR27241. Differential Revision: http://reviews.llvm.org/D19688 llvm-svn: 268227
* [SystemZ] Fix in restoreCalleeSavedRegisters()Jonas Paulsson2016-05-021-1/+2
| | | | | | | | Only add operands for GRs to the LMG. Reviewed by Ulrich Weigand. llvm-svn: 268216
* [SystemZ] Mark CC defs as dead whenever possible.Jonas Paulsson2016-05-023-5/+25
| | | | | | | | | | | | | | Marking implicit CC defs as dead everywhere except when CC is actually defined and used explicitly, is important since the post-ra scheduler will otherwise insert edges between instructions unnecessarily. Also temporarily disable LA(Y)-> AGSI optimization in foldMemoryOperandImpl(), since this inroduces a def of the CC reg, which is illegal unless it is known to be dead. Reviewed by Ulrich Weigand. llvm-svn: 268215
* [X86] Fix a bug in LOCK arithmetic operation pattern matching where the ↵Craig Topper2016-05-021-1/+1
| | | | | | | | wrong immediate predicate check was being used for 64-bit instructions with 8-bit immediates. This didn't cause a bug because the order of the patterns ensured that the 64-bit instructions with 32-bit immediates were selected first. llvm-svn: 268212
* [AVX512] VPACKUSWB/VPACKSSWB should not be encoded with EVEX.W=1. While ↵Craig Topper2016-05-011-4/+4
| | | | | | there fix the execution domain for VPACKSSDW/VPACKUSDW. llvm-svn: 268200
* Change AVX512 braodcastsd/ss patterns interaction with spilling . New ↵Igor Breger2016-05-013-110/+98
| | | | | | | | implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth. Differential Revision: http://reviews.llvm.org/D19579 llvm-svn: 268190
* [AVX512] Prefer AVX512 VPACK instructions over AVX/AVX2 instructions when ↵Craig Topper2016-05-011-3/+3
| | | | | | VLX and BWI are supported. llvm-svn: 268189
* [AVX512] Add HasVLX to the 128/256-bit versions of VPACKSSDW/USDW/SSWB/USWB ↵Craig Topper2016-05-011-13/+14
| | | | | | and VPMADDUBSW/VPMADDWD. llvm-svn: 268188
* [AVX512] Make sure 128/256-bit DQI versions of VAND/VANDN/VOR/VXOR are also ↵Craig Topper2016-05-011-16/+16
| | | | | | marked as requiring VLX. llvm-svn: 268186
* [X86] Add an AddedComplexity to another pattern to put it near similar in ↵Craig Topper2016-05-011-2/+1
| | | | | | the output file. llvm-svn: 268184
* [X86] Remove a seemlingly unused pattern. The same pattern appears elsewhere ↵Craig Topper2016-05-011-2/+0
| | | | | | with an AddedComplexity that made this unreachable. llvm-svn: 268183
* [X86] Add AddedComplexity to keep some similar patterns near each other in ↵Craig Topper2016-05-011-0/+1
| | | | | | the output file. llvm-svn: 268181
* [X86] Remove some redundant selection patterns.Craig Topper2016-05-012-11/+0
| | | | llvm-svn: 268180
* [AVX512] Replace vector_extract with extractelt in some patterns. They mean ↵Craig Topper2016-05-011-5/+5
| | | | | | the same thing but vector_extract is deprecated. NFC llvm-svn: 268179
* [AVX512] Add hasSideEffects/mayLoad/mayStore flags to some instructions.Craig Topper2016-05-011-4/+7
| | | | llvm-svn: 268174
* [X86] Reduce memory usage of MemOp2RegOp and RegOp2MemOp folding maps.Craig Topper2016-04-302-13/+9
| | | | llvm-svn: 268164
* Add missing override.Rafael Espindola2016-04-301-1/+2
| | | | llvm-svn: 268163
* AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaitsTom Stellard2016-04-301-6/+0
| | | | | | This was supposed to be part of r268143. llvm-svn: 268154
* [PowerPC/QPX] Fix the load/splat peephole with overlapping readsHal Finkel2016-04-301-1/+9
| | | | | | | | | | | If, in between the splat and the load (which does an implicit splat), there is a read of the splat register, then that register must have another earlier definition. In that case, we can't replace the load's destination register with the splat's destination register. Unfortunately, I don't have a small or non-fragile test case. llvm-svn: 268152
* AMDGPU/SI: Enable the post-ra schedulerTom Stellard2016-04-309-18/+324
| | | | | | | | | | | | | | Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
* AMDGPU: Fix crash with unreachable terminators.Matt Arsenault2016-04-291-12/+27
| | | | | | | | | | If a block has no successors because it ends in unreachable, this was accessing an invalid iterator. Also stop counting instructions that don't emit any real instructions. llvm-svn: 268119
* Differential Revision: http://reviews.llvm.org/D19733Sriraman Tallam2016-04-293-4/+3
| | | | llvm-svn: 268106
* AMDGPU: Add kernarg.segment.ptr intrinsicMatt Arsenault2016-04-291-0/+5
| | | | llvm-svn: 268105
* AMDGPU/SI: Move post regalloc run of SIShrinkInstructionsMatt Arsenault2016-04-291-5/+1
| | | | | | | | Move to addPreEmitPass. This is so it runs after post-RA scheduling so we can merge s_nops emitted by the scheduler and hazard recognizer. llvm-svn: 268095
* Fixed/Recommitted r267733 "[AMDGPU][llvm-mc] Add support of TTMP quads. ↵Artem Tamazov2016-04-296-19/+39
| | | | | | | | | | | Rework M0 exclusion for SMRD." Previously reverted by r267752. r267733 review: Differential Revision: http://reviews.llvm.org/D19342 llvm-svn: 268066
* [PPC] Enable shuffling of VSX vectorsGuozhi Wei2016-04-291-4/+2
| | | | | | This patch fixes PR27078 by enabling shuffling of vectors if VSX is available. llvm-svn: 268064
* [mips][ias] Move createCpRestoreMemOp to MipsTargetStreamer. NFC.Daniel Sanders2016-04-293-38/+57
| | | | | | | | | | | | | | | Summary: This removes the temporary call to isIntegratedAssemblerRequired() which was added recently. It's effect is now acheived directly in the MipsTargetStreamer hierarchy. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D19715 llvm-svn: 268058
* Fix NDEBUG build: variables used only in debug code causing compile errorKrzysztof Parzyszek2016-04-291-4/+8
| | | | llvm-svn: 268057
* [mips][FastISel] A store is not a load.Simon Dardis2016-04-291-1/+1
| | | | | | | | | | Correct trivial error. One of the failing tests from PR/27458. Reviewers: dsanders, vkalintiris, mcrosier Differential Review: http://reviews.llvm.org/D19726 llvm-svn: 268053
OpenPOWER on IntegriCloud