summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [MachineBlockPlacement] Let the target optimize the branches at the end.Quentin Colombet2016-05-021-4/+4
| | | | | | | | | | | | | | | After the layout of the basic blocks is set, the target may be able to get rid of unconditional branches to fallthrough blocks that the generic code does not catch. This happens any time TargetInstrInfo::AnalyzeBranch is not able to analyze all the branches involved in the terminators sequence, while still understanding a few of them. In such situation, AnalyzeBranch can directly modify the branches if it has been instructed to do so. This patch takes advantage of that. llvm-svn: 268328
* [X86] Model FAULTING_LOAD_OP as a terminator and branch.Quentin Colombet2016-05-021-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | This operation may branch to the handler block and we do not want it to happen anywhere within the basic block. Moreover, by marking it "terminator and branch" the machine verifier does not wrongly assume (because of AnalyzeBranch not knowing better) the branch is analyzable. Indeed, the target was seeing only the unconditional branch and not the faulting load op and thought it was a simple unconditional block. The machine verifier was complaining because of that and moreover, other optimizations could have done wrong transformation! In the process, simplify the representation of the handler block in the faulting load op. Now, we directly reference the handler block instead of using a label. This has the benefits of: 1. MC knows how to issue a label for a BB, so leave that to it. 2. Accessing the target BB from its label is painful, whereas it is direct from a MBB operand. Note: The 2 bytes offset in implicit-null-check.ll comes from the fact the unconditional jumps are not removed anymore, as the whole terminator sequence is not analyzable anymore. Will fix it in a subsequence commit. llvm-svn: 268327
* AMDGPU: Custom lower v2i32 loads and storesMatt Arsenault2016-05-021-1/+98
| | | | | | | This will allow us to split up 64-bit private accesses when necessary. llvm-svn: 268296
* AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratchTom Stellard2016-05-021-1/+1
| | | | | | | | | We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295
* AMDGPU: Make i64 loads/stores promote to v2i32Matt Arsenault2016-05-024-18/+14
| | | | | | | | | | | | Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293
* [X86][AVX2] Added 128-bit wide shuffle testSimon Pilgrim2016-05-021-0/+14
| | | | | | Demonstrate missing 128-bit wide shuffle combine support llvm-svn: 268290
* ARM: fix handling of SUB immediates in peephole opt.Tim Northover2016-05-022-0/+119
| | | | | | | | | | | We were negating an immediate that was going to be used in a SUBri form unnecessarily. Since ADD/SUB are very similar we *can* do that, but we have to change the SUB to an ADD at the same time. This also applies to ADD, and allows us to handle a slightly larger range of immediates for those two operations. rdar://25992245 llvm-svn: 268276
* [NVPTX] Fix sign/zero-extending ldg/ldu instruction selectionJustin Holewinski2016-05-021-0/+34
| | | | | | | | | | | | | | | | | Summary: We don't have sign-/zero-extending ldg/ldu instructions defined, so we need to emulate them with explicit CVTs. We were originally handling the i8 case, but not any other cases. Fixes PR26185 Reviewers: jingyue, jlebar Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D19615 llvm-svn: 268272
* AMDGPU/SI: Use the hazard recognizer to break SMEM soft clausesTom Stellard2016-05-023-39/+37
| | | | | | | | | | | | | | | Summary: Add support for detecting hazards in SMEM soft clauses, so that we only break the clauses when necessary, either by adding s_nop or re-ordering other alu instructions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18870 llvm-svn: 268260
* [WebAssembly] Rename memory_size intrinsic to current_memoryDerek Schuff2016-05-022-10/+10
| | | | | | This follows the recent renaming in the wasm spec. llvm-svn: 268255
* AMDGPU/SI: Use hazard recognizer to detect DPP hazardsTom Stellard2016-05-021-2/+6
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18603 llvm-svn: 268247
* Enable the X86 call frame optimization for the 64-bit targets that allow it.David L Kreitzer2016-05-021-0/+193
| | | | | | | | Fixes PR27241. Differential Revision: http://reviews.llvm.org/D19688 llvm-svn: 268227
* [SystemZ] Temporarily disable codegen test int-add-12.ll.Jonas Paulsson2016-05-021-1/+1
| | | | | | This checks for AGSI transformation, which is temporarily disabled. llvm-svn: 268219
* [AVX512] VPACKUSWB/VPACKSSWB should not be encoded with EVEX.W=1. While ↵Craig Topper2016-05-011-12/+12
| | | | | | there fix the execution domain for VPACKSSDW/VPACKUSDW. llvm-svn: 268200
* getelementptr instruction, support index vector of EVT.Igor Breger2016-05-011-0/+9
| | | | | | Differential Revision: http://reviews.llvm.org/D19775 llvm-svn: 268195
* Change AVX512 braodcastsd/ss patterns interaction with spilling . New ↵Igor Breger2016-05-013-0/+231
| | | | | | | | implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth. Differential Revision: http://reviews.llvm.org/D19579 llvm-svn: 268190
* [AVX512] Prefer AVX512 VPACK instructions over AVX/AVX2 instructions when ↵Craig Topper2016-05-011-8/+8
| | | | | | VLX and BWI are supported. llvm-svn: 268189
* AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaitsTom Stellard2016-04-302-2/+4
| | | | | | This was supposed to be part of r268143. llvm-svn: 268154
* AMDGPU/SI: Enable the post-ra schedulerTom Stellard2016-04-3026-99/+102
| | | | | | | | | | | | | | Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
* [MBP] Use Function::optForSize() instead of checking OptimizeForSize directly.Haicheng Wu2016-04-292-0/+70
| | | | | | Fix a FIXME. Disable loop alignment if compiled with -Oz now. llvm-svn: 268121
* AMDGPU: Fix crash with unreachable terminators.Matt Arsenault2016-04-291-0/+56
| | | | | | | | | | If a block has no successors because it ends in unreachable, this was accessing an invalid iterator. Also stop counting instructions that don't emit any real instructions. llvm-svn: 268119
* Differential Revision: http://reviews.llvm.org/D19733Sriraman Tallam2016-04-294-12/+29
| | | | llvm-svn: 268106
* AMDGPU: Add kernarg.segment.ptr intrinsicMatt Arsenault2016-04-291-0/+21
| | | | llvm-svn: 268105
* DAGCombiner: Reduce truncated shl widthMatt Arsenault2016-04-2920-278/+414
| | | | llvm-svn: 268094
* [PPC] Enable shuffling of VSX vectorsGuozhi Wei2016-04-291-0/+15
| | | | | | This patch fixes PR27078 by enabling shuffling of vectors if VSX is available. llvm-svn: 268064
* [mips][FastISel] A store is not a load.Simon Dardis2016-04-291-1/+1
| | | | | | | | | | Correct trivial error. One of the failing tests from PR/27458. Reviewers: dsanders, vkalintiris, mcrosier Differential Review: http://reviews.llvm.org/D19726 llvm-svn: 268053
* [Hexagon] Optimize addressing modes for load/storeKrzysztof Parzyszek2016-04-294-12/+124
| | | | | | Patch by Jyotsna Verma. llvm-svn: 268051
* AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructionsTom Stellard2016-04-292-1/+21
| | | | | | | | | | | | | | Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043
* AMDGPU/SI: Assembler: Unify parsing/printing of operands.Nikolay Haustov2016-04-2927-210/+210
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015
* RegisterPressure: Fix default lanemask for missing regunit intervalsMatthias Braun2016-04-292-4/+4
| | | | | | | | | | | | | | In case of missing live intervals for a physical registers getLanesWithProperty() would report 0 which was not a safe default in all situations. Add a parameter to pass in a safe default. No testcase because in-tree targets do not skip computing register unit live intervals. Also cleanup the getXXX() functions to not perform the RequireLiveIntervals checks anymore so we do not even need to return safe defaults. llvm-svn: 267977
* [PowerPC] Fix the EH_SjLj_Setup pseudo.Marcin Koscielnicki2016-04-281-0/+48
| | | | | | | | | | | | | | | | | | | This instruction is just a control flow marker - it should not actually exist in the object file. Unfortunately, nothing catches it before it gets to AsmPrinter. If integrated assembler is used, it's considered to be a normal 4-byte instruction, and emitted as an all-0 word, crashing the program. With external assembler, a comment is emitted. Fixed by setting Size to 0 and handling it in MCCodeEmitter - this means the comment will still be emitted if integrated assembler is not used. This broke an ASan test, which has been disabled for a long time as a result (see the discussion on D19657). We can reenable it once this lands. llvm-svn: 267943
* [RDF] Improve handling of inline-asmKrzysztof Parzyszek2016-04-282-0/+73
| | | | | | | - Keep implicit defs from inline-asm instructions. - Treat register references from inline-asm as fixed. llvm-svn: 267936
* AMDGPU: Emit error if too much LDS is usedMatt Arsenault2016-04-282-3/+17
| | | | llvm-svn: 267922
* Reset the TopRPTracker's position in ScheduleDAGMILive::initQueuesKrzysztof Parzyszek2016-04-281-0/+151
| | | | | | | | | | | | | | | | | | | ScheduleDAGMI::initQueues changes the RegionBegin to the first non-debug instruction. Since it does not track register pressure, it does not affect any RP trackers. ScheduleDAGMILive inherits initQueues from ScheduleDAGMI, and it does reset the TopTPTracker in its schedule method. Any derived, target-specific scheduler will need to do it as well, but the TopRPTracker is only exposed as a "const" object to derived classes. Without the ability to modify the tracker directly, this leaves a derived scheduler with a potential of having the TopRPTracker out-of-sync with the CurrentTop. The symptom of the problem: void llvm::ScheduleDAGMILive::scheduleMI(llvm::SUnit *, bool): Assertion `TopRPTracker.getPos() == CurrentTop && "out of sync"' failed. Differential Revision: http://reviews.llvm.org/D19438 llvm-svn: 267918
* AMDGPU: Fix mishandling array allocations when promoting allocaMatt Arsenault2016-04-283-16/+66
| | | | | | | | The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916
* [mips][atomics] Fix partword atomic binary operation implementationSimon Dardis2016-04-281-67/+90
| | | | | | | | | | | | | | | Currently Mips::emitAtomicBinaryPartword() does not properly respect the width of pointers. For MIPS64 this causes the memory address that the ll/sc sequence uses to be truncated. At runtime this causes a segmentation fault. This can be fixed by applying similar changes as r266204, so that a full 64bit pointer is loaded. Reviewers: dsanders Differential Review: http://reviews.llvm.org/D19651 llvm-svn: 267900
* [RDF] Handle undefined registers in RDF copy propagationKrzysztof Parzyszek2016-04-281-0/+55
| | | | | | | When updating the graph, make sure that new uses without reaching defs are handled correctly. llvm-svn: 267891
* CodeGen: Add DetectDeadLanes pass.Matthias Braun2016-04-281-0/+408
| | | | | | | | | | | | | | | | | | | | The DetectDeadLanes pass performs a dataflow analysis of used/defined subregister lanes across COPY instructions and instructions that will get lowered to copies. It detects dead definitions and uses reading undefined values which are obscured by COPY and subregister usage. These dead definitions cause trouble in the register coalescer which cannot deal with definitions suddenly becoming dead after coalescing COPY instructions. For now the pass only adds dead and undef flags to machine operands. It should be possible to extend it in the future to remove the dead instructions and redo the analysis for the affected virtual registers. Differential Revision: http://reviews.llvm.org/D18427 llvm-svn: 267851
* [SystemZ] Support Swift Calling ConventionBryan Chan2016-04-283-0/+627
| | | | | | | | | | | | | | | | Summary: Port rL265480, rL264754, rL265997 and rL266252 to SystemZ, in order to enable the Swift port on the architecture. SwiftSelf and SwiftError are assigned to R10 and R9, respectively, which are normally callee-saved registers. For more information, see: RFC: Implementing the Swift calling convention in LLVM and Clang https://groups.google.com/forum/#!topic/llvm-dev/epDd2w93kZ0 Reviewers: kbarton, manmanren, rjmccall, uweigand Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19414 llvm-svn: 267823
* [X86] Enable the post-RA-scheduler for clang's default 32-bit cpu.Mitch Bodart2016-04-271-0/+40
| | | | | | | | | For compilations with no explicit cpu specified, this exhibits nice gains on Silvermont, with neutral performance on big cores. Differential Revision: http://reviews.llvm.org/D19138 llvm-svn: 267809
* [X86][FastISel] Make sure we use the right register class when we select stores.Quentin Colombet2016-04-271-3/+3
| | | | llvm-svn: 267806
* [X86] Fix the lowering of TLS calls.Quentin Colombet2016-04-272-4/+9
| | | | | | | | | | | The callseq_end node must be glued with the TLS calls, otherwise, the generic code will miss the uses of the returned value and will mark it dead. Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI, the pseudo uses the symbol address at this point not RDI and the lowering will do the right thing. llvm-svn: 267797
* AMDGPU: Account for globals in AMDGPUPromoteAlloca passMatt Arsenault2016-04-271-0/+32
| | | | | | Patch by Bas Nieuwenhuizen llvm-svn: 267791
* [AArch64] Set correct successors in CMPXCHG pseudo expansion.Ahmed Bougacha2016-04-271-1/+1
| | | | | | | | | | | transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas it should be a successor of the original MBB. Follow-up to r266339. Unfortunately, it's tricky to catch this in the verifier. llvm-svn: 267779
* [ARM] Set correct successors in CMPXCHG pseudo expansion.Ahmed Bougacha2016-04-271-3/+3
| | | | | | | | | | | | | | transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas it should be a successor of the original MBB. The testcase changes are caused by Thumb2SizeReduction, which was previously confused by the broken CFG. Follow-up to r266679. Unfortunately, it's tricky to catch this in the verifier. llvm-svn: 267778
* [X86]: Quit promoting 16 bit loads to 32 bit.Kevin B. Smith2016-04-271-1/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D19592 llvm-svn: 267773
* [Mips] Add support for llvm.thread.pointer intrinsic.Marcin Koscielnicki2016-04-271-0/+12
| | | | | | | | This will be used to implement __builtin_thread_pointer in clang. Differential Revision: http://reviews.llvm.org/D19569 llvm-svn: 267743
* AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsicNicolai Haehnle2016-04-271-0/+38
| | | | | | | | | | | | | | | | | Summary: So it appears that to guarantee some of the ordering requirements of a GLSL memoryBarrier() executed in the shader, we need to emit an s_waitcnt. (We can't use an s_barrier, because memoryBarrier() may appear anywhere in the shader, in particular it may appear in non-uniform control flow.) Reviewers: arsenm, mareko, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19203 llvm-svn: 267729
* [AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware ↵Artem Tamazov2016-04-271-1/+1
| | | | | | | | | | | | registers. Possibility to specify code of hardware register kept. Disassemble to symbolic name, if name is known. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19335 llvm-svn: 267724
* Revert r267649, it caused PR27539.Nico Weber2016-04-271-973/+0
| | | | llvm-svn: 267723
OpenPOWER on IntegriCloud