summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Fix broken condition in hazard recognizerMatt Arsenault2017-03-173-17/+25
| | | | | | Fixes bug 32248. llvm-svn: 298125
* AMDGPU: Fix handling of constant phi input loop conditionsMatt Arsenault2017-03-171-5/+8
| | | | | | | | If the loop condition was an i1 phi with a constantexpr input, this would add a loop intrinsic fed by a phi dependent on a call to if.break in the same block. Insert the call in the loop header. llvm-svn: 298121
* AMDGPU: Cleanup control flow intrinsicsMatt Arsenault2017-03-1710-106/+80
| | | | | | | | | | | | | | | | Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
* Only unswitch loops with uniform conditionsStanislav Mekhanoshin2017-03-171-0/+2
| | | | | | | | | | | | | | | | | | Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104
* [AMDGPU] Run always inliner early in optStanislav Mekhanoshin2017-03-161-0/+1
| | | | | | | | | | We can mark functions to always inline early in the opt. Since we do not have call support this early inlining creates opportunities for inter-procedural optimizations which would not occur otherwise. Differential Revision: https://reviews.llvm.org/D31016 llvm-svn: 297958
* AMDGPU: Allow sinking of addressing modes for atomic_inc/decMatt Arsenault2017-03-152-7/+28
| | | | llvm-svn: 297913
* AMDGPU: Fix unnecessary ands when packing f16 vectorsMatt Arsenault2017-03-156-6/+25
| | | | | | | | | computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
* AMDGPU: Minor SIAnnotateControlFlow cleanupsMatt Arsenault2017-03-151-31/+35
| | | | | | Newline fixes, early return, range loops. llvm-svn: 297865
* Cyle -> Cycle; NFCISanjay Patel2017-03-152-4/+4
| | | | llvm-svn: 297846
* Reverted unintended commitSimon Pilgrim2017-03-151-2/+2
| | | | llvm-svn: 297841
* Fix Wint-in-bool-context warning (PR32248)Simon Pilgrim2017-03-151-2/+2
| | | | llvm-svn: 297840
* AMDGPU: Re-use TM.getNullPointerValueMatt Arsenault2017-03-131-10/+8
| | | | llvm-svn: 297662
* AMDGPU: Treat 0 as private null pointer in addrspacecast loweringMatt Arsenault2017-03-132-8/+14
| | | | llvm-svn: 297658
* AMDGPU: Remove packf16 intrinsicMatt Arsenault2017-03-112-7/+0
| | | | llvm-svn: 297557
* AMDGPU: Keep track of modifiers when converting v_mac to v_madMatt Arsenault2017-03-111-4/+10
| | | | | | | | | | | | | | | | Since v_max_f32_e64/v_max_f16_e64 can be folded if the target instruction supports the clamp bit, we also need to maintain modifiers when converting v_mac to v_mad. This fixes a rendering issue with Dirt Rally because a v_mac instruction with the clamp bit set was converted to a v_mad but that bit was lost during the conversion. Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit") Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 297556
* [AMDGPU] Remove getBidirectionalReasonRankStanislav Mekhanoshin2017-03-111-13/+1
| | | | | | | | | | | | | | This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536
* [AMDGPU] Split R600/SI getFrameIndexReference and emit stack object offsets ↵Konstantin Zhuravlyov2017-03-106-39/+48
| | | | | | | | for SI Differential Revision: https://reviews.llvm.org/D29674 llvm-svn: 297499
* Rename PT_NOTE namespace name used in AMDGPUPTNote.hYaxun Liu2017-03-103-10/+11
| | | | | | | | Patch by Guansong Zhang. Differential Revision: https://reviews.llvm.org/D30750 llvm-svn: 297498
* AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not ↵Changpeng Fang2017-03-091-0/+4
| | | | | | | | | | | | vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 llvm-svn: 297328
* AMDGPU: Don't wait at end of block with a trivial successorMatt Arsenault2017-03-081-2/+14
| | | | | | | | | | If there is only one successor, and that successor only has one predecessor the wait can obviously be delayed until uses or the end of the next block. This avoids code quality regressions when there are trivial fallthrough blocks inserted for structurization. llvm-svn: 297251
* AMDGPU: Constant fold rcp nodeMatt Arsenault2017-03-081-2/+12
| | | | | | | When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. llvm-svn: 297248
* AMDGPU/SI: Do not insert EndCf in an unreachable blockChangpeng Fang2017-03-071-2/+3
| | | | | | | | | | Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D22025 llvm-svn: 297243
* Recommit: [globalisel] Change LLT constructor string into an LLT-based ↵Daniel Sanders2017-03-071-1/+1
| | | | | | | | | | | | | | | | | | | | object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297241
* Revert r297177: Change LLT constructor string into an LLT-based object ...Daniel Sanders2017-03-071-1/+1
| | | | | | | | | | More module problems. This time it only showed up in the stage 2 compile of clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile. Somehow, this change causes the build to need Attributes.gen before it's been generated. llvm-svn: 297188
* [globalisel] Change LLT constructor string into an LLT-based object that ↵Daniel Sanders2017-03-071-1/+1
| | | | | | | | | | | | | | | | | | knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297177
* Revert "AMDGPU: Set MCAsmInfo::PointerSize"Konstantin Zhuravlyov2017-03-071-1/+0
| | | | | | | | It breaks line tables because the patch is not complete, working on a complete one at the moment This reverts commit r294031. llvm-svn: 297118
* AMDGPU/R600: Fix ALU clause markers use detectionJan Vesely2017-03-061-2/+5
| | | | | | | | also exit early on kill instead of redefinition. Differential Revision: https://reviews.llvm.org/D30230 llvm-svn: 297060
* Make TargetInstrInfo::isPredicable take a const reference, NFCKrzysztof Parzyszek2017-03-032-3/+3
| | | | llvm-svn: 296901
* [AMDGPU][MC] Fix for Bug 30829 + LIT testsDmitry Preobrazhensky2017-03-037-0/+163
| | | | | | | | Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction). Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code). Added LIT tests. llvm-svn: 296873
* AMDGPU: Fix missing dominator tree dependencyMatt Arsenault2017-03-021-0/+1
| | | | llvm-svn: 296842
* AMDGPU: Fix types for VOP_I16_I16_I16Matt Arsenault2017-02-281-1/+1
| | | | llvm-svn: 296523
* AMDGPU: Add definition for v_swap_b32Matt Arsenault2017-02-281-4/+31
| | | | | | | | This is somewhat tricky because there are two pairs of tied operands, and it isn't allowed to be VOP3 encoded. llvm-svn: 296519
* AMDGPU: Add definition for v_xad_u32Matt Arsenault2017-02-281-0/+2
| | | | llvm-svn: 296515
* AMDGPU: Add ds_nop to assemblerMatt Arsenault2017-02-281-1/+21
| | | | llvm-svn: 296513
* AMDGPU: Add definitions for ds_{read|write}_b{96|128}Matt Arsenault2017-02-281-4/+19
| | | | | | | | | It's not clear to me if this is always better than doing ds_write2_b64 This adds the constraint of a 128-bit register input instead of a pair of 64-bit. llvm-svn: 296512
* [AMDGPU] Add second pass of the schedulerStanislav Mekhanoshin2017-02-282-7/+126
| | | | | | | | | | | If during scheduling we have identified that we cannot keep optimistic occupancy increase critical register pressure limit and try scheduling of the whole function again. In this case blocks with smaller pressure will have a chance for better scheduling. Differential Revision: https://reviews.llvm.org/D30442 llvm-svn: 296506
* [AMDGPU] New method to estimate register pressureStanislav Mekhanoshin2017-02-282-21/+150
| | | | | | | | | | | | | | | | | | | | | | | | This change introduces new method to estimate register pressure in GCNScheduler. Standard RPTracker gives huge error due to the following reasons: 1. It does not account for live-ins or live-outs if value is not used in the region itself. That creates a huge error in a very common case if there are a lot of live-thu registers. 2. It does not properly count subregs. 3. It assumes a register used as an input operand can be reused as an output. This is not always possible by itself, this is not what RA will finally do in many cases for various reasons not limited to RA's inability to do so, and this is not so if the value is actually a live-thu. In addition we can now see clear separation between live-in pressure which we cannot change with the scheduling and tentative pressure which we can change. Differential Revision: https://reviews.llvm.org/D30439 llvm-svn: 296491
* [AMDGPU] Change amd_kernel_code_t's minor version to 1Konstantin Zhuravlyov2017-02-281-1/+1
| | | | | | | | - We do emit amd_kernel_code_t v1.1 Differential Revision: https://reviews.llvm.org/D30433 llvm-svn: 296489
* [AMDGPU] Fix read-undef flags when schedule is revertedStanislav Mekhanoshin2017-02-281-12/+15
| | | | | | | | | | | | | If two subregs of the same register are defined and we need to revert schedule changing def order, we will end up with both instructions having def,read-undef flags because adjustLaneLiveness() will only set this flag but will not remove it. Fix this by removing read-undef flags before calling adjustLaneLiveness. Differential Revision: https://reviews.llvm.org/D30428 llvm-svn: 296484
* Revert r296474 - [globalisel] Change LLT constructor string into an LLT ↵Daniel Sanders2017-02-281-1/+1
| | | | | | | | subclass that knows how to generate it. There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1. llvm-svn: 296478
* [globalisel] Change LLT constructor string into an LLT subclass that knows ↵Daniel Sanders2017-02-281-1/+1
| | | | | | | | | | | | | | | | | | how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 296474
* AMDGPU: Use v_med3_{f16|i16|u16}Matt Arsenault2017-02-277-33/+52
| | | | llvm-svn: 296401
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-2711-63/+378
| | | | llvm-svn: 296396
* AMDGPU: Add some of the new gfx9 VOP3 instructionsMatt Arsenault2017-02-271-0/+12
| | | | llvm-svn: 296382
* AMDGPU: Support inlineasm for packed instructionsMatt Arsenault2017-02-271-1/+42
| | | | | | | Add packed types as legal so they may be used with inlineasm. Keep all operations expanded for now. llvm-svn: 296379
* AMDGPU: Don't fold immediate if clamp/omod are setMatt Arsenault2017-02-272-8/+13
| | | | | | | Doesn't fix any practical problems because clamp/omod are currently folded after peephole optimizer. llvm-svn: 296375
* AMDGPU: Fold omod into instructionsMatt Arsenault2017-02-273-6/+146
| | | | llvm-svn: 296372
* AMDGPU: Add f16 to shader calling conventionsMatt Arsenault2017-02-271-3/+3
| | | | | | Mostly useful for writing tests for f16 features. llvm-svn: 296370
* AMDGPU: Add VOP3P instruction formatMatt Arsenault2017-02-2723-86/+879
| | | | | | | | Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368
* [AMDGPU] Runtime metadata fixes:Konstantin Zhuravlyov2017-02-275-32/+79
| | | | | | | | | | | - Verify that runtime metadata is actually valid runtime metadata when assembling, otherwise we could accept the following when assembling, but ocl runtime will reject it: .amdgpu_runtime_metadata { amd.MDVersion: [ 2, 1 ], amd.RandomUnknownKey, amd.IsaInfo: ... - Make IsaInfo optional, and always emit it. Differential Revision: https://reviews.llvm.org/D30349 llvm-svn: 296324
OpenPOWER on IntegriCloud