summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Refactor SOP instructions TD files.Valery Pykhtin2016-08-304-914/+1105
| | | | | | Differential revision: https://reviews.llvm.org/D23617 llvm-svn: 280101
* SILoadStoreOptimizer.cpp: Fix a warning in r279991. [-Wunused-variable]NAKAMURA Takumi2016-08-301-0/+1
| | | | llvm-svn: 280075
* Replace incorrect "#ifdef DEBUG" with "#ifndef NDEBUG".James Y Knight2016-08-301-11/+15
| | | | | | | | | | | | | | | | | | | | The former is simply wrong -- the code will either never be used or will always be used, rather than being dependent upon whether it's built with debug assertions enabled. The macro DEBUG isn't ever set by the llvm build system. But, the macro DEBUG(X) is defined (unconditionally) if you happen to include llvm/Support/Debug.h. The code in Value.h which was erroneously protected by the #ifdef DEBUG didn't even compile -- you can't cast<> from an LLVMOpaqueValue directly. Fortunately, it was never invoked, as Core.cpp included Value.h before Debug.h. The conditionalized code in AArch64CollectLOH.cpp was previously always used, as it includes Debug.h. llvm-svn: 280056
* [PowerPC] Force entry alignment in .got2Hal Finkel2016-08-301-2/+4
| | | | | | | | | Implement Bill's suggested fix for 32-bit targets for PR22711 (for the alignment of each entry). As pointed out in the bug report, we could just force the section alignment, since we only add pointer-sized things currently, but this fix is somewhat more future-proof. llvm-svn: 280049
* [PowerPC] Add support for -mlongcallHal Finkel2016-08-304-1/+15
| | | | | | | | | | | The "long call" option forces the use of the indirect calling sequence for all calls (even those that don't really need it). GCC provides this option; This is helpful, under certain circumstances, for building very-large binaries, and some other specialized use cases. Fixes PR19098. llvm-svn: 280040
* ADT: Give ilist<T>::reverse_iterator a handle to the current nodeDuncan P. N. Exon Smith2016-08-302-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reverse iterators to doubly-linked lists can be simpler (and cheaper) than std::reverse_iterator. Make it so. In particular, change ilist<T>::reverse_iterator so that it is *never* invalidated unless the node it references is deleted. This matches the guarantees of ilist<T>::iterator. (Note: MachineBasicBlock::iterator is *not* an ilist iterator, but a MachineInstrBundleIterator<MachineInstr>. This commit does not change MachineBasicBlock::reverse_iterator, but it does update MachineBasicBlock::reverse_instr_iterator. See note at end of commit message for details on bundle iterators.) Given the list (with the Sentinel showing twice for simplicity): [Sentinel] <-> A <-> B <-> [Sentinel] the following is now true: 1. begin() represents A. 2. begin() holds the pointer for A. 3. end() represents [Sentinel]. 4. end() holds the poitner for [Sentinel]. 5. rbegin() represents B. 6. rbegin() holds the pointer for B. 7. rend() represents [Sentinel]. 8. rend() holds the pointer for [Sentinel]. The changes are #6 and #8. Here are some properties from the old scheme (which used std::reverse_iterator): - rbegin() held the pointer for [Sentinel] and rend() held the pointer for A; - operator*() cost two dereferences instead of one; - converting from a valid iterator to its valid reverse_iterator involved a confusing increment; and - "RI++->erase()" left RI invalid. The unintuitive replacement was "RI->erase(), RE = end()". With vector-like data structures these properties are hard to avoid (since past-the-beginning is not a valid pointer), and don't impose a real cost (since there's still only one dereference, and all iterators are invalidated on erase). But with lists, this was a poor design. Specifically, the following code (which obviously works with normal iterators) now works with ilist::reverse_iterator as well: for (auto RI = L.rbegin(), RE = L.rend(); RI != RE;) fooThatMightRemoveArgFromList(*RI++); Converting between iterator and reverse_iterator for the same node uses the getReverse() function. reverse_iterator iterator::getReverse(); iterator reverse_iterator::getReverse(); Why doesn't iterator <=> reverse_iterator conversion use constructors? In order to catch and update old code, reverse_iterator does not even have an explicit conversion from iterator. It wouldn't be safe because there would be no reasonable way to catch all the bugs from the changed semantic (see the changes at call sites that are part of this patch). Old code used this API: std::reverse_iterator::reverse_iterator(iterator); iterator std::reverse_iterator::base(); Here's how to update from old code to new (that incorporates the semantic change), assuming I is an ilist<>::iterator and RI is an ilist<>::reverse_iterator: [Old] ==> [New] reverse_iterator(I) (--I).getReverse() reverse_iterator(I) ++I.getReverse() --reverse_iterator(I) I.getReverse() reverse_iterator(++I) I.getReverse() RI.base() (--RI).getReverse() RI.base() ++RI.getReverse() --RI.base() RI.getReverse() (++RI).base() RI.getReverse() delete &*RI, RE = end() delete &*RI++ RI->erase(), RE = end() RI++->erase() ======================================= Note: bundle iterators are out of scope ======================================= MachineBasicBlock::iterator, also known as MachineInstrBundleIterator<MachineInstr>, is a wrapper to represent MachineInstr bundles. The idea is that each operator++ takes you to the beginning of the next bundle. Implementing a sane reverse iterator for this is harder than ilist. Here are the options: - Use std::reverse_iterator<MBB::i>. Store a handle to the beginning of the next bundle. A call to operator*() runs a loop (usually operator--() will be called 1 time, for unbundled instructions). Increment/decrement just works. This is the status quo. - Store a handle to the final node in the bundle. A call to operator*() still runs a loop, but it iterates one time fewer (usually operator--() will be called 0 times, for unbundled instructions). Increment/decrement just works. - Make the ilist_sentinel<MachineInstr> *always* store that it's the sentinel (instead of just in asserts mode). Then the bundle iterator can sniff the sentinel bit in operator++(). I initially tried implementing the end() option as part of this commit, but updating iterator/reverse_iterator conversion call sites was error-prone. I have a WIP series of patches that implements the final option. llvm-svn: 280032
* AMDGPU/R600: Cleanup DAGCombineJan Vesely2016-08-291-15/+12
| | | | | | | | | Move SDLoc initialization to comon place. fall back to AMDGPU version in one place Differential Revision: https://reviews.llvm.org/D23900 llvm-svn: 280030
* Fix typo in comment. NFC.Michael Kuperstein2016-08-291-1/+1
| | | | llvm-svn: 280025
* [PowerPC] Fix i8/i16 atomics for little-Endian targets without partword atomicsHal Finkel2016-08-291-6/+12
| | | | | | | | | | | | | | For little-Endian PowerPC, we generally target only P8 and later by default. However, generic (older) 64-bit configurations are still an option, and in that case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics without true i8/i16 atomic operations, we emulate using i32 atomics in combination with a bunch of shifting and masking, etc. The amount by which to shift in little-Endian mode is different from the amount in big-Endian mode (it is inverted -- meaning we can leave off the xor when computing the amount). Fixes PR22923. llvm-svn: 280022
* AMDGPU/R600: Remove MergeVectorStores from legalizationJan Vesely2016-08-293-65/+0
| | | | | | | | This is handled by DAGCombiner in a more generic way Differential Revision: https://reviews.llvm.org/D23970 llvm-svn: 280019
* AMDGPU: fix mismatch tags, NFCSaleem Abdulrasool2016-08-292-2/+2
| | | | llvm-svn: 280006
* [Myriad]: add missing 'mcpu' valuesDouglas Katzman2016-08-291-0/+3
| | | | | | Should have been done with r276646. llvm-svn: 279996
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-299-1/+445
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
* AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the schedulerTom Stellard2016-08-292-111/+148
| | | | | | | | | | | | | | | | | | | | Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge. Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23814 llvm-svn: 279991
* GlobalISel: legalize frem to a libcall on AArch64.Tim Northover2016-08-293-2/+5
| | | | llvm-svn: 279988
* GlobalISel: rework CallLowering so that it can be used for libcalls too.Tim Northover2016-08-292-19/+11
| | | | | | | There should be no functional change here, I'm just making the implementation of "frem" (to libcall) legalization easier for a followup. llvm-svn: 279987
* AMDGPU/R600: Fix fixups used for constant arraysMatt Arsenault2016-08-291-0/+1
| | | | | | Fixes bug 29289 llvm-svn: 279986
* [AArch64] Adjust the scheduling model for Exynos M1.Evandro Menezes2016-08-291-4/+14
| | | | | | Further refine the model for loads. llvm-svn: 279976
* AMDGPU/SI: Improve register allocation hints for sopk instructionsTom Stellard2016-08-291-0/+1
| | | | | | | | | | | | | | | | | | | Summary: For shrinking SOPK instructions, we were creating a hint to tell the register allocator to use the register allocated for src0 for the dst operand as well. However, this seems to not work sometimes depending on the order virtual registers are assigned physical registers. To fix this, I've added a second allocation hint which does the reverse, asks that the register allocated for dst is used for src0. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23862 llvm-svn: 279968
* Fix -Wunused-but-set-variable warning.Haojian Wu2016-08-291-4/+0
| | | | | | | | | | | | Summary: A follow-up fix on r279958. Reviewers: bkramer Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D23989 llvm-svn: 279964
* AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint()Tom Stellard2016-08-291-0/+11
| | | | | | | | | | | | | | Summary: The SILoadStoreOptimizer will need to use AliasAnalysis here in order to move it before scheduling. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23813 llvm-svn: 279963
* [AVX512] In some cases KORTEST instruction may be used instead of ZEXT + ↵Igor Breger2016-08-292-5/+23
| | | | | | | | TEST sequence. Differential Revision: http://reviews.llvm.org/D23490 llvm-svn: 279960
* [X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just ↵Craig Topper2016-08-291-9/+4
| | | | | | | | create a ConstantFPSDNode and let that be lowered. This allows broadcast loads to used when available. llvm-svn: 279958
* [AVX-512] Always use v8i64 when converting 512-bit FAND/FOR/FXOR/FANDN to ↵Craig Topper2016-08-291-5/+3
| | | | | | integer operations when DQI isn't supported. This is consistent with the recent changes to promote logical operations to i64 vectors. llvm-svn: 279957
* [AVX-512] Add support for selecting 512-bit VPABSB/VPABSW when BWI is available.Craig Topper2016-08-282-2/+19
| | | | llvm-svn: 279951
* [AVX-512] Add patterns for selecting 128/256-bit EVEX VPABS instructions.Craig Topper2016-08-282-2/+37
| | | | llvm-svn: 279950
* [X86][AVX512] Only combine EVEX targets shuffles to shuffles of the same ↵Simon Pilgrim2016-08-281-4/+12
| | | | | | | | | | number of vector elements Over eager combing prevents the correct folding of writemasks. At the moment this occurs for ALL EVEX shuffles, in the future we need to check that the user of the root shuffle is a VSELECT that can fold to a writemask. llvm-svn: 279934
* [PowerPC] Implement lowering for atomicrmw min/max/umin/umaxHal Finkel2016-08-284-5/+152
| | | | | | Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818. llvm-svn: 279933
* [AVX-512] Promote AND/OR/XOR to v2i64/v4i64/v8i64 even when we have ↵Craig Topper2016-08-282-18/+124
| | | | | | | | | | AVX512F/AVX512VL. Previously we weren't creating masked logical operations if bitcasts appeared between the logic operation and the select. The IR optimizers can move bitcasts across logic operations and create these cases. To minimize the number of cases we need to handle, this change promotes all logic ops to an i64 vector type just like when only SSE or AVX is available. Unfortunately, this also has the consequence of making it difficult to select unmasked VPANDD/VPORD/VPXORD in all the cases it was previously used. This is the cause of most of the test change. This shouldn't result in any functional change though. llvm-svn: 279929
* [X86] Rename PABSB/D/W instructions to be consistent with SSE/AVX ↵Craig Topper2016-08-282-40/+40
| | | | | | instructions instead of ending 128/256. NFC llvm-svn: 279927
* AMDGPU/R600: Enable Load combineJan Vesely2016-08-271-0/+1
| | | | | | | | Fix and improve tests Differential Revision: https://reviews.llvm.org/D23899 llvm-svn: 279925
* [X86] Rename predicate function that detects if requires one of the REX.B, ↵Craig Topper2016-08-271-15/+16
| | | | | | REX.X or REX.R bits. It's old name conflicted with a function in X8II namespace that doesnt' quite do the same thing. NFC llvm-svn: 279924
* [X86] Keep looping over operands looking for byte registers even if we ↵Craig Topper2016-08-271-5/+4
| | | | | | already found a register that requires a REX prefix. Otherwise we don't error if a high byte register is used after SPL/BPL/DIL/SIL. llvm-svn: 279923
* [X86] Include XMM/YMM/ZMM16-23 in X86II::isX86_64ExtendedReg. This feels ↵Craig Topper2016-08-272-8/+4
| | | | | | more consistent with its name and simplifies assembler code. llvm-svn: 279922
* [X86] Don't allow DR8-DR15 to be assembled in 32-bit mode. Add missing test ↵Craig Topper2016-08-271-0/+2
| | | | | | for CR8-CR15. llvm-svn: 279921
* [X86] Remove stale comment about FixupBWInsts pass being off by default. NFCCraig Topper2016-08-271-2/+0
| | | | llvm-svn: 279915
* [AVX-512] Allow EVEX encoding unordered/ordered/equal/notequal ↵Craig Topper2016-08-272-8/+28
| | | | | | VCMPPS/PD/SS/SD to be commuted just like the SSE and AVX counterparts. llvm-svn: 279914
* [X86] Enable FR32/FR64 cmpeq/cmpne/cmpunord/cmpord to be commuted.Craig Topper2016-08-272-0/+9
| | | | llvm-svn: 279913
* [AVX-512] Add load folding for EVEX vcmpps/pd/ss/sd.Craig Topper2016-08-271-0/+8
| | | | llvm-svn: 279912
* AMDGPU: Mark sched model completeMatt Arsenault2016-08-271-1/+1
| | | | | | Fixes bug 26800 llvm-svn: 279910
* AMDGPU: Remove unneeded implicit exec uses/defsMatt Arsenault2016-08-272-40/+48
| | | | | | | SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec. SI_IF_BREAK and SI_ELSE_BREAK do not read it either. llvm-svn: 279909
* AMDGPU: Select mulhi 24-bit instructionsMatt Arsenault2016-08-277-20/+163
| | | | llvm-svn: 279902
* AMDGPU: Move cndmask pseudo to be isel pseudoMatt Arsenault2016-08-273-23/+31
| | | | | | | | There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901
* AMDGPU: Fix sched type for branchesMatt Arsenault2016-08-271-1/+1
| | | | llvm-svn: 279900
* AMDGPU: Remove register operand from si_mask_branchMatt Arsenault2016-08-272-5/+3
| | | | | | | | | It isn't used for anything, and is also misleading since it could be spilled at the end of the block, so it can't be relied on. There ends up being a verifier error about using an undefined register since the spill kills the register. llvm-svn: 279899
* AMDGPU: Improve error reporting for maximum branch distanceMatt Arsenault2016-08-271-30/+61
| | | | | | Unfortunately this seems to only help the assembler diagnostic. llvm-svn: 279895
* [AArch64][CallLowering] Do not assert for not implemented part.Quentin Colombet2016-08-271-6/+9
| | | | | | | When doing the ABI lowering, report a failure to the caller instead of asserting. This gives a chance for the caller to recover. llvm-svn: 279890
* AMDGPU/SI: Canonicalize offset order for merged DS instructionsTom Stellard2016-08-261-3/+15
| | | | | | | | | | | | | | | | | | | Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads together without out explicitly clustering them, which would give us non-sorted offsets. Also, we will want to do this if we move the load/store optimizer before the scheduler. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23776 llvm-svn: 279870
* XXXTom Stellard2016-08-261-1/+1
| | | | llvm-svn: 279868
* AMDGPU/SI: Use a better method for determining the largest pressure setsTom Stellard2016-08-263-15/+41
| | | | | | | | | | | | | | | | | | | | | | Summary: There are a few different sgpr pressure sets, but we only care about the one which covers all of the sgprs. We were using hard-coded register pressure set names to determine the reg set id for the biggest sgpr set. However, we were using the wrong name, and this method is pretty fragile, since the reg pressure set names may change. The new method just looks for the pressure set that contains the most reg units and sets that set as our SGPR pressure set. We've also adopted the same technique for determining our VGPR pressure set. Reviewers: arsenm Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23687 llvm-svn: 279867
OpenPOWER on IntegriCloud