summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Change mubuf soffset register when SP relativeMatt Arsenault2017-05-172-15/+53
| | | | | | | | | | Check the MachinePointerInfo for whether the access is supposed to be relative to the stack pointer. No tests because this is used in later commits implementing calls. llvm-svn: 303301
* [X86][AVX512] Add 512-bit vector ctlz costs + testsSimon Pilgrim2017-05-171-0/+24
| | | | llvm-svn: 303300
* AMDGPU: Make better use of op_sel with high componentsMatt Arsenault2017-05-172-8/+57
| | | | | | Handle more general swizzles. llvm-svn: 303296
* [X86][AVX512] Add 512-bit vector cttz costs + testsSimon Pilgrim2017-05-171-0/+6
| | | | llvm-svn: 303293
* Only enable LiveRangeShrink for x86.Dehao Chen2017-05-171-0/+1
| | | | | | | | | | | | | | Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure. Reviewers: MatzeB, qcolombet Reviewed By: qcolombet Subscribers: jholewinski, jyknight, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33294 llvm-svn: 303292
* AMDGPU: Try to use op_sel when selecting packed instructionsMatt Arsenault2017-05-171-1/+29
| | | | | | | | | | | | Avoids instructions to pack a vector when the source is really a scalar being broadcast. Also be smarter and look for per-component fneg. Doesn't yet handle scalar from upper half of register or other swizzles. llvm-svn: 303291
* [WebAssembly][NFC] Update expected testsuite failures for newly passing testsJacob Gravelle2017-05-171-3/+0
| | | | | | | | | | | | Summary: r303050 fixes crashes when calling scalarizeMaskedMemIntrin pass from WebAssembly backend. This updates expected test failures for that. Reviewers: sbc100 Subscribers: jfb, llvm-commits, dschuff Differential Revision: https://reviews.llvm.org/D33295 llvm-svn: 303288
* AMDGPU: Use appropriate soffset for spillingMatt Arsenault2017-05-172-20/+20
| | | | | | | This needs to be the frame offset register, and not the global scratch wave offset register. For kernels, these are the same. llvm-svn: 303287
* AMDGPU: Fix min3/max3 combines for f16/i16Matt Arsenault2017-05-173-2/+25
| | | | | | Fix missing instruction definitions for min3/max3. llvm-svn: 303284
* [X86][AVX512] Add 512-bit vector bitreverse costs + testsSimon Pilgrim2017-05-171-0/+18
| | | | llvm-svn: 303283
* [PPC] Properly update register save area offsetsKrzysztof Parzyszek2017-05-171-9/+14
| | | | | | | | | | | | The variables MinGPR/MinG8R were not updated properly when resetting the offsets, which in the included testcase lead to saving the CR register in the same location as R30. This fixes another issue reported in PR26519. Differential Revision: https://reviews.llvm.org/D33017 llvm-svn: 303257
* [GlobalISel][X86] Support add i64 in IA32.Igor Breger2017-05-172-0/+71
| | | | | | | | | | | | | | Summary: support G_UADDE instruction selection. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D33096 llvm-svn: 303255
* [SystemZ] Modelling of costs of divisions with a constant power of 2.Jonas Paulsson2017-05-171-1/+33
| | | | | | | | Such divisions will eventually be implemented with shifts which should be reflected in the cost function. Review: Ulrich Weigand llvm-svn: 303254
* Reland r303247: [ARM] GlobalISel: Remove dead instruction selection codeDiana Picus2017-05-171-15/+0
| | | | | | | | It only failed on llvm-clang-x86_64-expensive-checks-win, probably because the TableGen stuff hasn't been regenerated. Requires a clean build. llvm-svn: 303252
* Revert "[ARM] GlobalISel: Remove dead instruction selection code"Diana Picus2017-05-171-0/+15
| | | | | | | This reverts commit r303247 because the tests are failing on some bots. Sorry! llvm-svn: 303249
* [ARM] GlobalISel: Remove dead instruction selection codeDiana Picus2017-05-171-15/+0
| | | | | | | We can now generate code for selecting G_ADD, G_SUB and G_MUL. Remove the hand-written versions. llvm-svn: 303247
* [Sparc] Remove execute permissions from non-executable text filesDaniel Cederman2017-05-173-0/+0
| | | | | | | | | | | | Reviewers: jyknight, lero_chris, venkatra Reviewed By: jyknight Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27127 llvm-svn: 303245
* BitVector: add iterators for set bitsFrancis Visoiu Mistrih2017-05-174-7/+4
| | | | | | Differential revision: https://reviews.llvm.org/D32060 llvm-svn: 303227
* Re-commit r302678, fixing PR33053.Amara Emerson2017-05-164-269/+101
| | | | | | | The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211
* [PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync.Tim Shen2017-05-165-8/+67
| | | | | | | | | | | | | | | | | Summary: This fixes pr32392. The lowering pipeline is: llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in expandPostRAPseudo. The reason why expandPostRAPseudo is chosen is because previous passes are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne- 7, .+4 (some branch pass(s)). Differential Revision: https://reviews.llvm.org/D32763 llvm-svn: 303205
* Revert "[X86] Replace slow LEA instructions in X86"Reid Kleckner2017-05-164-237/+43
| | | | | | | This reverts commit r303183, it broke various buildbots and introduced sanitizer errors. llvm-svn: 303199
* Revert "[ARM] Mark LEApcrel instructions as isAsCheapAsAMove"Renato Golin2017-05-163-4/+4
| | | | | | | | | | | | | | | | | Revert "[ARM] Mark LEApcrel as not having side effects" This reverts commit r303054 and r303053, as they broke the ARM self-hosting buildbots: http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/1550 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/1349 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/1845 Offline investigation on course. llvm-svn: 303193
* [AMDGPU] Use GCNRPTracker dumper methods in schedulerStanislav Mekhanoshin2017-05-163-18/+21
| | | | | | Differential Revision: https://reviews.llvm.org/D33244 llvm-svn: 303186
* [AMDGPU] Cache live-ins and register pressure in schedulerStanislav Mekhanoshin2017-05-162-75/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using LIS can be quite expensive, so caching of calculated region live-ins and pressure is implemented. It does two things: 1. Caches the info for the second stage when we schedule with decreased target occupancy. 2. Tracks the basic block from top to bottom thus eliminating the need to scan whole register file liveness at every region split in the middle of the block. The scheduling is now done in 3 stages instead of two, with the first one being really a no-op and only used to collect scheduling regions as sent by the scheduler driver. There is no functional change to the current behavior, only compilation speed is affected. In general computeBlockPressure() could be simplified if we switch to backward RP tracker, because scheduler sends regions within a block starting from the last upward. We could use a natural order of upward tracker to seamlessly change between regions of the same block, since live reg set of a previous tracked region would become a live-out of the next region. That however requires fixing upward tracker to properly account defs and uses of the same instruction as both are contributing to the current pressure. When we converge on the produced pressure we should be able to switch between them back and forth. In addition, backward tracker is less expensive as it uses LIS in recede less often than forward uses it in advance. At the moment the worst known case compilation time has improved from 26 minutes to 8.5. Differential Revision: https://reviews.llvm.org/D33117 llvm-svn: 303184
* [X86] Replace slow LEA instructions in X86Lama Saba2017-05-164-43/+237
| | | | | | | | | | | | | | | According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303183
* [AMDGPU] Turn register pressure estimation into forward trackerStanislav Mekhanoshin2017-05-164-135/+196
| | | | | | | | | | This factors register pressure estimation mechanism from the GCNSchedStrategy into the forward tracker to unify interface with other strategies and expose it to other interested phases. Differential Revision: https://reviews.llvm.org/D33105 llvm-svn: 303179
* Fix an improperly placed curly bracket. NFC.Chad Rosier2017-05-161-1/+1
| | | | llvm-svn: 303165
* AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable]NAKAMURA Takumi2017-05-162-2/+4
| | | | llvm-svn: 303137
* IR: Give function GlobalValue::getRealLinkageName() a less misleading name: ↵Peter Collingbourne2017-05-162-3/+3
| | | | | | | | | | | | dropLLVMManglingEscape(). This function gives the wrong answer on some non-ELF platforms in some cases. The function that does the right thing lives in Mangler.h. To try to discourage people from using this function, give it a different name. Differential Revision: https://reviews.llvm.org/D33162 llvm-svn: 303134
* [AMDGPU] Kill now unused phiInfoElementGetDebugLoc(). NFCI.Davide Italiano2017-05-151-5/+0
| | | | llvm-svn: 303122
* AArch64: use linker-private symbols for globals in MachO.Tim Northover2017-05-152-0/+11
| | | | | | | | We don't use section-relative relocations on AArch64, so all symbols must be at least visible to the linker (i.e. properly global or l_whatever, but not L_whatever). llvm-svn: 303118
* [SLP] Enable 64-bit wide vectorization on AArch64Adam Nemet2017-05-153-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. *** Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. ** SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116
* Revert r302678 "[AArch64] Enable use of reduction intrinsics."Hans Wennborg2017-05-154-99/+269
| | | | | | | | | | | | | | | | | | This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115
* Re-submit AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-157-12/+3245
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111
* AArch64: diagnose unrecognized features in .cpu directive.Tim Northover2017-05-151-2/+17
| | | | | | | We were silently ignoring any features we couldn't match up, which led to errors in an inline asm block missing the conventional "\n\t". llvm-svn: 303108
* [AArch64][Falkor] Fix sched details for FMOVGeoff Berry2017-05-152-4/+6
| | | | llvm-svn: 303099
* Revert 303091.Jan Sjodin2017-05-157-3380/+12
| | | | llvm-svn: 303098
* Add AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-157-12/+3380
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091
* [NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ↵Simon Pilgrim2017-05-151-3/+3
| | | | | | | | | | | | | | ReadMem/WriteMem (PR32146) Follow up to D33147 NVPTXTargetLowering::LowerCall was trusting the default argument values. Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33189 llvm-svn: 303082
* [AArch64] Enable FeatureFuseAES on Cortex-A72.Florian Hahn2017-05-151-0/+1
| | | | | | | | This patch enables fusing dependent AESE/AESMC and AESD/AESIMC instruction pairs on Cortex-A72, as recommended in the Software Optimization Guide, section 4.10. llvm-svn: 303073
* [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64Dmitry Preobrazhensky2017-05-151-11/+22
| | | | | | | | | | See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33123 llvm-svn: 303070
* [AMDGPU][MC] Removed V_MQSAD_U16_U8Dmitry Preobrazhensky2017-05-151-3/+0
| | | | | | | | | | | | This instruction does not really exist See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018 Reviewers: vpykhtin, artem.tamazov Differential Revision: https://reviews.llvm.org/D33126 llvm-svn: 303055
* [ARM] Mark LEApcrel instructions as isAsCheapAsAMoveJohn Brawn2017-05-153-3/+3
| | | | | | | | | | | Doing this means that if an LEApcrel is used in two places we will rematerialize instead of generating two MOVs. This is particularly useful for printfs using the same format string, where we want to generate an address into a register that's going to get corrupted by the call. Differential Revision: https://reviews.llvm.org/D32858 llvm-svn: 303054
* [ARM] Mark LEApcrel as not having side effectsJohn Brawn2017-05-151-2/+2
| | | | | | | | | | | | | | | Doing this lets us hoist it out of loops, and I've also marked it as rematerializable the same as the thumb1 and thumb2 counterparts. It looks like it being marked as such was just a mistake, as the commit that made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the LEApcrelJT instructions were marked as having side-effects, so it looks like the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was accidentally marked as such also. Differential Revision: https://reviews.llvm.org/D32857 llvm-svn: 303053
* [NVPTX] Don't rely on default arguments to ↵Simon Pilgrim2017-05-151-3/+8
| | | | | | | | SelectionDAG::getMemIntrinsicNode. NFC. NFC followup to D33147, this explicitly sets all the arguments (instead of relying on the defaults) to SelectionDAG::getMemIntrinsicNode to help identify -verify-machineinstrs issues. llvm-svn: 303047
* [X86] Utilize SelectionDAG::getSelect(). NFC.Zvi Rackover2017-05-141-34/+27
| | | | | | | | | | Replace SelectionDAG::getNode(ISD::SELECT, ...) and SelectionDAG::getNode(ISD::VSELECT, ...) with SelectionDAG::getSelect(...) Saves a few lines of code and in some cases saves the need to explicitly check the type of the desired node. llvm-svn: 303024
* [X86][AVX1] Account for cost of extract/insert of 256-bit shiftsSimon Pilgrim2017-05-141-49/+49
| | | | llvm-svn: 303023
* [X86][AVX2] Fix costs for v4i64 ashr by splatSimon Pilgrim2017-05-141-0/+5
| | | | llvm-svn: 303022
* [X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splatSimon Pilgrim2017-05-141-12/+12
| | | | llvm-svn: 303021
* [X86] Remove unused value from IntrinsicType enum. NFCCraig Topper2017-05-142-7/+1
| | | | llvm-svn: 303018
OpenPOWER on IntegriCloud