summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] AArch32 v8 NEON is still not IEEE-754 compliantRenato Golin2016-04-181-1/+4
| | | | llvm-svn: 266603
* [mips][ias] Stream macro expansions to output instead of buffering them. NFC.Daniel Sanders2016-04-183-353/+336
| | | | | | | | | | | | | | | | | | | Summary: This will allows us to eliminate some magic numbers from the offset operand of branch instructions in favour of symbols and makes it possible to avoid double-filling delay slots when clang is given -save-temps. parseDirectiveCpRestore() is calling isIntegratedAssemblerRequired() for the moment since correctly pushing the generation of these instructions into the ELF target streamer is tricky enough to warrant a separate patch. Reviewers: sdardis, vkalintiris Subscribers: dsanders, llvm-commits, sdardis Differential Revision: http://reviews.llvm.org/D19164 llvm-svn: 266602
* [NFC] Header cleanupMehdi Amini2016-04-18102-170/+44
| | | | | | | | | | | | | | Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595
* [X86] Be explicit about calls to setOperationAction for AVX2 and AVX512 ↵Craig Topper2016-04-171-45/+42
| | | | | | rather than just looping over all vector types and conditinally matching them. NFC llvm-svn: 266577
* Declare MVT::SimpleValueType as an int8_t sized enum. This removes 400 bytes ↵Craig Topper2016-04-173-3/+3
| | | | | | | | | | from TargetLoweringBase and probably other places. This required changing several places to print VT enums as strings instead of raw ints since the proper method to use to print became ambiguous. This is probably an improvement anyway. This also appears to save ~8K from an x86 self host build of llc. llvm-svn: 266562
* [X86] Added TODO comment for target shuffle mask decoding of bitcasted masksSimon Pilgrim2016-04-171-0/+1
| | | | llvm-svn: 266559
* [X86] Remove unneeded variablesAsaf Badouh2016-04-171-12/+8
| | | | | | | | | no functional change. ExtraLoad and WrapperKind are been used only if (OpFlags == X86II::MO_GOTPCREL). Differential Revision: http://reviews.llvm.org/D18942 llvm-svn: 266557
* [AVX512] ISD::MUL v2i64/v4i64 should only be legal if DQI and VLX features ↵Craig Topper2016-04-171-2/+4
| | | | | | are enabled. llvm-svn: 266554
* [X86] Use ternary operator to reduce code slightly. NFCCraig Topper2016-04-161-8/+3
| | | | llvm-svn: 266534
* [X86][XOP] Added VPPERM constant mask decoding and target shuffle combining ↵Simon Pilgrim2016-04-163-2/+64
| | | | | | | | support Added additional test that peeks through bitcast to v16i8 mask llvm-svn: 266533
* AMDGPU: Enable LocalStackSlotAllocation passMatt Arsenault2016-04-162-0/+159
| | | | | | | | | | | This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508
* AMDGPU: Use s_addk_i32 / s_mulk_i32Matt Arsenault2016-04-161-12/+45
| | | | llvm-svn: 266506
* [mips] More range-based for loops. NFC.Vasileios Kalintiris2016-04-153-10/+9
| | | | | | | There are still a couple more inside the MIPS target. I opted for a single commit in order to avoid spamming the list. llvm-svn: 266472
* [mips] Use range-based for loops and simplify slightly the code. NFC.Vasileios Kalintiris2016-04-151-9/+13
| | | | llvm-svn: 266471
* [SystemZ] Call tryAddingSymbolicOperand in the disassemblerUlrich Weigand2016-04-152-11/+52
| | | | | | | | | | Use the tryAddingSymbolicOperand callback to attempt to present immediate values in symbolic form when disassembling. This is currently only used for PC-relative immediates (which are most likely to be symbolic in the SystemZ ISA). Add new DecodeMethod types to allow distinguishing between branch and non-branch instructions. llvm-svn: 266469
* ARM: don't try to hoist constant RHS out of a division.Tim Northover2016-04-152-3/+16
| | | | | | | | | | | | Divisions by a constant can be converted into multiplies which are usually cheaper, but this isn't possible if the constant gets separated (particularly in loops). Fix this by telling ConstantHoisting that the immediate in a DIV is cheap. I considered making the check generic, but neither AArch64 (strangely) nor x86 showed any benefit on the tests I had. llvm-svn: 266464
* [AArch64] Add load/store pair instructions to getMemOpBaseRegImmOfsWidth().Chad Rosier2016-04-151-5/+46
| | | | | | | | | This improves AA in the MI schduler when reason about paired instructions. Phabricator Revision: http://reviews.llvm.org/D17098 PR26358 llvm-svn: 266462
* [AArch64] Add MMOs to callee-save load/store instructions.Geoff Berry2016-04-151-2/+15
| | | | | | | | | | | | | | Summary: Without MMOs, the callee-save load/store instructions were treated as volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17661 llvm-svn: 266439
* Fix typing on generated LXV2DX/STXV2DX instructionsNirav Dave2016-04-151-5/+23
| | | | | | | | | | | | | | | | | [PPC] Previously when casting generic loads to LXV2DX/ST instructions we would leave the original load return type in place allowing for an assertion failure when we merge two equivalent LXV2DX nodes with different types. This fixes PR27350. Reviewers: nemanjai Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19133 llvm-svn: 266438
* [MachineScheduler]Add support for store clusteringJun Bum Lim2016-04-154-10/+22
| | | | | | | | | | | | Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437
* AMDGPU/SI: Fix regression with no-return atomicsNicolai Haehnle2016-04-151-0/+1
| | | | | | | | | | | | | | | Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433
* Use MVT instead of EVT to remove a bunch of unnecessary calls to getSimpleVT.Craig Topper2016-04-154-61/+58
| | | | llvm-svn: 266414
* Add a setOperationPromotedToType convenience method that sets an operation ↵Craig Topper2016-04-151-36/+18
| | | | | | to promoted and set the type in one call. Use it so save code in X86. llvm-svn: 266413
* [X86] AND, OR, and XOR of vectors are always legal no need to set them legal ↵Craig Topper2016-04-151-5/+0
| | | | | | explicitly. llvm-svn: 266412
* [X86] Combine an if and else block that had the same set of calls to ↵Craig Topper2016-04-151-47/+24
| | | | | | setOperationAction that only varied in Legal/Custom. Use the ternary operator on that argument instead. NFC llvm-svn: 266410
* [NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5.Justin Lebar2016-04-151-0/+4
| | | | | | | | | | | | | | | Summary: Calls on NVPTX are unusually expensive (for one thing, lots of state needs to be saved to memory, which is slow), so make the inlininer much more aggressive. Reviewers: chandlerc Subscribers: jholewinski, llvm-commits, tra Differential Revision: http://reviews.llvm.org/D18561 llvm-svn: 266406
* AMDGPU: Remove custom load/store scalarizationMatt Arsenault2016-04-144-87/+7
| | | | llvm-svn: 266385
* AMDGPU: Include LDS size in printed commentMatt Arsenault2016-04-141-0/+2
| | | | llvm-svn: 266382
* Remove every uses of getGlobalContext() in LLVM (but the C API)Mehdi Amini2016-04-142-3/+7
| | | | | | | | | | | At the same time, fixes InstructionsTest::CastInst unittest: yes you can leave the IR in an invalid state and exit when you don't destroy the context (like the global one), no longer now. This is the first part of http://reviews.llvm.org/D19094 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266379
* AMDGPU: Run SIFoldOperands after PeepholeOptimizerMatt Arsenault2016-04-142-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378
* AMDGPU: Directly emit m0 initialization with s_mov_b32Matt Arsenault2016-04-142-14/+37
| | | | | | | | | | | | | Currently what comes out of instruction selection is a register initialized to -1, and then copied to m0. MachineCSE doesn't consider copies, but we want these to be CSEed. This isn't much of a problem currently, because SIFoldOperands is run immediately after. This avoids regressions when SIFoldOperands is run later from leaving all copies to m0. llvm-svn: 266377
* AMDGPU: Fold bitcasts of scalar constants to vectorsMatt Arsenault2016-04-141-0/+34
| | | | | | | This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376
* [ARM] Adding IEEE-754 SIMD detection to loop vectorizerRenato Golin2016-04-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON. This patch teaches the loop vectorizer to only allow transformations of loops that either contain no floating-point operations or have enough allowance flags supporting lack of precision (ex. -ffast-math, Darwin). For that, the target description now has a method which tells us if the vectorizer is allowed to handle FP math without falling into unsafe representations, plus a check on every FP instruction in the candidate loop to check for the safety flags. This commit makes LLVM behave like GCC with respect to ARM NEON support, but it stops short of fixing the underlying problem: sub-normals. Neither GCC nor LLVM have a flag for allowing sub-normal operations. Before this patch, GCC only allows it using unsafe-math flags and LLVM allows it by default with no way to turn it off (short of not using NEON at all). As a first step, we push this change to make it safe and in sync with GCC. The second step is to discuss a new sub-normal's flag on both communitues and come up with a common solution. The third step is to improve the FastMath flags in LLVM to encode sub-normals and use those flags to restrict NEON FP. Fixes PR16275. llvm-svn: 266363
* AMDGPU: Add skeleton GlobalIsel implementationTom Stellard2016-04-146-0/+144
| | | | | | | | | | | | | | | Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356
* Sink DI metadata usage out of MachineInstr.h and MachineInstrBuilder.hReid Kleckner2016-04-141-0/+1
| | | | | | | | | | | MachineInstr.h and MachineInstrBuilder.h are very popular headers, widely included across all LLVM backends. It turns out that there only a handful of TUs that actually care about DI operands on MachineInstrs. After this change, touching DebugInfoMetadata.h and rebuilding llc only needs 112 actions instead of 542. llvm-svn: 266351
* [lanai] Add custom lowering for SRL_PARTS i32.Jacques Pienaar2016-04-142-1/+44
| | | | llvm-svn: 266349
* [GlobalISel] Move GISelAccessor class into public headersTom Stellard2016-04-144-48/+15
| | | | | | | | | | Reviewers: qcolombet Subscribers: joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19120 llvm-svn: 266348
* [StructurizeCFG] Annotate branches that were treated as uniformNicolai Haehnle2016-04-142-4/+15
| | | | | | | | | | | | | | | | | | | Summary: This fully solves the problem where the StructurizeCFG pass does not consider the same branches as uniform as the SIAnnotateControlFlow pass. The patch in D19013 helps with this problem, but is not sufficient (and, interestingly, causes a "regression" with one of the existing test cases). No tests included here, because tests in D19013 already cover this. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19018 llvm-svn: 266346
* AMDGPU: Remove SIFixSGPRLiveRanges passNicolai Haehnle2016-04-144-242/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345
* AMDGPU: change a redundant if () to an assert(). NFCNicolai Haehnle2016-04-141-2/+1
| | | | | | | | | | | | | | | Summary: I've been carrying this change around with me for a while, because the if () managed to confuse me while following the code. All callers ensure that the assertion holds. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19042 llvm-svn: 266344
* [GlobalISel] Coding style and whitespace fixesTom Stellard2016-04-142-6/+6
| | | | | | | | | | Reviewers: qcolombet Subscribers: joker.eph, llvm-commits, vkalintiris Differential Revision: http://reviews.llvm.org/D19119 llvm-svn: 266342
* AArch64: expand cmpxchg after regalloc at -O0.Tim Northover2016-04-144-4/+314
| | | | | | | | | | | | | | | | | | | FastRegAlloc works only at the basic-block level and spills all live-out registers. Unfortunately for a stack-based cmpxchg near the spill slots, this can perpetually clear the exclusive monitor, which means the cmpxchg will never succeed. I believe the only way to handle this within LLVM is by expanding the loop post-regalloc. We don't want this in general because it severely limits the optimisations that can be done, so we limit this to -O0 compilations. It's an ugly hack, and about the one good point in the whole mess is that we can treat all cmpxchg operations in the most naive way possible (seq_cst, no clrex faff) without affecting correctness. Should fix PR25526. llvm-svn: 266339
* [lanai] Add areMemAccessesTriviallyDisjoint, getMemOpBaseRegImmOfs and ↵Jacques Pienaar2016-04-142-2/+103
| | | | | | | | | | | | | | getMemOpBaseRegImmOfsWidth. Summary: Add getMemOpBaseRegImmOfsWidth to enable determining independence during MiSched. Reviewers: eliben, majnemer Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18903 llvm-svn: 266338
* AMDGPU: allow specifying a workgroup size that needs to fit in a compute unitTom Stellard2016-04-146-63/+94
| | | | | | | | | | | | | | | | | | | Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337
* AMDGPU/SI: Use the correct scratch wave offset register for shaders.Tom Stellard2016-04-143-9/+38
| | | | | | | | | | | | | | | | | | | | | | Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336
* Summary:Simon Dardis2016-04-142-1/+7
| | | | | | | | | | Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like binutils. This patch was previous committed as r266055 as seemed to have caused some spurious test failures. They did not reappear after further local testing. llvm-svn: 266301
* Do not use getGlobalContext()... ever.Mehdi Amini2016-04-141-5/+5
| | | | | | | | This code was creating a new type in the global context, regardless of which context the user is sitting in, what can possibly go wrong? From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266275
* AMDGPU: Implement canonicalizeMatt Arsenault2016-04-144-0/+55
| | | | | | Also add generic DAG node for it. llvm-svn: 266272
* TargetLowering: Factor out common code for tail call eligibility checking; NFCMatthias Braun2016-04-143-63/+9
| | | | llvm-svn: 266270
* ARM: override cost function to re-enable ConstantHoisting (& fix it).Tim Northover2016-04-132-5/+9
| | | | | | | | | | | | | | | | At some point, ARM stopped getting any benefit from ConstantHoisting because the pass called a different variant of getIntImmCost. Reimplementing the correct variant revealed some problems, however: + ConstantHoisting was modifying switch statements. This is simply invalid, the cases must remain integer constants no matter the notional cost. + ConstantHoisting was mangling alloca instructions in the entry block. These should be handled by FrameLowering, so constants actually have a cost of 0. Worse, the resulting bitcasts meant they became dynamic allocas. rdar://25707382 llvm-svn: 266260
OpenPOWER on IntegriCloud