summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Combine an if and else block that had the same set of calls to ↵Craig Topper2016-04-151-47/+24
| | | | | | setOperationAction that only varied in Legal/Custom. Use the ternary operator on that argument instead. NFC llvm-svn: 266410
* Revert "[LTO] Add a new splitCodeGen() API which takes a TargetMachineFactory."Davide Italiano2016-04-152-41/+20
| | | | | | This reverts commits r266390 and r266396 as they broke some bots. llvm-svn: 266408
* [NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5.Justin Lebar2016-04-151-0/+4
| | | | | | | | | | | | | | | Summary: Calls on NVPTX are unusually expensive (for one thing, lots of state needs to be saved to memory, which is slow), so make the inlininer much more aggressive. Reviewers: chandlerc Subscribers: jholewinski, llvm-commits, tra Differential Revision: http://reviews.llvm.org/D18561 llvm-svn: 266406
* [TTI] Add getInliningThresholdMultiplier.Justin Lebar2016-04-155-0/+24
| | | | | | | | | | | | | | | | Summary: InlineCost's threshold is multiplied by this value. This lets us adjust the inlining threshold up or down on a per-target basis. For example, we might want to increase the threshold on targets where calls are unusually expensive. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18560 llvm-svn: 266405
* [ifcnv] Don't duplicate blocks that contain convergent instructions.Justin Lebar2016-04-151-1/+31
| | | | | | | | | | | It's unsafe to duplicate blocks that contain convergent instructions during ifcnv. See the patch for details. Reviewers: hfinkel Differential Revision: http://reviews.llvm.org/D17518 llvm-svn: 266404
* Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX ↵Justin Lebar2016-04-151-2/+4
| | | | | | target. llvm-svn: 266403
* [PM] Add a SpeculativeExecution pass for targets with divergent branches.Justin Lebar2016-04-151-0/+2
| | | | | | | | | | | | | | Summary: This IR pass is helpful for GPUs, and other targets with divergent branches. It's a nop on targets without divergent branches. Reviewers: chandlerc Subscribers: llvm-commits, jingyue, rnk, joker.eph, tra Differential Revision: http://reviews.llvm.org/D18626 llvm-svn: 266399
* [Speculation] Add a SpeculativeExecution mode where the pass does nothing ↵Justin Lebar2016-04-154-5/+70
| | | | | | | | | | | | | | | | unless TTI::hasBranchDivergence() is true. Summary: This lets us add this pass to the IR pass manager unconditionally; it will simply not do anything on targets without branch divergence. Reviewers: tra Subscribers: llvm-commits, jingyue, rnk, chandlerc Differential Revision: http://reviews.llvm.org/D18625 llvm-svn: 266398
* [ParallelCG] Attempt to placate MSVC.Davide Italiano2016-04-151-0/+2
| | | | llvm-svn: 266396
* Option parser: class for consuming a joined arg in addition to all remaining ↵Hans Wennborg2016-04-156-1/+73
| | | | | | args llvm-svn: 266394
* OptionParsingTest.cpp: reorder EXPECT_EQs to put expectation on the left. NFC.Hans Wennborg2016-04-151-25/+25
| | | | | | | This provides for better error messages from the framework when the expected and actual values don't match. llvm-svn: 266393
* [LTO] Add a new splitCodeGen() API which takes a TargetMachineFactory.Davide Italiano2016-04-152-20/+39
| | | | | | | | | This will be used in lld to avoid creating TargetMachine in two different places. See D18999 for a more detailed discussion. Differential Revision: http://reviews.llvm.org/D19139 llvm-svn: 266390
* [test] Require 'asserts' for a test which uses -debug-onlyVedant Kumar2016-04-141-0/+1
| | | | | | | Without this line, bots which run check-all on Release compilers will break. llvm-svn: 266386
* AMDGPU: Remove custom load/store scalarizationMatt Arsenault2016-04-144-87/+7
| | | | llvm-svn: 266385
* AMDGPU: Include LDS size in printed commentMatt Arsenault2016-04-142-4/+12
| | | | llvm-svn: 266382
* [AliasSetTracker] Correctly handle changing the size of an entryMichael Kuperstein2016-04-143-20/+64
| | | | | | | | | | | | | If the size of an AST entry changes, we also need to make sure we perform necessary alias set merges, as the new size may overlap pointers in other sets. We happen to run into this with memset, because memset allows an entry for a i8* pointer to have a decidedly non-i8 size. This fixes PR27262. Differential Revision: http://reviews.llvm.org/D18939 llvm-svn: 266381
* Nuke getGlobalContext() from LLVM (but the C API)Mehdi Amini2016-04-143-14/+4
| | | | | | | | | | The only use for getGlobalContext() is in the C API. Let's just move the static global here and nuke the C++ API. Differential Revision: http://reviews.llvm.org/D19094 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266380
* Remove every uses of getGlobalContext() in LLVM (but the C API)Mehdi Amini2016-04-1480-1079/+1108
| | | | | | | | | | | At the same time, fixes InstructionsTest::CastInst unittest: yes you can leave the IR in an invalid state and exit when you don't destroy the context (like the global one), no longer now. This is the first part of http://reviews.llvm.org/D19094 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266379
* AMDGPU: Run SIFoldOperands after PeepholeOptimizerMatt Arsenault2016-04-1417-46/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378
* AMDGPU: Directly emit m0 initialization with s_mov_b32Matt Arsenault2016-04-142-14/+37
| | | | | | | | | | | | | Currently what comes out of instruction selection is a register initialized to -1, and then copied to m0. MachineCSE doesn't consider copies, but we want these to be CSEed. This isn't much of a problem currently, because SIFoldOperands is run immediately after. This avoids regressions when SIFoldOperands is run later from leaving all copies to m0. llvm-svn: 266377
* AMDGPU: Fold bitcasts of scalar constants to vectorsMatt Arsenault2016-04-145-50/+83
| | | | | | | This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376
* [ScheduleDAGInstrs] Re-factor for based on review feedback. NFC.Geoff Berry2016-04-142-54/+52
| | | | | | | | | | | | | | Summary: Re-factor some code to improve clarity and style based on review comments from http://reviews.llvm.org/D18093. Reviewers: MatzeB, mcrosier Subscribers: MatzeB, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19128 llvm-svn: 266372
* [ARM] Adding IEEE-754 SIMD detection to loop vectorizerRenato Golin2016-04-146-6/+406
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON. This patch teaches the loop vectorizer to only allow transformations of loops that either contain no floating-point operations or have enough allowance flags supporting lack of precision (ex. -ffast-math, Darwin). For that, the target description now has a method which tells us if the vectorizer is allowed to handle FP math without falling into unsafe representations, plus a check on every FP instruction in the candidate loop to check for the safety flags. This commit makes LLVM behave like GCC with respect to ARM NEON support, but it stops short of fixing the underlying problem: sub-normals. Neither GCC nor LLVM have a flag for allowing sub-normal operations. Before this patch, GCC only allows it using unsafe-math flags and LLVM allows it by default with no way to turn it off (short of not using NEON at all). As a first step, we push this change to make it safe and in sync with GCC. The second step is to discuss a new sub-normal's flag on both communitues and come up with a common solution. The third step is to improve the FastMath flags in LLVM to encode sub-normals and use those flags to restrict NEON FP. Fixes PR16275. llvm-svn: 266363
* [InstCombine] remove constant by inverting compare + logic (PR27105)Sanjay Patel2016-04-142-0/+32
| | | | | | | | | | | | | | | https://llvm.org/bugs/show_bug.cgi?id=27105 We can check if all bits outside of a constant mask are set with a single constant. As noted in the bug report, although this form should be considered the canonical IR, backends may want to transform this into an 'andn' / 'andc' comparison against zero because that could be a single machine instruction. Differential Revision: http://reviews.llvm.org/D18842 llvm-svn: 266362
* Fix null pointer access for discriminator assignment.Dehao Chen2016-04-141-0/+2
| | | | | | | | | | | | Summary: This fixes the buildbot failure. Reviewers: dnovillo, davidxl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19129 llvm-svn: 266360
* AMDGPU: Add skeleton GlobalIsel implementationTom Stellard2016-04-147-0/+156
| | | | | | | | | | | | | | | Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356
* Update discriminator assignment algorithm to handle nested call correctly.Dehao Chen2016-04-142-20/+64
| | | | | | | | | | | | Summary: Add discriminator for nested call correctly. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19127 llvm-svn: 266354
* Sink DI metadata usage out of MachineInstr.h and MachineInstrBuilder.hReid Kleckner2016-04-1411-46/+70
| | | | | | | | | | | MachineInstr.h and MachineInstrBuilder.h are very popular headers, widely included across all LLVM backends. It turns out that there only a handful of TUs that actually care about DI operands on MachineInstrs. After this change, touching DebugInfoMetadata.h and rebuilding llc only needs 112 actions instead of 542. llvm-svn: 266351
* [ValueMapper] Range-loopify to improve readability. NFC.Davide Italiano2016-04-141-3/+3
| | | | llvm-svn: 266350
* [lanai] Add custom lowering for SRL_PARTS i32.Jacques Pienaar2016-04-143-1/+56
| | | | llvm-svn: 266349
* [GlobalISel] Move GISelAccessor class into public headersTom Stellard2016-04-144-20/+20
| | | | | | | | | | Reviewers: qcolombet Subscribers: joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19120 llvm-svn: 266348
* [DivergenceAnalysis] Treat PHI with incoming undef as constantNicolai Haehnle2016-04-145-1/+93
| | | | | | | | | | | | | | | | | | | | | | | Summary: If a PHI has an incoming undef, we can pretend that it is equal to one non-undef, non-self incoming value. This is particularly relevant in combination with the StructurizeCFG pass, which introduces PHI nodes with undefs. Previously, this lead to branch conditions that were uniform before StructurizeCFG to become non-uniform afterwards, which confused the SIAnnotateControlFlow pass. This fixes a crash when Mesa radeonsi compiles a shader from dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex Reviewers: arsenm, tstellarAMD, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19013 llvm-svn: 266347
* [StructurizeCFG] Annotate branches that were treated as uniformNicolai Haehnle2016-04-143-4/+30
| | | | | | | | | | | | | | | | | | | Summary: This fully solves the problem where the StructurizeCFG pass does not consider the same branches as uniform as the SIAnnotateControlFlow pass. The patch in D19013 helps with this problem, but is not sufficient (and, interestingly, causes a "regression" with one of the existing test cases). No tests included here, because tests in D19013 already cover this. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19018 llvm-svn: 266346
* AMDGPU: Remove SIFixSGPRLiveRanges passNicolai Haehnle2016-04-145-243/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345
* AMDGPU: change a redundant if () to an assert(). NFCNicolai Haehnle2016-04-141-2/+1
| | | | | | | | | | | | | | | Summary: I've been carrying this change around with me for a while, because the if () managed to confuse me while following the code. All callers ensure that the assertion holds. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19042 llvm-svn: 266344
* [GlobalISel] Coding style and whitespace fixesTom Stellard2016-04-144-11/+11
| | | | | | | | | | Reviewers: qcolombet Subscribers: joker.eph, llvm-commits, vkalintiris Differential Revision: http://reviews.llvm.org/D19119 llvm-svn: 266342
* AArch64: expand cmpxchg after regalloc at -O0.Tim Northover2016-04-145-4/+389
| | | | | | | | | | | | | | | | | | | FastRegAlloc works only at the basic-block level and spills all live-out registers. Unfortunately for a stack-based cmpxchg near the spill slots, this can perpetually clear the exclusive monitor, which means the cmpxchg will never succeed. I believe the only way to handle this within LLVM is by expanding the loop post-regalloc. We don't want this in general because it severely limits the optimisations that can be done, so we limit this to -O0 compilations. It's an ugly hack, and about the one good point in the whole mess is that we can treat all cmpxchg operations in the most naive way possible (seq_cst, no clrex faff) without affecting correctness. Should fix PR25526. llvm-svn: 266339
* [lanai] Add areMemAccessesTriviallyDisjoint, getMemOpBaseRegImmOfs and ↵Jacques Pienaar2016-04-143-2/+158
| | | | | | | | | | | | | | getMemOpBaseRegImmOfsWidth. Summary: Add getMemOpBaseRegImmOfsWidth to enable determining independence during MiSched. Reviewers: eliben, majnemer Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18903 llvm-svn: 266338
* AMDGPU: allow specifying a workgroup size that needs to fit in a compute unitTom Stellard2016-04-149-63/+214
| | | | | | | | | | | | | | | | | | | Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337
* AMDGPU/SI: Use the correct scratch wave offset register for shaders.Tom Stellard2016-04-145-13/+43
| | | | | | | | | | | | | | | | | | | | | | Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336
* [PGO] Do not attach VP metadata if value count at site is 0 [NFC] Betul Buyukkurt2016-04-141-0/+2
| | | | llvm-svn: 266335
* [SCEV][LAA] Add tests for SCEV expression transformations performed during LAASilviu Baranga2016-04-144-0/+52
| | | | | | | | | | | | | | | | | | | | Summary: Add a print method to Predicated Scalar Evolution which prints all interesting transformations done by PSE. Loop Access Analysis will now print this as part of the analysis output. We now use this to check the exact expression transformations that were done by PSE in LAA. The additional checking also acts as white-box testing for the getAsAddRec method. Reviewers: anemet, sanjoy Subscribers: sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18792 llvm-svn: 266334
* Summary:Simon Dardis2016-04-1413-49/+73
| | | | | | | | | | Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like binutils. This patch was previous committed as r266055 as seemed to have caused some spurious test failures. They did not reappear after further local testing. llvm-svn: 266301
* [Coverage] Update testing methods to support more than two filesIgor Kudrin2016-04-141-11/+58
| | | | | | Differential Revision: http://reviews.llvm.org/D18757 llvm-svn: 266289
* [mips] Remove duplicate tests and add missing prefixes for *-LABEL checks. NFC.Vasileios Kalintiris2016-04-144-419/+119
| | | | | | | | | | | | | | | Summary: The only difference between the removed tests and the pre-existing ones, is the materialization of the zero constant, which shouldn't matter for these cases. Reviewers: dsanders, sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D18693 llvm-svn: 266285
* [Coverage] Avoid unnecessary copying of std::vectorIgor Kudrin2016-04-141-7/+16
| | | | | | | | Approved by: Justin Bogner <mail@justinbogner.com> Differential Revision: http://reviews.llvm.org/D18756 llvm-svn: 266284
* Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"Adam Nemet2016-04-1412-339/+206
| | | | | | | | This reverts commit r266086. It breaks the LTO build of gcc in SPEC2000. llvm-svn: 266282
* ThinLTO: linkonce compile-time optimization, do not bother when there is ↵Mehdi Amini2016-04-141-0/+4
| | | | | | | only one input file From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266281
* [CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NANDavid Majnemer2016-04-1410-74/+151
| | | | | | | | | | | | | | | The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only one of the inputs is NaN: -NUM will return the non-NaN argument while -NAN would return NaN. It is desirable to lower to @llvm.{min,max}num to -NAN if they don't have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the -NAN semantics. N.B. Of course, it is only safe to do this if the intrinsic call is marked nnan. llvm-svn: 266279
* Do not use getGlobalContext()... ever.Mehdi Amini2016-04-141-5/+5
| | | | | | | | This code was creating a new type in the global context, regardless of which context the user is sitting in, what can possibly go wrong? From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266275
OpenPOWER on IntegriCloud