summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* use auto* with dyn_cast ; NFCSanjay Patel2016-08-111-2/+1
| | | | llvm-svn: 278340
* getParent()->getParent() == getFunction() ; NFCSanjay Patel2016-08-111-2/+1
| | | | llvm-svn: 278339
* Restore "Resolution-based LTO API."Teresa Johnson2016-08-115-7/+818
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This restores commit r278330, with fixes for a few bot failures: - Fix a late change I had made to the save temps output file that I missed due to existing files sitting on my disk - Fix a bunch of Windows bot failures with "ambiguous call to overloaded function" due to confusion between llvm::make_unique vs std::make_unique (preface the new make_unique calls with "llvm::") - Attempt to fix a modules bot failure by adding a missing include to LTO/Config.h. Original change: Resolution-based LTO API. Summary: This introduces a resolution-based LTO API. The main advantage of this API over existing APIs is that it allows the linker to supply a resolution for each symbol in each object, rather than the combined object as a whole. This will become increasingly important for use cases such as ThinLTO which require us to process symbol resolutions in a more complicated way than just adjusting linkage. Patch by Peter Collingbourne. Reviewers: rafael, tejohnson, mehdi_amini Subscribers: lhames, tejohnson, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D20268 llvm-svn: 278338
* revert 278334Ehsan Amiri2016-08-111-27/+0
| | | | llvm-svn: 278337
* Revert "[AMDGPU] fix failure on printing of non-existing instruction operands."Valery Pykhtin2016-08-111-5/+0
| | | | | | This reverts revision 278333, newly added test failed. llvm-svn: 278336
* Extend trip count instead of truncating IV in LFTR, when legalEhsan Amiri2016-08-111-2/+30
| | | | | | | | | | | | | | | When legal, extending trip count in the loop control logic generates better code compared to truncating IV. This is because (1) extending trip count is a loop invariant operation (see genLoopLimit where we prove trip count is loop invariant). (2) Scalar Evolution seems to have problems understanding trunc when computing loop trip count. So removing them allows better analysis performed in Scalar Evolution. (In particular this fixes PR 28363 which is the motivation for this change). I am not going to perform any performance test. Any degradation caused by this should be an indication of a bug elsewhere. To prove legality, we rely on SCEV to prove zext(trunc(IV)) == IV (or similarly for sext). If this holds, we can prove equivalence of trunc(IV)==ExitCnt (1) and IV == zext(ExitCnt). Simply take zext of boths sides of (1) and apply the proven equivalence. https://reviews.llvm.org/D23075 llvm-svn: 278334
* [AMDGPU] fix failure on printing of non-existing instruction operands.Valery Pykhtin2016-08-111-0/+5
| | | | | | Differential revision: https://reviews.llvm.org/D23323 llvm-svn: 278333
* Revert "Resolution-based LTO API."Teresa Johnson2016-08-115-811/+7
| | | | | | | | | | This reverts commit r278330. I made a change to the save temps output that is causing issues with the bots. Didn't realize this because I had older output files sitting on disk in my test output directory. llvm-svn: 278331
* Resolution-based LTO API.Teresa Johnson2016-08-115-7/+811
| | | | | | | | | | | | | | | | | | | | | | Summary: This introduces a resolution-based LTO API. The main advantage of this API over existing APIs is that it allows the linker to supply a resolution for each symbol in each object, rather than the combined object as a whole. This will become increasingly important for use cases such as ThinLTO which require us to process symbol resolutions in a more complicated way than just adjusting linkage. Patch by Peter Collingbourne. Reviewers: rafael, tejohnson, mehdi_amini Subscribers: lhames, tejohnson, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D20268 Address review comments llvm-svn: 278330
* Fixed VS2015 (Update 3) warning - differing const/volatile qualifiers for ↵Simon Pilgrim2016-08-111-1/+1
| | | | | | | | overridden function Dropped the const qualifier to match llvm::CallLowering::lowerCall llvm-svn: 278329
* [AVX512] Fix extractelement i1 lowering.Igor Breger2016-08-112-35/+28
| | | | | | | | | The previous implementation (not custom) doesn't enforce zeroing off upper bits. The assumption is that i1 PRODUCER (truncate and extractelement) must zero all upper bits, so i1 CONSUMER instructions ( test, zext, save, etc) can be done without additional zeroing. Make extractelement i1 lowering custom for all vector i1. Differential Revision: http://reviews.llvm.org/D23246 llvm-svn: 278328
* Avoid false dependencies of undef machine operandsMarina Yatsina2016-08-112-1/+54
| | | | | | | | | | | | | | | | | | | | This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref. Pseudo example: loop: xmm0 = ... xmm1 = vcvtsi2sdl eax, xmm0<undef> ... = inst xmm0 jmp loop In this example, selecting xmm0 as the undef register creates false dependency between loop iterations. This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction. Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem. Differential Revision: https://reviews.llvm.org/D22466 llvm-svn: 278321
* [AVX-512] Promote 512-bit integer loads to v8i64 similar to what is done for ↵Craig Topper2016-08-112-10/+11
| | | | | | 128/256-bit vectors for overall consistency. llvm-svn: 278318
* [AVX-512] Add patterns to allow EVEX encoded stores of ↵Craig Topper2016-08-112-1/+23
| | | | | | v16i16/v8i16/v16i8/v32i8 even when BWI is not supported. llvm-svn: 278317
* [AVX-512] Fix the 128-bit and 256-bit nontemporal load patterns with ↵Craig Topper2016-08-111-6/+6
| | | | | | elements type other than i64. These loads have all been promoted to v2i64/v4i64 loads so we need bitcasts or we end up selecting VMOVDQA32/VMOVDQU32 instead. llvm-svn: 278316
* [Profile] improve warning control option Xinliang David Li2016-08-111-3/+4
| | | | | | | | | Change --no-pgo-warn-missing to -pgo-warn-missing-function and negate the default. /NFC Add more test to make sure the warning is off by default llvm-svn: 278314
* [WebAssembly] Cleanup trailing whitespaceDominic Chen2016-08-111-2/+2
| | | | | | | | | | Summary: Test for commit access. Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D23392 llvm-svn: 278313
* Make more fields of InlineParams Optional.Easwaran Raman2016-08-111-4/+8
| | | | | | Differential revision: https://reviews.llvm.org/D23386 llvm-svn: 278312
* [Statepoints] Minor cosmetic change; NFCSanjoy Das2016-08-111-1/+1
| | | | | | The verification failure message was missing a space. llvm-svn: 278309
* [MachOYAML] Don't output empty ExportTrieChris Bieneman2016-08-111-1/+2
| | | | | | The YAML representation was always outputting the root node of an export trie even if the trie was empty. While this doesn't really have any functional impact, it does add visual clutter to the yaml file. llvm-svn: 278307
* GlobalISel: support same ConstantExprs as Instructions.Tim Northover2016-08-101-75/+41
| | | | | | | | It's more than just inttoptr, but the others can't be tested until we have support for non-trivial constants (they currently get unavoidably folded to a ConstantInt). llvm-svn: 278303
* GlobalISel: implement simple function calls on AArch64.Tim Northover2016-08-107-45/+160
| | | | | | | We're still limited in the arguments we support, but this at least handles the basic cases. llvm-svn: 278293
* AMDGPU/SI: Implement amdgcn image intrinsics with samplerChangpeng Fang2016-08-101-3/+133
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch define and implement amdgcn image intrinsics with sampler. 1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty, and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name to have three suffixes to overload each of these three types; 2. D128 as well as two other flags are implied in the three types, for example, if you use v8i32 as resource type, then r128 is 0! 3. don't expose TFE flag, and other flags are exposed in the instruction order: unrm, glc, slc, lwe and da. Differential Revision: http://reviews.llvm.org/D22838 Reviewed by: arsenm and tstellarAMD llvm-svn: 278291
* Changed sign of LastCallToStaticBounsPiotr Padlewski2016-08-102-2/+2
| | | | | | | | | | | | | | | | | Summary: I think it is much better this way. When I firstly saw line: Cost += InlineConstants::LastCallToStaticBonus; I though that this is a bug, because everywhere where the cost is being reduced it is usuing -=. Reviewers: eraman, tejohnson, mehdi_amini Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23222 llvm-svn: 278290
* Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough.Kyle Butt2016-08-101-0/+10
| | | | | | | | | | | | If AnalyzeBranch can't analyze a block and it is possible to fallthrough, then duplicating the block doesn't make sense, as only one block can be the layout predecessor for the un-analyzable fallthrough. Submitted wit a test case, but NOTE: the test case doesn't currently fail. However, the test case fails with D20505 and would have saved me some time debugging. llvm-svn: 278288
* CodeGen: If Convert blocks that would form a diamond when tail-merged.Kyle Butt2016-08-101-65/+351
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following function currently relies on tail-merging for if conversion to succeed. The common tail of cond_true and cond_false is extracted, and this then forms a diamond pattern that can be successfully if converted. If this block does not get extracted, either because tail-merging is disabled or the threshold is higher, we should still recognize this pattern and if-convert it. Fixed a regression in the original commit. Need to un-reverse branches after reversing them, or other conversions go awry. define i32 @t2(i32 %a, i32 %b) nounwind { entry: %tmp1434 = icmp eq i32 %a, %b ; <i1> [#uses=1] br i1 %tmp1434, label %bb17, label %bb.outer bb.outer: ; preds = %cond_false, %entry %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ] %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ] br label %bb bb: ; preds = %cond_true, %bb.outer %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ] %tmp. = sub i32 0, %b_addr.021.0.ph %tmp.40 = mul i32 %indvar, %tmp. %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph br i1 %tmp3, label %cond_true, label %cond_false cond_true: ; preds = %bb %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph %indvar.next = add i32 %indvar, 1 br i1 %tmp1437, label %bb17, label %bb cond_false: ; preds = %bb %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0 %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10 br i1 %tmp14, label %bb17, label %bb.outer bb17: ; preds = %cond_false, %cond_true, %entry %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ] ret i32 %a_addr.026.1 } Without tail-merging or diamond-tail if conversion: LBB1_1: @ %bb @ =>This Inner Loop Header: Depth=1 cmp r0, r1 ble LBB1_3 @ BB#2: @ %cond_true @ in Loop: Header=BB1_1 Depth=1 subs r0, r0, r1 cmp r1, r0 it ne cmpne r0, r1 bgt LBB1_4 LBB1_3: @ %cond_false @ in Loop: Header=BB1_1 Depth=1 subs r1, r1, r0 cmp r1, r0 bne LBB1_1 LBB1_4: @ %bb17 bx lr With diamond-tail if conversion, but without tail-merging: @ BB#0: @ %entry cmp r0, r1 it eq bxeq lr LBB1_1: @ %bb @ =>This Inner Loop Header: Depth=1 cmp r0, r1 ite le suble r1, r1, r0 subgt r0, r0, r1 cmp r1, r0 bne LBB1_1 @ BB#2: @ %bb17 bx lr llvm-svn: 278287
* Fix UB in APInt::ashrJonathan Roelofs2016-08-101-5/+1
| | | | | | | | i64 -1, whose sign bit is the 0th one, can't be left shifted without invoking UB. https://reviews.llvm.org/D23362 llvm-svn: 278280
* AMDGPU: s_setpc_b64 should be an indirect branchMatt Arsenault2016-08-101-1/+2
| | | | llvm-svn: 278278
* AMDGPU: Set sizes on control flow pseudosMatt Arsenault2016-08-101-9/+17
| | | | llvm-svn: 278276
* AMDGPU: Remove empty file commentMatt Arsenault2016-08-101-1/+0
| | | | llvm-svn: 278275
* AMDGPU: Remove unnecessary castMatt Arsenault2016-08-101-4/+2
| | | | llvm-svn: 278274
* AMDGPU: Change insertion point of si_mask_branchMatt Arsenault2016-08-102-12/+19
| | | | | | | | | | | | | Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. llvm-svn: 278273
* AMDGPU: Use CreateStackObject instead of CreateSpillStackObjectMatt Arsenault2016-08-101-2/+2
| | | | | | | I'm not sure what the difference is, but no other target uses this for emergency spill slots. llvm-svn: 278272
* [x86, AVX] allow FP vector select folding to bitwise logic ops (PR28895)Sanjay Patel2016-08-101-4/+10
| | | | | | | | | | | | | | | This handles the case in: https://llvm.org/bugs/show_bug.cgi?id=28895 ...but we are not getting all of the possibilities yet. Eg, we use 'X86::FANDN' for scalar FP select combines. That enhancement is filed as: https://llvm.org/bugs/show_bug.cgi?id=28925 Differential Revision: https://reviews.llvm.org/D23337 llvm-svn: 278270
* [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be ↵Andrew Kaylor2016-08-101-2/+7
| | | | | | | | | | non-negative Patch by Li Huang Differential Revision: https://reviews.llvm.org/D18867 llvm-svn: 278269
* LiveIntervalAnalysis: fix a crash in repairOldRegInRangeNicolai Haehnle2016-08-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: See the new test case for one that was (non-deterministically) crashing on trunk and deterministically hit the assertion that I added in D23302. Basically, the machine function contains a sequence DS_WRITE_B32 %vreg4, %vreg14:sub0, ... DS_WRITE_B32 %vreg4, %vreg14:sub0, ... %vreg14:sub1<def> = COPY %vreg14:sub0 and SILoadStoreOptimizer::mergeWrite2Pair merges the two DS_WRITE_B32 instructions into one before calling repairIntervalsInRange. Now repairIntervalsInRange wants to repair %vreg14, in particular, and ends up trying to repair %vreg14:sub1 as well, but that only becomes active _after_ the range that is to be repaired, hence the crash due to LR.find(...) == LR.begin() at the start of repairOldRegInRange. I believe that just skipping those subrange is fine, but again, not too familiar with that code. Reviewers: MatzeB, kparzysz, tstellarAMD Subscribers: llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D23303 llvm-svn: 278268
* [ValueTracking] An improvement to IR ValueTracking on Non-negative IntegersAndrew Kaylor2016-08-101-1/+37
| | | | | | | | Patch by Li Huang Differential Revision: https://reviews.llvm.org/D18777 llvm-svn: 278267
* [Hexagon] Remove unused variants of LO/HI instructionsKrzysztof Parzyszek2016-08-101-34/+0
| | | | llvm-svn: 278266
* Codegen: Tail Merge: Be less aggressive with special cases.Kyle Butt2016-08-101-4/+13
| | | | | | | | | | | | This change makes it possible for tail-duplication and tail-merging to be disjoint. By being less aggressive when merging during layout, there are no overlapping cases between tail-duplication and tail-merging, provided the thresholds are disjoint. There is a remaining TODO to benchmark the succ_size() test for non-layout tail merging. llvm-svn: 278265
* [X86][SSE] Dropped blend(insertps(x,y),zero) combine - this is now handled ↵Simon Pilgrim2016-08-101-38/+0
| | | | | | by target shuffle chain combining llvm-svn: 278260
* [Hexagon] Simplify the SplitConst32/64 passKrzysztof Parzyszek2016-08-101-65/+29
| | | | llvm-svn: 278256
* [Hexagon] Add extra patterns for single-precision min/max instructionsKrzysztof Parzyszek2016-08-101-18/+18
| | | | llvm-svn: 278252
* Fix LCSSA increased compile timeRong Xu2016-08-101-7/+7
| | | | | | | | | | | | | | | | | | | | | We are seeing r276077 drastically increasing compiler time for our larger benchmarks in PGO profile generation build (both clang based and IR based mode) -- it can be 20x slower than without the patch (like from 30 secs to 780 secs) The increased time are all in pass LCSSA. The problematic code is about PostProcessPHIs after use-rewrite. Note that the InsertedPhis from ssa_updater is accumulating (never been cleared). Since the inserted PHIs are added to the candidate for each rewrite, The earlier ones will be repeatedly added. Later when adding the new PHIs to the work-list, we don't check the duplication either. This can result in extremely long work-list that containing tons of duplicated PHIs. This patch fixes the issue by hoisting the code out of the loop. Differential Revision: http://reviews.llvm.org/D23344 llvm-svn: 278250
* [Hexagon] Fix table-gen decode conflict warnings for CONST32/64Krzysztof Parzyszek2016-08-101-7/+6
| | | | llvm-svn: 278247
* GlobalISel: avoid inserting redundant COPYs for bitcasts.Tim Northover2016-08-101-2/+5
| | | | | | | If the value produced by the bitcast hasn't been referenced yet, we can simply reuse the input register avoiding an unnecessary COPY instruction. llvm-svn: 278245
* [Hexagon] Use integer instructions for floating point immediatesKrzysztof Parzyszek2016-08-1014-239/+117
| | | | | | | | | | | | Floating point instructions use general purpose registers, so the few instructions that can put floating point immediates into registers are, in fact, integer instruction. Use them explicitly instead of having pseudo-instructions specifically for dealing with floating point values. Simplify the constant loading instructions (from sdata) to have only two: one for 32-bit values and one for 64-bit values: CONST32 and CONST64. llvm-svn: 278244
* [Coroutines] Part 6: Elide dynamic allocation of a coroutine frame when possibleGor Nishanov2016-08-104-28/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: A particular coroutine usage pattern, where a coroutine is created, manipulated and destroyed by the same calling function, is common for coroutines implementing RAII idiom and is suitable for allocation elision optimization which avoid dynamic allocation by storing the coroutine frame as a static `alloca` in its caller. coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed when dynamic allocation elision happens: ``` entry: %elide = call i8* @llvm.coro.alloc() %need.dyn.alloc = icmp ne i8* %elide, null br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc dyn.alloc: %alloc = call i8* @CustomAlloc(i32 4) br label %coro.begin coro.begin: %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ] %hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*)) ``` and ``` %mem = call i8* @llvm.coro.free(i8* %hdl) %need.dyn.free = icmp ne i8* %mem, null br i1 %need.dyn.free, label %dyn.free, label %if.end dyn.free: call void @CustomFree(i8* %mem) br label %if.end if.end: ... ``` If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant. Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack. Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) 3.Add empty coroutine passes. (https://reviews.llvm.org/D22847) 4.Add coroutine devirtualization + tests. ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998) c) Do devirtualization (https://reviews.llvm.org/D23229) 5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234) 6.Add coroutine heap elision + tests. <= we are here 7.Add the rest of the logic (split into more patches) Reviewers: mehdi_amini, majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D23245 llvm-svn: 278242
* Fix build break of VS 2013 debug buildsRoger Ferrer Ibanez2016-08-101-0/+3
| | | | | | | | | | In debug mode extra macros are enabled for several C++ algorithms. Some of them may cause unfortunate build failures. This commit adds a redundant operator() to work around one of those troublesome macros which was hit accidentally by change r278012. llvm-svn: 278241
* [Hexagon] Delete HexagonSelectCCInfo.tdKrzysztof Parzyszek2016-08-102-122/+0
| | | | | | | This file is not used. The location assignment of call arguments and return values is implemented directly in HexagonISelLowering. llvm-svn: 278237
* [Hexagon] Remove unneeded/unused ISD opcodes ARGEXTEND and FCONST32Krzysztof Parzyszek2016-08-106-29/+0
| | | | llvm-svn: 278236
OpenPOWER on IntegriCloud