summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [SLPVectorizer] Fix dependency listKeno Fischer2016-03-141-0/+1
| | | | | | | | | | | | | | | Summary: DemandedBits was added to the requirements of SLPVectorizer in rL261212 (and various earlier version of it), but the appropriate initialization statement was accidentally forgotten. Ref [[ https://github.com/JuliaLang/julia/issues/14998 | JuliaLang/julia#14998 ]]. Patch by Yichao Yu. Reviewers: mssimpso Differential Revision: http://reviews.llvm.org/D18152 llvm-svn: 263476
* Turn LoopLoadElimination on againAdam Nemet2016-03-141-2/+2
| | | | | | | | The two issues that were discovered got fixed (r263058, r263173). The pass can be disabled with -mllvm -enable-loop-load-elim=0 llvm-svn: 263472
* [AliasSetTracker] Do not strip pointer casts when processing MemSetInstMichael Kuperstein2016-03-141-2/+2
| | | | | | This fixes PR26843. llvm-svn: 263462
* [AArch64] Refactor AArch64FrameLowering::emitPrologue. NFC.Chad Rosier2016-03-141-52/+50
| | | | | | | http://reviews.llvm.org/D18125 Patch by Aditya Kumar. llvm-svn: 263461
* [SpillPlacement] Fix a quadratic behavior in spill placement.Quentin Colombet2016-03-142-53/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bad behavior happens when we have a function with a long linear chain of basic blocks, and have a live range spanning most of this chain, but with very few uses. Let say we have only 2 uses. The Hopfield network is only seeded with two active blocks where the uses are, and each iteration of the outer loop in `RAGreedy::growRegion()` only adds two new nodes to the network due to the completely linear shape of the CFG. Meanwhile, `SpillPlacer->iterate()` visits the whole set of discovered nodes, which adds up to a quadratic algorithm. This is an historical accident effect from r129188. When the Hopfield network is expanding, most of the action is happening on the frontier where new nodes are being added. The internal nodes in the network are not likely to be flip-flopping much, or they will at least settle down very quickly. This means that while `SpillPlacer->iterate()` is recomputing all the nodes in the network, it is probably only the two frontier nodes that are changing their output. Instead of recomputing the whole network on each iteration, we can maintain a SparseSet of nodes that need to be updated: - `SpillPlacement::activate()` adds the node to the todo list. - When a node changes value (i.e., `update()` returns true), its neighbors are added to the todo list. - `SpillPlacement::iterate()` only updates the nodes in the list. The result of Hopfield iterations is not necessarily exact. It should converge to a local minimum, but there is no guarantee that it will find a global minimum. It is possible that updating nodes in a different order will cause us to switch to a different local minimum. In other words, this is not NFC, but although I saw a few runtime improvements and regressions when I benchmarked this change, those were side effects and actually the performance change is in the noise as expected. Huge thanks to Jakob Stoklund Olesen <stoklund@2pi.dk> for his feedbacks, guidance and time for the review. llvm-svn: 263460
* [AArch64] Break the dependency between FP and SP when possible.Chad Rosier2016-03-142-3/+14
| | | | | | | | | | | | | When the SP in not changed because of realignment/VLAs etc., we restore the SP by using the previous value of SP and not the FP. Breaking the dependency will help in cases when the epilog of a callee is close to the epilog of the caller; for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of the callee. http://reviews.llvm.org/D18060 Patch by Aditya Kumar and Evandro Menezes. llvm-svn: 263458
* [Mips] Fix -Wunused-private-field warning after r263444.Chad Rosier2016-03-143-7/+6
| | | | llvm-svn: 263454
* [DAG] use !isUndef() ; NFCISanjay Patel2016-03-1410-50/+43
| | | | llvm-svn: 263453
* [DAG] use isUndef() ; NFCISanjay Patel2016-03-1414-196/+179
| | | | llvm-svn: 263448
* AMDGPU/SI: Handle wait states required for DPP instructionsTom Stellard2016-03-142-0/+63
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17543 llvm-svn: 263447
* [x86, AVX] replace masked load with full vector load when possibleSanjay Patel2016-03-141-7/+25
| | | | | | | | | | | | | Converting masked vector loads to regular vector loads for x86 AVX should always be a win. I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any objections. 1. x86 already does this kind of optimization for multiple scalar loads -> vector load. 2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner. Differential Revision: http://reviews.llvm.org/D18094 llvm-svn: 263446
* [mips] MIPS32R6 compact branch supportDaniel Sanders2016-03-1413-56/+358
| | | | | | | | | | | | | | | | | | | | | | | Summary: MIPSR6 introduces a class of branches called compact branches. Unlike the traditional MIPS branches which have a delay slot, compact branches do not have a delay slot. The instruction following the compact branch is only executed if the branch is not taken and must not be a branch. It works by generating compact branches for MIPS32R6 when the delay slot filler cannot fill a delay slot. Then, inspecting the generated code for forbidden slot hazards (a compact branch with an adjacent branch or other CTI) and inserting nops to clear this hazard. Patch by Simon Dardis. Reviewers: vkalintiris, dsanders Subscribers: MatzeB, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D16353 llvm-svn: 263444
* AMDGPU/SI: Incomplete shader binaries need to finish execution at the endMarek Olsak2016-03-142-8/+24
| | | | | | | | | | Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D18058 llvm-svn: 263441
* AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergenceNicolai Haehnle2016-03-141-0/+13
| | | | | | | | | | | | | | Summary: When multiple threads perform an atomic op with the same arguments, they will usually see different return values. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18101 llvm-svn: 263440
* [mips] Use range-based for loops. NFC.Vasileios Kalintiris2016-03-141-7/+5
| | | | llvm-svn: 263438
* Revert "Recommitted r261633 "Supporting all entities declared in lexical ↵Benjamin Kramer2016-03-146-199/+51
| | | | | | | | scope in LLVM debug info." After fixing PR26715 at r263379." This reverts commit r263424. Breaks self-host. llvm-svn: 263437
* [SystemZ] Avoid LER on z13 due to partial register dependenciesUlrich Weigand2016-03-142-1/+6
| | | | | | | | | | | | | | | | On the z13, it turns out to be more efficient to access a full floating-point register than just the upper half (as done e.g. by the LE and LER instructions). Current code already takes this into account when loading from memory by using the LDE instruction in place of LE. However, we still generate LER, which shows the same performance issues as LE in certain circumstances. This patch changes the back-end to emit LDR instead of LER to implement FP32 register-to-register copies on z13. llvm-svn: 263431
* [CVP] Replace nonnegative with positive, per Philip's request. NFC.Chad Rosier2016-03-141-2/+2
| | | | llvm-svn: 263430
* [mips] Fix an issue with long double when function roundl is definedZlatko Buljan2016-03-141-2/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D17760 llvm-svn: 263428
* [mips] Range check uimm16_64Daniel Sanders2016-03-141-7/+8
| | | | | | | | | | | | Summary: Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D17725 llvm-svn: 263427
* Recommitted r261633 "Supporting all entities declared in lexical scope in ↵Amjad Aboud2016-03-146-51/+199
| | | | | | | | LLVM debug info." After fixing PR26715 at r263379. llvm-svn: 263424
* [mips] Simplify ordering of range checked immediate classes.Daniel Sanders2016-03-141-29/+49
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: With the addition of checks to ensure that operands have a strict ordering it has become tricky to manage the order in the way I originally intended. This patch linearizes the ordering which simplifies the implementation but requires an order that is arbitrary in places. Here are some examples: * uimm4 < uimm5 < uimm6 * simm4 < uimm4 < simm5 < uimm5 * uimm5 < uimm5_plus1 (1..32) < uimm5_plus32 (32..63) < uimm6 The term 'superset' starts to break down here since the *_plus* classes are not true supersets of uimm5 (but they are still subsets of uimm6). * uimm5 < uimm5_64, and uimm5 < vsplat_uimm5 This is entirely arbitrary. We need an ordering and what we pick is unimportant since only one is possible for a given mnemonic. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D17723 llvm-svn: 263423
* [AMDGPU] Assembler: SOP* instruction fixesNikolay Haustov2016-03-142-27/+40
| | | | | | | | | | | | | | s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit. s_rfe_b64 has just one destination operand and no source. Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that. Add s_memrealtime test and change comments in smem.s to follow common style. Change test for s_memtime to use non-zero register to make it really test encoding. Add tests for s_buffer_load*. Add tests for SOPC instructions (same for SI and VI) Differential Revision: http://reviews.llvm.org/D18040 llvm-svn: 263420
* [mips] Range check uimm6_lsl2.Daniel Sanders2016-03-144-43/+32
| | | | | | | | | | | | Summary: Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17291 llvm-svn: 263419
* Try to fix build of WebAssemblyRegStackify.cpp on WindowsHans Wennborg2016-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | It's failing to build on VS2015 with: C:\b\build\slave\ClangToTWin\build\src\third_party\llvm\lib\Target\WebAssembly\WebAssemblyRegStackify.cpp(520): error C2668: 'llvm::make_reverse_iterator': ambiguous call to overloaded function C:\b\build\slave\ClangToTWin\build\src\third_party\llvm\include\llvm/ADT/STLExtras.h(217): note: could be 'std::reverse_iterator<llvm::MachineBasicBlock::iterator> llvm::make_reverse_iterator<llvm::MachineInstrBundleIterator<llvm::MachineInstr>>(IteratorTy)' with [ IteratorTy=llvm::MachineInstrBundleIterator<llvm::MachineInstr> ] C:\b\depot_tools\win_toolchain\vs_files\391bbf1220d3edcd3cc3fccdb56224181e3b13a7\win_sdk\bin\..\..\VC\include\xutility(1217): note: or 'std::reverse_iterator<llvm::MachineBasicBlock::iterator> std::make_reverse_iterator<llvm::MachineInstrBundleIterator<llvm::MachineInstr>>(_RanIt)' [found using argument-dependent lookup] with [ _RanIt=llvm::MachineInstrBundleIterator<llvm::MachineInstr> ] I don't have VS2015 locally at the moment, but hopefully this will help. llvm-svn: 263418
* AVX512: icmp operation should be always lowered to CMPM (AVX-512) ↵Igor Breger2016-03-141-22/+23
| | | | | | | | | | instruction on SKX. implemented by delena Differential Revision: http://reviews.llvm.org/D18054 llvm-svn: 263417
* [AMDGPU] AsmParser: Factor out parseRegister. NFC.Valery Pykhtin2016-03-141-24/+40
| | | | llvm-svn: 263411
* [AMDGPU] AsmParser: refactor post push_back vector access. NFC.Valery Pykhtin2016-03-141-6/+5
| | | | llvm-svn: 263409
* [CodeView] Consistently handle overly large symbol namesDavid Majnemer2016-03-141-15/+17
| | | | | | | Overly large symbol names weren't correctly handled for leaf function records. llvm-svn: 263408
* [AMDGPU] AsmParser: remove redundant isReg checks. NFC.Valery Pykhtin2016-03-141-7/+7
| | | | llvm-svn: 263407
* [CVP] Convert an SDiv to a UDiv if both operands are known to be nonnegativeHaicheng Wu2016-03-141-0/+41
| | | | | | | | | | | | | | The motivating example is this for (j = n; j > 1; j = i) { i = j / 2; } The signed division is safely to be changed to an unsigned division (j is known to be larger than 1 from the loop guard) and later turned into a single shift without considering the sign bit. llvm-svn: 263406
* Add facility to add/remove/check attribute on function and arguments.Amaury Sechet2016-03-141-18/+20
| | | | | | | | | | | | Summary: This comes from work to make attribute manipulable via the C API. Reviewers: gottesmm, hfinkel, baldrick, echristo, tejohnson Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18128 llvm-svn: 263404
* Remove PreserveNames template parameter from IRBuilderMehdi Amini2016-03-139-22/+21
| | | | | | | | This reapplies r263258, which was reverted in r263321 because of issues on Clang side. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263393
* [X86][SSE41] Avoid variable blend for constant v8i16 shiftsSimon Pilgrim2016-03-131-2/+7
| | | | | | The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering. llvm-svn: 263383
* Fixed DIBuilder to verify that same imported entity will not be added twice ↵Amjad Aboud2016-03-131-1/+6
| | | | | | | | to the "imports" list of the DICompileUnit. Differential Revision: http://reviews.llvm.org/D17884 llvm-svn: 263379
* [CodeView] Truncate display namesDavid Majnemer2016-03-131-5/+8
| | | | | | | | | | | | | | | | | | | Fundamentally, the length of a variable or function name is bound by the maximum size of a record: 0xffff. However, the name doesn't live in a vacuum; other data is associated with the name, lowering the bound further. We would naively attempt to emit the name, causing us to assert because the record would no-longer fit in 16-bits. Instead, truncate the name but preserve as much as we can. While I have tested this locally, I've decided to not commit it due to the test's size. N.B. While this behavior is undesirable, it is better than MSVC's behavior. They seem to truncate to ~4000 characters. llvm-svn: 263378
* [Bitcode] Make writeComdats less strangeDavid Majnemer2016-03-131-2/+2
| | | | | | | | It had a weird artificial limitation on the write side: the comdat name couldn't be bigger than 2**16. However, the reader had no such limitation. Make the reader and the writer agree. llvm-svn: 263377
* ConstantFoldInstruction: avoid wasted calls to ConstantFoldConstantExpressionFiona Glaser2016-03-131-5/+5
| | | | | | | Check to see if all operands are constant before calling simplify on them so that we don't perform wasted simplifications. llvm-svn: 263374
* APFloat: Fix ilogb for denormalsMatt Arsenault2016-03-131-0/+18
| | | | llvm-svn: 263370
* APFloat: Fix scalbn handling of denormalsMatt Arsenault2016-03-131-12/+14
| | | | | | | This was incorrect for denormals, and also failed on longer exponent ranges. llvm-svn: 263369
* [X86] Remove many operands that represent memory stores from outs to ins. ↵Craig Topper2016-03-136-34/+34
| | | | | | These operands are the registers and immediates that specify the memory address not the memory itself thus they are inputs. llvm-svn: 263354
* Use templated version of unwrap instead of cats in the Core.cpp. NFCAmaury Sechet2016-03-131-6/+7
| | | | llvm-svn: 263349
* Move LLVMConstStructInContext so that declarationa nd definition order ↵Amaury Sechet2016-03-131-8/+8
| | | | | | match. NFC llvm-svn: 263348
* fix documentation comments; NFCSanjay Patel2016-03-121-7/+2
| | | | llvm-svn: 263346
* remove unnecessary cast; NFCSanjay Patel2016-03-121-4/+3
| | | | llvm-svn: 263343
* fix formatting; NFCSanjay Patel2016-03-121-12/+12
| | | | llvm-svn: 263342
* use range loops; NFCISanjay Patel2016-03-121-21/+18
| | | | llvm-svn: 263341
* [x86, InstCombine] delete x86 SSE2 masked store with zero maskSanjay Patel2016-03-121-0/+6
| | | | | | | | | This follows up on the related AVX instruction transforms, but this one is too strange to do anything more with. Intel's behavioral description of this instruction in its Software Developer's Manual is tragi-comic. llvm-svn: 263340
* Fix for PR 26378Nemanja Ivanovic2016-03-121-0/+6
| | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D17712 We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This caused duplicate definition asserts when the pass is reused on the module (i.e. with -compile-twice or in JIT contexts). llvm-svn: 263338
* Make gc relocates more strongly typed; NFCSanjoy Das2016-03-121-10/+13
| | | | | | | Don't use a `Value *` where we can use a stronger `GCRelocateInst *` type. llvm-svn: 263327
OpenPOWER on IntegriCloud