summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] use m_APInt to allow demanded bits analysis on splat constantsSanjay Patel2017-02-091-10/+13
| | | | llvm-svn: 294628
* [AMDGPU] Calculate number of min/max SGPRs/VGPRs for WavesPerEU instead of ↵Konstantin Zhuravlyov2017-02-092-68/+31
| | | | | | | | using switch statement Differential Revision: https://reviews.llvm.org/D29741 llvm-svn: 294627
* Drop graph_ prefixDaniel Berlin2017-02-096-10/+10
| | | | llvm-svn: 294621
* GraphTraits: Add range versions of graph traits functions (graph_nodes, ↵Daniel Berlin2017-02-096-48/+23
| | | | | | | | | | | | | | | | | | | | | | | | | graph_children, inverse_graph_nodes, inverse_graph_children). Summary: Convert all obvious node_begin/node_end and child_begin/child_end pairs to range based for. Sending for review in case someone has a good idea how to make graph_children able to be inferred. It looks like it would require changing GraphTraits to be two argument or something. I presume inference does not happen because it would have to check every GraphTraits in the world to see if the noderef types matched. Note: This change was 3-staged with clang as well, which uses Dominators/etc from LLVM. Reviewers: chandlerc, tstellarAMD, dblaikie, rsmith Subscribers: arsenm, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D29767 llvm-svn: 294620
* [JumpThreading] Thread through guardsSanjoy Das2017-02-092-15/+189
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch allows JumpThreading also thread through guards. Virtually, guard(cond) is equivalent to the following construction: if (cond) { do something } else {deoptimize} Yet it is not explicitly converted into IFs before lowering. This patch enables early threading through guards in simple cases. Currently it covers the following situation: if (cond1) { // code A } else { // code B } // code C guard(cond2) // code D If there is implication cond1 => cond2 or !cond1 => cond2, we can transform this construction into the following: if (cond1) { // code A // code C } else { // code B // code C guard(cond2) } // code D Thus, removing the guard from one of execution branches. Patch by Max Kazantsev! Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29620 llvm-svn: 294617
* Object: pad out BSD archive members to 8-bytesSaleem Abdulrasool2017-02-091-1/+11
| | | | | | | | | | | | | ld64 requires its archive members to be 8-byte aligned for 64-bit content and 4-byte aligned for 32-bit content. Opt for the larger alignment requirement. This ensures that ld64 can consume archives generated by llvm-ar. Thanks to Kevin Enderby for the hint about the ld64/cctools behaviours! Resolves PR28361! llvm-svn: 294615
* Convert to for-range loop. NFCI.Simon Pilgrim2017-02-091-3/+3
| | | | llvm-svn: 294610
* [SelectionDAG] Fix bugs in inverted condition splitting code.Geoff Berry2017-02-091-4/+6
| | | | | | | | | | | | | | | | | | Summary: Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by Mikael Holmen. Handle non-canonicalized xor not operation correctly (was assuming operand 0 was always the non-constant operand) and check that the negated condition is also in the same block as the original and/or instruction (as is done for and/or operands already) before proceeding with optimization. Reviewers: bogner, MatzeB, qcolombet Subscribers: mcrosier, uabelho, llvm-commits Differential Revision: https://reviews.llvm.org/D29680 llvm-svn: 294605
* [X86][MMX] Remove the (long time) unused MMX_PINSRW ISD opcode.Simon Pilgrim2017-02-092-2/+1
| | | | llvm-svn: 294596
* Object: add a comment explaining a divergenceSaleem Abdulrasool2017-02-091-0/+2
| | | | | | | Add a note about the reason for the divergence from the specification for ld64. Addresses post-commit review comments from Davide. NFC. llvm-svn: 294594
* Revert: "[Stack Protection] Add diagnostic information for why stack ↵David Bozier2017-02-091-41/+1
| | | | | | | | protection was applied to a function" this reverts revision r294590 as it broke some buildbots. llvm-svn: 294593
* [Stack Protection] Add diagnostic information for why stack protection was ↵David Bozier2017-02-091-1/+41
| | | | | | | | | | | | | | applied to a function Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which function have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function. This change adds an SSP-specific DiagnosticInfo class and uses of it to the Stack Protection code. A subsequent change to clang will cause the remarks to be emitted when enabled. Patch by: James Henderson Differential Revision: https://reviews.llvm.org/D29023 llvm-svn: 294590
* Make it possible to set SHF_LINK_ORDER explicitly.Rafael Espindola2017-02-093-6/+29
| | | | | | | This will make it possible to add support for gcing user metadata (asan for example). llvm-svn: 294589
* [X86][btver2] PR31902: Fix a crash in combineOrCmpEqZeroToCtlzSrl under fast ↵Pierre Gousseau2017-02-091-1/+1
| | | | | | | | | | math. In combineOrCmpEqZeroToCtlzSrl, replace "getConstantOperand == 0" by "isNullConstant" to account for floating point constants. Differential Revision: https://reviews.llvm.org/D29756 llvm-svn: 294588
* [ARM] GlobalISel: Lower single precision FP argsDiana Picus2017-02-091-2/+8
| | | | | | | Both for aapcscc and aapcs_vfpcc. We currently filter out soft float targets because we don't support libcalls yet. llvm-svn: 294584
* [DAGCombiner] Support non-zero offset in load combineArtur Pilipenko2017-02-091-3/+8
| | | | | | | | | | | | | | | Enable folding patterns which load the value from non-zero offset: i8 *a = ... i32 val = a[4] | (a[5] << 8) | (a[6] << 16) | (a[7] << 24) => i32 val = *((i32*)(a+4)) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D29394 llvm-svn: 294582
* [X86][SSE] Attempt to break register dependencies during lowerBuildVectorSimon Pilgrim2017-02-091-11/+40
| | | | | | | | | | | | LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register. This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD. On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.) Differential Revision: https://reviews.llvm.org/D29720 llvm-svn: 294581
* LVI: Fix use-of-uninitialized-value after r294463Vitaly Buka2017-02-091-1/+1
| | | | | | BlockValueStack can be reallocated making reference e invalid. llvm-svn: 294572
* [X86] Remove the HLE feature flag.Craig Topper2017-02-096-16/+5
| | | | | | We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line. llvm-svn: 294562
* [X86] Remove INVPCID and SMAP feature flags. They aren't currently used by ↵Craig Topper2017-02-093-16/+1
| | | | | | | | any instructions and not tested. If we implement intrinsics for their instructions in the future, the feature flags can be added back with proper testing. llvm-svn: 294561
* [X86] Clzero intrinsic and its addition under znver1Craig Topper2017-02-097-2/+52
| | | | | | | | | | | | | | | | | This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558
* Object: pad BSD ar string table to 4-bytesSaleem Abdulrasool2017-02-091-0/+4
| | | | | | | | | cctools would pad the string table to a sizeof(int32_t) (explicitly printed out by cctools rather than 4). This adjusts the string table to make it more compatible with cctools, but is insufficient to make ld64 happy. llvm-svn: 294557
* SwiftCC: swifterror register cannot be as the base registerArnold Schwaighofer2017-02-092-18/+18
| | | | | | | | | | | | | Functions that have a dynamic alloca require a base register which is defined to be X19 on AArch64 and r6 on ARM. We have defined the swifterror register to be the same register. Use a different callee save register for swifterror instead: X21 on AArch64 R8 on ARM rdar://30433803 llvm-svn: 294551
* [MC] Fix some Clang-tidy modernize and Include What You Use warnings in ↵Eugene Zelenko2017-02-095-64/+99
| | | | | | | | SubtargetFeature; other minor fixes (NFC). Same changes in files affected by reduced SubtargetFeature.h dependencies. llvm-svn: 294548
* Reapply r294356 ("Keep track of spilled variables in LiveDebugValues").Wolfgang Pieb2017-02-081-9/+141
| | | | | | | Was reverted with r294447 due to undefined behavior with negative offsets in DBG_VALUE instructions. llvm-svn: 294532
* GlobalISel: legalize G_FPOW to a libcall on AArch64.Tim Northover2017-02-082-5/+16
| | | | | | There's no instruction to implement it. llvm-svn: 294531
* GlobalISel: translate @llvm.pow intrinsic to G_FPOW.Tim Northover2017-02-081-0/+6
| | | | | | | | It'll usually be immediately legalized back to a libcall, but occasionally something can be done with it so we'd just as well enable that flexibility from the start. llvm-svn: 294530
* [sancov] using comdat only when it is enabledMike Aizatsky2017-02-081-3/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D29733 llvm-svn: 294529
* [ARM/AArch ISel] SwiftCC: First parameters that are marked swiftself are not ↵Arnold Schwaighofer2017-02-082-2/+4
| | | | | | | | | | | | | | | | | | | 'this returns' We mark X0 as preserved by a call that passes the returned parameter. x0 = ... fun(x0) // no implicit def of x0 This no longer is valid if we pass the parameter in a different register then the returned value as is the case with a swiftself parameter (passed in x20). x20 = ... fun(x20) // there should be an implict def of x8 rdar://30425845 llvm-svn: 294527
* [MC] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵Eugene Zelenko2017-02-0810-40/+49
| | | | | | minor fixes (NFC). llvm-svn: 294526
* [ARM] Fix some Include What You Use warnings; other minor fixes (NFC).Eugene Zelenko2017-02-081-6/+7
| | | | | | This is preparation to reduce MC headers dependencies. llvm-svn: 294525
* Revert r294437 as it broke an asan buildbot.Amara Emerson2017-02-081-16/+16
| | | | llvm-svn: 294523
* GlobalISel: select G_[SU]MULH on AArch64.Tim Northover2017-02-081-0/+28
| | | | | | | Hopefully this'll be nuked by tablegen pretty soon, but until then it's reasonably important for supporting C++ operator new[]. llvm-svn: 294520
* GlobalISel: expand mul-with-overflow into mul-hi on AArch64.Tim Northover2017-02-082-1/+31
| | | | | | | | AArch64 has specific instructions to multiply two numbers at double the width and produce the high part of the result. These can be used to implement LLVM's mul.with.overflow instructions fairly simply. Helps with C++ operator new[]. llvm-svn: 294519
* [AMDGPU] Implement register pressure callbacksStanislav Mekhanoshin2017-02-082-0/+37
| | | | | | | | | | | | | | | | | | | Implement getRegPressureLimit and getRegPressureSetLimit callbacks in SIRegisterInfo. This makes standard converge scheduler to behave almost the same as GCNScheduler, sometime slightly better sometimes a bit worse. In gerenal that is also possible to switch GCNScheduler to use these callbacks instead of getMaxWaves(), which also makes GCNScheduler slightly better on some tests and slightly worse on another. A big win is behavior with converge scheduler. Note, these are used not only by scheduling, but in places like MachineLICM. Differential Revision: https://reviews.llvm.org/D29700 llvm-svn: 294518
* [sancov] specifying comdat for sancov constructorsMike Aizatsky2017-02-081-1/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D29662 llvm-svn: 294517
* [CMake] Fix `is_llvm_target_library` and support out-of-order componentsChris Bieneman2017-02-081-0/+6
| | | | | | | | | | | | Summary: This patch is required by D28855, and enables us to rely on CMake's ability to handle out of order target dependencies. Reviewers: mgorny, chapuni, bryant Subscribers: llvm-commits, jgosnell Differential Revision: https://reviews.llvm.org/D28869 llvm-svn: 294514
* ThinLTOBitcodeWriter: Strip debug info from merged module.Peter Collingbourne2017-02-081-0/+2
| | | | | | | | | | This module will contain nothing but vtable definitions and (soon) available_externally function definitions, so there is no point in keeping debug info in the module. Differential Revision: https://reviews.llvm.org/D28913 llvm-svn: 294511
* [Loop Vectorizer] Cost-based decision for vectorization form of memory ↵Elena Demikhovsky2017-02-081-269/+485
| | | | | | | | | | | | | | | | | | instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 llvm-svn: 294503
* [DebugInfo] Rename EmitDebugValue to EmitDebugThreadLocal (NFC)Simon Dardis2017-02-084-4/+4
| | | | | | | As pointed out by David Blaikie in the post commit review of r292624, EmitDebugValue should be called EmitDebugThreadLocal. llvm-svn: 294500
* [DAGCombiner] NFC. Mark ByteProvider accessors as constArtur Pilipenko2017-02-081-2/+2
| | | | llvm-svn: 294494
* GlobalISel: select G_VASTART on iOS AArch64.Tim Northover2017-02-084-1/+67
| | | | | | | The AAPCS ABI is substantially more complicated so that's coming in a separate patch. For now we can generate correct code for iOS though. llvm-svn: 294493
* GlobalISel: translate @llvm.va_start intrinsic.Tim Northover2017-02-083-0/+22
| | | | | | | Because we need to preserve the memory access being performed we need a separate instruction to represent this. llvm-svn: 294492
* NVPTX: Extract mem intrinsic expansions into utilitiesMatt Arsenault2017-02-083-213/+243
| | | | llvm-svn: 294490
* [Reassociate] Remove an unused argument. NFC.Chad Rosier2017-02-081-5/+4
| | | | llvm-svn: 294489
* Fix bitcode upgrade for DIGlobalVariables with a var: field.Adrian Prantl2017-02-081-2/+1
| | | | | | | | | | | | | | | This is a follow-up to https://reviews.llvm.org/D29349. It turns out that NeedUpgradeToDIGlobalVariableExpression is always necessary when we encountered a version==0 record because it may always be referenced via a list of globals in a DICompileUnit. My tests weren't good enough to catch this though. To trigger this case, we need much older bitcode produced by LLVM around version 3.7. <rdar://problem/30404262> Differential Revision: https://reviews.llvm.org/D29693 llvm-svn: 294488
* [Hexagon] Fix decoding conflict between A2_zxtb and A4_extKrzysztof Parzyszek2017-02-082-1/+3
| | | | llvm-svn: 294472
* [mips] MUL macro variationsSimon Dardis2017-02-084-3/+179
| | | | | | | | | | | | | | [mips] MUL macro variations Adds support for MUL macro variations. Patch by: Srdjan Obucina Reviewers: zoran.jovanovic, vkalintiris, dsanders, sdardis, obucina, seanbruno Differential Revision: https://reviews.llvm.org/D16807 llvm-svn: 294471
* [InstCombine] add local name for repeated calls; NFC Sanjay Patel2017-02-081-6/+4
| | | | llvm-svn: 294470
* LVI: Add a per-value worklist limit to LazyValueInfo.Daniel Berlin2017-02-081-6/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: LVI is now depth first, which is optimal for iteration strategy in terms of work per call. However, the way the results get cached means it can still go very badly N^2 or worse right now. The overdefined cache is per-block, because LVI wants to try to get different results for the same name in different blocks (IE solve the problem PredicateInfo solves). This means even if we discover a value is overdefined after going very deep, it doesn't cache this information, causing it to end up trying to rediscover it again and again. The same is true for values along the way. In practice, overdefined anywhere should mean overdefined everywhere (this is how, for example, SCCP works). Until we get around to reworking the overdefined cache, we need to limit the worklist size we process. Note that permanently reverting the DFS strategy exploration seems the wrong strategy (temporarily seems fine if we really want). BFS is clearly the wrong approach, it just gets luckier on some testcases. It's also very hard to design an effective throttle for BFS. For DFS, the throttle is directly related to the depth of the CFG. So really deep CFGs will get cutoff, smaller ones will not. As the CFG simplifies, you get better results. In BFS, the limit is it's related to the fan-out times average block size, which is harder to reason about or make good choices for. Bug being filed about the overdefined cache, but it will require major surgery to fix it (plumbing predicateinfo through CVP or LVI). Note: I did not make this number configurable because i'm not sure anyone really needs to tweak this knob. We run CVP 3 times. On the testcases i have the slow ones happen in the middle, where CVP is doing cleanup work other things are effective at. Over the course of 3 runs, we don't see to have any real loss of performance. I haven't gotten a minimized testcase yet, but just imagine in your head a testcase where, going *up* the CFG, you have branches, one of which leads 50000 blocks deep, and the other, to something where the answer is overdefined immediately. BFS would discover the overdefined faster than DFS, but do more work to do so. In practice, the right answer is "once DFS discovers overdefined for a value, stop trying to get more info about that value" (and so, DFS would normally cache the overdefined results for every value it passed through in those 50k blocks, and never do that work again. But it don't, because of the naming problem) Reviewers: chandlerc, djasper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29715 llvm-svn: 294463
OpenPOWER on IntegriCloud