summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Add missing includesDavid Blaikie2018-02-021-0/+3
| | | | llvm-svn: 324040
* SplitKit: Fix liveness recomputation in some remat cases.Matthias Braun2018-02-022-11/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Example situation: ``` BB0: %0 = ... use %0 ; ... condjump BB1 jmp BB2 BB1: %0 = ... ; rematerialized def from above (from earlier split step) jmp BB2 BB2: ; ... use %0 ``` %0 will have a live interval with 3 value numbers (for the BB0, BB1 and BB2 parts). Now SplitKit tries and succeeds in rematerializing the value number in BB2 (This only works because it is a secondary split so SplitKit is can trace this back to a single original def). We need to recompute all live ranges affected by a value number that we rematerialize. The case that we missed before is that when the value that is rematerialized is at a join (Phi VNI) then we also have to recompute liveness for the predecessor VNIs. rdar://35699130 Differential Revision: https://reviews.llvm.org/D42667 llvm-svn: 324039
* [X86] Separate the call to LowerVectorAllZeroTest from EmitTest. NFCICraig Topper2018-02-011-17/+21
| | | | | | | | | | Every instruction that has the word TEST in its name seems to have been buried into EmitTest. But that code is largely concerned with trying to reuse the flags from instructions that update flags in a pretty normal way. PTEST/TESTP/KTEST do not update flags in a normal way. They only update Z and C and the C flag update is non-standard. Rather than try to bend EmitTest's already complex logic to accomodate this, just move the call up to LowerSETCC and replicate the few pre-checks that are needed. While there add a FIXME for using the C flag for checking for all 1s which we definitely couldn't do from EmitTEST. llvm-svn: 324029
* [GlobalISel][Legalizer] Relax a legalization loop detecting assert.Amara Emerson2018-02-011-1/+3
| | | | | | | Legalizing vectors may keep the element type the same but change the number of elements, the assert didn't take this into account. llvm-svn: 324028
* [InstCombine] allow multi-use values in canEvaluate* if all uses are in 1 instSanjay Patel2018-02-011-5/+13
| | | | | | | | | | | | | | | | This is the enhancement suggested in D42536 to fix a shortcoming in regular InstCombine's canEvaluate* functionality. When we have multiple uses of a value, but they're all in one instruction, we can allow that expression to be narrowed or widened for the same cost as a single-use value. AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified away if the operands are the same value; add becomes shl; shifts with a variable shift amount aren't handled. Differential Revision: https://reviews.llvm.org/D42739 llvm-svn: 324014
* [PowerPC] Tell VSX swap removal that scalar conversions are lane-sensitiveNemanja Ivanovic2018-02-011-0/+2
| | | | | | | | This is a rather non-controversial change. We were missing these instructions from the list of instructions that are lane-sensitive. These two put the result into lane 0 (BE) or 3 (LE) regardless of the input. This patch fixes PR36068. llvm-svn: 324005
* [DAGCombiner] When folding (insert_subvector undef, (bitcast ↵Craig Topper2018-02-011-1/+3
| | | | | | | | | | | | (extract_subvector N1, Idx)), Idx) -> (bitcast N1) make sure that N1 has the same total size as the original output We were only checking the element count, but not the total width. This could cause illegal bitcasts to be created if for example the output was 512-bits, but N1 is 256 bits, and the extraction size was 128-bits. Fixes PR36199 Differential Revision: https://reviews.llvm.org/D42809 llvm-svn: 324002
* [GlobalISel] Fix assert failure when legalizing non-power-2 loads.Amara Emerson2018-02-011-3/+6
| | | | | | | Until we support extending loads properly we're going to fall back for these. We already handle stores in the same way, so this is just being consistent. llvm-svn: 324001
* [CodeView] Class record member counts should include base classes and ...Brock Wyma2018-02-011-0/+2
| | | | | | | | | Increment the field list member count for base classes and virtual base classes. Differential Revision: https://reviews.llvm.org/D41874 llvm-svn: 324000
* [ADT] Replace sys::MemoryFence with standard atomics.Benjamin Kramer2018-02-011-6/+2
| | | | | | | | This is a bit faster in theory, in practice it's cold code that's only active in !NDEBUG, so it probably doesn't make a difference. This is one of the last users of our homegrown Atomic.h. llvm-svn: 323999
* [AArch64] remove bogus comment; NFCSanjay Patel2018-02-011-3/+0
| | | | | | | I added this comment with D42323, but as discussed in D42806, the architecture does the right thing for denorms. We don't even need the select on 0.0 here? llvm-svn: 323996
* Remove CallGraphTraits and use equivalent methods in GraphTraitsEaswaran Raman2018-02-011-2/+2
| | | | | | | | | | | | | | | | Summary: D42698 adds child_edge_{begin|end} and children_edges to GraphTraits which are used here. The reason for this change is to make it easy to use count propagation on ModulesummaryIndex. As it stands, CallGraphTraits is in Analysis while ModuleSummaryIndex is in IR. Reviewers: davidxl, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42703 llvm-svn: 323994
* [MachineCopyPropagation] Extend pass to do COPY source forwardingGeoff Berry2018-02-012-1/+210
| | | | | | | | | | | | | | | | | | | | | | Summary: This change extends MachineCopyPropagation to do COPY source forwarding and adds an additional run of the pass to the default pass pipeline just after register allocation. This version of this patch uses the newly added MachineOperand::isRenamable bit to avoid forwarding registers is such a way as to violate constraints that aren't captured in the Machine IR (e.g. ABI or ISA constraints). This change is a continuation of the work started in D30751. Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits Differential Revision: https://reviews.llvm.org/D41835 llvm-svn: 323991
* AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the ↵Changpeng Fang2018-02-012-4/+10
| | | | | | | | | | | | target has UnpackedD16VMem feature. Reviewers: Matt and Brian Differential Revision: https://reviews.llvm.org/D42548 llvm-svn: 323988
* [X86][SSE] LowerBUILD_VECTORAsVariablePermute - add support for scaling ↵Simon Pilgrim2018-02-011-5/+44
| | | | | | | | | | index vectors This allows us to use PSHUFB for v8i16/v4i32 and VPERMD/PERMPS for v4i64/v4f64 variable shuffles. Differential Revision: https://reviews.llvm.org/D42487 llvm-svn: 323987
* [X86] Remove custom lowering vXi1 extending loads and truncating stores.Craig Topper2018-02-011-174/+0
| | | | | | | | | | | | | | Summary: Now that v2i1/v4i1 are legal without VLX. And v32i1 is legalized by splitting rather than widening. And isVectorLoadExtDesirable returns false for vXi1. It appears this handling is dead because the operations simply don't exist. Reviewers: RKSimon, zvi, guyblank, delena, spatel Reviewed By: delena Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D42781 llvm-svn: 323983
* [X86] Turn X86ISD::AND nodes that have no flag users back into ISD::AND just ↵Craig Topper2018-02-012-2/+14
| | | | | | | | | | | | | | | | | | | | | before isel to enable test instruction matching Summary: EmitTest sometimes creates X86ISD::AND specifically to hide the AND from DAG combine. But this prevents isel patterns that look for (cmp (and X, Y), 0) from being able to see it. So we end up with an AND and a TEST. The TEST gets removed by compare instruction optimization during the peephole pass. This patch attempts to fix this by converting X86ISD::AND with no flag users back into ISD::AND during the DAG preprocessing just before isel. In order to do this correctly I had to make the X86ISD::AND node created by EmitTest in this case really have a flag output. Which arguably it should have had anyway so that the number of operands would be consistent for the opcode in all cases. Then I had to modify the ReplaceAllUsesWith to understand that we might be looking at an instruction with 2 outputs. Though in this case there are no uses to replace since we just created the node, but that's what the code did before so I just made it keep working. Reviewers: spatel, RKSimon, niravd, deadalnix Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42764 llvm-svn: 323982
* [DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994)Sanjay Patel2018-02-012-11/+28
| | | | | | | | | | | | | | | | | | | | | | | As shown in the example in PR34994: https://bugs.llvm.org/show_bug.cgi?id=34994 ...we can return a very wrong answer (inf instead of 0.0) for square root when using a reciprocal square root estimate instruction. Here, I've conditionalized the filtering out of denorms based on the function having "denormal-fp-math"="ieee" in its attributes. The other options for this attribute are 'preserve-sign' and 'positive-zero'. So we don't generate this extra code by default with just '-ffast-math' (because then there's no denormal attribute string at all), but it works if you specify '-ffast-math -fdenormal-fp-math=ieee' from clang. As noted in the review, there may be other problems in clang that affect the results depending on platform (Linux x86 at least), but this should allow creating the desired codegen. Differential Revision: https://reviews.llvm.org/D42323 llvm-svn: 323981
* [SelectionDAG] Fix UpdateChains handling of TokenFactorsNirav Dave2018-02-011-1/+2
| | | | | | | | | | | | | | | | | | | Summary: In Instruction Selection UpdateChains replaces all matched Nodes' chain references including interior token factors and deletes them. This may allow nodes which depend on these interior nodes but are not part of the set of matched nodes to be left with a dangling dependence. Avoid this by doing the replacement for matched non-TokenFactor nodes. Fixes PR36164. Reviewers: jonpa, RKSimon, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D42754 llvm-svn: 323977
* [ARM] FullFP16 LowerReturn FixSjoerd Meijer2018-02-011-2/+2
| | | | | | | | | | Commit r323512 introduced an optimisation in LowerReturn for half-precision return values. A missing check caused a crash when the return value is "undef" (i.e. a node that has no operands). Differential Revision: https://reviews.llvm.org/D42743 llvm-svn: 323968
* Revert commit rL323951David Green2018-02-011-6/+2
| | | | | | | Looks like it's causing timeouts out on at least ppc64le buildbots. llvm-svn: 323959
* [mips] Include EVA instructions in Std2MicroMips mapping tablesAleksandar Beserminji2018-02-014-23/+38
| | | | | | | | | This patch includes EVA instructions in the Std2MicroMips mapping tables, which is required for direct object emission. Differential Revision: https://reviews.llvm.org/D41771 llvm-svn: 323958
* [AArch64][NFC] Make all ProcResource definitions include their SchedModel.Clement Courbet2018-02-013-42/+35
| | | | | | | This makes targets ExynosM1,ExynosM3,ThunderX2T99 consistent with all other targets. llvm-svn: 323955
* [ARM] Add support for unpredictable MVN instructions.Yvan Roux2018-02-011-2/+8
| | | | | | | | | | | | | | | This fixes bugzilla 33011 https://bugs.llvm.org/show_bug.cgi?id=33011 Defines bits {19-16} as zero or unpredictable as specified by the ARM ARM in sections A8.8.116 and A8.8.117. It fixes also the usage of PC register as destination register for MVN register-shifted register version as specified in A8.8.117. Differential Revision: https://reviews.llvm.org/D41905 llvm-svn: 323954
* [InstCombine] Allow common type conversions to i8/i16/i32David Green2018-02-011-2/+6
| | | | | | | | | | | This, in instcombine, allows conversions to i8/i16/i32 (very common cases) even if the resulting type is not legal according to the data layout. This can often open up extra combine opportunities. Differential Revision: https://reviews.llvm.org/D42424 llvm-svn: 323951
* Test commit: Fix a comment.Yvan Roux2018-02-011-1/+1
| | | | llvm-svn: 323947
* [LSR] Don't force bases of foldable formulae to the final type.Mikael Holmen2018-02-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Before emitting code for scaled registers, we prevent SCEVExpander from hoisting any scaled addressing mode by emitting all the bases first. However, these bases are being forced to the final type, resulting in some odd code. For example, if the type of the base is an integer and the final type is a pointer, we will emit an inttoptr for the base, a ptrtoint for the scale, and then a 'reverse' GEP where the GEP pointer is actually the base integer and the index is the pointer. It's more intuitive to use the pointer as a pointer and the integer as index. Patch by: Bevin Hansson Reviewers: atrick, qcolombet, sanjoy Reviewed By: qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42103 llvm-svn: 323946
* [XRay][compiler-rt+llvm] Update XRay register stashing semanticsDean Michael Berris2018-02-013-2/+17
| | | | | | | | | | | | | | | | | | | | | | Summary: This change expands the amount of registers stashed by the entry and `__xray_CustomEvent` trampolines. We've found that since the `__xray_CustomEvent` trampoline calls can show up in situations where the scratch registers are being used, and since we don't typically want to affect the code-gen around the disabled `__xray_customevent(...)` intrinsic calls, that we need to save and restore the state of even the scratch registers in the handling of these custom events. Reviewers: pcc, pelikan, dblaikie, eizan, kpw, echristo, chandlerc Reviewed By: echristo Subscribers: chandlerc, echristo, hiraditya, davide, dblaikie, llvm-commits Differential Revision: https://reviews.llvm.org/D40894 llvm-svn: 323940
* [MC] Fix assembler infinite loop on EH table using LEB padding.Rafael Espindola2018-02-011-2/+6
| | | | | | | | | | | | | Fix the infinite loop reported in PR35809. It can occur with GCC-style EH table assembly, where the compiler relies on the assembler to calculate the offsets in the EH table. Also see https://sourceware.org/bugzilla/show_bug.cgi?id=4029 for the equivalent issue in the GNU assembler. Patch by Ryan Prichard! llvm-svn: 323934
* [GlobalOpt] Improve common case efficiency of static global initializer ↵Amara Emerson2018-01-311-2/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | evaluation For very, very large global initializers which can be statically evaluated, the code would create vectors of temporary Constants, modifying them in place, before committing the resulting Constant aggregate to the global's initializer value. This had effectively O(n^2) complexity in the size of the global initializer and would cause memory and non-termination issues compiling some workloads. This change performs the static initializer evaluation and creation in batches, once for each global in the evaluated IR memory. The existing code is maintained as a last resort when the initializers are more complex than simple values in a large aggregate. This should theoretically by NFC, no test as the example case is massive. The existing test cases pass with this, as well as the llvm test suite. To give an example, consider the following C++ code adapted from the clang regression tests: struct S { int n = 10; int m = 2 * n; S(int a) : n(a) {} }; template<typename T> struct U { T *r = &q; T q = 42; U *p = this; }; U<S> e; The global static constructor for 'e' will need to initialize 'r' and 'p' of the outer struct, while also initializing the inner 'q' structs 'n' and 'm' members. This batch algorithm will simply use general CommitValueTo() method to handle the complex nested S struct initialization of 'q', before processing the outermost members in a single batch. Using CommitValueTo() to handle member in the outer struct is inefficient when the struct/array is very large as we end up creating and destroy constant arrays for each initialization. For the above case, we expect the following IR to be generated: %struct.U = type { %struct.S*, %struct.S, %struct.U* } %struct.S = type { i32, i32 } @e = global %struct.U { %struct.S* gep inbounds (%struct.U, %struct.U* @e, i64 0, i32 1), %struct.S { i32 42, i32 84 }, %struct.U* @e } The %struct.S { i32 42, i32 84 } inner initializer is treated as a complex constant expression, while the other two elements of @e are "simple". Differential Revision: https://reviews.llvm.org/D42612 llvm-svn: 323933
* DAG: Fix not truncating when promoting bswap/bitreverseMatt Arsenault2018-01-311-1/+2
| | | | | | | These need to convert back to the original type, like any other promotion. llvm-svn: 323932
* Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using ↵Evgeniy Stepanov2018-01-311-20/+0
| | | | | | | | | | bit-operations" Miscompiles code. Testcase pending. This reverts commit r323869. llvm-svn: 323929
* Utils: Fix DomTree update for entry blockMatt Arsenault2018-01-311-5/+14
| | | | | | | If SplitBlockPredecessors was used on a function entry block, it wouldn't update the dominator tree. llvm-svn: 323928
* AMDGPU: Fix missing SCC def from s_xor_b64_termMatt Arsenault2018-01-311-0/+1
| | | | llvm-svn: 323927
* [AggressiveInstCombine] Fixed TruncCombine class to handle TruncInst leaf ↵Amjad Aboud2018-01-311-4/+12
| | | | | | | | | | | node correctly. This covers the case where TruncInst leaf node is a constant expression. See PR36121 for more details. Differential Revision: https://reviews.llvm.org/D42622 llvm-svn: 323926
* [X86] Make the type checks in detectAVX512USatPattern more robustCraig Topper2018-01-311-7/+9
| | | | | | | | | | This code currently uses isSimple and getSizeInBits in an attempt to prune types. But isSimple will return true for any type that any target supports natively. I don't think that's a good way to prune types. I also don't think the dest element type checks are very robust since we didn't do an isSimple check on the dest type. This patch adds a check for the input type being legal to the one caller that didn't already check that. Then we explicitly check the element types for the destination are i8, i16, or i32 Differential Revision: https://reviews.llvm.org/D42706 llvm-svn: 323924
* Followup on Proposal to move MIR physical register namespace to '$' sigil.Puyan Lotfi2018-01-312-10/+19
| | | | | | | | | | | | Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922
* [Hexagon] Rename HexagonISelLowering::getNode to getInstr, NFCKrzysztof Parzyszek2018-01-313-56/+56
| | | | llvm-svn: 323916
* [x86] Make the retpoline thunk insertion a machine function pass.Chandler Carruth2018-01-312-51/+86
| | | | | | | | | | | | | | | | | | | | Summary: This removes the need for a machine module pass using some deeply questionable hacks. This should address PR36123 which is a case where in full LTO the memory usage of a machine module pass actually ended up being significant. We should revert this on trunk as soon as we understand and fix the memory usage issue, but we should include this in any backports of retpolines themselves. Reviewers: echristo, MatzeB Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D42726 llvm-svn: 323915
* [Hexagon] Implement HVX codegen for vector shiftsKrzysztof Parzyszek2018-01-314-67/+54
| | | | llvm-svn: 323914
* [Hexagon] Handle ANY_EXTEND_VECTOR_INREG in loweringKrzysztof Parzyszek2018-01-311-0/+4
| | | | llvm-svn: 323912
* [Hexagon] Handle SETCC on vector pairs in loweringKrzysztof Parzyszek2018-01-312-1/+16
| | | | llvm-svn: 323911
* [GlobalOpt] Fix exponential compile-time with selects.Eli Friedman2018-01-311-17/+16
| | | | | | | | | | | | | | | If you have a long chain of select instructions created from something like `int* p = &g; if (foo()) p += 4; if (foo2()) p += 4;` etc., a naive recursive visitor will recursively visit each select twice, which is O(2^N) in the number of select instructions. Use the visited set to cut off recursion in this case. (No testcase because this doesn't actually change the behavior, just the time.) Differential Revision: https://reviews.llvm.org/D42451 llvm-svn: 323910
* AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9Marek Olsak2018-01-311-22/+31
| | | | | | | | | | | | | | | | Summary: This enables load merging into x2, x4, which is driven by inline offsets. 6500 shaders are affected: Code Size in affected shaders: -15.14 % Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42078 llvm-svn: 323909
* AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}Marek Olsak2018-01-316-8/+78
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908
* [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPsMarek Olsak2018-01-311-0/+2
| | | | | | | | | | | | | | Summary: !amdgpu.uniform needs to be preserved for AMDGPU, otherwise bad things happen. Reviewers: arsenm, nhaehnle, jingyue, broune, majnemer, bjarke.roune, dblaikie Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D42744 llvm-svn: 323907
* [MachineOutliner] Freeze registers in new functionsGeoff Berry2018-01-311-0/+2
| | | | | | | | | | | | | | | Summary: Call MRI.freezeReservedRegs() on functions created during outlining so that calls to isReserved() by the verifier called after this pass won't assert. Reviewers: MatzeB, qcolombet, paquette Subscribers: mcrosier, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42749 llvm-svn: 323905
* [WebAssembly] MC: Remove unused code for handling of wasm globalsSam Clegg2018-01-314-139/+13
| | | | | | | | | | | | | | | For now, we are not using wasm globals, except for modeling of the stack points. Alos, factor out common struct WasmGlobalType, which matches the name for that tuple in the Wasm spec and rename methods to "isBindingGlobal", "isTypeGlobal" to avoid ambiguity. Patch by Nicholas Wilson! Differential Revision: https://reviews.llvm.org/D42750 llvm-svn: 323901
* [WebAssembly] MC: Resolve aliases when creating provisional table entriesSam Clegg2018-01-311-31/+25
| | | | | | | | | | | | | | | | This change is useful for the upcoming addition of the symbol table (D41954) since in that world aliases for given function all share the same function index. This change does not effect lld because it essentially ignores the wasm "table". The table exists only to the wasm objects will validate and disassembly meaningfully. Patch by Nicholas Wilson! Differential Revision: https://reviews.llvm.org/D42095 llvm-svn: 323900
* [X86] Generate testl instruction through truncates.Amaury Sechet2018-01-311-59/+44
| | | | | | | | | | | | | | | Summary: This was introduced in D42646 but ended up being reverted because the original implementation was buggy. Depends on D42646 Reviewers: craig.topper, niravd, spatel, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42741 llvm-svn: 323899
OpenPOWER on IntegriCloud