summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [InlineCost] Find more free binary operationsHaicheng Wu2017-12-221-40/+18
| | | | | | | | | | | | Currently, inline cost model considers a binary operator as free only if both its operands are constants. Some simple cases are missing such as a + 0, a - a, etc. This patch modifies visitBinaryOperator() to call SimplifyBinOp() without going through simplifyInstruction() to get rid of the constant restriction. Thus, visitAnd() and visitOr() are not needed. Differential Revision: https://reviews.llvm.org/D41494 llvm-svn: 321366
* [DAG] Integrate findBaseOffset address analyses to BaseIndexOffset. NFCI.Nirav Dave2017-12-222-70/+27
| | | | | | | BaseIndexOffset supercedes findBaseOffset analysis save only Constant Pool addresses. Migrate analysis to BaseIndexOffset. llvm-svn: 321364
* [AMDGPU][MC] Added support of 256- and 512-bit tuples of ttmp registersDmitry Preobrazhensky2017-12-227-43/+197
| | | | | | | | | | | | See bug 35561: https://bugs.llvm.org/show_bug.cgi?id=35561 This patch also affects implementation of SGPR and VGPR registers though changes are cosmetic. Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D41437 llvm-svn: 321359
* [ARM GlobalISel] Support G_INTTOPTR and G_PTRTOINT for s32Diana Picus2017-12-223-0/+30
| | | | | | | Mark conversions between pointers and 32-bit scalars as legal, map them to the GPR and select to a simple COPY. llvm-svn: 321356
* [ARM GlobalISel] Support pointer constantsDiana Picus2017-12-222-1/+11
| | | | | | | | | | Pointer constants are pretty rare, since we usually represent them as integer constants and then cast to pointer. One notable exception is the null pointer constant, which is represented directly as a G_CONSTANT 0 with pointer type. Mark it as legal and make sure it is selected like any other integer constant. llvm-svn: 321354
* [DAGCombine] Revert r321259Sam Parker2017-12-221-26/+0
| | | | | | | Improve ReduceLoadWidth for SRL Patch is causing an issue on the PPC64 BE santizer. llvm-svn: 321349
* Rewrite the cached map used for locating the most precise DIE amongChandler Carruth2017-12-221-26/+360
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inlined subroutines for a given address. This is essentially the hot path of llvm-symbolizer when extracting inlined frames during symbolization. Previously, we would read every subprogram and every inlined subroutine, building a std::map across the entire PC space to the best DIE, and then do only a handful of queries as we symbolized a backtrace. A huge fraction of the time was spent building the map itself. This patch changes it two a two-level system. First, we just build a map from PC-interval to DWARF subprograms. These are required to be disjoint and so constructing this is pretty easy. Second, we build a map *just* for the inlined subroutines within the subprogram containing the query address. This allows us to look at far fewer DIEs and build a *much* smaller set of cached maps in the llvm-symbolizer case where only a few address get symbolized during the entire run. It also builds both interval maps in a very different way. It constructs a single flat vector of pairs that maps from offset -> index. The indices point into collections of DIE objects, but can also be "tombstones" (-1) to mark gaps. In the case of subprograms, this mostly just simplifies the data structure a bit. For inlined subroutines, because we carefully split them as we build the map, we end up in many cases having no holes and not having to store both start and stop offsets. Finally, the PC ranges for the inlined subroutines are compressed into 32-bits by making them relative to the base PC of the outer subprogram. This means that if you have a single function body with over 2gb of executable code in it, we will stop mapping address past the first 2gb of that function into inlined subroutines and just give you the subprogram. This doesn't seem like a problem. ;] All of this combines to make llvm-symbolizer *well* over 2x faster for symbolizing backtraces out of LLVM's unittests. Death-test heavy unit tests are running >2x faster. I'm still going to look at completely disabling symbolization there, but figured while I had a good benchmark we should make symbolization a bit better. Sadly, the logic to build the flat interval map for the inlined subroutines is fairly complex. I'm not super happy about this and welcome any simplifying suggestions. Huge thanks to Dave Blaikie who helped walk me through what the various things I needed to do in DWARF to make this work. Differential Revision: https://reviews.llvm.org/D40987 llvm-svn: 321345
* [X86] Add missing initialization for the HasPREFETCHWT1 subtarget variable.Craig Topper2017-12-221-0/+1
| | | | llvm-svn: 321340
* [X86] Enable PRFCHW feature on KNL/KNM and all CPUs inherited from Broadwell.Craig Topper2017-12-221-2/+4
| | | | llvm-svn: 321336
* [X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling ↵Craig Topper2017-12-226-7/+33
| | | | | | | | | | | | | | for prefetch instructions. Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well. The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled. Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions. I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations. llvm-svn: 321335
* [X86] Use SIGN_EXTEND to implement ANY_EXTEND from vXi1.Craig Topper2017-12-221-15/+15
| | | | llvm-svn: 321334
* [Inliner] Restrict soft-float inlining penalty.Eli Friedman2017-12-223-32/+23
| | | | | | | | | | | | | | The penalty is currently getting applied in a bunch of places where it doesn't make sense, like bitcasts (which are free) and calls (which were getting the call penalty applied twice). Instead, just apply the penalty to binary operators and floating-point casts. While I'm here, also fix getFPOpCost() to do the right thing in more cases, so we don't have to dig into function attributes. Differential Revision: https://reviews.llvm.org/D41522 llvm-svn: 321332
* Add hasProfileData() to check if a function has profile data. NFC.Easwaran Raman2017-12-229-16/+15
| | | | | | | | | | | | | | | | | | | Summary: This replaces calls to getEntryCount().hasValue() with hasProfileData that does the same thing. This refactoring is useful to do before adding synthetic function entry counts but also a useful cleanup IMO even otherwise. I have used hasProfileData instead of hasRealProfileData as David had earlier suggested since I think profile implies "real" and I use the phrase "synthetic entry count" and not "synthetic profile count" but I am fine calling it hasRealProfileData if you prefer. Reviewers: davidxl, silvas Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41461 llvm-svn: 321331
* [DWARF] Fix formatting bug with r321295. This fixes a MIPS buildbot failure.Wolfgang Pieb2017-12-221-1/+1
| | | | llvm-svn: 321330
* [X86] Use SIGN_EXTEND rather than ZERO_EXTEND for lowering ↵Craig Topper2017-12-211-2/+2
| | | | | | | | extract_vector_elt from vXi1 with a non-const index. We have a better range of instructions we can use if we can fill with the value i1 value rather than zeroing. llvm-svn: 321315
* [ModRefInfo] Add must alias info to ModRefInfo.Alina Sbirlea2017-12-215-27/+166
| | | | | | | | | | | | | | | | | | | | | | Summary: Add an additional bit to ModRefInfo, ModRefInfo::Must, to be cleared for known must aliases. Shift existing Mod/Ref/ModRef values to include an additional most significant bit. Update wrappers that modify ModRefInfo values to reflect the change. Notes: * ModRefInfo::Must is almost entirely cleared in the AAResults methods, the remaining changes are trying to preserve it. * Only some small changes to make custom AA passes set ModRefInfo::Must (BasicAA). * GlobalsModRef already declares a bit, who's meaning overlaps with the most significant bit in ModRefInfo (MayReadAnyGlobal). No changes to shift the value of MayReadAnyGlobal (see AlignedMap). FunctionInfo.getModRef() ajusts most significant bit so correctness is preserved, but the Must info is lost. * There are cases where the ModRefInfo::Must is not set, e.g. 2 calls that only read will return ModRefInfo::NoModRef, though they may read from exactly the same location. Reviewers: dberlin, hfinkel, george.burgess.iv Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D38862 llvm-svn: 321309
* [X86] When lowering truncates to vXi1, don't sign extend i16/i8 types to ↵Craig Topper2017-12-211-1/+2
| | | | | | | | 512-bit if we have VLX. This should only affect what we do for v8i16. Previously we went to v8i64, but if we have VLX we only need v8i32. This prevents an unnecessary zmm usage. llvm-svn: 321303
* [DWARF v5] Rework of string offsets table readerWolfgang Pieb2017-12-212-66/+206
| | | | | | | | | | | | | Reorganizes the DWARF consumer to derive the string offsets table contribution's format from the contribution header instead of (incorrectly) from the unit's format. Reviewers: JDevliegehere, aprantl Differential Revision: https://reviews.llvm.org/D41146 llvm-svn: 321295
* [X86] Promote v8i1 shuffles to v8i32 instead of v8i64 if we have VLX.Craig Topper2017-12-211-1/+3
| | | | | | | | We should have equally good shuffle options for v8i32 with VLX. This was spotted during my attempts to remove 512-bit vectors from SKX. We still use 512-bits for v16i1, v32i1, and v64i1. I'm less sure we can handle those well with narrower vectors. i32 and i64 element sizes get the best shuffle support. llvm-svn: 321291
* [X86][SSE] Split large PAVGB/PAVGW vectors to legal widthsSimon Pilgrim2017-12-211-15/+31
| | | | | | | | Patch to allow detectAVGPattern handle vectors larger than the legal size (128 SSE2, 256 AVX2, 512 AVX512BW), splitting the vectors accordingly. Differential Revision: https://reviews.llvm.org/D41440 llvm-svn: 321288
* [YAML] Fix UTF-8 handlingFrancis Visoiu Mistrih2017-12-211-1/+6
| | | | | | | | Previous YAML quoting patches broke UTF-8 printing in YAML: see https://reviews.llvm.org/D41290#961801. Differential Revision: https://reviews.llvm.org/D41490 llvm-svn: 321283
* [DAGCombiner] Remove (xor (xor x, c1), c2) -> (xor x, (xor c1, c2)) fold. NFCI.Simon Pilgrim2017-12-211-15/+0
| | | | | | More general cases are already handled by constant canonicalization and then the ReassociateOps call at line 5327 llvm-svn: 321280
* [DAGCombiner] Generalize (or (and X, c1), c2) -> (and (or X, c2), c1|c2) ↵Simon Pilgrim2017-12-211-10/+10
| | | | | | | | combine to work on non-splat vectors The knownbits_mask_or_shuffle_uitofp change is interesting - shuffle combines manage to kick in, removing the AND constant mask load. For targets with fast-variable-shuffle this should reduce further to VPOR+VPSHUFB+VCVTDQ2PS. llvm-svn: 321279
* [PowerPC] Fix parest build failure in SPEC2017.Tony Jiang2017-12-211-5/+6
| | | | | | | | | | | | | | | | The build failure was caused by an assertion in pre-legalization DAGCombine: Combining: t6: ppcf128 = uint_to_fp t5 ... into: t20: f32 = PPCISD::FCFIDUS t19 which is clearly wrong since ppcf128 are definitely different type with f32 and we cannot change the node value type when do DAGCombine. The fix is don't handle ppc_fp128 or i1 conversions in PPCTargetLowering::combineFPToIntToFP and leave it to downstream to legalize it and expand it to small legal types. Differential Revision: https://reviews.llvm.org/D41411 llvm-svn: 321276
* [DAGCombiner] Generalize (and (or x, C), D) -> D iff (C & D) == D combine to ↵Simon Pilgrim2017-12-211-4/+6
| | | | | | work on non-splat vectors llvm-svn: 321275
* [DAGCombine] Improve ReduceLoadWidth for SRLSam Parker2017-12-211-0/+26
| | | | | | | | | | | If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 321259
* [Support] Remove MemoryBuffer::getNewUninitMemBufferPavel Labath2017-12-211-8/+3
| | | | | | | | | | | There is nothing useful that can be done with a read-only uninitialized buffer without const_casting its contents to initialize it. A better solution is to obtain a writable buffer (WritableMemoryBuffer::getNewUninitMemBuffer), and then convert it to a read-only buffer after initialization. All callers of this function have already been updated to do this, so this function is now unused. llvm-svn: 321257
* [ARM] Armv8-R DFB instructionSam Parker2017-12-215-5/+21
| | | | | | | | Implement MC support for the Armv8-R 'Data Full Barrier' instruction. Differential Revision: https://reviews.llvm.org/D41430 llvm-svn: 321256
* [X86] Use PSHUFB for v32i16 shuffles before falling back to VPERMW/VPERMI2W.Craig Topper2017-12-211-0/+4
| | | | | | PSHUFB has the ability to implicitly 0 elements which VPERMI2W can't do. So give a chance to use it first. llvm-svn: 321251
* [X86] Use VPERMI2B for v16i8 shuffles if we have VBMI+VLX and would have ↵Craig Topper2017-12-211-13/+17
| | | | | | otherwise used two PSHUFBs ORed together. llvm-svn: 321249
* [X86] Use VPERMB/VPERMI2B for v32i8 shuffle lowering if VBMI and VLX are ↵Craig Topper2017-12-211-0/+4
| | | | | | supported. llvm-svn: 321248
* [WebAssembly] Remove unneeded sub-directorySam Clegg2017-12-212-2/+2
| | | | | | | | | This is the only wasm def (and likely likely will be for the foreseeable) file so no need for a sub-directory Differential Revision: https://reviews.llvm.org/D41476 llvm-svn: 321246
* Revert "Expose a TargetMachine::getTargetTransformInfo function"Sanjoy Das2017-12-2128-71/+84
| | | | | | This reverts commit r321234. It breaks the -DBUILD_SHARED_LIBS=ON build. llvm-svn: 321243
* [WebAssembly] Fix local references to weak aliasesSam Clegg2017-12-212-11/+20
| | | | | | | | | | | | | | | | When weak aliases are used with in same translation unit we need to be able to directly reference to alias and not just the thing it is aliases. We do this by defining both a wasm import and a wasm export in this case that result in a single Symbol. This change is a partial revert of rL314245. A corresponding lld change address the previous issues we had with this. See: https://github.com/WebAssembly/tool-conventions/issues/34 Differential Revision: https://reviews.llvm.org/D41472 llvm-svn: 321242
* [SimplifyCFG] Avoid quadratic on a predecessors number behavior in ↵Michael Zolotukhin2017-12-211-14/+10
| | | | | | | | | | | | | | | | | | | | | | instruction sinking. If a block has N predecessors, then the current algorithm will try to sink common code to this block N times (whenever we visit a predecessor). Every attempt to sink the common code includes going through all predecessors, so the complexity of the algorithm becomes O(N^2). With this patch we try to sink common code only when we visit the block itself. With this, the complexity goes down to O(N). As a side effect, the moment the code is sunk is slightly different than before (the order of simplifications has been changed), that's why I had to adjust two tests (note that neither of the tests is supposed to test SimplifyCFG): * test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic the changes that previous implementation of SimplifyCFG would do. * test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common code sinking by a command line flag. llvm-svn: 321236
* Expose a TargetMachine::getTargetTransformInfo functionSanjoy Das2017-12-2128-84/+71
| | | | | | | | | | | | | | | | | | | | | | | Summary: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 llvm-svn: 321234
* Attempt to pacify 4.8.5 with makeArrayRefReid Kleckner2017-12-211-1/+1
| | | | llvm-svn: 321233
* [ARM] Optimize {s,u}{add,sub}.with.overflow.Joel Galenson2017-12-202-2/+72
| | | | | | | | The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during SelectionDAG. This commit ports that code to the ARM backend. Differential revision: https://reviews.llvm.org/D35635 llvm-svn: 321224
* [Hexagon] Use ArrayRef member functions instead of custom onesKrzysztof Parzyszek2017-12-201-19/+10
| | | | llvm-svn: 321221
* [Hexagon] Allow construction of HVX vector predicatesKrzysztof Parzyszek2017-12-206-161/+363
| | | | | | Handle BUILD_VECTOR of boolean values. llvm-svn: 321220
* [Hexagon] Legalize vector elements to i32 in buildVector32/64Krzysztof Parzyszek2017-12-201-15/+22
| | | | llvm-svn: 321218
* bpf: add support for objdump -print-imm-hexYonghong Song2017-12-201-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for 'objdump -print-imm-hex' for imm64, operand imm and branch target. If user programs encode immediate values as hex numbers, such an option will make it easy to correlate asm insns with source code. This option also makes it easy to correlate imm values with insn encoding. There is one changed behavior in this patch. In old way, we print the 64bit imm as u64: O << (uint64_t)Op.getImm(); and the new way is: O << formatImm(Op.getImm()); The formatImm is defined in llvm/MC/MCInstPrinter.h as format_object<int64_t> formatImm(int64_t Value) So the new way to print 64bit imm is i64 type. If a 64bit value has the highest bit set, the old way will print the value as a positive value and the new way will print as a negative value. The new way is consistent with x86_64. For the code (see the test program): ... if (a == 0xABCDABCDabcdabcdULL) ... x86_64 objdump, with and without -print-imm-hex, looks like: 48 b8 cd ab cd ab cd ab cd ab movabsq $-6067004223159161907, %rax 48 b8 cd ab cd ab cd ab cd ab movabsq $-0x5432543254325433, %rax Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 321215
* [X86] Refactor DomainReassignment pass to make the Closure class not stores ↵Craig Topper2017-12-201-78/+89
| | | | | | | | | | | | references to the main data structures of the pass itself Multiple Closure objects can be created and stored for a single function. It's not a good idea to devote so many fields of it to storing pointers and references to global data structures of the pass. The closure class should only store the things needed to represent the closure itself. This patch refactors many of the methods of Closure to belong to the pass object and to pass around a reference to the current Closure. The Closure class gains a few simple methods to add instructions and edges, and to return iterators to edges and instructions Differential Revision: https://reviews.llvm.org/D41327 llvm-svn: 321213
* [ICP] Expose unconditional call promotion interfaceMatthew Simpson2017-12-201-77/+178
| | | | | | | | | | | | | | | | | | | | | | | This patch modifies the indirect call promotion utilities by exposing and using an unconditional call promotion interface. The unconditional promotion interface (i.e., call promotion without creating an if-then-else) can be used if it's known that an indirect call has only one possible callee. The existing conditional promotion interface uses this unconditional interface to promote an indirect call after it has been versioned and placed within the "then" block. A consequence of unconditional promotion is that the fix-up operations for phi nodes in the normal destination of invoke instructions are changed. This is necessary because the existing implementation assumed that an invoke had been versioned, creating a "merge" block where a return value bitcast could be placed. In the new implementation, the edge between a promoted invoke's parent block and its normal destination is split if needed to add a bitcast for the return value. If the invoke is also versioned, the phi node merging the return value of the promoted and original invoke instructions is placed in the "merge" block. Differential Revision: https://reviews.llvm.org/D40751 llvm-svn: 321210
* [X86] Remove zext from vXi32 to vXi64 on indices of gather/scatter ↵Craig Topper2017-12-201-0/+17
| | | | | | | | instructions if we can prove the pre-extended value is positive. Gather/scatter can implicitly sign extend from i32->i64 on indices. So if we know the sign bit of the input to a zext is 0 we can use the implicit extension. llvm-svn: 321209
* DAG: Tolerate non-MemSDNodes for OPC_RecordMemRefMatt Arsenault2017-12-201-8/+24
| | | | | | | | | | | | | | | | | | | | When intrinsics are allowed to have mem operands, there are two ways this can happen. First is an intrinsic that is marked has having a mem operand, but is not handled by getTgtMemIntrinsic. The second way can occur even for intrinsics which do not have a mem operand. It seems the selector table does some kind of sorting based on the opcode, and the mem ref recording can happen in the same scope for intrinsics that both do and do not have mem refs. I haven't been able to figure out exactly why this happens (although it happens even with the matcher optimizations disabled). I'm not sure if it's worth trying to avoid hitting this for these nodes since I think it's still reasonable to handle this in case getTgtMemIntrinic is not implemented. llvm-svn: 321208
* [PowerPC] Added an assert to make sure that the MBBI iterator is valid.Stefan Pintilie2017-12-201-3/+3
| | | | | | | | | | The function createTailCallBranchInstr assumes that the iterator MBBI is valid. However, only one use of MBBI is guarded in the function. Fix this by adding an assert. Differential Revision: https://reviews.llvm.org/D41358 llvm-svn: 321205
* [DAG] Fix condition on overlapping store check.Nirav Dave2017-12-201-2/+2
| | | | | | | Prevent overlapping store elision when overlapping store is pre-inc/dec as analysis is wrong in these cases. llvm-svn: 321204
* [hwasan] Implement -fsanitize-recover=hwaddress.Evgeniy Stepanov2017-12-201-7/+18
| | | | | | | | | | | | Summary: Very similar to AddressSanitizer, with the exception of the error type encoding. Reviewers: kcc, alekseyshl Subscribers: cfe-commits, kubamracek, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41417 llvm-svn: 321203
* [AMDGPU, AsmParser] Enable the mnemonic spell corrector.Matt Arsenault2017-12-201-2/+15
| | | | | | Patch by Dmitry Venikov llvm-svn: 321202
OpenPOWER on IntegriCloud