summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [IntrinsicEmitter] Add overloaded type VecOfBitcastsToInt for SVE intrinsicsKerry McLaughlin2019-10-021-1/+23
| | | | | | | | | | | | | | | | | | | | | Summary: This allows intrinsics such as the following to be defined: - declare <n x 4 x i32> @llvm.something.nxv4f32(<n x 4 x i32>, <n x 4 x i1>, <n x 4 x float>) ...where <n x 4 x i32> is derived from <n x 4 x float>, but the element needs bitcasting to int. Reviewers: c-rhodes, sdesmalen, rovka Reviewed By: c-rhodes Subscribers: tschuett, hiraditya, jdoerfert, llvm-commits, cfe-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68021 llvm-svn: 373437
* Remove an unnecessary cast. NFC.Jay Foad2019-10-021-3/+2
| | | | llvm-svn: 373434
* [AMDGPU] Make printf lowering faster when there are no printfsJay Foad2019-10-021-16/+14
| | | | | | | | | | | | | | | | | Summary: Printf lowering unconditionally visited every instruction in the module. To make it faster in the common case where there are no printfs, look up the printf function (if any) and iterate over its users instead. Reviewers: rampitec, kzhuravl, alex-t, arsenm Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68145 llvm-svn: 373433
* Revert [GlobalOpt] Pass DTU to removeUnreachableBlocks instead of recomputing.Florian Hahn2019-10-021-3/+7
| | | | | | | | This breaks http://lab.llvm.org:8011/builders/sanitizer-windows/builds/52310 This reverts r373430 (git commit 70f70035484ba199a329f9f8d9bd67e37bc2b408) llvm-svn: 373432
* Switch lowering: omit range check for bit tests when default is unreachable ↵Hans Wennborg2019-10-021-17/+23
| | | | | | | | | | | (PR43129) This is modeled after the same functionality for jump tables, which was added in r357067. Differential revision: https://reviews.llvm.org/D68131 llvm-svn: 373431
* [GlobalOpt] Pass DTU to removeUnreachableBlocks instead of recomputing.Florian Hahn2019-10-021-7/+3
| | | | | | | | | | | | | removeUnreachableBlocks knows how to preserve the DomTree, so make use of it instead of re-computing the DT. Reviewers: davide, kuhar, brzycki Reviewed By: davide, kuhar Differential Revision: https://reviews.llvm.org/D68298 llvm-svn: 373430
* [Local] Simplify function removeUnreachableBlocks() to avoid (re-)computation.Florian Hahn2019-10-021-16/+10
| | | | | | | | | | | | | | | | | | Two small changes in llvm::removeUnreachableBlocks() to avoid unnecessary (re-)computation. First, replace the use of count() with find(), which has better time complexity. Second, because we have already computed the set of dead blocks, replace the second loop over all basic blocks to a loop only over the already computed dead blocks. This simplifies the loop and avoids recomputation. Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com> Reviewers: efriedma, spatel, fhahn, xbolva00 Reviewed By: fhahn, xbolva00 Differential Revision: https://reviews.llvm.org/D68191 llvm-svn: 373429
* [llvm-lib] Detect duplicate input filesRui Ueyama2019-10-021-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D68320 llvm-svn: 373426
* [llvm-lib] Correctly handle .lib input filesRui Ueyama2019-10-021-88/+132
| | | | | | | | | | | | | | | | If archive files are passed as input files, llvm-lib needs to append the members of the input archive files to the output file. This patch implements that behavior. This patch splits an existing function into smaller functions. Effectively, the new code is only `if (Magic == file_magic::archive) { ... }` part. Fixes https://bugs.llvm.org/show_bug.cgi?id=32674 Differential Revision: https://reviews.llvm.org/D68204 llvm-svn: 373424
* [X86] Add broadcast load folding patterns to the NoVLX compare patterns.Craig Topper2019-10-021-16/+138
| | | | | | | | | These patterns use zmm registers for 128/256-bit compares when the VLX instructions aren't available. Previously we only supported registers, but as PR36191 notes we can fold broadcast loads, but not regular loads. llvm-svn: 373423
* DebugInfo: Update support for detecting C++ language variants in debug info ↵David Blaikie2019-10-022-4/+5
| | | | | | emission llvm-svn: 373420
* AMDGPU/GlobalISel: Use getIntrinsicID helperMatt Arsenault2019-10-023-7/+7
| | | | llvm-svn: 373417
* AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEXMatt Arsenault2019-10-021-1/+7
| | | | | | | | | In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415
* AMDGPU/GlobalISel: Private loads always use VGPRsMatt Arsenault2019-10-021-4/+6
| | | | llvm-svn: 373414
* AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTORMatt Arsenault2019-10-021-4/+6
| | | | | | This will be needed to support AGPR operations. llvm-svn: 373413
* AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit valuesMatt Arsenault2019-10-021-29/+35
| | | | llvm-svn: 373412
* [AMDGPU] separate accounting for agprsStanislav Mekhanoshin2019-10-023-7/+52
| | | | | | | | | Account and report agprs separately on gfx908. Other targets do not change the reporting. Differential Revision: https://reviews.llvm.org/D68307 llvm-svn: 373411
* [X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are ↵Craig Topper2019-10-011-0/+34
| | | | | | | | | | | | constant with sufficient sign bits to fit in vXi32 The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size. I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold. Differential Revision: https://reviews.llvm.org/D68247 llvm-svn: 373408
* AMDGPU: Fix an out of date assert in addressing FrameIndexChangpeng Fang2019-10-011-3/+2
| | | | | | | | | | Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67574 llvm-svn: 373404
* Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ↵Craig Topper2019-10-011-79/+1
| | | | | | | | | | | | | | | | ops." This seems to be causing some performance regresions that I'm trying to investigate. One thing that stands out is that this transform can increase the live range of the operands of the earlier logic op. This can be bad for register allocation. If there are two logic op inputs we should really combine the one that is closest, but SelectionDAG doesn't have a good way to do that. Maybe we need to do this as a basic block transform in Machine IR. llvm-svn: 373401
* [X86] convertToThreeAddress, make sure second operand of SUB32ri is really ↵Craig Topper2019-10-011-0/+4
| | | | | | | | | | | | | an immediate before calling getImm(). It might be a symbol instead. We can't fold those since we can't negate them. Similar for other SUB with immediates. Fixes PR43529. llvm-svn: 373397
* [FileCheck] Move private interface to its own headerThomas Preud'homme2019-10-012-0/+622
| | | | | | | | | | | | | | | | | | | Summary: Most of the class definition in llvm/include/llvm/Support/FileCheck.h are actually implementation details that should not be relied upon. This commit moves all of it in a new header file under llvm/lib/Support/FileCheck. It also takes advantage of the code movement to put the code into a new llvm::filecheck namespace. Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield Tags: #llvm Differential Revision: https://reviews.llvm.org/D67649 llvm-svn: 373395
* [BypassSlowDivision][CodeGenPrepare] avoid crashing on unused code (PR43514)Sanjay Patel2019-10-011-2/+6
| | | | | | https://bugs.llvm.org/show_bug.cgi?id=43514 llvm-svn: 373394
* [ASan][NFC] Address remaining comments for https://reviews.llvm.org/D68287Leonard Chan2019-10-011-8/+8
| | | | | | | | I submitted that patch after I got the LGTM, but the comments didn't appear until after I submitted the change. This adds `const` to the constructor argument and makes it a pointer. llvm-svn: 373391
* [ASan] Make GlobalsMD member a const reference.Leonard Chan2019-10-011-2/+2
| | | | | | | | | | | | | PR42924 points out that copying the GlobalsMetadata type during construction of AddressSanitizer can result in exteremely lengthened build times for translation units that have many globals. This can be addressed by just making the GlobalsMD member in AddressSanitizer a reference to avoid the copy. The GlobalsMetadata type is already passed to the constructor as a reference anyway. Differential Revision: https://reviews.llvm.org/D68287 llvm-svn: 373389
* [DDG] Data Dependence Graph - Root NodeBardia Mahjour2019-10-012-1/+51
| | | | | | | | | | | | | | | | | | | Summary: This patch adds Root Node to the DDG. The purpose of the root node is to create a single entry node that allows graph walk iterators to iterate through all nodes of the graph, making sure that no node is left unvisited during a graph walk (eg. SCC or DFS). Once the DDG is fully constructed it will have exactly one root node. Every node in the graph is reachable from the root. The algorithm for connecting the root node is based on depth-first-search that keeps track of visited nodes to try to avoid creating unnecessary edges. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack Tag: #llvm Differential Revision: https://reviews.llvm.org/D67970 llvm-svn: 373386
* [MemorySSA] Check for unreachable blocks when getting last definition.Alina Sbirlea2019-10-011-0/+3
| | | | | | | | If a single predecessor is found, still check if the block is unreachable. The test that found this had a self loop unreachable block. Resolves PR43493. llvm-svn: 373383
* [MemorySSA] Update last_access_in_block check.Alina Sbirlea2019-10-011-2/+7
| | | | | | | | | | The check for "was there an access in this block" should be: is the last access in this block and is it not a newly inserted phi. Resolves new test in PR43438. Also fix a typo when simplifying trivial Phis to match the comment. llvm-svn: 373380
* [Dominators][CodeGen] Don't mark MachineDominatorTree as preserved in ↵Jakub Kuderski2019-10-011-1/+0
| | | | | | MachineLICM llvm-svn: 373378
* [Dominators][CodeGen] Fix MachineDominatorTree preservation in PHIEliminationJakub Kuderski2019-10-012-2/+8
| | | | | | | | | | | | | | | | | | | Summary: PHIElimination modifies CFG and marks MachineDominatorTree as preserved. Therefore, it the CFG changes it should also update the MDT, when available. This patch teaches PHIElimination to recalculate MDT when necessary. This fixes the `tailmerging_in_mbp.ll` test failure discovered after switching to generic DomTree verification algorithm in MachineDominators in D67976. Reviewers: arsenm, hliao, alex-t, rampitec, vpykhtin, grosser Reviewed By: rampitec Subscribers: MatzeB, wdng, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68154 llvm-svn: 373377
* Reapply [Dominators][CodeGen] Clean up MachineDominatorsJakub Kuderski2019-10-011-13/+3
| | | | | | | | This reverts r373117 (git commit 159ef37735f21ae373282e0c53cbd9b6af1e0dfd) Phabricator review: https://reviews.llvm.org/D67976. llvm-svn: 373376
* [PGO] Fix typos from r359612. NFC.Rong Xu2019-10-013-9/+9
| | | | llvm-svn: 373369
* AMDGPU/SILoadStoreOptimizer: Add helper functions for working with CombineInfoTom Stellard2019-10-011-205/+244
| | | | | | | | | | | | | | | | | | Summary: This is a refactoring that will make future improvements to this pass easier. This change should not change the behavior of the pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Reviewed By: nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65496 llvm-svn: 373366
* [InstCombine] Deal with -(trunc(X >>u 63)) -> trunc(X >>s 63)Roman Lebedev2019-10-011-12/+25
| | | | | | | | Identical to it's trunc-less variant, just pretent-to hoist trunc, and everything else still holds: https://rise4fun.com/Alive/JRU llvm-svn: 373364
* [InstCombine] Preserve 'exact' in -(X >>u 31) -> (X >>s 31) foldRoman Lebedev2019-10-011-2/+6
| | | | | | https://rise4fun.com/Alive/yR4 llvm-svn: 373363
* [IndVars] An implementation of loop predication without a need for speculationPhilip Reames2019-10-011-12/+138
| | | | | | | | | | | | | | | | This patch implements a variation of a well known techniques for JIT compilers - we have an implementation in tree as LoopPredication - but with an interesting twist. This version does not assume the ability to execute a path which wasn't taken in the original program (such as a guard or widenable.condition intrinsic). The benefit is that this works for arbitrary IR from any frontend (including C/C++/Fortran). The tradeoff is that it's restricted to read only loops without implicit exits. This builds on SCEV, and can thus eliminate the loop varying portion of the any early exit where all exits are understandable by SCEV. A key advantage is that fixing deficiency exposed in SCEV - already found one while writing test cases - will also benefit all of full redundancy elimination (and most other loop transforms). I haven't seen anything in the literature which quite matches this. Given that, I'm not entirely sure that keeping the name "loop predication" is helpful. Anyone have suggestions for a better name? This is analogous to partial redundancy elimination - since we remove the condition flowing around the backedge - and has some parallels to our existing transforms which try to make conditions invariant in loops. Factoring wise, I chose to put this in IndVarSimplify since it's a generally applicable to all workloads. I could split this off into it's own pass, but we'd then probably want to add that new pass every place we use IndVars. One solid argument for splitting it off into it's own pass is that this transform is "too good". It breaks a huge number of existing IndVars test cases as they tend to be simple read only loops. At the moment, I've opted it off by default, but if we add this to IndVars and enable, we'll have to update around 20 test files to add side effects or disable this transform. Near term plan is to fuzz this extensively while off by default, reflect and discuss on the factoring issue mentioned just above, and then enable by default. I also need to give some though to supporting widenable conditions in this framing. Differential Revision: https://reviews.llvm.org/D67408 llvm-svn: 373351
* AMDGPU/GlobalISel: Increase max legal size to 1024Matt Arsenault2019-10-013-10/+13
| | | | | | | | There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350
* [X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load ↵Craig Topper2019-10-016-182/+335
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349
* [AMDGPU] Add VerifyScheduling support.Jay Foad2019-10-012-3/+22
| | | | | | | | | | | | | | | | Summary: This is cut and pasted from the corresponding GenericScheduler functions. Reviewers: arsenm, atrick, tstellar, vpykhtin Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68264 llvm-svn: 373346
* [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook ↵Simon Pilgrim2019-10-012-276/+272
| | | | | | | | | | | | | | | | (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). Partial reversion of rL372756 - I've identified the infinite loop issue inside the X86 override but haven't fixed it yet so I've only (re)committed the common TargetLowering refactoring part of the patch. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 373343
* [Dominators][CodeGen] Add MachinePostDominatorTree verificationJakub Kuderski2019-10-013-8/+17
| | | | | | | | | | | | | | | | | | | | | Summary: This patch implements Machine PostDominator Tree verification and ensures that the verification doesn't fail the in-tree tests. MPDT verification can be enabled using `verify-machine-dom-info` -- the same flag used by Machine Dominator Tree verification. Flipping the flag revealed that MachineSink falsely claimed to preserve CFG and MDT/MPDT. This patch fixes that. Reviewers: arsenm, hliao, rampitec, vpykhtin, grosser Reviewed By: hliao Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68235 llvm-svn: 373341
* Revert rL349624 : Let TableGen write output only if it changed, instead of ↵Simon Pilgrim2019-10-011-24/+8
| | | | | | | | | | doing so in cmake, attempt 2 Differential Revision: https://reviews.llvm.org/D55842 ----------------- As discussed on PR43385 this is causing Visual Studio msbuilds to perpetually rebuild all tablegen generated files llvm-svn: 373338
* Revert [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX)David Bolvansky2019-10-011-29/+12
| | | | | | Seems to be slower than memcpy + strlen. llvm-svn: 373335
* [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX)David Bolvansky2019-10-011-12/+29
| | | | llvm-svn: 373333
* DIExpression::createFragmentExpression - silence static analyzer ↵Simon Pilgrim2019-10-011-0/+1
| | | | | | DIExpression* null dereference warning with an assertion. NFCI. llvm-svn: 373326
* VirtualFileSystem - replace dyn_cast<>+assert with cast<> calls. NFCI.Simon Pilgrim2019-10-011-8/+5
| | | | | | Silences a number of clang static analyzer null dereference warnings. llvm-svn: 373325
* ObjectFile makeTriple - silence static analyzer dyn_cast<COFFObjectFile> ↵Simon Pilgrim2019-10-011-1/+1
| | | | | | | | null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<COFFObjectFile> directly and if not assert will fire for us. llvm-svn: 373324
* InstrProf - avoid static analyzer dyn_cast<ConstantInt> null dereference ↵Simon Pilgrim2019-10-011-4/+2
| | | | | | | | | | warning. The static analyzer is warning about a potential null dereference, as we're already earlying-out for a null Constant pointer I've just folded this into a dyn_cast_or_null<ConstantInt>. No test case, this is by inspection only. llvm-svn: 373322
* ConstantFold - ConstantFoldSelectInstruction - assume constant vector ↵Simon Pilgrim2019-10-011-2/+1
| | | | | | | | elements are constant. NFCI. Goes a bit further than rL372743 which added the early out - elements should be Constant so use cast<Constant> instead (and rely on the assert if anything fails). llvm-svn: 373321
* [yaml2obj] - Allow specifying custom Link values for SHT_HASH section.George Rimar2019-10-011-1/+1
| | | | | | | | This allows setting any sh_link values for SHT_HASH sections. Differential revision: https://reviews.llvm.org/D68214 llvm-svn: 373316
OpenPOWER on IntegriCloud