summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Fix uninitialized variable.Alexander Kornienko2018-11-131-1/+1
| | | | | | | | | | | | | | | | Flags variable was not initialized and later used (both isMBBSafeToOutlineFrom implementations assume it's initialized), which breaks test/CodeGen/AArch64/machine-outliner.mir. under memory sanitizer: MemorySanitizer: use-of-uninitialized-value #0 in llvm::AArch64InstrInfo::getOutliningType(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&, unsigned int) const llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:5494:9 #1 in (anonymous namespace)::InstructionMapper::convertToUnsignedVec(llvm::MachineBasicBlock&, llvm::TargetInstrInfo const&) llvm/lib/CodeGen/MachineOutliner.cpp:772:19 #2 in (anonymous namespace)::MachineOutliner::populateMapper((anonymous namespace)::InstructionMapper&, llvm::Module&, llvm::MachineModuleInfo&) llvm/lib/CodeGen/MachineOutliner.cpp:1543:14 #3 in (anonymous namespace)::MachineOutliner::runOnModule(llvm::Module&) llvm/lib/CodeGen/MachineOutliner.cpp:1645:3 #4 in (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1744:27 #5 in llvm::legacy::PassManagerImpl::run(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1857:44 #6 in compileModule(char**, llvm::LLVMContext&) llvm/tools/llc/llc.cpp:597:8 llvm-svn: 346761
* [CostModel][X86] Fix constant vector XOP rights shiftsSimon Pilgrim2018-11-131-2/+11
| | | | | | | | We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760
* [VectorUtils] Use namespace for InterleaveGroup template specialization.Florian Hahn2018-11-131-4/+6
| | | | llvm-svn: 346759
* [VPlan] VPlan version of InterleavedAccessInfo.Florian Hahn2018-11-135-19/+126
| | | | | | | | | | | | | | | | | This patch turns InterleaveGroup into a template with the instruction type being a template parameter. It also adds a VPInterleavedAccessInfo class, which only contains a mapping from VPInstructions to their respective InterleaveGroup. As we do not have access to scalar evolution in VPlan, we can re-use convert InterleavedAccessInfo to VPInterleavedAccess info. Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D49489 llvm-svn: 346758
* [TTI] Make TargetTransformInfo::getOperandInfo static. NFCI.Simon Pilgrim2018-11-131-2/+1
| | | | | | It has no member dependencies and this makes it easier to reuse in other cost analysis code. llvm-svn: 346755
* Fix comment for XOP rotates. NFCI.Simon Pilgrim2018-11-131-1/+1
| | | | llvm-svn: 346753
* Fix .cfi_restore with register numbers > 64Alexander Richardson2018-11-131-1/+6
| | | | | | | | | | | | | | | | | | | Summary: DW_CFA_restore can only encode register numbers up to 64 (6 bits unsigned int). For regsiter numbers > 64 we have to use DW_CFA_restore_extended instead which uses a ULEB128 value. I discovered this problem in the out-of-tree CHERI target since we use DWARF register number 89 for our return capability register. Reviewers: probinson, dblaikie, aprantl, espindola Reviewed By: dblaikie Subscribers: JohnReagan, emaste, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D54420 llvm-svn: 346751
* Fix modules build of AVRAsmParser.cppAlexander Richardson2018-11-131-3/+4
| | | | | | | | | | | | | | | | | Summary: Without this change I get the following error: lib/Target/AVR/AVRGenAsmMatcher.inc:1135:1: error: redundant #include of module 'LLVM_Utils.Support.Format' appears within namespace 'llvm' [-Wmodules-import-nested-redundant] Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53425 llvm-svn: 346750
* [SystemZ] Increase the number of VLREPsJonas Paulsson2018-11-132-0/+43
| | | | | | | | | | | | | | | If a loaded value is replicated it is best to combine these two operations into a VLREP (load and replicate), but isel will not produce this if the load has other users as well. This patch handles this by putting the other users of the load to use the REPLICATE 0-element instead of the load. This way the load has only the REPLICATE node as user, and we get a VLREP. Review: Ulrich Weigand https://reviews.llvm.org/D54264 llvm-svn: 346746
* [DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector opsCraig Topper2018-11-131-14/+7
| | | | | | | | | | It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner. Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all. Differential Revision: https://reviews.llvm.org/D54285 llvm-svn: 346728
* [BuildingAJIT] Update chapter 2 to use the ORCv2 APIs.Lang Hames2018-11-131-1/+2
| | | | llvm-svn: 346726
* [FileCheck] fixing typo in assertFedor Sergeev2018-11-131-1/+1
| | | | llvm-svn: 346723
* [FileCheck] introduce CHECK-COUNT-<num> repetition directiveFedor Sergeev2018-11-131-103/+146
| | | | | | | | | | | | | | | | | | | | | | | | In some cases it is desirable to match the same pattern repeatedly many times. Currently the only way to do it is to copy the same check pattern as many times as needed. And that gets pretty unwieldy when its more than count is big. Introducing CHECK-COUNT-<num> directive which acts like a plain CHECK directive yet matches the same pattern exactly <num> times. Extended FileCheckType to a struct to add Count there. Changed some parsing routines to handle non-fixed length of directive (all currently existing directives were fixed-length). The code is generic enough to allow future support for COUNT in more than just PlainCheck directives. See motivating example for this feature in reviews.llvm.org/D54223. Reviewed By: chandlerc, dblaikie Differential Revision: https://reviews.llvm.org/D54336 llvm-svn: 346722
* [MachineOutliner][NFC] Simplify isMBBSafeToOutlineFrom check in AArch64 outlinerJessica Paquette2018-11-131-20/+19
| | | | | | | | | Turns out it's way simpler to do this check with one LRU. Instead of maintaining two, just keep one. Check if each of the registers is available, and then check if it's a live out from the block. If it's a live out, but available in the block, we know we're in an unsafe case. llvm-svn: 346721
* Introduce DebugCounter into ConstProp passZhizhou Yang2018-11-131-26/+43
| | | | | | | | | | | | | | | | | | | Summary: This patch introduces DebugCounter into ConstProp pass at per-transformation level. It will provide an option to skip first n or stop after n transformations for the whole ConstProp pass. This will make debug easier for the pass, also providing chance to do transformation level bisecting. Reviewers: davide, fhahn Reviewed By: fhahn Subscribers: llozano, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D50094 llvm-svn: 346720
* [MachineOutliner][NFC] Change getMachineOutlinerMBBFlags to ↵Jessica Paquette2018-11-123-18/+39
| | | | | | | | | | | | isMBBSafeToOutlineFrom Instead of returning Flags, return true if the MBB is safe to outline from. This lets us check for unsafe situations, like say, in AArch64, X17 is live across a MBB without being defined in that MBB. In that case, there's no point in performing an instruction mapping. llvm-svn: 346718
* [InstCombine] narrow width of rotate patterns, part 3Sanjay Patel2018-11-121-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a longer variant for the pattern handled in rL346713 This one includes zexts. Eventually, we should canonicalize all rotate patterns to the funnel shift intrinsics, but we need a bit more infrastructure to make sure the vectorizers handle those intrinsics as well as the shift+logic ops. https://rise4fun.com/Alive/FMn Name: narrow rotateright %neg = sub i8 0, %shamt %rshamt = and i8 %shamt, 7 %rshamtconv = zext i8 %rshamt to i32 %lshamt = and i8 %neg, 7 %lshamtconv = zext i8 %lshamt to i32 %conv = zext i8 %x to i32 %shr = lshr i32 %conv, %rshamtconv %shl = shl i32 %conv, %lshamtconv %or = or i32 %shl, %shr %r = trunc i32 %or to i8 => %maskedShAmt2 = and i8 %shamt, 7 %negShAmt2 = sub i8 0, %shamt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shl2 = lshr i8 %x, %maskedShAmt2 %shr2 = shl i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2 llvm-svn: 346716
* [DWARF] Do not use PRIx32 for printing uint64_t valuesSimon Atanasyan2018-11-121-1/+1
| | | | | | | | | | | | | | | The `DWARFDebugAddrTable::dump` routine prints 32/64-bits addresses. These values are stored in a vector of `uint64_t` independently of their original sizes. But `format` function gets format string with PRIx32 suffix in case of 32-bit address size. At least on MIPS 32-bit targets that leads to incorrect output. This patch changes formats strings and always use PRIx64 to print `uint64_t` values. Differential Revision: http://reviews.llvm.org/D54424 llvm-svn: 346715
* [InstCombine] narrow width of rotate patterns, part 2 (PR39624)Sanjay Patel2018-11-121-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sub-pattern for the shift amount in a rotate can take on several different forms, and there's apparently no way to canonicalize those without seeing the entire rotate sequence. This is the form noted in: https://bugs.llvm.org/show_bug.cgi?id=39624 https://rise4fun.com/Alive/qnT %zx = zext i8 %x to i32 %maskedShAmt = and i32 %shAmt, 7 %shl = shl i32 %zx, %maskedShAmt %negShAmt = sub i32 0, %shAmt %maskedNegShAmt = and i32 %negShAmt, 7 %shr = lshr i32 %zx, %maskedNegShAmt %rot = or i32 %shl, %shr %r = trunc i32 %rot to i8 => %truncShAmt = trunc i32 %shAmt to i8 %maskedShAmt2 = and i8 %truncShAmt, 7 %shl2 = shl i8 %x, %maskedShAmt2 %negShAmt2 = sub i8 0, %truncShAmt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shr2 = lshr i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2 llvm-svn: 346713
* [GC][NFC] Simplify code now that we only have one safepoint kindPhilip Reames2018-11-123-17/+7
| | | | | | This is the NFC follow up to exploit the semantic simplification from r346701 llvm-svn: 346712
* [InstCombine] refactor code for matching shift amount of a rotate; NFCSanjay Patel2018-11-121-12/+17
| | | | | | | | As shown in existing test cases and with: https://bugs.llvm.org/show_bug.cgi?id=39624 ...we're missing at least 2 more patterns for rotate narrowing. llvm-svn: 346711
* Use a data structure better suited for large sets in SimplificationTracker.Ali Tamur2018-11-121-11/+156
| | | | | | | | | | | | | | | | | Summary: D44571 changed SimplificationTracker to use SmallSetVector to keep phi nodes. As a result, when the number of phi nodes is large, the build time performance suffers badly. When building for power pc, we have a case where there are more than 600.000 nodes, and it takes too long to compile. In this change, I partially revert D44571 to use SmallPtrSet, which does an acceptable job with any number of elements. In the original patch, having a deterministic iteration order was mentioned as a motivation, however I think it only applies to the nodes already matched in MatchPhiSet method, which I did not touch. Reviewers: bjope, skatkov Reviewed By: bjope, skatkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54007 llvm-svn: 346710
* [X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387)Simon Pilgrim2018-11-121-8/+115
| | | | | | | | This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location. Differential Revision: https://reviews.llvm.org/D54267 llvm-svn: 346706
* AMDGPU: Adding more median3 patternsAakanksha Patil2018-11-122-9/+22
| | | | | | | | min(max(a, b), max(min(a, b), c)) -> med3 a, b, c Differential Revision: https://reviews.llvm.org/D54331 llvm-svn: 346704
* [GC] Remove so called PreCall safepointsPhilip Reames2018-11-122-9/+2
| | | | | | | Remove another bit of unused configuration potential from GCStrategy. It's not entirely clear what the intention here was, but from the docs, it sounds like this may have been subsumed by patchable call support. Note: This change is deliberately small to make it clear that while implemented, there's nothing using the option. A following NFC will do most of the simplifications. llvm-svn: 346701
* [WebAssembly] Added WasmAsmParser.Wouter van Oortmerssen2018-11-124-79/+180
| | | | | | | | | | | | | | | | | | | | | | | Summary: This is to replace the ELFAsmParser that WebAssembly was using, which so far was a stub that didn't do anything, and couldn't work correctly with wasm. This new class is there to implement generic directives related to wasm as a binary format. Wasm target specific directives are still parsed in WebAssemblyAsmParser as before. The two classes now cooperate more correctly too. Also implemented .result which was missing. Any unknown directives will now result in errors. Reviewers: dschuff, sbc100 Subscribers: mgorny, jgravelle-google, eraman, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54360 llvm-svn: 346700
* [GC][InstCombine] Fix a potential iteration issuePhilip Reames2018-11-121-1/+4
| | | | | | Noticed via inspection. Appears to be largely innocious in practice, but slight code change could have resulted in either visit order dependent missed optimizations or infinite loops. May be a minor compile time problem today. llvm-svn: 346698
* [X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of ↵Craig Topper2018-11-121-13/+18
| | | | | | | | | | directly emitting PACKUS. Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis. This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not. llvm-svn: 346697
* NFC: DebugInfo: Reduce scope of DebugOffset to simplify codeDavid Blaikie2018-11-121-31/+33
| | | | | | | | | | | This was being used as a sort of indirect out parameter from shouldDump - seems simpler to use it as the actual result of the call. (this does mean using a pointer to an Optional & actually using all 3 states (null, None, and present) which is, admittedly, a tad subtle - but given the limited scope, seems OK to me - open to discussion though, if others feel strongly about it) llvm-svn: 346691
* [AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]ZStanislav Mekhanoshin2018-11-121-0/+97
| | | | | | | | | | | | | | | | | | | Sometimes after basic block placement we end up with a code like: sreg = s_mov_b64 -1 vcc = s_and_b64 exec, sreg s_cbranch_vccz This happens as a join of a block assigning -1 to a saved mask and another block which consumes that saved mask with s_and_b64 and a branch. This is essentially a single s_cbranch_execz instruction when moved into a single new basic block. Differential Revision: https://reviews.llvm.org/D54164 llvm-svn: 346690
* [CostModel][X86] Add funnel shift rotation special case costsSimon Pilgrim2018-11-121-1/+82
| | | | | | When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688
* Fix MachineInstr::findRegisterUseOperandIdx subreg checksStanislav Mekhanoshin2018-11-121-3/+1
| | | | | | | | | | | | The function only checks that instruction reads a super-register containing requested physical register. In case if a sub-register if being read that is also a use of a super-reg, so added the check. In particular MI->readsRegister() is broken because of the missing check. The resulting check is essentially regsOverlap(). Differential Revision: https://reviews.llvm.org/D54128 llvm-svn: 346686
* [CostModel][X86] Add SHLD/SHRD scalar funnel shift costsSimon Pilgrim2018-11-121-2/+11
| | | | | | The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683
* [MachineOutliner][NFC] Early exit pruning when candidates don't share an MBBJessica Paquette2018-11-121-0/+8
| | | | | | | | | | There's no way they can overlap in this case. This can save a few iterations when the candidate is close to the beginning of a MachineBasicBlock. It's particularly useful when the average length of a MachineBasicBlock in the program is small. llvm-svn: 346682
* [MachineOutliner][NFC] Put suffix tree in buildCandidateListJessica Paquette2018-11-121-6/+5
| | | | | | It's only used there, so it doesn't make much sense to have it in runOnModule. llvm-svn: 346681
* [DWARFv5] Emit split type units in .debug_info.dwo.Paul Robinson2018-11-121-4/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D54350 llvm-svn: 346674
* [CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is ↵Simon Pilgrim2018-11-121-5/+13
| | | | | | aligned within the source vector llvm-svn: 346664
* [SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversionsJonas Paulsson2018-11-121-1/+2
| | | | | | | | | | | | | | Improve getCastInstrCost() by respecting the different types of Src and Dst for vector integer <-> fp conversions. This means that extracting from integer becomes more expensive (by the extraction penalty), and the extraction from fp becomes cheaper (no longer has a false extraction penalty). Review: Ulrich Weigand https://reviews.llvm.org/D54423 llvm-svn: 346663
* [VectorUtils] add funnel-shifts to the list of vectorizable intrinsicsSanjay Patel2018-11-121-0/+2
| | | | | | | | | | | | | | | | This just identifies the intrinsics as candidates for vectorization. It does not mean we will attempt to vectorize under normal conditions (the test file is forcing vectorization). The cost model must be fixed to show that the transform is profitable in general. Allowing vectorization with these intrinsics is required to avoid potential regressions from canonicalizing to the intrinsics from generic IR: https://bugs.llvm.org/show_bug.cgi?id=37417 llvm-svn: 346661
* [VectorUtils] reorder list of vectorizable intrinsics; NFCSanjay Patel2018-11-121-10/+9
| | | | | | | We need to add funnel-shifts to this list, so clean up the random order before it gets worse. llvm-svn: 346660
* [CostModel] Add more realistic SK_ExtractSubvector generic costs.Simon Pilgrim2018-11-121-1/+2
| | | | | | | | Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type. llvm-svn: 346656
* [RISCV] Support .option relax and .option norelaxAlex Bradbury2018-11-127-99/+177
| | | | | | | | | | | | | | | | | | | | | | This extends the .option support from D45864 to enable/disable the relax feature flag from D44886 During parsing of the relax/norelax directives, the RISCV::FeatureRelax feature bits of the SubtargetInfo stored in the AsmParser are updated appropriately to reflect whether relaxation is currently enabled in the parser. When an instruction is parsed, the parser checks if relaxation is currently enabled and if so, gets a handle to the AsmBackend and sets the ForceRelocs flag. The AsmBackend uses a combination of the original RISCV::FeatureRelax feature bits set by e.g -mattr=+/-relax and the ForceRelocs flag to determine whether to emit relocations for symbol and branch diffs. Diff relocations should therefore only not be emitted if the relax flag was not set on the command line and no instruction was ever parsed in a section with relaxation enabled to ensure correct diffs are emitted. Differential Revision: https://reviews.llvm.org/D46423 Patch by Lewis Revill. llvm-svn: 346655
* [DAGCombiner] Fix load-store forwarding of indexed loads.Nirav Dave2018-11-121-3/+17
| | | | | | | | | | | | | | | | Summary: Handle extra output from index loads in cases where we wish to forward a load value directly from a preceeding store. Fixes PR39571. Reviewers: peter.smith, rengolin Subscribers: javed.absar, hiraditya, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54265 llvm-svn: 346654
* Add an OptimizerLast EPPhilip Pfaffe2018-11-121-0/+3
| | | | | | | | | | | | | | | | | Summary: It turns out that we need an OptimizerLast PassBuilder extension point after all. I missed the relevance of this EP the first time. By legacy PM magic, function passes added at this EP get added to the last _Function_ PM, which is a feature we lost when dropping this EP for the new PM. A key difference between this and the legacy PassManager's OptimizerLast callback is that this extension point is not triggered at O0. Extensions to the O0 pipeline should append their passes to the end of the overall pipeline. Differential Revision: https://reviews.llvm.org/D54374 llvm-svn: 346645
* [LICM] Hoist guards from non-header blocksMax Kazantsev2018-11-123-11/+39
| | | | | | | | | | | This patch relaxes overconservative checks on whether or not we could write memory before we execute an instruction. This allows us to hoist guards out of loops even if they are not in the header block. Differential Revision: https://reviews.llvm.org/D50891 Reviewed By: fedor.sergeev llvm-svn: 346643
* [GCOV] Add options to filter files which must be instrumented.Calixte Denizet2018-11-121-2/+82
| | | | | | | | | | | | | | | | | | | | Summary: When making code coverage, a lot of files (like the ones coming from /usr/include) are removed when post-processing gcno/gcda so finally they doen't need to be instrumented nor to appear in gcno/gcda. The goal of the patch is to be able to filter the files we want to instrument, there are several advantages to do that: - improve speed (no overhead due to instrumentation on files we don't care) - reduce gcno/gcda size - it gives the possibility to easily instrument only few files (e.g. ones modified in a patch) without changing the build system - need to accept this patch to be enabled in clang: https://reviews.llvm.org/D52034 Reviewers: marco-c, vsk Reviewed By: marco-c Subscribers: llvm-commits, sylvestre.ledru Differential Revision: https://reviews.llvm.org/D52033 llvm-svn: 346641
* [SystemZ] Replicate the load with most uses in buildVector()Jonas Paulsson2018-11-121-8/+11
| | | | | | | | | | | Iterate over all elements and count the number of uses among them for each used load. Then make sure to REPLICATE the load which has the most uses in order to minimize the number of needed element insertions. Review: Ulrich Weigand https://reviews.llvm.org/D54322 llvm-svn: 346637
* [GC] Remove unused configuration variablePhilip Reames2018-11-121-6/+1
| | | | | | The custom root mechanism didn't actually do anything. ShadowStackGC, the only one which used it, just removed the gcroots before they reached the normal lowering in SelectionDAG. As a result, the state flag had no value. llvm-svn: 346632
* [GC] Minor style modernizationPhilip Reames2018-11-121-44/+43
| | | | llvm-svn: 346631
* [GCRoot] Remove some unneccessary complexityPhilip Reames2018-11-112-50/+33
| | | | | | | | | The GCStrategy provides three configuration options were are largely redundant. 1) Support for conditionally lowering gcread and gcwrite to loads and stores. This is redundant since any GC which wished to use these abstractions would lower them out of existance before the built in lowering anyways. As such, there's no need to have the lowering being conditional. 2) Conditional initialization for allocas marked via gcroot. Semantically, roots have to be initialized before first potential use. Arguably, the frontend really should have responsibility for that, but the old API allowed the frontend to ignore this detail. Only one builtin GC used the non-initializing mode. Since no one to my knowledge actually uses the ErlangGC strategy, I decide the slight pessimization was worth the simplicity. If that turns out to be problematic, we can always improve the insertion algorithm to detect more existing initializing stores. llvm-svn: 346621
OpenPOWER on IntegriCloud