summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [LoopPred] Fix two subtle issues found by inspectionPhilip Reames2019-11-062-8/+37
| | | | | | | | | | | | This patch fixes two issues noticed by inspection when going to enable the loop predication code in IndVarSimplify. Issue 1 - Both the LoopPredication transform, and the already on by default optimizeLoopExits transform, modify the exit count of the exits they modify. (either to 0 or Infinity) Looking at the code more closely, this was not reflected into SCEV and we were instead running later transforms with incorrect SCEVs. Fixing this requires forgetting the loop, weakening a too strong assert, and updating SCEV to not pessimize results when a loop is provable untaken. I haven't been able to find a test case to demonstrate the miscompile. Issue 2 - For modules without a data layout, we can end up with unsized pointer typed exit counts. Just bail out of this case. I think these are the last two issues which need addressed before we enable this by default. The code has already survived a decent amount of fuzzing without revealing either of the above. Differential Revision: https://reviews.llvm.org/D69695
* [X86] Clamp large constant shift amounts for MMX shift intrinsics to 8-bits.Craig Topper2019-11-061-2/+5
| | | | | | | | | | | | | | | | | | The MMX intrinsics for shift by immediate take a 32-bit shift amount but the hardware for shifting by immediate only encodes 8-bits. For the intrinsic we don't require the shift amount to fit in 8-bits in the frontend because we don't check that its an immediate in the frontend. If its is not an immediate we move it to an MMX register and use the shift by register. But if it is an immediate we'll use the shift by immediate instruction. But we need to change the shift amount to 8-bits. We were previously doing this accidentally by masking it in the encoder. But this can make a large shift amount into a small in bounds shift amount. Instead we should clamp larger shift amounts to 255 so that the they don't become in bounds. Fixes PR43922
* [AArch64] Re-add patterns for (s/u)mull2.Eli Friedman2019-11-061-0/+19
| | | | | | | These patterns were added in D46009, but removed in D54276 due to missing test coverage. Differential Revision: https://reviews.llvm.org/D69831
* Fix a typo in my previous commitSteven Wu2019-11-061-1/+1
|
* [Object][MachO] Rewrite macho-invalid-fat-arch-size into YAMLSteven Wu2019-11-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: Rewrite one of the invalid macho test input file with YAML file. The original invalid macho is breaking our internal test infrastusture because it is too broken to be copy around. Need to relax an assertion in the YAML/MachoEmitter to allow yaml2obj to write an invalid object like this. rdar://problem/56879982 Reviewers: beanz, mtrent Reviewed By: beanz Subscribers: hiraditya, jkorous, dexonsmith, ributzka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69856
* [X86TargetTransformInfo] Fixed warning: Expression 'ISD == ISD::UREM' is ↵Dávid Bolvanský2019-11-061-1/+1
| | | | always true. NFCI.
* [X86] Fix SLM v2i64 ADD/Sub/CMPEQ instruction schedulesSimon Pilgrim2019-11-061-0/+16
| | | | | | Noticed while fixing the reduction costs for D59710 - the SLM model doesn't account for the poor throughput of v2i64 ops. Numbers taken from Intel AOM (+ checked against Agner)
* [X86] Fix SLM v2f64 ADD/MUL + FP BLEND/HADD instruction schedulesSimon Pilgrim2019-11-061-7/+7
| | | | Noticed while fixing the reduction costs for D59710 - the SLM model doesn't account for the poor throughput of v2f64/v2i64 ops.
* [X86ISelLowering] Fixed typo in assert. NFCI.Dávid Bolvanský2019-11-061-1/+1
|
* [CostModel][X86] Improve add vXi64 + fadd vXf64 reduction tests for SLMSimon Pilgrim2019-11-061-0/+26
| | | | As noted on D59710 we weren't handling the high costs of these operations on SLM.
* [CommandLine] Add inline ArgName printingDon Hinton2019-11-061-14/+20
| | | | | | | | | | | | | | | | | | | | | Summary: This patch adds PrintArgInline (after PrintArg) that strips the leading spaces from an argument before printing them, for usage inline. Related bug: PR42943 <https://bugs.llvm.org/show_bug.cgi?id=42943> Patch by Daan Sprenkels! Reviewers: jhenderson, chandlerc, hintonda Reviewed By: jhenderson Subscribers: hiraditya, kristina, llvm-commits, dsprenkels Tags: #llvm Differential Revision: https://reviews.llvm.org/D69501
* DWARFDebugLoclists: Move to a incremental parsing modelPavel Labath2019-11-063-123/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch stems from the discussion D68270 (including some offline talks). The idea is to provide an "incremental" api for parsing location lists, which will avoid caching or materializing parsed data. An additional goal is to provide a high level location list api, which abstracts the differences between different encoding schemes, and can be used by users which don't care about those (such as LLDB). This patch implements the first part. It implements a call-back based "visitLocationList" api. This function parses a single location list, calling a user-specified callback for each entry. This is going to be the base api, which other location list functions (right now, just the dumping code) are going to be based on. Future patches will do something similar for the v4 location lists, and add a mechanism to translate raw entries into concrete address ranges. Reviewers: dblaikie, probinson, JDevlieghere, aprantl, SouraVX Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69672
* [NFC][APInt] Fix typos in comments.Miloš Stojanović2019-11-061-3/+3
| | | | Testing git commit access.
* [x86] avoid crashing when splitting AVX stores with non-simple type (PR43916)Sanjay Patel2019-11-061-3/+5
| | | | | The store splitting transform was assuming a simple type (MVT), but that's not necessarily the case as shown in the test.
* [Support] fix mingw-w64 buildIlya Biryukov2019-11-061-1/+1
| | | | | | | | | | | Older versions of Mingw-w64 do not define _beginthreadex_proc_type, so we replace it with `unsigned (__stdcall *ThreadFunc)(void *)`. Fixes https://github.com/clangd/clangd/issues/188 Patch by lh123! Differential Revision: https://reviews.llvm.org/D69879
* [X86] Fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-11-067-26/+26
|
* [X86] LowerAVXExtend - fix dodgy self-comparison assert.Simon Pilgrim2019-11-061-1/+1
| | | | PVS Studio noticed that we were asserting "VT.getVectorNumElements() == VT.getVectorNumElements()" instead of "VT.getVectorNumElements() == InVT.getVectorNumElements()".
* [AArch64] Move the branch relaxation pass after BTI insertionMomchil Velikov2019-11-061-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Inserting BTI instructions can push branch destinations out of range. The branch relaxation pass itself cannot insert indirect branches since `TargetInstrInfo::insertIndirecrtBranch` is not implemented for AArch64 (guess +/-128 MB direct branch range is more than enough in practice). Testing this is a bit tricky. The original test case we have is 155kloc/6.1M. I've generated a test case using this program: ``` int main() { std::cout << R"src(int test(); void g0(), g1(), g2(), g3(), g4(), e(); void f(int v) { if ((test() & 2) == 0) { switch (v) { case 0: g0(); case 1: g1(); case 2: g2(); case 3: g3(); } )src"; const int N = 8176; for (int i = 0; i < N; ++i) std::cout << " void h" << i << "();\n"; for (int i = 0; i < N; ++i) std::cout << " h" << i << "();\n"; std::cout << R"src( } else { e(); } } )src"; } ``` which is still a bit too much to commit as a regression test, IMHO. Reviewers: t.p.northover, ostannard Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69118 Change-Id: Ide5c922bcde08ff4cf635da5e52365525a997a0a
* [LoopUnroll] countToEliminateCompares(): fix handling of [in]equality ↵Roman Lebedev2019-11-061-16/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | predicates (PR43840) Summary: I believe this bisects to https://reviews.llvm.org/D44983 (`[LoopUnroll] Only peel if a predicate becomes known in the loop body.`) While that revision did contain tests that showed arguably-subpar peeling for [in]equality predicates that [not] happen in the middle of the loop, it also disabled peeling for the *first* loop iteration, because latch would be canonicalized to [in]equality comparison.. That was intentional as per https://reviews.llvm.org/D44983#1059583. I'm not 100% sure that i'm using correct checks here, but this fix appears to be going in the right direction.. Let me know if i'm missing some checks here.. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43840 | PR43840 ]]. Reviewers: fhahn, mkazantsev, efriedma Reviewed By: fhahn Subscribers: xbolva00, hiraditya, zzheng, llvm-commits, fhahn Tags: #llvm Differential Revision: https://reviews.llvm.org/D69617
* [AMDGPU] Improve code size cost model (part 2)dfukalov2019-11-061-18/+98
| | | | | | | | | | | | | | Summary: Added estimations for ShuffleVector, some cast and arithmetic instructions Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69629
* [TTI][LV] preferPredicateOverEpilogueSjoerd Meijer2019-11-064-5/+70
| | | | | | | | | | | | | | | | | We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040
* [ARM,MVE] Add intrinsics for gather/scatter load/stores.Simon Tatham2019-11-061-39/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
* YAML parser robustness improvementsThomas Finch2019-11-052-14/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch fixes a number of bugs found in the YAML parser through fuzzing. In general, this makes the parser more robust against malformed inputs. The fixes are mostly improved null checking and returning errors in more cases. In some cases, asserts were changed to regular errors, this provides the same robustness but also protects release builds from the triggering conditions. This also improves the fuzzability of the YAML parser since asserts can act as a roadblock to further fuzzing once they're hit. Each fix has a corresponding test case: - TestAnchorMapError - Added proper null pointer handling in `Stream::printError` if N is null and `KeyValueNode::getValue` if getKey returns null, `Input::createHNodes` `dyn_casts` changed to `dyn_cast_or_null` so the null pointer checks are actually able to fail - TestFlowSequenceTokenErrors - Added case in `Document::parseBlockNode` for FlowMappingEnd, FlowSequenceEnd, or FlowEntry tokens outside of mappings or sequences - TestDirectiveMappingNoValue - Changed assert to regular error return in `Scanner::scanValue` - TestUnescapeInfiniteLoop - Fixed infinite loop in `ScalarNode::unescapeDoubleQuoted` by returning an error for unrecognized escape codes - TestScannerUnexpectedCharacter - Changed asserts to regular error returns in `Scanner::consume` - TestUnknownDirective - For both of the inputs the stream doesn't fail and correctly returns TK_Error, but there is no valid root node for the document. There's no reasonable way to make the scanner fail for unknown directives without breaking the YAML spec (see spec-07-01.test). I think the assert is unnecessary given that an error is still generated for this case. The `SimpleKeys.clear()` line fixes a bug found by AddressSanitizer triggered by multiple test cases - when TokenQueue is cleared SimpleKeys is still holding dangling pointers into it, so SimpleKeys should be cleared as well. Patch by Thomas Finch! Reviewers: chandlerc, Bigcheese, hintonda Reviewed By: Bigcheese, hintonda Subscribers: hintonda, kristina, beanz, dexonsmith, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61608
* [PowerPC] Fix the incorrect 'RM' flag set on load/store instrQingShan Zhang2019-11-061-1/+1
| | | | | | The 'RM' flag model the "Rounding Mode" and it has nothing to do with the load/store instructions. Differential Revision: https://reviews.llvm.org/D69551
* Implement `sys::getHostCPUName()` for Darwin ARMChris Bieneman2019-11-051-1/+28
| | | | | | | | | | | | | | Summary: Currently there is no implementation of `sys::getHostCPUName()` for Darwin ARM targets. This patch makes it so that LLVM running on ARM makes reasonable guesses about the CPU features of the host CPU. Reviewers: t.p.northover, lhames, efriedma Reviewed By: efriedma Subscribers: rjmccall, efriedma, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69597
* [IRMover] Set Address Space for moved global valuesTeresa Johnson2019-11-051-4/+5
| | | | | | | | | | | | | | | | | | | Summary: Set Address Space when creating a new function (from another). Fix PR41154. Patch by Ehud Katz <ehudkatz@gmail.com> Reviewers: tejohnson, chandlerc Reviewed By: tejohnson Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69361
* [AMDGPU] Add missing flags to DS_RealStanislav Mekhanoshin2019-11-051-0/+2
| | | | Differential Revision: https://reviews.llvm.org/D69867
* [IRMover] Use GlobalValue::getAddressSpace instead of directly from its type ↵Teresa Johnson2019-11-051-10/+10
| | | | | | | | | | | | | | | | | | [NFC] Summary: Change the old form of G->getType()->getAddressSpace() to the new G->getAddressSpace() (underneath does the same). Patch by Ehud Katz <ehudkatz@gmail.com> Reviewers: tejohnson, chandlerc Reviewed By: tejohnson Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69550
* [mips] Fix `getRegForInlineAsmConstraint` to do not crash on empty ConstraintSimon Atanasyan2019-11-061-4/+6
|
* [LoopRotationUtils] Check values are newly inserted into maps.Alina Sbirlea2019-11-051-5/+14
| | | | | This is a cleanup that came up in D63680. All values added to the ValueMaps should be newly added.
* [Hexagon] getCompoundCandidateGroup - fix 'false' value is implicitly cast ↵Simon Pilgrim2019-11-051-5/+5
| | | | | | to unsigned warning. NFCI. Consistently return HexagonII::HCG_None.
* [X86/Atomics] Correct a few transforms for new atomic loweringPhilip Reames2019-11-051-4/+3
| | | | | | This is a partial fix for the issues described in commit message of 027aa27 (the revert of G24609). Unfortunately, I can't provide test coverage for it on it's own as the only (known) wrong example is still wrong, but due to a separate issue. These fixes are cases where when performing unrelated DAG combines, we were dropping the atomicity flags entirely.
* [MIR] Add MIR parsing for heap alloc site instruction markersAmy Huang2019-11-055-5/+39
| | | | | | | | | | | | | | | Summary: This patch adds MIR parsing and printing for heap alloc markers, which were added in D69136. They are printed as an operand similar to pre-/post-instr symbols, with a heap-alloc-marker token and a metadata node. Reviewers: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69864
* [X86] Gate select->fmin/fmax transform on NoSignedZeros instead of UnsafeFPMathBenjamin Kramer2019-11-051-8/+7
|
* [AMDGPU] Removed dead code from R600ISelLowering.cppStanislav Mekhanoshin2019-11-051-6/+1
| | | | | | | | | This was added to inhibit a warning from gcc 7.3 according to the comment. However, it triggers warning from PVS. In addition I cannot reproduce it with gcc 7.4 and I also cannot reproduce it with gcc 7.3 using compiler explorer. Differential Revision: https://reviews.llvm.org/D69863
* [X86/Atomics] (Semantically) revert G246098, switch back to the old atomic ↵Philip Reames2019-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | example When writing an email for a follow up proposal, I realized one of the diffs in the committed change was incorrect. Digging into it revealed that the fix is complicated enough to require some thought, so reverting in the meantime. The problem is visible in this diff (from the revert): ; X64-SSE-LABEL: store_fp128: ; X64-SSE: # %bb.0: -; X64-SSE-NEXT: movaps %xmm0, (%rdi) +; X64-SSE-NEXT: subq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 32 +; X64-SSE-NEXT: movaps %xmm0, (%rsp) +; X64-SSE-NEXT: movq (%rsp), %rsi +; X64-SSE-NEXT: movq {{[0-9]+}}(%rsp), %rdx +; X64-SSE-NEXT: callq __sync_lock_test_and_set_16 +; X64-SSE-NEXT: addq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 8 ; X64-SSE-NEXT: retq store atomic fp128 %v, fp128* %fptr unordered, align 16 ret void The problem here is three fold: 1) x86-64 doesn't guarantee atomicity of anything larger than 8 bytes. Some platforms observably break this guarantee, others don't, but the codegen isn't considering this, so it's wrong on at least some platforms. 2) When I started to track down the problem, I discovered that DAGCombiner had stripped the atomicity off the store entirely. This comes down to idiomatic usage of DAG.getStore passing all MMO components separately as opposed to just passing the MMO. 3) On x86 (not -64), there are cases where 8 byte atomiciy is supported, but only for floating point operations. This would seem to imply that operation typing matters for correctness, and DAGCombine happily folds away bitcasts. I'm not 100% sure there's a problem here, but I'm not entirely sure there isn't either. I plan on returning to each issue in turn; sorry for the churn here.
* [AMDGPU] Removed dead code handling M0CopyRegStanislav Mekhanoshin2019-11-051-14/+0
| | | | | | | Static analyzer complains about always false condition. See https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69860
* [X86] Specifically limit fmin/fmax commutativity to NoNaNs + NoSignedZerosBenjamin Kramer2019-11-051-2/+3
| | | | | The backend UnsafeFPMath flag is not a superset of all the others, so limit it to the exact bits needed.
* [globalisel] Rename G_GEP to G_PTR_ADDDaniel Sanders2019-11-0528-82/+84
| | | | | | | | | | | | | | | | Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734
* [AMDGPU] return Fail instead of SolfFail from addOperand()Stanislav Mekhanoshin2019-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | addOperand() method of AMDGPU disassembler returns SoftFail on error. All instances which may lead to that place are an impossible encdoing, not something which is possible to encode, but semantically incorrect as described for SoftFail. Then tablegen generates a check of the following form: if (Decode...(..) == MCDisassembler::Fail) { return MCDisassembler::Fail; } Since we can only return Success and SoftFail that is dead code as detected by the static code analyzer. Solution: return Fail as it should be. See https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69819
* [SLP] - Add couple safety checks to TreeEntry::dump(). NFCSergey Dmitriev2019-11-051-5/+12
| | | | | | | | | | | | | | Summary: Check for MainOp and AltOp for NULL before dereferencing or issue NULL. Reviewers: Vasilis, dtemirbulatov, RKSimon, ABataev Reviewed By: ABataev Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69812
* [JumpThreading] Factor out code to merge basic blocks (NFC)Kazu Hirata2019-11-051-43/+52
| | | | | | | | | | | | | | | Summary: This patch factors out code to merge a basic block with its sole successor -- partly for readability and partly to facilitate an upcoming patch of my own. Reviewers: wmi Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69852
* Remove redundant assignment. NFCI.Simon Pilgrim2019-11-051-1/+0
| | | | Fixes cppcheck warning.
* Use iterator prefix increment. NFCI.Simon Pilgrim2019-11-051-1/+1
|
* [MachineOutliner] Reduce scope of variable and stop duplicate getMF() calls. ↵Simon Pilgrim2019-11-051-3/+3
| | | | NFCI.
* [DFAPacketizer] Allow up to 64 functional unitsjmolloy2019-11-051-54/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: To drive the automaton we used a uint64_t as an action type. This contained the transition's resource requirements as a conjunction: (a OR b) AND (b OR c) We encoded this conjunction as a sequence of four 16-bit bitmasks. This limited the number of addressable functional units to 16, which is quite low and has bitten many people in the past. Instead, the DFAEmitter now generates a lookup table from InstrItinerary class (index of the ItinData inside the ProcItineraries) to an internal action index which is essentially a dense embedding of the conjunctive form. Because we never materialize the conjunctive form, we no longer have the 16 FU restriction. In this patch we limit to 64 functional units due to using a uint64_t bitmask in the DFAEmitter. Now that we've decoupled these representations we can increase this in future. Reviewers: ThomasRaoux, kparzysz, majnemer Reviewed By: ThomasRaoux Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69110
* [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)Gil Rapaport2019-11-055-125/+170
| | | | | | | This recommits 2be17087f8c38934b7fc9208ae6cf4e9b4d44f4b (reverted in d3ec06d219788801380af1948c7f7ef9d3c6100b for heap-use-after-free) with a fix in IAI's reset() which was not clearing the set of interleave groups after deleting them.
* [MachineOutliner] Fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-11-051-1/+1
|
* [ObjC][ARC] Ignore lifetime markers between *ReturnValue callsFrancis Visoiu Mistrih2019-11-051-1/+28
| | | | | | | | | | | | | | | | | | | | | | When eliminating a pair of `llvm.objc.autoreleaseReturnValue` followed by `llvm.objc.retainAutoreleasedReturnValue` we need to make sure that the instructions in between are safe to ignore. Other than bitcasts and useless GEPs, it's also safe to ignore lifetime markers for both static allocas (lifetime.start/lifetime.end) and dynamic allocas (stacksave/stackrestore). These get added by the inliner as part of the return sequence and can prevent the transformation from happening in practice. Differential Revision: https://reviews.llvm.org/D69833
* [JumpThreading] Factor out common code to update the SSA form (NFC)Kazu Hirata2019-11-051-75/+46
| | | | | | | | | | | | | | | Summary: This patch factors out common code to update the SSA form in JumpThreading.cpp -- partly for readability and partly to facilitate an coming patch of my own. Reviewers: wmi Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69811
OpenPOWER on IntegriCloud