summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [Attributor] Create helper struct for handling analysis gettersHideto Ueno2019-09-171-18/+8
| | | | | | | | | | | | | | | | Summary: This patch introduces a helper struct `AnalysisGetter` to put together analysis getters. In this patch, a getter for `AAResult` is also added for `noalias`. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67603 llvm-svn: 372072
* [X86] Split oversized vXi1 vector arguments and return values into scalars ↵Craig Topper2019-09-172-0/+34
| | | | | | | | | | | | | | | | | on avx512 targets. Previously we tried to split them into narrower v64i1 or v16i1 pieces that each got promoted to vXi8 and then passed in a zmm or xmm register. But this crashes when you need to pass more pieces than available registers reserved for argument passing. The scalarizing done here generates much longer and slower code, but is consistent with the behavior of avx2 and earlier targets for these types. Fixes PR43323. llvm-svn: 372069
* [X86] Allow masked VBROADCAST instructions to be turned into BLENDM with a ↵Craig Topper2019-09-172-78/+158
| | | | | | | | | | broadcast load to avoid a copy. The BLENDM instructions allow an 2 sources and an independent destination while masked VBROADCAST has the destination tied to the source. llvm-svn: 372068
* [X86] Add support for commuting EVEX VCMP instructons with any immediate value.Craig Topper2019-09-171-6/+33
| | | | | | Previously we limited to the EQ/NE/TRUE/FALSE/ORD/UNORD immediates. llvm-svn: 372067
* [X86] Enable commuting of EVEX VCMP for all immediate values during isel.Craig Topper2019-09-173-14/+39
| | | | llvm-svn: 372065
* [GlobalISel] Partially revert r371901.Amara Emerson2019-09-161-2/+2
| | | | | | | | | | | r371901 was overeager and widenScalarDst() and the like in the legalizer attempt to increment the insert point given in order to add new instructions after the currently legalizing inst. In cases where the insertion point is not exactly the current instruction, then callers need to de-compensate for the behaviour by decrementing the insertion iterator before calling them. It's not a nice state of affairs, for now just undo the problematic parts of the change. llvm-svn: 372050
* [PowerPC] Cust lower fpext v2f32 to v2f64 from extract_subvector v4f32Nemanja Ivanovic2019-09-163-24/+49
| | | | | | | | | | | | | Add the missing piece of r372029. Somehow when the patch for review D61961 was committed, only the test case went in and the code didn't. This of course caused all kinds of build bot breaks. This patch just adds the code for that patch. Author: Lei Huang Differential revision: https://reviews.llvm.org/D61961 llvm-svn: 372043
* [Remarks] Allow remarks::Format::YAML to take a string tableFrancis Visoiu Mistrih2019-09-164-26/+41
| | | | | | | It should be allowed to take a string table in case all the strings in the remarks point there, but it shouldn't use it during serialization. llvm-svn: 372042
* [Coverage] Speed up file-based queries for coverage info, NFCVedant Kumar2019-09-161-2/+35
| | | | | | | | | | | | | | Speed up queries for coverage info in a file by reducing the amount of time spent determining whether a function record corresponds to a file. This gives a 36% speedup when generating a coverage report for `llc`. The reduction is entirely in user time. rdar://54758110 Differential Revision: https://reviews.llvm.org/D67575 llvm-svn: 372025
* [Coverage] Assert that filenames in a TU are unique, NFCVedant Kumar2019-09-161-0/+10
| | | | llvm-svn: 372024
* [LTO][Legacy] Add new C inferface to query libcall functionsSteven Wu2019-09-161-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is needed to implemented the same approach as lld (implemented in r338434) for how to handling symbols that can be generated by LTO code generator but not present in the symbol table for linker that uses legacy C APIs. libLTO is in charge of providing the list of symbols. Linker is in charge of implementing the eager loading from static libraries using the list of symbols. rdar://problem/52853974 Reviewers: tejohnson, bd1976llvm, deadalnix, espindola Reviewed By: tejohnson Subscribers: emaste, arichardson, hiraditya, MaskRay, dang, kledzik, mehdi_amini, inglorion, jkorous, dexonsmith, ributzka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67568 llvm-svn: 372021
* [PGO] Use linkonce_odr linkage for __profd_ variables in comdat groupsReid Kleckner2019-09-161-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes relocations against __profd_ symbols in discarded sections, which is PR41380. In general, instrumentation happens very early, and optimization and inlining happens afterwards. The counters for a function are calculated early, and after inlining, counters for an inlined function may be widely referenced by other functions. For C++ inline functions of all kinds (linkonce_odr & available_externally mainly), instr profiling wants to deduplicate these __profc_ and __profd_ globals. Otherwise the binary would be quite large. I made __profd_ and __profc_ comdat in r355044, but I chose to make __profd_ internal. At the time, I was only dealing with coverage, and in that case, none of the instrumentation needs to reference __profd_. However, if you use PGO, then instrumentation passes add calls to __llvm_profile_instrument_range which reference __profd_ globals. The solution is to make these globals externally visible by using linkonce_odr linkage for data as was done for counters. This is safe because PGO adds a CFG hash to the names of the data and counter globals, so if different TUs have different globals, they will get different data and counter arrays. Reviewers: xur, hans Differential Revision: https://reviews.llvm.org/D67579 llvm-svn: 372020
* [X86][AVX] matchShuffleWithSHUFPD - add support for zeroable operandsSimon Pilgrim2019-09-161-15/+41
| | | | | | Determine if all of the uses of LHS/RHS operands can be replaced with a zero vector. llvm-svn: 372013
* [ARM] A predicate cast of a predicate cast is a predicate castDavid Green2019-09-161-0/+20
| | | | | | | | | | | The adds some very basic folding of PREDICATE_CASTS, removing cases when they are chained together. These would already be removed eventually, as these are lowered to copies. This just allows it to happen earlier, which can help other simplifications. Differential Revision: https://reviews.llvm.org/D67591 llvm-svn: 372012
* [SimplifyCFG] FoldTwoEntryPHINode(): consider *total* speculation cost, not ↵Roman Lebedev2019-09-161-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | per-BB cost Summary: Previously, if the threshold was 2, we were willing to speculatively execute 2 cheap instructions in both basic blocks (thus we were willing to speculatively execute cost = 4), but weren't willing to speculate when one BB had 3 instructions and other one had no instructions, even thought that would have total cost of 3. This looks inconsistent to me. I don't think `cmov`-like instructions will start executing until both of it's inputs are available: https://godbolt.org/z/zgHePf So i don't see why the existing behavior is the correct one. Also, let's add it's own `cl::opt` for this threshold, with default=4, so it is not stricter than the previous threshold: will allow to fold when there are 2 BB's each with cost=2. And since the logic has changed, it will also allow to fold when one BB has cost=3 and other cost=1, or there is only one BB with cost=4. This is an alternative solution to D65148: This fix is mainly motivated by `signbit-like-value-extension.ll` test. That pattern comes up in JPEG decoding, see e.g. `Figure F.12 – Extending the sign bit of a decoded value in V` of `ITU T.81` (JPEG specification). That branch is not predictable, and it is within the innermost loop, so the fact that that pattern ends up being stuck with a branch instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial. This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240) | metric | old | new | delta | % | | x86-mi-counting.NumMachineFunctions | 37720 | 37721 | 1 | 0.00% | | x86-mi-counting.NumMachineBasicBlocks | 773545 | 771181 | -2364 | -0.31% | | x86-mi-counting.NumMachineInstructions | 7488843 | 7486442 | -2401 | -0.03% | | x86-mi-counting.NumUncondBR | 135770 | 135543 | -227 | -0.17% | | x86-mi-counting.NumCondBR | 423753 | 422187 | -1566 | -0.37% | | x86-mi-counting.NumCMOV | 24815 | 25731 | 916 | 3.69% | | x86-mi-counting.NumVecBlend | 17 | 17 | 0 | 0.00% | We significantly decrease basic block count, notably decrease instruction count, significantly decrease branch count and very significantly increase `cmov` count. Performance-wise, unsurprisingly, this has great effect on target RawSpeed benchmark. I'm seeing 5 **major** improvements: ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean -0.3064 -0.3064 226.9913 157.4452 226.9800 157.4384 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median -0.3057 -0.3057 226.8407 157.4926 226.8282 157.4828 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev -0.4985 -0.4954 0.3051 0.1530 0.3040 0.1534 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean -0.1747 -0.1747 80.4787 66.4227 80.4771 66.4146 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median -0.1742 -0.1743 80.4686 66.4542 80.4690 66.4436 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev +0.6089 +0.5797 0.0670 0.1078 0.0673 0.1062 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean -0.1598 -0.1598 171.6996 144.2575 171.6915 144.2538 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median -0.1598 -0.1597 171.7109 144.2755 171.7018 144.2766 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev +0.4024 +0.3850 0.0847 0.1187 0.0848 0.1175 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean -0.0550 -0.0551 280.3046 264.8800 280.3017 264.8559 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median -0.0554 -0.0554 280.2628 264.7360 280.2574 264.7297 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev +0.7005 +0.7041 0.2779 0.4725 0.2775 0.4729 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean -0.0354 -0.0355 316.7396 305.5208 316.7342 305.4890 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median -0.0354 -0.0356 316.6969 305.4798 316.6917 305.4324 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev +0.0493 +0.0330 0.3562 0.3737 0.3563 0.3681 ``` That being said, it's always best-effort, so there will likely be cases where this worsens things. Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67318 llvm-svn: 372009
* [InstCombine] remove unneeded one-use checks for icmp foldSanjay Patel2019-09-161-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Related folds were added in: rL125734 ...the code comment about register pressure is discussed in more detail in: https://bugs.llvm.org/show_bug.cgi?id=2698 But 10 years later, perf testing bzip2 with this change now shows a slight (0.2% average) improvement on Haswell although that's probably within test noise. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. rL371940 and rL371981 are related patches in this series. llvm-svn: 372007
* [ARM] Add patterns for BSWAP intrinsic on MVEOliver Cruickshank2019-09-162-0/+8
| | | | | | | BSWAP can use the VREV instruction on MVE to produce better results than expanding. llvm-svn: 372002
* [ARM] Add patterns for bitreverse intrinsic on MVEOliver Cruickshank2019-09-162-0/+12
| | | | | | | BITREVERSE can use the VBRSR which will reverse and right shift. Shifting right by 0 will just reverse the bits. llvm-svn: 372001
* [ARM] Lower CTTZ on MVEOliver Cruickshank2019-09-161-2/+2
| | | | | | | Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and count the leading zeros, equivalent to a count trailing zeros (CTTZ). llvm-svn: 372000
* [ARM] Add patterns for CTLZ on MVEOliver Cruickshank2019-09-162-0/+10
| | | | | | | CTLZ intrinsic can use the VCLS instruction on MVE, which produces better results than expanding. llvm-svn: 371999
* [ExecutionEngine] Don't dereference a dyn_cast result. NFCI.Simon Pilgrim2019-09-161-2/+2
| | | | | | The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't. llvm-svn: 371998
* [SystemZ] Call erase() on the right MBB in SystemZTargetLowering::emitSelect()Jonas Paulsson2019-09-161-1/+1
| | | | | | | | | Since MBB was split *before* MI, the MI(s) will reside in JoinMBB (MBB) at the point of erasing them, so calling StartMBB->erase() is actually wrong, although it is "working" by all appearances. Review: Ulrich Weigand llvm-svn: 371995
* [NFC] remove unused functionsGuillaume Chatelet2019-09-161-8/+0
| | | | | | | | | | | | Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67616 llvm-svn: 371994
* AMDGPU/GlobalISel: Fail select of G_INSERT non-32-bit sourceMatt Arsenault2019-09-161-4/+16
| | | | | | | This was producing an illegal copy which would hit an assert later. Error on selection for now until this is implemented. llvm-svn: 371993
* AMDGPU/GlobalISel: Fix RegBankSelect for G_FRINT and G_FCEILMatt Arsenault2019-09-161-0/+2
| | | | llvm-svn: 371991
* [X86][NFC] Add a `use-aa` feature.Clement Courbet2019-09-162-0/+8
| | | | | | | | | | | | | | | | | | Summary: This allows enabling useaa on the command-line and will allow enabling the feature on a per-CPU basis where benchmarking shows improvements. This is modelled after the ARM/AArch64 target. Reviewers: RKSimon, andreadb, craig.topper Subscribers: javed.absar, kristof.beyls, hiraditya, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67266 llvm-svn: 371989
* [ARM] Fold VCMP into VPTDavid Green2019-09-162-18/+118
| | | | | | | | | | | | | | | MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in a single instruction, performing the compare and starting the VPT block in one. This teaches the MVEVPTBlockPass to fold them, searching back through the basicblock for a valid VCMP and creating the VPT from its operands. There are some changes to the VPT instructions to accommodate this, altering the order of the operands to match the VCMP better, and changing P0 register defs to be VPR defs, as is used in other places. Differential Revision: https://reviews.llvm.org/D66577 llvm-svn: 371982
* [InstCombine] remove unneeded one-use checks for icmp foldSanjay Patel2019-09-161-4/+1
| | | | | | | | | | | | | | | | | | | | This fold and several others were added in: rL125734 <https://reviews.llvm.org/rL125734> ...with no explanation for the one-use checks other than the code comments about register pressure. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. rL371940 is a related patch in this series. llvm-svn: 371981
* [InstCombine] fix comments to match code; NFCSanjay Patel2019-09-161-25/+27
| | | | | | | | | | This blob was written before match() existed, so it could probably be reduced significantly. But I suspect it isn't well tested, so tests would have to be added to reduce risk from logic changes. llvm-svn: 371978
* [VPlanSLP] Don't dereference a cast_or_null<VPInstruction> result. NFCI.Simon Pilgrim2019-09-161-5/+8
| | | | | | The static analyzer is warning about a potential null dereference of the cast_or_null result, I've split the cast_or_null check from the ->getUnderlyingInstr() call to avoid this, but it appears that we weren't seeing any null pointers in the dumped bundles in the first place. llvm-svn: 371975
* [SLPVectorizer] Assert that we find a LastInst to silence analyzer null ↵Simon Pilgrim2019-09-161-0/+1
| | | | | | dereference warning. NFCI. llvm-svn: 371974
* [SLPVectorizer] Don't dereference a dyn_cast result. NFCI.Simon Pilgrim2019-09-161-4/+4
| | | | | | The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't. llvm-svn: 371973
* [SVE][Inline-Asm] Add constraints for SVE predicate registersKerry McLaughlin2019-09-164-1/+53
| | | | | | | | | | | | | | | | | | | Summary: Adds the following inline asm constraints for SVE: - Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive - Upa: SVE predicate register with full range, P0 to P15 Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin Reviewed By: rovka Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66524 llvm-svn: 371967
* [AArch64] Some more FP16 FMA pattern matchingSjoerd Meijer2019-09-161-2/+19
| | | | | | | | | After our previous machinecombiner exercises (rL371321, rL371818, rL371833), we were still missing a few FP16 FMA patterns. Differential Revision: https://reviews.llvm.org/D67576 llvm-svn: 371960
* [SystemZ] Merge the SystemZExpandPseudo pass into SystemZPostRewrite.Jonas Paulsson2019-09-168-278/+175
| | | | | | | | | | | | | | | | | SystemZExpandPseudo:s only job was to expand LOCRMux instructions into jump sequences. This needs to be done if expandLOCRPseudo() or expandSELRPseudo() fails to find a legal opcode (all registers "high" or "low"). This task has now been moved to SystemZPostRewrite while removing the SystemZExpandPseudo pass. It is in fact preferred to expand these pseudos directly after register allocation in SystemZPostRewrite since the hinted register combinations are then not subject to later optimizations. Review: Ulrich Weigand https://reviews.llvm.org/D67432 llvm-svn: 371959
* AMDGPU/GlobalISel: Select SMRD loads for more typesMatt Arsenault2019-09-161-3/+12
| | | | llvm-svn: 371954
* AMDGPU/GlobalISel: RegBankSelect for killMatt Arsenault2019-09-161-0/+4
| | | | llvm-svn: 371953
* AMDGPU/GlobalISel: Legalize s1 source G_[SU]ITOFPMatt Arsenault2019-09-161-1/+2
| | | | llvm-svn: 371952
* AMDGPU/GlobalISel: Set type on vgpr live in special argumentsMatt Arsenault2019-09-161-1/+2
| | | | | | | Fixes assertion with workitem ID intrinsics used in non-kernel functions. llvm-svn: 371951
* AMDGPU/GlobalISel: Select S16->S32 fptointMatt Arsenault2019-09-162-3/+3
| | | | llvm-svn: 371950
* AMDGPU/GlobalISel: Select s32->s16 G_[US]ITOFPMatt Arsenault2019-09-161-2/+2
| | | | llvm-svn: 371949
* AMDGPU/GlobalISel: Fix VALU s16 fnegMatt Arsenault2019-09-161-0/+10
| | | | llvm-svn: 371948
* [Attributor] Heap-To-Stack ConversionStefan Stipanovic2019-09-151-5/+259
| | | | | | | | | | | | D53362 gives a prototype heap-to-stack conversion pass. With addition of new attributes in the attributor, this can now be revisted and improved. This will place it in the Attributor to make it easier to use new attributes (eg. nofree, nosync, willreturn, etc.) and other attributor features. Reviewers: jdoerfert, uenoku, hfinkel, efriedma Subscribers: lebedev.ri, xbolva00, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D65408 llvm-svn: 371942
* [InstCombine] remove unneeded one-use checks for icmp foldSanjay Patel2019-09-151-3/+4
| | | | | | | | | | | | | | | | | | | | | | | This fold and several others were added in: rL125734 ...with no explanation for the one-use checks other than the code comments about register pressure. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. There are similar checks as noted with the TODO comments. I'm hoping to remove those restrictions too, but if any of these does cause a regression, it should be easier to correct by making small, individual commits. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. llvm-svn: 371940
* [GlobalISel] findGISelOptimalMemOpLowering - remove dead initalization. NFCI.Simon Pilgrim2019-09-151-3/+1
| | | | | | Fixes static analyzer warning that "Value stored to 'NewTySize' during its initialization is never read". llvm-svn: 371937
* [LoadStoreVectorizer] vectorizeLoadChain - ensure we find a valid Type down ↵Simon Pilgrim2019-09-151-1/+2
| | | | | | | | the load chain. NFCI. Silence static analyzer uninitialized variable warning by setting the LoadTy to null and then asserting we find a real value. llvm-svn: 371936
* InterleavedLoadCombine - merge isa<> and dyn_cast<> duplicates. NFCI.Simon Pilgrim2019-09-151-2/+2
| | | | | | Silence static analyzer null dereference warning of *dyn_cast<BinaryOperator> by merging with the isa<BinaryOperator> above. llvm-svn: 371935
* [DebugInfo] Don't dereference a dyn_cast<PDBSymbolData> result. NFCI.Simon Pilgrim2019-09-151-1/+1
| | | | | | The static analyzer is warning about a potential null dereference - but as we're in DataMemberLayoutItem we should be able to guarantee that the Symbol is a PDBSymbolData type, allowing us to use cast<PDBSymbolData> - and if not assert will fire for us. llvm-svn: 371933
* [ARM] Masked loads and storesDavid Green2019-09-154-0/+137
| | | | | | | | | | | | | | | | Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932
* [SLP] limit vectorization of Constant subclasses (PR33958)Sanjay Patel2019-09-151-2/+5
| | | | | | | | | | | | | | This is a fix for: https://bugs.llvm.org/show_bug.cgi?id=33958 It seems universally true that we would not want to transform this kind of sequence on any target, but if that's not correct, then we could view this as a target-specific cost model problem. We could also white-list ConstantInt, ConstantFP, etc. rather than blacklist Global and ConstantExpr. Differential Revision: https://reviews.llvm.org/D67362 llvm-svn: 371931
OpenPOWER on IntegriCloud