summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [RISCV] Lower calls through PLTLewis Revill2019-06-183-4/+18
| | | | | | | | | | This patch adds support for generating calls through the procedure linkage table where required for a given ExternalSymbol or GlobalAddress callee. Differential Revision: https://reviews.llvm.org/D55304 llvm-svn: 363686
* AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsMatt Arsenault2019-06-186-21/+156
| | | | | | | There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678
* AMDGPU: Change API for checking for exec modificationMatt Arsenault2019-06-184-27/+55
| | | | | | | | | | | | | | | | | | Invert the name and return value to better reflect the imprecise nature. Force passing in the DefMI, since it's known in the 2 users and could possibly fail for an arbitrary vreg. Allow specifying a specific user instruction. Scan through use instructions, instead of use operands. Add scan thresholds instead of searching infinitely. Stop using a set to track seen uses. I didn't understand this usage, or why it would not check the last use. I don't think the use list has any particular order. llvm-svn: 363675
* MCContext: Delete unused functionsFangrui Song2019-06-181-8/+0
| | | | llvm-svn: 363674
* [SelectionDAG] Legalize vaargs that require vector splittingSimon Pilgrim2019-06-182-0/+24
| | | | | | | | | | This adds vector splitting for vaarg instructions during type legalization Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D60762 llvm-svn: 363671
* AMDGPU: Fold readlane from copy of SGPR or immMatt Arsenault2019-06-182-0/+42
| | | | | | These may be inserted to assert uniformity somewhere. llvm-svn: 363670
* AMDGPU: Remove unnecessary check for virtual registerMatt Arsenault2019-06-181-17/+4
| | | | | | | The copy was found by searching the uses of a virtual register, so it's already known to be virtual. llvm-svn: 363669
* AMDGPU: Fix iterator crash in AMDGPUPromoteAllocaMatt Arsenault2019-06-181-5/+9
| | | | | | The lifetime intrinsic was erased, which was the next iterator. llvm-svn: 363668
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scaleMatt Arsenault2019-06-181-0/+14
| | | | llvm-svn: 363667
* [ARM] Some Thumb2ITBlock clean ups. NFCSjoerd Meijer2019-06-182-48/+41
| | | | | | | | | Some more refactoring, like registering the IT Block pass, less cryptic variable names, and some simplification of loops. Differential Revision: https://reviews.llvm.org/D63419 llvm-svn: 363666
* [SystemZ] Fix AHIMuxK pseudo expansion.Jonas Paulsson2019-06-181-4/+6
| | | | | | | Do not emit a copy if the source and destination registers are the same. Review: Ulrich Weigand llvm-svn: 363665
* [AMDGPU] Speed up live-in virtual register set computaion in ↵Valery Pykhtin2019-06-184-5/+80
| | | | | | | | GCNScheduleDAGMILive. Differential revision: https://reviews.llvm.org/D62401 llvm-svn: 363661
* [SVE][IR] Scalable Vector IR Type with pr42210 fixGraham Hunter2019-06-189-13/+68
| | | | | | | | | | | | | | | | | | | | | Recommit of D32530 with a few small changes: - Stopped recursively walking through aggregates in the verifier, so that we don't impose too much overhead on large modules under LTO (see PR42210). - Changed tests to match; the errors are slightly different since they only report the array or struct that actually contains a scalable vector, rather than all aggregates which contain one in a nested member. - Corrected an older comment Reviewers: thakis, rengolin, sdesmalen Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D63321 llvm-svn: 363658
* [X86] Replace any_extend* vector extensions with zero_extend* equivalentsSimon Pilgrim2019-06-183-84/+53
| | | | | | | | | | First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal. In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep...... Differential Revision: https://reviews.llvm.org/D63326 llvm-svn: 363655
* [SimplifyCFG] NFC, prof branch_weighs handling is simplifiedYevgeny Rouban2019-06-181-49/+15
| | | | | | | | | Using the new SwitchInstProfUpdateWrapper this patch simplifies 3 places of prof branch_weights handling. Differential Revision: https://reviews.llvm.org/D62123 llvm-svn: 363652
* [X86] Move code that shrinks immediates for ((x << C1) op C2) into a helper ↵Craig Topper2019-06-181-108/+118
| | | | | | | | function. NFCI Preliminary step for D59909 llvm-svn: 363645
* [X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly ↵Craig Topper2019-06-183-64/+12
| | | | | | | | | | | | instructions. The isel patterns for these use a bitcast and load/store, but DAG combine should have canonicalized those away. For the purposes of the memory folding table these opcodes can be replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes. llvm-svn: 363644
* [X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class.Craig Topper2019-06-187-58/+129
| | | | | | | | | | | | | | | | | | | | | | Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643
* GlobalISel: Remove redundant pass initializationTom Stellard2019-06-185-13/+4
| | | | | | | | | | | | | | | | | | | Summary: All the GlobalISel passes are initialized when the target calls initializeGlobalISel(), so we don't need to call the initializers from the pass constructors. Reviewers: qcolombet, t.p.northover, paquette, dsanders, aemerson, aditya_nandakumar Reviewed By: aemerson Subscribers: rovka, kristof.beyls, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63235 llvm-svn: 363642
* GlobalISel: Use the original flags when lowering fneg to fsubMatt Arsenault2019-06-171-2/+1
| | | | | | | | | | This was ignoring the flag on fneg, and using the source instruction's flags. Also fixes tests missing from r358702. Note the expansion itself isn't correct without nnan, but that should be fixed separately. llvm-svn: 363637
* hwasan: Use bits [3..11) of the ring buffer entry address as the base stack tag.Peter Collingbourne2019-06-171-18/+32
| | | | | | | | | | | | | | This saves roughly 32 bytes of instructions per function with stack objects and causes us to preserve enough information that we can recover the original tags of all stack variables. Now that stack tags are deterministic, we no longer need to pass -hwasan-generate-tags-with-calls during check-hwasan. This also means that the new stack tag generation mechanism is exercised by check-hwasan. Differential Revision: https://reviews.llvm.org/D63360 llvm-svn: 363636
* hwasan: Add a tag_offset DWARF attribute to instrumented stack variables.Peter Collingbourne2019-06-176-4/+36
| | | | | | | | | | | | | | | | | | | | | | | The goal is to improve hwasan's error reporting for stack use-after-return by recording enough information to allow the specific variable that was accessed to be identified based on the pointer's tag. Currently we record the PC and lower bits of SP for each stack frame we create (which will eventually be enough to derive the base tag used by the stack frame) but that's not enough to determine the specific tag for each variable, which is the stack frame's base tag XOR a value (the "tag offset") that is unique for each variable in a function. In IR, the tag offset is most naturally represented as part of a location expression on the llvm.dbg.declare instruction. However, the presence of the tag offset in the variable's actual location expression is likely to confuse debuggers which won't know about tag offsets, and moreover the tag offset is not required for a debugger to determine the location of the variable on the stack, so at the DWARF level it is represented as an attribute so that it will be ignored by debuggers that don't know about it. Differential Revision: https://reviews.llvm.org/D63119 llvm-svn: 363635
* [GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra ↵Amara Emerson2019-06-173-61/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | block. Inter-block localization is the same as what currently happens, except now it only runs on the entry block because that's where the problematic constants with long live ranges come from. The second phase is a new intra-block localization phase which attempts to re-sink the already localized instructions further right before one of the multiple uses. One additional change is to also localize G_GLOBAL_VALUE as they're constants too. However, on some targets like arm64 it takes multiple instructions to materialize the value, so some additional heuristics with a TTI hook have been introduced attempt to prevent code size regressions when localizing these. Overall, these changes improve CTMark code size on arm64 by 1.2%. Full code size results: Program baseline new diff ------------------------------------------------------------------------------ test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6% test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6% test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4% test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2% test-suite :: CTMark/lencod/lencod.test 1340592 1324200 -1.2% test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9% test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0% test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0% Geomean difference -1.2% Differential Revision: https://reviews.llvm.org/D63303 llvm-svn: 363632
* Propagate fmf in IRTranslate for fnegMichael Berg2019-06-171-8/+17
| | | | | | | | | | | | | | Summary: This case is related to D63405 in that we need to be propagating FMF on negates. Reviewers: volkan, spatel, arsenm Reviewed By: arsenm Subscribers: wdng, javed.absar Differential Revision: https://reviews.llvm.org/D63458 llvm-svn: 363631
* Use VR128X instead of FR32X/FR64X for the register class in ↵Craig Topper2019-06-171-5/+5
| | | | | | | | VMOVSSZmrk/VMOVSDZmrk. Removes COPY_TO_REGCLASS from some patterns. llvm-svn: 363630
* [X86] Make an assert in LowerSCALAR_TO_VECTOR stricter to make it clear what ↵Craig Topper2019-06-171-1/+2
| | | | | | | | types are allowed here. NFC Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code. llvm-svn: 363629
* [AMDGPU] Use custom inserter for gfx10 VOP2bStanislav Mekhanoshin2019-06-171-1/+3
| | | | | | | | | This is part of the approved D63204 pending parent revision. This small change is in fact a part of the VOP2b legalization which does not technically belong to wave32 support, so extracted separately. llvm-svn: 363625
* Teach getSCEVAtScope how to handle loop phis w/invariant operands in loops ↵Philip Reames2019-06-172-20/+31
| | | | | | | | | | | | w/taken backedges This patch really contains two pieces: Teach SCEV how to fold a phi in the header of a loop to the value on the backedge when a) the backedge is known to execute at least once, and b) the value is safe to use globally within the scope dominated by the original phi. Teach IndVarSimplify's rewriteLoopExitValues to allow loop invariant expressions which already exist (and thus don't need new computation inserted) even in loops where we can't optimize away other uses. Differential Revision: https://reviews.llvm.org/D63224 llvm-svn: 363619
* [globalisel] Fix iterator invalidation in the extload combinesDaniel Sanders2019-06-171-54/+38
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Change the way we deal with iterator invalidation in the extload combines as it was still possible to neglect to visit a use. Even worse, it happened in the in-tree test cases and the checks weren't good enough to detect it. We now take a cheap copy of the use list before iterating over it. This prevents iterator invalidation from occurring and has the nice side effect of making the existing schedule-for-erase/schedule-for-insert mechanism moot. Reviewers: aditya_nandakumar Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61813 llvm-svn: 363616
* [AMDGPU] Propagate function attributes thru bitcastsStanislav Mekhanoshin2019-06-171-3/+4
| | | | | | | | | AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 llvm-svn: 363614
* Fix a bug w/inbounds invalidation in LFTR (recommit)Philip Reames2019-06-171-11/+85
| | | | | | | | | | | | | | | | | | Recommit r363289 with a bug fix for crash identified in pr42279. Issue was that a loop exit test does not have to be an icmp, leading to a null dereference crash when new logic was exercised for that case. Test case previously committed in r363601. Original commit comment follows: This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV. The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program. As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well. (Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.) Differential Revision: https://reviews.llvm.org/D62939 llvm-svn: 363613
* AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printerNicolai Haehnle2019-06-171-1/+7
| | | | | | | | | | | | | | | | | | | | | | Summary: The purpose of the padding is to guard against stale code being fetched into the instruction cache by the lowest level prefetching. We're generating relocatable ELF here, and so the padding should arguably be added by the linker. This is in fact what Mesa does. This also fixes multi-part shaders for Mesa. Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82 Reviewers: arsenm, rampitec, t-tye Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63427 llvm-svn: 363602
* [EarlyCSE] Fix hashing of self-comparesJoseph Tremoulet2019-06-171-2/+7
| | | | | | | | | | | | | | | | | | | | | | | Summary: Update compare normalization in SimpleValue hashing to break ties (when the same value is being compared to itself) by switching to the swapped predicate if it has a lower numerical value. This brings the hashing in line with isEqual, which already recognizes the self-compares with swapped predicates as equal. Fixes PR 42280. Reviewers: spatel, efriedma, nikic, fhahn, uabelho Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63349 llvm-svn: 363598
* [MemorySSA] Don't use template when the clone is a simplified instruction.Alina Sbirlea2019-06-171-3/+11
| | | | | | | | | | | | | | | | | | | Summary: LoopRotate doesn't create a faithful clone of an instruction, it may simplify it beforehand. Hence the clone of an instruction that has a MemoryDef associated may not be a definition, but a use or not a memory alternig instruction. Don't rely on the template when the clone may be simplified. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63355 llvm-svn: 363597
* [GlobalISel][AArch64] Fold G_SUB into G_ICMP when it's safe to do soJessica Paquette2019-06-171-16/+144
| | | | | | | | | | | | | | | | | | | | | | | | | Basically porting over the behaviour in AArch64ISelLowering to GISel. See emitComparison for reference. When we have something like this: ``` lhs = G_SUB 0, y ... G_ICMP lhs, rhs ``` We can fold away the G_SUB and produce a cmn instead, given that we produce the same value in NZCV. Add a test showing that the transformation works, and also showing that we don't perform the transformation when it's unsafe. Also factor out the CSet emission into emitCSetForICMP. Differential Revision: https://reviews.llvm.org/D63163 llvm-svn: 363596
* [X86] Add TB_NO_REVERSE to some memory folding table entries where the ↵Craig Topper2019-06-171-3/+3
| | | | | | | | | | | | | register form requires 64-bit mode, but the memory form does not. We don't know if its safe to unfold if we're in 32-bit mode. This is simlar to what was done to some load opcodes in r363523. I think its pretty unlikely we will try to unfold these anyway so I don't think this is testable. llvm-svn: 363595
* [X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026)Simon Pilgrim2019-06-171-0/+45
| | | | | | If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64). llvm-svn: 363592
* [MemorySSA] Add all MemoryPhis before filling their values.Alina Sbirlea2019-06-171-3/+13
| | | | | | | | | | | | | | | | | | Summary: Add all MemoryPhis in IDF before filling in their incomign values. Otherwise, a new Phi can be added that needs to become the incoming value of another Phi. Test fails the verification in verifyPrevDefInPhis. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63353 llvm-svn: 363590
* [AMDGPU] gfx1010 wavefrontsize intrinsic foldingStanislav Mekhanoshin2019-06-173-16/+59
| | | | | | Differential Revision: https://reviews.llvm.org/D63206 llvm-svn: 363588
* AMDGPU: Fold readlane/readfirstlane callsMatt Arsenault2019-06-171-0/+24
| | | | llvm-svn: 363587
* [AMDGPU] Pass to propagate ABI attributes from kernels to the functionsStanislav Mekhanoshin2019-06-174-4/+356
| | | | | | | | | | | | | | | | The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 llvm-svn: 363586
* [X86][AVX] Split under-aligned vector nt-stores.Simon Pilgrim2019-06-171-2/+13
| | | | | | If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it. llvm-svn: 363582
* [LV] Suppress vectorization in some nontemporal casesWarren Ristow2019-06-175-1/+80
| | | | | | | | | | | | | | | | | | | | | When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type *DataType, unsigned Alignment) const; bool isLegalNTLoad(Type *DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581
* GlobalISel: Ignore callsite attributes when picking intrinsic typeMatt Arsenault2019-06-171-1/+3
| | | | | | | | | | | A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. I fixed the same bug in SelectionDAG in r287593. llvm-svn: 363580
* GlobalISel: Verify intrinsicsMatt Arsenault2019-06-172-26/+63
| | | | | | | | I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579
* AMDGPU/GlobalISel: Account for multiple defs when finding intrinsic IDMatt Arsenault2019-06-171-2/+1
| | | | llvm-svn: 363578
* [AMDGPU] gfx1010 wave32 metadataStanislav Mekhanoshin2019-06-178-2/+85
| | | | | | Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577
* AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECTTom Stellard2019-06-173-1/+195
| | | | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60640 llvm-svn: 363576
* [Remarks] Extend -fsave-optimization-record to specify the formatFrancis Visoiu Mistrih2019-06-175-13/+51
| | | | | | | | | Use -fsave-optimization-record=<format> to specify a different format than the default, which is YAML. For now, only YAML is supported. llvm-svn: 363573
* [X86] combineLoad - begun making the load split code more generic. NFCI.Simon Pilgrim2019-06-171-13/+12
| | | | | | | | This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment. This is necessary for an upcoming patch to split under-aligned non-temporal vector loads. llvm-svn: 363570
OpenPOWER on IntegriCloud