summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][AVX] lowerShuffleAsLanePermuteAndShuffle - consistently normalize ↵Simon Pilgrim2020-01-101-2/+2
| | | | | | | | multi-input shuffle elements We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths. Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.
* [BPF] extend BTF_KIND_FUNC to cover global, static and extern funcsYonghong Song2020-01-103-29/+39
| | | | | | | | | | | | | | Previously extern function is added as BTF_KIND_VAR. This does not work well with existing BTF infrastructure as function expected to use BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO. This patch added extern function to BTF_KIND_FUNC. The two bits 0:1 of btf_type.info are used to indicate what kind of function it is: 0: static 1: global 2: extern Differential Revision: https://reviews.llvm.org/D71638
* Add support for __declspec(guard(nocf))Andrew Paverd2020-01-101-6/+4
| | | | | | | | | | | | | | | | | | Summary: Avoid using the `nocf_check` attribute with Control Flow Guard. Instead, use a new `"guard_nocf"` function attribute to indicate that checks should not be added on indirect calls within that function. Add support for `__declspec(guard(nocf))` following the same syntax as MSVC. Reviewers: rnk, dmajor, pcc, hans, aaron.ballman Reviewed By: aaron.ballman Subscribers: aaron.ballman, tomrittervg, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72167
* [PowerPC] Handle constant zero bits in BitPermutationSelectorNemanja Ivanovic2020-01-101-4/+7
| | | | | | | | | | We currently crash when analyzing an AssertZExt node that has some bits that are constant zeros (i.e. as a result of an and with a constant). This issue was reported in https://bugs.llvm.org/show_bug.cgi?id=41088 and this patch fixes that. Differential revision: https://reviews.llvm.org/D72038
* [DebugInfo][NFC] Remove unused variable/fix variable namingJames Henderson2020-01-101-1181/+1179
| | | | | | Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D72159
* [DebugInfo] Improve error message textJames Henderson2020-01-101-3/+5
| | | | | | | | | | Unlike most of our errors in the debug line parser, the "no end of sequence" message was missing any reference to which line table it refererred to. This change adds the offset to this message. Reviewed by: dblaikie Differential Revision: https://reviews.llvm.org/D72443
* AMDGPU/GlobalISel: Clamp G_ZEXT source sizesMatt Arsenault2020-01-101-2/+3
| | | | | Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so fewer cases actually work.
* [FPEnv] Invert sense of MIFlag::FPExcept flagUlrich Weigand2020-01-107-14/+14
| | | | | | | | | | | | | | | | | | In D71841 we inverted the sense of the SDNode-level flag to ensure all nodes default to potentially raising FP exceptions unless otherwise specified -- i.e. if we forget to propagate the flag somewhere, the effect is now only lost performance, not incorrect code. However, the related flag at the MI level still defaults to nodes not raising FP exceptions unless otherwise specified. To be fully on the (conservatively) safe side, we should invert that flag as well. This patch does so by replacing MIFlag::FPExcept with MIFlag::NoFPExcept. (Note that this does also introduce an incompatible change in the MIR format.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D72466
* [ARM][MVE] Tail predicate VMAX,VMAXA,VMIN,VMINASam Parker2020-01-101-0/+2
| | | | | | | Add the MVE min and max instructions to our tail predication whitelist. Differential Revision: https://reviews.llvm.org/D72502
* ARMLowOverheadLoops: a few more dbg msgs to better trace rejected TP loops. NFC.Sjoerd Meijer2020-01-101-7/+16
|
* Reverting, broke some bots. Need further investigation.Diogo Sampaio2020-01-107-336/+85
| | | | | | | | Summary: This reverts commit 8c12769f3046029e2a9b4e48e1645b1a77d28650. Reviewers: Subscribers:
* [Support] ThreadPoolExecutor fixes for Windows/MinGWAndrew Ng2020-01-101-18/+64
| | | | | | | | | | | | | | | | | | | | | Changed ThreadPoolExecutor to no longer use detached threads and instead to join threads on destruction. This is to prevent intermittent crashing on Windows when doing a normal full exit, e.g. via exit(). Changed ThreadPoolExecutor to be a ManagedStatic so that it can be stopped on llvm_shutdown(). Without this, it would only be stopped in the destructor when doing a full exit. This is required to avoid intermittent crashing on Windows due to a race condition between the ThreadPoolExecutor starting up threads and the process doing a fast exit, e.g. via _exit(). The Windows crashes appear to only occur with the MSVC static runtimes and are more frequent with the debug static runtime. These changes also prevent intermittent deadlocks on exit with the MinGW runtime. Differential Revision: https://reviews.llvm.org/D70447
* [ARM][Thumb2] Fix ADD/SUB invalid writes to SPDiogo Sampaio2020-01-107-85/+336
| | | | | | | | | | | | | | | | | | | | Summary: This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80". The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable. To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP). Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations. When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant ) It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt). Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma Reviewed By: efriedma Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70680
* Fix "pointer is null" static analyzer warnings. NFCI.Simon Pilgrim2020-01-101-0/+1
| | | | Assert that the pointers are non-null before dereferencing them.
* Fix Wdocumentation warning. NFCI.Simon Pilgrim2020-01-101-2/+0
|
* Don't use dyn_cast_or_null if we know the pointer is nonnull.Simon Pilgrim2020-01-101-4/+2
| | | | Fix clang static analyzer null dereference warning by using dyn_cast instead.
* [LV] Silence unused variable warning in Release builds. NFC.Benjamin Kramer2020-01-101-0/+1
|
* [MIR] Fix cyclic dependency of MIR formatterPeng Guo2020-01-106-24/+18
| | | | | | | | | | | | | | Summary: Move MIR formatter pointer from TargetMachine to TargetInstrInfo to avoid cyclic dependency between target & codegen. Reviewers: dsanders, bkramer, arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72485
* [SVEV] Recognise hardware-loop intrinsic loop.decrement.regSjoerd Meijer2020-01-101-0/+11
| | | | | | | | | | | | | | | Teach SCEV about the @loop.decrement.reg intrinsic, which has exactly the same semantics as a sub expression. This allows us to query hardware-loops, which contain this @loop.decrement.reg intrinsic, so that we can calculate iteration counts, exit values, etc. of hardwareloops. This "int_loop_decrement_reg" intrinsic is defined as "IntrNoDuplicate". Thus, while hardware-loops and tripcounts now become analysable by SCEV, this prevents the usual loop transformations from applying transformations on hardware-loops, which is what we want at this point, for which I have added test cases for loopunrolling and IndVarSimplify and LFTR. Differential Revision: https://reviews.llvm.org/D71563
* [NFC] [PowerPC] Add isPredicable for basic instrsQiu Chaofan2020-01-104-34/+21
| | | | | | | | | PowerPC uses a dedicated method to check if the machine instr is predicable by opcode. However, there's a bit `isPredicable` in instr definition. This patch removes the method and set the bit only to opcodes referenced in it. Differential Revision: https://reviews.llvm.org/D71921
* [LV] VPValues for memory operation pointers (NFCI)Gil Rapaport2020-01-105-104/+141
| | | | | | | | | | | | Memory instruction widening recipes use the pointer operand of their load/store ingredient for generating the needed GEPs, making it difficult to feed these recipes with pointers based on other ingredients or none at all. This patch modifies these recipes to use a VPValue for the pointer instead, in order to reduce ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. The recipes are constructed with VPValues bound to these ingredients, maintaining current behavior. Differential revision: https://reviews.llvm.org/D70865
* [ThinLTO] Pass CodeGenOpts like UnrollLoops/VectorizeLoop/VectorizeSLPWei Mi2020-01-092-1/+7
| | | | | | | | | | | | | down to pass builder in ltobackend. Currently CodeGenOpts like UnrollLoops/VectorizeLoop/VectorizeSLP in clang are not passed down to pass builder in ltobackend when new pass manager is used. This is inconsistent with the behavior when new pass manager is used and thinlto is not used. Such inconsistency causes slp vectorization pass not being enabled in ltobackend for O3 + thinlto right now. This patch fixes that. Differential Revision: https://reviews.llvm.org/D72386
* [NFC] Style cleanupShengchen Kan2020-01-101-9/+10
|
* AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELTMatt Arsenault2020-01-095-10/+89
| | | | | Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.
* AMDGPU/GlobalISel: Fix G_EXTRACT_VECTOR_ELT mapping for s-v caseMatt Arsenault2020-01-092-16/+87
| | | | | | If an SGPR vector is indexed with a VGPR, the actual indexing will be done on the SGPR and produce an SGPR. A copy needs to be inserted inside the waterwall loop to the VGPR result.
* [AMDGPU] Fix bundle schedulingStanislav Mekhanoshin2020-01-093-0/+61
| | | | | | | Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487
* AVR: Update for getRegisterByName changeMatt Arsenault2020-01-092-3/+3
|
* TableGen/GlobalISel: Add way for SDNodeXForm to work on timmMatt Arsenault2020-01-097-31/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.
* GlobalISel: Handle llvm.read_registerMatt Arsenault2020-01-093-0/+30
| | | | | Compared to the attempt in bdcc6d3d2638b3a2c99ab3b9bfaa9c02e584993a, this uses intermediate generic instructions.
* DAG: Don't use unchecked dyn_castMatt Arsenault2020-01-091-4/+4
|
* GlobalISel: Fix else after returnMatt Arsenault2020-01-091-3/+9
|
* CodeGen: Use LLT instead of EVT in getRegisterByNameMatt Arsenault2020-01-0921-26/+31
| | | | | | Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.
* [AArch64][GlobalISel] Implement selection of <2 x float> vector splat.Amara Emerson2020-01-092-7/+36
| | | | | | Also requires making G_IMPLICIT_DEF of v2s32 legal. Differential Revision: https://reviews.llvm.org/D72422
* GlobalISel: Move getLLTForMVT/getMVTForLLTMatt Arsenault2020-01-092-17/+17
| | | | | | As an intermediate step, some TLI functions can be converted to using LLT instead of MVT. Move this somewhere out of GlobalISel so DAG functions can use these.
* GlobalISel: Don't assert on MoreElements creating vectorsMatt Arsenault2020-01-091-5/+7
| | | | | | | If the original type was a scalar, it should be valid to add elements to turn it into a vector. Tests included with following legalization change.
* AMDGPU/GlobalISel: Fix argument lowering for vectors of pointersMatt Arsenault2020-01-091-2/+18
| | | | | | | When these arguments are broken down by the EVT based callbacks, the pointer information is lost. Hack around this by coercing the register types to be the expected pointer element type when building the remerge operations.
* AMDGPU/GlobalISel: Widen 16-bit shift amount sourcesMatt Arsenault2020-01-091-1/+2
| | | | | | This should be legal, but will require future selection work. 16-bit shift amounts were already removed from being legal, but this didn't adjust the transformation rules.
* MipsDelaySlotFiller: Update registers def-uses for BUNDLE instructionsAlex Richardson2020-01-091-2/+10
| | | | | | | | | | | | | | | Summary: In commit b91f239485fb7bb8d29be3e0b60660a2de7570a9 I updated the MipsDelaySlotFiller to skip BUNDLE instructions. However, in addition to not considering BUNDLE instructions for the delay slot, we also need to ensure that the register def-use information is updated. Not updating this information caused run-time crashes (when using the out-of-tree CHERI backend) since later definitions could be overwritten with earlier register values. Reviewers: atanasyan Reviewed By: atanasyan Differential Revision: https://reviews.llvm.org/D72254
* [GlobalISel][AArch64] Import + select LDR*roW and STR*roW patternsJessica Paquette2020-01-092-46/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for selecting a large chunk of the load/store *roW patterns. This is pretty much a straight port of AArch64DAGToDAGISel::SelectAddrModeWRO into GISel. The code is very similar to the XRO code. The main difference is that in the *roW patterns, we want to try and fold in an extend, and *possibly* a shift along with it. A good portion of this patch is refactoring the existing XRO code. - Add selectAddrModeWRO - Factor out the code from selectAddrModeShiftedExtendXReg which is used by both selectAddrModeXRO and selectAddrModeWRO into selectExtendedSHL. This is similar to the function of the same name in AArch64DAGToDAGISel. - Add support for extends to the factored out code in selectExtendedSHL. - Teach getExtendTypeForInst how to handle AND masks that are intended to be used in loads/stores (necessary for this addressing mode.) - Make getExtendTypeForInst not static because moving it made an annoying diff and I wanted to have the WRO/XRO functions close to each other while I was writing the code. Differential Revision: https://reviews.llvm.org/D72426
* [ms] [X86] Use "P" modifier on all branch-target operands in inline X86 ↵Eric Astor2020-01-095-51/+32
| | | | | | | | | | | | | | | | | | | assembly. Summary: Extend D71677 to apply to all branch-target operands, rather than special-casing call instructions. Also add a regression test for llvm.org/PR44272, since this finishes fixing it. Reviewers: thakis, rnk Reviewed By: thakis Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72417
* [X86] AMD Znver2 (Rome) Scheduler enablementGanesh Gopalasubramanian2020-01-103-2/+1551
| | | | | | | | | | | | The patch gives out the details of the znver2 scheduler model. There are few improvements with respect to execution units, latencies and throughput when compared with znver1. The tests that were present for znver1 for llvm-mca tool were replicated. The latencies, execution units, timeline and throughput information are updated for znver2. Reviewers: craig.topper, Simon Pilgrim Differential Revision: https://reviews.llvm.org/D66088
* [PowerPC] The VK_PLT symbolref modifier is only used on 32-bit ELF. [NFC]Sean Fertile2020-01-091-2/+2
| | | | | | | | | Fix a conditional that guarded code for execution only on 32-bit ELF by checking that the Subtarget was not 64-bit and not-Darwin. By adding a new target ABI (AIX), the condition is no longer correct. This code is dead for AIX, due to a 'report_fatal_error' for thread local storage usage earlier in the pipeline, but needs to be modifed as part of Darwins removal from the PowerPC backend.
* [TargetLowering][X86] TeachSimplifyDemandedBits to handle cases where only ↵Craig Topper2020-01-091-0/+21
| | | | | | | | the sign bit is demanded from a SETCC and can be passed through If we're doing a compare that only tests the sign bit and only the sign bit is demanded, we can just bypass the node. This removes one of the blend dependencies in our v2i64->v2f32 uint_to_fp codegen on pre-sse4.2 targets. Differential Revision: https://reviews.llvm.org/D72356
* [SystemZ] Fix matching another pattern for nxgrk (PR44496)Ulrich Weigand2020-01-091-2/+3
| | | | | | | | SystemZDAGToDAGISel::Select will attempt to split logical instruction with a large immediate constant. This must not happen if the result matches one of the z15 combined operations, so the code checks for those. However, one of them was missed, causing invalid code to be generated in the test case for PR44496.
* [Support][NFC] Make some helper functions "static" in Memory.incBruno Ricci2020-01-092-11/+3
|
* [NFC,format] Sort switch cases alphabeticallySimon Moll2020-01-091-133/+133
| | | | | | | | | This patch brings the switch cases of `llvm/lib/Support/Triple.cpp` back into alphabetical order. This was noted during the the review of https://reviews.llvm.org/D69103 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D72452
* [NFCI][LoopUnrollAndJam] Changing LoopUnrollAndJamPass to a functionWhitney Tsang2020-01-093-45/+64
| | | | | | | | | | | | | | | | | | | pass. Summary: This patch changes LoopUnrollAndJamPass to a function pass, and keeps the loops traversal order same as defined in FunctionToLoopPassAdaptor LoopPassManager.h. The next patch will change the loop traversal to outer to inner order, so more loops can be transform. Discussion in llvm-dev mailing list: https://groups.google.com/forum/#!topic/llvm-dev/LF4rUjkVI2g Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto Reviewed By: dmgreen Subscribers: hiraditya, zzheng, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D72230
* [InstCombine] Z / (1.0 / Y) => (Y * Z)@raghesh (Raghesh Aloor)2020-01-091-0/+8
| | | | | | | | | This is a special case of Z / (X / Y) => (Y * Z) / X, with X = 1.0. The m_OneUse check is avoided because even in the case of the multiple uses for 1.0/Y, the number of instructions remain the same and a division is replaced by a multiplication. Differential Revision: https://reviews.llvm.org/D72319
* AMDGPU/GlobalISel: Fix import of integer med3Matt Arsenault2020-01-092-32/+30
| | | | | This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.
* AMDGPU: Eliminate more legacy codepred address space PatFragsMatt Arsenault2020-01-094-93/+24
| | | | These should now be limited to R600 code.
OpenPOWER on IntegriCloud