summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Add a quick hack so clang does not use mul type instructions on mips.meklort-10.0.0Evan Lojewski2020-03-291-15/+36
|
* [MC][ARM] Resolve some pcrel fixups at assembly time (PR44929)Hans Wennborg2020-02-271-12/+10
| | | | | | | | | | | | MC currently does not emit these relocation types, and lld does not handle them. Add FKF_Constant as a work-around of some ARM code after D72197. Eventually we probably should implement these relocation types. By Fangrui Song! Differential revision: https://reviews.llvm.org/D72892 (cherry picked from commit 2e24219d3cbfcb8c824c58872f97de0a2e94a7c8)
* Don't generate libcalls for wide shift on Windows ARM (PR42711)Hans Wennborg2020-02-251-1/+1
| | | | | | | The previous patch (cff90f07cb5cc3c3bc58277926103af31caef308) didn't cover ARM. (cherry picked from commit decd021facba804b57e8d80b6159c987d3261ab8)
* [RISCV] Correct the CallPreservedMask for the function call in an interrupt ↵Shiva Chen2020-02-201-7/+0
| | | | | | | | | | | handler CallPreservedMask is used to describe the register liveness after a function call. The function call in an interrupt handler should use the same CallPreservedMask as normal functions. So that only callee save registers can live through the function call. (cherry picked from commit 1cae2f9d192c69833e22684ca338660942ab464e)
* Fix unused function warning (PR44808)Hans Wennborg2020-02-191-5/+7
| | | | (cherry picked from commit a19de32095e4cdb18957e66609574ce2021a8d1c)
* [X86CmovConversion] Make heuristic for optimized cmov depth more ↵Nikita Popov2020-02-191-6/+7
| | | | | | | | | | | | | | | | | | | conservative (PR44539) Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights. This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice). Differential Revision: https://reviews.llvm.org/D74155 (cherry picked from commit 5eb19bf4a2b0c29a8d4d48dfb0276f096eff9bec)
* [FPEnv][ARM] Don't call mutateStrictFPToFP when loweringJohn Brawn2020-02-181-2/+10
| | | | | | | | | | | | mutateStrictFPToFP can delete the node and replace it with another with the same value which can later cause problems, and returning the result of mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the returned value has the same number of results as the original. Instead handle things by doing the mutation manually. Differential Revision: https://reviews.llvm.org/D74726 (cherry picked from commit 594a89f7270da74c89f2321432bc6a7135773fa5)
* [ARM] Fix infinite loop when lowering STRICT_FP_EXTENDJohn Brawn2020-02-181-0/+9
| | | | | | | | | | | | | | | | If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the current implementation will cause in infinite loop for STRICT_FP_EXTEND due to emitting a merge_values of the original node which after replacement becomes a merge_values of itself. Fix this by not doing anything for f32 to f64 extend when we have FP64, though for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that doesn't happen automatically for opcodes with custom lowering. Differential Revision: https://reviews.llvm.org/D74559 (cherry picked from commit 0ec57972967dfb43fc022c2e3788be041d1db730)
* [FPEnv][AArch64] Add lowering of f128 STRICT_FSETCCJohn Brawn2020-02-181-2/+4
| | | | | | | | These get lowered to function calls, like the non-strict versions. Differential Revision: https://reviews.llvm.org/D73784 (cherry picked from commit 68cf574857c81f711f498a479855a17e7bea40f7)
* [FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCSJohn Brawn2020-02-183-10/+75
| | | | | | | | | | | | | | | These can be lowered to code sequences using CMPFP and CMPFPE which then get selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain operand isn't handled correctly, but resolving that looks like it would involve changes around FPSCR-handling instructions and how the FPSCR is modelled. The fp-intrinsics test was already testing some of this but as the entire test was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the cases where we aren't generating the right instruction sequences as FIXME. Differential Revision: https://reviews.llvm.org/D73194 (cherry picked from commit b37d59353f699e99f139a9227a6a69964ef4b132)
* [FPEnv][AArch64] Add lowering and instruction selection for strict conversionsJohn Brawn2020-02-182-24/+54
| | | | | | | | | | Strict fp-to-int and int-to-fp conversions can be handled in the same way that the non-strict versions are (by using the appropriate instruction or converting to a function call when we have no instruction). Differential Revision: https://reviews.llvm.org/D73625 (cherry picked from commit 0bb9a27c9895c0fbc3f55f56ad7f1e1927398fce)
* [FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUNDJohn Brawn2020-02-182-8/+16
| | | | | | | | | | This gets selected to the appropriate fcvt instruction. Handling from there on isn't fully correct yet, as we need to model fcvt reading and writing to fpsr and fpcr. Differential Revision: https://reviews.llvm.org/D73201 (cherry picked from commit 258d8dd76afd88a12539b182a53ff21dcba16a2d)
* Add lowering of STRICT_FSETCC and STRICT_FSETCCSJohn Brawn2020-02-183-11/+57
| | | | | | | | | | These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the corresponding FCMP and FCMPE instructions, though the handling from there on isn't fully correct as we don't model reads and writes to FPCR and FPSR. Differential Revision: https://reviews.llvm.org/D73368 (cherry picked from commit 2224407ef5baf6100fa22420feb4d25af1a9493f)
* Fix an unused variable warningHans Wennborg2020-02-121-1/+1
| | | | (cherry picked from commit ea9850b6c71d975935de15bd4128508b260165c5)
* [SystemZ] Bugfix in emitSelect()Jonas Paulsson2020-02-121-2/+3
| | | | | | | | | | | | | | When more than one SelectPseudo instruction is handled a new MBB is returned. This must not be done if that would result in leaving an undhandled isel pseudo behind in the original MBB. Fixes https://bugs.llvm.org/show_bug.cgi?id=44849. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D74352 (cherry picked from commit 0311e28e9cc01a244faa774b8cab337b45404fa9)
* [AArch64] Add option to enable/disable load-store renaming.Florian Hahn2020-02-101-0/+7
| | | | | | | | This patch adds a new option to enable/disable register renaming in the load-store optimizer. Defaults to disabled, as there is a potential mis-compile caused by this. (cherry picked from commit 8e3f59b45ae185cc9b4e3a817d7ac958f1d55976)
* AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x)Jan Vesely2020-02-103-4/+10
| | | | | | | | | | | | The old version might be faster on EG (RECIP_IEEE is Trans only), but it'd need extra corner case checks. This gives correct corner case behaviour and saves a register. Fixes OCL CTS sqrt test (1-thread, scalar) on Turks. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D74017 (cherry picked from commit e6686adf8a743564f0c455c34f04752ab08cf642)
* [X86] Use MVT::i8 instead of MVT::i64 for shift amount in BuildSDIVPow2Craig Topper2020-02-101-1/+1
| | | | | | | | | | | | X86 uses i8 for shift amounts. This code can fail on a 32-bit target if it runs after type legalization. This code was copied from AArch64 and modified for X86, but the shift amount wasn't changed to the correct type for X86. Fixes PR44812 (cherry picked from commit ec9a94af4d5fb3270f2451fcbec5a3a99f4ac03a)
* [BPF] disable ReduceLoadWidth during SelectionDag phaseYonghong Song2020-02-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | The compiler may transform the following code ctx = ctx + reloc_offset ... (*(u32 *)ctx) & 0x8000 ... to ctx = ctx + reloc_offset ... (*(u8 *)(ctx + 1)) & 0x80 ... where reloc_offset will be replaced with a constant during AsmPrinter phase. The above transformed code will be rejected the kernel verifier as it does not allow *(type *)((ctx + non_zero_offset1) + non_zero_offset2) style access pattern. It is hard at SelectionDag phase to identify whether a load is related to context or not. Sometime, interprocedure analysis may be needed. So let us simply prevent such optimization from happening. Differential Revision: https://reviews.llvm.org/D73997 (cherry picked from commit d96c1bbaa03574daf759e5e9a6c75047c5e3af64)
* Revert "[ARM] Improve codegen of volatile load/store of i64"Victor Campos2020-02-086-162/+6
| | | | This reverts commit 60e0120c913dd1d4bfe33769e1f000a076249a42.
* [X86] -fpatchable-function-entry=N,0: place patch label after ENDBR{32,64}Fangrui Song2020-02-051-0/+19
| | | | | | | | | | | | | | | | Similar to D73680 (AArch64 BTI). A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR. Also, add 32-bit tests and test that .cfi_startproc is at the function entry. The line information has a general implementation and is tested by AArch64/patchable-function-entry-empty.mir Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73760 (cherry picked from commit 8ff86fcf4c038c7156ed4f01e7ed35cae49489e2)
* [ARM][VecReduce] Force expand vector_reduce_fminDavid Green2020-02-051-3/+6
| | | | | | | | | | | | | | Under MVE, we do not have any lowering for fminimum, which a vector_reduce_fmin without NoNan will be expanded into. As with the other recent patches, force this to expand in the pre-isel pass. Note that Neon lowering would be OK because the scalar fminimum uses the vector VMIN instruction, but is probably better to just rely on the scalar operations, which is what is done here. Also fixes what appears to be the reversal of INF vs -INF in the vector_reduce_fmin widening code. (cherry picked from commit 362d00e0510ee75750499e2993a782428e377215)
* [ARM] Expand vector reduction intrinsics on soft floatNikita Popov2020-02-051-1/+8
| | | | | | | | | | | | Followup to D73135. If the target doesn't have hard float (default for ARM), then we assert when trying to soften the result of vector reduction intrinsics. This patch marks these for expansion as well. (A bit odd to use vectors on a target without hard float ... but that's where you end up if you expose target-independent vector types.) Differential Revision: https://reviews.llvm.org/D73854 (cherry picked from commit 1cc4f8d17247cd9be88addd75d060f9321b6f8b0)
* [AArch64][ARM] Always expand ordered vector reductions (PR44600)Nikita Popov2020-02-052-2/+25
| | | | | | | | | | | | | | | | | | fadd/fmul reductions without reassoc are lowered to VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization support. Until that is in place, expand these intrinsics on ARM and AArch64. Other targets always expand the vector reduction intrinsics. Additionally expand fmax/fmin reductions without nonan flag on AArch64, as the backend asserts that the flag is present when lowering VECREDUCE_FMIN/FMAX. This fixes https://bugs.llvm.org/show_bug.cgi?id=44600. Differential Revision: https://reviews.llvm.org/D73135 (cherry picked from commit 70d345e687caba4ac1f95655c6924dfa91e0083f)
* AMDGPU: Fix handling of infinite loops in fragment shadersConnor Abbott2020-02-041-6/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781 (cherry picked from commit 87d98c149504f9b0751189744472d7cc94883960)
* AMDGPU/R600: Emit rodata in text segmentJan Vesely2020-02-031-1/+1
| | | | | | | | | | R600 relies on this behaviour. Fixes: 6e18266aa4dd78953557b8614cb9ff260bad7c65 ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"') Fixes ~100 piglit regressions since 6e18266 Differential Revision: https://reviews.llvm.org/D72991 (cherry picked from commit 1b8eab179db46f25a267bb73c657009c0bb542cc)
* [BPF] fix a bug in BPFMISimplifyPatchable pass with -O0Yonghong Song2020-02-031-3/+4
| | | | | | | | | | | | | | | | | | | | The recommended optimization level for BPF programs is O2 since (1). BPF is running inside the kernel and linux kernel won't work at -O0 level, and (2). Verifier is not able to handle O0 code properly, e.g., potential large stack size and a lot of spills. But we should keep -O0 at least compiling. This patch fixed a bug in BPFMISimplifyPatchable phase where with -O0, a segmentation fault will happen for a simple program like: int test(int a, int b) { return a + b; } A test case is added to capture such a case. Differential Revision: https://reviews.llvm.org/D73681 (cherry picked from commit 795bbb366266e83d2bea8dc04c19919b52ab3a2a)
* Revert "[AMDGPU] Invert the handling of skip insertion."Nicolai Hähnle2020-02-036-173/+6
| | | | | | | | | This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa. (cherry picked from commit a80291ce10ba9667352adcc895f9668144f5f616)
* [RISCV] Scheduler description for the Rocket coreKai Wang2020-02-0311-186/+900
| | | | | | | | | | Pipeline scheduler model for the RISC-V Rocket micro-architecture using the MIScheduler interface. Support for both 32 and 64-bit Rocket cores is implemented. Differential revision: https://reviews.llvm.org/D68685 (cherry picked from commit 838a28e234e098bfc073a45f37a4dd3bb5b45eab)
* [AArch64] -fpatchable-function-entry=N,0: place patch label after BTIFangrui Song2020-02-031-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after 9a24488cb67a90f889529987275c5e411ce01dda, we place the NOP sled after the initial BTI. ``` .Lfunc_begin0: bti c nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lfunc_begin0 ``` This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label: ``` .Lfunc_begin0: bti c .Lpatch0: nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lpatch0 ``` This placement is compatible with the resolution in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 . A local linkage function whose address is not taken does not need a BTI. Placing the patch label after BTI has the advantage that code does not need to differentiate whether the function has an initial BTI. Reviewers: mrutland, nickdesaulniers, nsz, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73680 (cherry picked from commit 06b8e32d4fd3f634f793e3bc0bc4fdb885e7a3ac)
* [WebAssembly] Fix resume-only case in Emscripten EHHeejin Ahn2020-01-291-1/+4
| | | | | | | | | | | | | | | | | | Summary: D72308 incorrectly assumed `resume` cannot exist without a `landingpad`, which is not true. This sets `Changed` to true whenever we make changes to a function, including creating a call to `__resumeException` within a function without a landing pad. Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73308 (cherry picked from commit 580d7838dd08e13dac6caf4ab3142c9381bc7ad0)
* [RISCV] Support ABI checking with per function target-featuresZakk Chen2020-01-273-10/+28
| | | | | | | | | | | | | | | | 1. if users don't specific -mattr, the default target-feature come from IR attribute. 2. fixed bug and re-land this patch Reviewers: lenary, asb Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D70837 (cherry picked from commit 0cb274de397a193fb37c60653b336d48a3a4f1bd)
* Revert "[RISCV] Support ABI checking with per function target-features"Zakk Chen2020-01-273-27/+10
| | | | | | | This reverts commit 7bc58a779aaa1de56fad8b1bc8e46932d2f2f1e4. It breaks EXPENSIVE_CHECKS on Windows (cherry picked from commit cef838e65f9a2aeecf5e19431077bc16b01a79fb)
* [RISCV] Check the target-abi module flag matches the optionZakk Chen2020-01-273-12/+28
| | | | | | | | | | | | Reviewers: lenary, asb Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D72768 (cherry picked from commit 1256d68093ac1696034e385bbb4cb6e516b66bea)
* [X86] Make `llc --help` output readable againRoman Lebedev2020-01-271-7/+7
| | | | | | | | | Long `cl::value_desc()` is added right after the flag name, before `cl::desc()` column. And thus the `cl::desc()` column, for all flags, is padded to the right, which makes the output unreadable. (cherry picked from commit 70cbf8c71c510077baadcad305fea6f62e830b06)
* Add function attribute "patchable-function-prefix" to support ↵Fangrui Song2020-01-242-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -fpatchable-function-entry=N,M where M>0 Similar to the function attribute `prefix` (prefix data), "patchable-function-prefix" inserts data (M NOPs) before the function entry label. -fpatchable-function-entry=2,1 (1 NOP before entry, 1 NOP after entry) will look like: ``` .type foo,@function .Ltmp0: # @foo nop foo: .Lfunc_begin0: # optional `bti c` (AArch64 Branch Target Identification) or # `endbr64` (Intel Indirect Branch Tracking) nop .section __patchable_function_entries,"awo",@progbits,get,unique,0 .p2align 3 .quad .Ltmp0 ``` -fpatchable-function-entry=N,0 + -mbranch-protection=bti/-fcf-protection=branch has two reasonable placements (https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01185.html): ``` (a) (b) func: func: .Ltmp0: bti c bti c .Ltmp0: nop nop ``` (a) needs no additional code. If the consensus is to go for (b), we will need more code in AArch64BranchTargets.cpp / X86IndirectBranchTracking.cpp . Differential Revision: https://reviews.llvm.org/D73070 (cherry picked from commit 22467e259507f5ead2a87d989251b4c951a587e4)
* [RISCV] Fix evaluating %pcrel_lo against global and weak symbolsJames Clarke2020-01-234-108/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, we would erroneously turn %pcrel_lo(label), where label has a %pcrel_hi against a weak symbol, into %pcrel_lo(label + offset), as evaluatePCRelLo would believe the target independent logic was going to fold it. Moreover, even if that were fixed, shouldForceRelocation lacks an MCAsmLayout and thus cannot evaluate the %pcrel_hi fixup to a value and check the symbol, so we would then erroneously constant-fold the %pcrel_lo whilst leaving the %pcrel_hi intact. After D72197, this same sequence also occurs for symbols with global binding, which is triggered in real-world code. Instead, as discussed in D71978, we introduce a new FKF_IsTarget flag to avoid these kinds of issues. All the resolution logic happens in one place, with no coordination required between RISCAsmBackend and RISCVMCExpr to ensure they implement the same logic twice. Although the implementation of %pcrel_hi can be left as target independent, we make it target dependent to ensure that they are handled identically to %pcrel_lo, otherwise we risk one of them being constant folded but the other being preserved. This also allows us to properly support fixup pairs where the instructions are in different fragments. Reviewers: asb, lenary, efriedma Reviewed By: efriedma Subscribers: arichardson, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73211 (cherry picked from commit 3f5976c97dbfefb4669abcf968bd79a9a64c18e0)
* [AArch64] Don't rename registers with pseudo defs in Ld/St opt.Florian Hahn2020-01-221-0/+13
| | | | | | | | | | | | If the root def of for renaming is a noop-pseudo instruction like kill, we would end up without a correct def for the renamed register, causing miscompiles. This patch conservatively bails out on any pseudo instruction. This fixes https://bugs.chromium.org/p/chromium/issues/detail?id=1037912#c70 (cherry picked from commit 300997c41a00b705ca10264c15910dd8d691ab75)
* [Transforms][RISCV] Remove a "using namespace llvm" from an include file. ↵Craig Topper2020-01-171-2/+2
| | | | | | | | | | | | Fix a place that became dependent on it. This include file was created in October and has a "using namespace llvm". This seems to get exposed to other include files and finally onto cpp files. While this somewhat okay for llvm itself, its bad for other projects that use llvm as a library and includes a header file that picks this up. This was found by ISPC which has some class names at gloal scope with the same names as LLVM. It looks like RISCV accidentally became dependent on this. I fixed it by reordering some includes in the RISCV code, but maybe we want to change the TableGenEmitter to put "namespace llvm {" in the generated file instead? But we probably want to do the simplest thing first so we can merge it to 10.0. Differential Revision: https://reviews.llvm.org/D72895 (cherry picked from commit caee96031d3be9f951e4a17c8d3fb1c8b748fb31)
* Revert rG6078f2fedcac5797ac39ee5ef3fd7a35ef1202d5 - "[AArch64][GlobalISel]: ↵Simon Pilgrim2020-01-151-38/+1
| | | | | | | | | | | | | Support @llvm.{return,frame}address selection." These intrinsics expand to a variable number of instructions so just like in ISelLowering.cpp we use custom code to deal with them. Committing Tim's original patch. Differential Revision: https://reviews.llvm.org/D65656 ---- Breaks EXPENSIVE_CHECKS builds.
* [RISCV] Support ABI checking with per function target-featuresZakk Chen2020-01-153-10/+27
| | | | | | | | | | | | | if users don't specific -mattr, the default target-feature come from IR attribute. Reviewers: lenary, asb Reviewed By: lenary, asb Tags: #llvm Differential Revision: https://reviews.llvm.org/D70837
* Revert "[RISCV] Support ABI checking with per function target-features"Zakk Chen2020-01-153-27/+10
| | | | This reverts commit 109e4d12edda07bdec139de36d9fdb6f73399f92.
* Fix "pointer is null" static analyzer warning. NFCI.Simon Pilgrim2020-01-151-2/+1
| | | | Use cast<> instead of dyn_cast<> since the pointer is always dereferenced and cast<> will perform the null assertion for us.
* [AArch64][SVE] Fold variable into assert to silence unused variable warnings ↵Benjamin Kramer2020-01-151-2/+2
| | | | in Release builds
* [AArch64][SVE] Add ptest intrinsicsCullen Rhodes2020-01-154-1/+54
| | | | | | | | | | | | | | | | | | | Summary: Implements the following intrinsics: * @llvm.aarch64.sve.ptest.any * @llvm.aarch64.sve.ptest.first * @llvm.aarch64.sve.ptest.last Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72398
* [RISCV] Support ABI checking with per function target-featuresZakk Chen2020-01-153-10/+27
| | | | | if users don't specific -mattr, the default target-feature come from IR attribute.
* [AMDGPU] Invert the handling of skip insertion.cdevadas2020-01-156-6/+173
| | | | | | | | | | | | | | | The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092
* [VE] Minimal codegen for empty functionsKazushi (Jam) Marukawa2020-01-1536-18/+2549
| | | | | | | | | | | | | | | | Summary: This patch implements minimal VE code generation for empty function bodies (no args, no value return). Contents * empty function code generation test. * Minimal function prologue & epilogue emission * Instruction formats and instruction definitions as far as required for the empty function prologue & epilogue. * I64 register class definitions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D72598
* [X86] Don't call LowerUINT_TO_FP_i32 for i32->f80 on 32-bit targets with sse2.Craig Topper2020-01-151-1/+1
| | | | | | | | | | We were performing an emulated i32->f64 in the SSE registers, then storing that value to memory and doing a extload into the X87 domain. After this patch we'll now just store the i32 to memory along with an i32 0. Then do a 64-bit FILD to f80 completely in the X87 unit. This matches what we do without SSE.
* [PowerPC] Fix powerpcspe subtarget enablement in llvm backendJustin Hibbits2020-01-142-4/+3
| | | | | | | | | | | | Summary: As currently written, -target powerpcspe will enable SPE regardless of disabling the feature later on in the command line. Instead, change this to just set a default CPU to 'e500' instead of a generic CPU. As part of this, add FeatureSPE to the e500 definition. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D72673
OpenPOWER on IntegriCloud