summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCSJohn Brawn2020-02-183-10/+75
| | | | | | | | | | | | | | | These can be lowered to code sequences using CMPFP and CMPFPE which then get selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain operand isn't handled correctly, but resolving that looks like it would involve changes around FPSCR-handling instructions and how the FPSCR is modelled. The fp-intrinsics test was already testing some of this but as the entire test was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the cases where we aren't generating the right instruction sequences as FIXME. Differential Revision: https://reviews.llvm.org/D73194 (cherry picked from commit b37d59353f699e99f139a9227a6a69964ef4b132)
* [FPEnv][AArch64] Add lowering and instruction selection for strict conversionsJohn Brawn2020-02-182-24/+54
| | | | | | | | | | Strict fp-to-int and int-to-fp conversions can be handled in the same way that the non-strict versions are (by using the appropriate instruction or converting to a function call when we have no instruction). Differential Revision: https://reviews.llvm.org/D73625 (cherry picked from commit 0bb9a27c9895c0fbc3f55f56ad7f1e1927398fce)
* [FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUNDJohn Brawn2020-02-182-8/+16
| | | | | | | | | | This gets selected to the appropriate fcvt instruction. Handling from there on isn't fully correct yet, as we need to model fcvt reading and writing to fpsr and fpcr. Differential Revision: https://reviews.llvm.org/D73201 (cherry picked from commit 258d8dd76afd88a12539b182a53ff21dcba16a2d)
* Add lowering of STRICT_FSETCC and STRICT_FSETCCSJohn Brawn2020-02-183-11/+57
| | | | | | | | | | These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the corresponding FCMP and FCMPE instructions, though the handling from there on isn't fully correct as we don't model reads and writes to FPCR and FPSR. Differential Revision: https://reviews.llvm.org/D73368 (cherry picked from commit 2224407ef5baf6100fa22420feb4d25af1a9493f)
* Fix an unused variable warningHans Wennborg2020-02-121-1/+1
| | | | (cherry picked from commit ea9850b6c71d975935de15bd4128508b260165c5)
* [SystemZ] Bugfix in emitSelect()Jonas Paulsson2020-02-121-2/+3
| | | | | | | | | | | | | | When more than one SelectPseudo instruction is handled a new MBB is returned. This must not be done if that would result in leaving an undhandled isel pseudo behind in the original MBB. Fixes https://bugs.llvm.org/show_bug.cgi?id=44849. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D74352 (cherry picked from commit 0311e28e9cc01a244faa774b8cab337b45404fa9)
* [AArch64] Add option to enable/disable load-store renaming.Florian Hahn2020-02-101-0/+7
| | | | | | | | This patch adds a new option to enable/disable register renaming in the load-store optimizer. Defaults to disabled, as there is a potential mis-compile caused by this. (cherry picked from commit 8e3f59b45ae185cc9b4e3a817d7ac958f1d55976)
* AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x)Jan Vesely2020-02-103-4/+10
| | | | | | | | | | | | The old version might be faster on EG (RECIP_IEEE is Trans only), but it'd need extra corner case checks. This gives correct corner case behaviour and saves a register. Fixes OCL CTS sqrt test (1-thread, scalar) on Turks. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D74017 (cherry picked from commit e6686adf8a743564f0c455c34f04752ab08cf642)
* [X86] Use MVT::i8 instead of MVT::i64 for shift amount in BuildSDIVPow2Craig Topper2020-02-101-1/+1
| | | | | | | | | | | | X86 uses i8 for shift amounts. This code can fail on a 32-bit target if it runs after type legalization. This code was copied from AArch64 and modified for X86, but the shift amount wasn't changed to the correct type for X86. Fixes PR44812 (cherry picked from commit ec9a94af4d5fb3270f2451fcbec5a3a99f4ac03a)
* [BPF] disable ReduceLoadWidth during SelectionDag phaseYonghong Song2020-02-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | The compiler may transform the following code ctx = ctx + reloc_offset ... (*(u32 *)ctx) & 0x8000 ... to ctx = ctx + reloc_offset ... (*(u8 *)(ctx + 1)) & 0x80 ... where reloc_offset will be replaced with a constant during AsmPrinter phase. The above transformed code will be rejected the kernel verifier as it does not allow *(type *)((ctx + non_zero_offset1) + non_zero_offset2) style access pattern. It is hard at SelectionDag phase to identify whether a load is related to context or not. Sometime, interprocedure analysis may be needed. So let us simply prevent such optimization from happening. Differential Revision: https://reviews.llvm.org/D73997 (cherry picked from commit d96c1bbaa03574daf759e5e9a6c75047c5e3af64)
* Revert "[ARM] Improve codegen of volatile load/store of i64"Victor Campos2020-02-086-162/+6
| | | | This reverts commit 60e0120c913dd1d4bfe33769e1f000a076249a42.
* [X86] -fpatchable-function-entry=N,0: place patch label after ENDBR{32,64}Fangrui Song2020-02-051-0/+19
| | | | | | | | | | | | | | | | Similar to D73680 (AArch64 BTI). A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR. Also, add 32-bit tests and test that .cfi_startproc is at the function entry. The line information has a general implementation and is tested by AArch64/patchable-function-entry-empty.mir Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73760 (cherry picked from commit 8ff86fcf4c038c7156ed4f01e7ed35cae49489e2)
* [ARM][VecReduce] Force expand vector_reduce_fminDavid Green2020-02-051-3/+6
| | | | | | | | | | | | | | Under MVE, we do not have any lowering for fminimum, which a vector_reduce_fmin without NoNan will be expanded into. As with the other recent patches, force this to expand in the pre-isel pass. Note that Neon lowering would be OK because the scalar fminimum uses the vector VMIN instruction, but is probably better to just rely on the scalar operations, which is what is done here. Also fixes what appears to be the reversal of INF vs -INF in the vector_reduce_fmin widening code. (cherry picked from commit 362d00e0510ee75750499e2993a782428e377215)
* [ARM] Expand vector reduction intrinsics on soft floatNikita Popov2020-02-051-1/+8
| | | | | | | | | | | | Followup to D73135. If the target doesn't have hard float (default for ARM), then we assert when trying to soften the result of vector reduction intrinsics. This patch marks these for expansion as well. (A bit odd to use vectors on a target without hard float ... but that's where you end up if you expose target-independent vector types.) Differential Revision: https://reviews.llvm.org/D73854 (cherry picked from commit 1cc4f8d17247cd9be88addd75d060f9321b6f8b0)
* [AArch64][ARM] Always expand ordered vector reductions (PR44600)Nikita Popov2020-02-052-2/+25
| | | | | | | | | | | | | | | | | | fadd/fmul reductions without reassoc are lowered to VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization support. Until that is in place, expand these intrinsics on ARM and AArch64. Other targets always expand the vector reduction intrinsics. Additionally expand fmax/fmin reductions without nonan flag on AArch64, as the backend asserts that the flag is present when lowering VECREDUCE_FMIN/FMAX. This fixes https://bugs.llvm.org/show_bug.cgi?id=44600. Differential Revision: https://reviews.llvm.org/D73135 (cherry picked from commit 70d345e687caba4ac1f95655c6924dfa91e0083f)
* AMDGPU: Fix handling of infinite loops in fragment shadersConnor Abbott2020-02-041-6/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781 (cherry picked from commit 87d98c149504f9b0751189744472d7cc94883960)
* AMDGPU/R600: Emit rodata in text segmentJan Vesely2020-02-031-1/+1
| | | | | | | | | | R600 relies on this behaviour. Fixes: 6e18266aa4dd78953557b8614cb9ff260bad7c65 ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"') Fixes ~100 piglit regressions since 6e18266 Differential Revision: https://reviews.llvm.org/D72991 (cherry picked from commit 1b8eab179db46f25a267bb73c657009c0bb542cc)
* [BPF] fix a bug in BPFMISimplifyPatchable pass with -O0Yonghong Song2020-02-031-3/+4
| | | | | | | | | | | | | | | | | | | | The recommended optimization level for BPF programs is O2 since (1). BPF is running inside the kernel and linux kernel won't work at -O0 level, and (2). Verifier is not able to handle O0 code properly, e.g., potential large stack size and a lot of spills. But we should keep -O0 at least compiling. This patch fixed a bug in BPFMISimplifyPatchable phase where with -O0, a segmentation fault will happen for a simple program like: int test(int a, int b) { return a + b; } A test case is added to capture such a case. Differential Revision: https://reviews.llvm.org/D73681 (cherry picked from commit 795bbb366266e83d2bea8dc04c19919b52ab3a2a)
* Revert "[AMDGPU] Invert the handling of skip insertion."Nicolai Hähnle2020-02-036-173/+6
| | | | | | | | | This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa. (cherry picked from commit a80291ce10ba9667352adcc895f9668144f5f616)
* [RISCV] Scheduler description for the Rocket coreKai Wang2020-02-0311-186/+900
| | | | | | | | | | Pipeline scheduler model for the RISC-V Rocket micro-architecture using the MIScheduler interface. Support for both 32 and 64-bit Rocket cores is implemented. Differential revision: https://reviews.llvm.org/D68685 (cherry picked from commit 838a28e234e098bfc073a45f37a4dd3bb5b45eab)
* [AArch64] -fpatchable-function-entry=N,0: place patch label after BTIFangrui Song2020-02-031-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after 9a24488cb67a90f889529987275c5e411ce01dda, we place the NOP sled after the initial BTI. ``` .Lfunc_begin0: bti c nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lfunc_begin0 ``` This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label: ``` .Lfunc_begin0: bti c .Lpatch0: nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lpatch0 ``` This placement is compatible with the resolution in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 . A local linkage function whose address is not taken does not need a BTI. Placing the patch label after BTI has the advantage that code does not need to differentiate whether the function has an initial BTI. Reviewers: mrutland, nickdesaulniers, nsz, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73680 (cherry picked from commit 06b8e32d4fd3f634f793e3bc0bc4fdb885e7a3ac)
* [WebAssembly] Fix resume-only case in Emscripten EHHeejin Ahn2020-01-291-1/+4
| | | | | | | | | | | | | | | | | | Summary: D72308 incorrectly assumed `resume` cannot exist without a `landingpad`, which is not true. This sets `Changed` to true whenever we make changes to a function, including creating a call to `__resumeException` within a function without a landing pad. Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73308 (cherry picked from commit 580d7838dd08e13dac6caf4ab3142c9381bc7ad0)
* [RISCV] Support ABI checking with per function target-featuresZakk Chen2020-01-273-10/+28
| | | | | | | | | | | | | | | | 1. if users don't specific -mattr, the default target-feature come from IR attribute. 2. fixed bug and re-land this patch Reviewers: lenary, asb Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D70837 (cherry picked from commit 0cb274de397a193fb37c60653b336d48a3a4f1bd)
* Revert "[RISCV] Support ABI checking with per function target-features"Zakk Chen2020-01-273-27/+10
| | | | | | | This reverts commit 7bc58a779aaa1de56fad8b1bc8e46932d2f2f1e4. It breaks EXPENSIVE_CHECKS on Windows (cherry picked from commit cef838e65f9a2aeecf5e19431077bc16b01a79fb)
* [RISCV] Check the target-abi module flag matches the optionZakk Chen2020-01-273-12/+28
| | | | | | | | | | | | Reviewers: lenary, asb Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D72768 (cherry picked from commit 1256d68093ac1696034e385bbb4cb6e516b66bea)
* [X86] Make `llc --help` output readable againRoman Lebedev2020-01-271-7/+7
| | | | | | | | | Long `cl::value_desc()` is added right after the flag name, before `cl::desc()` column. And thus the `cl::desc()` column, for all flags, is padded to the right, which makes the output unreadable. (cherry picked from commit 70cbf8c71c510077baadcad305fea6f62e830b06)
* Add function attribute "patchable-function-prefix" to support ↵Fangrui Song2020-01-242-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -fpatchable-function-entry=N,M where M>0 Similar to the function attribute `prefix` (prefix data), "patchable-function-prefix" inserts data (M NOPs) before the function entry label. -fpatchable-function-entry=2,1 (1 NOP before entry, 1 NOP after entry) will look like: ``` .type foo,@function .Ltmp0: # @foo nop foo: .Lfunc_begin0: # optional `bti c` (AArch64 Branch Target Identification) or # `endbr64` (Intel Indirect Branch Tracking) nop .section __patchable_function_entries,"awo",@progbits,get,unique,0 .p2align 3 .quad .Ltmp0 ``` -fpatchable-function-entry=N,0 + -mbranch-protection=bti/-fcf-protection=branch has two reasonable placements (https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01185.html): ``` (a) (b) func: func: .Ltmp0: bti c bti c .Ltmp0: nop nop ``` (a) needs no additional code. If the consensus is to go for (b), we will need more code in AArch64BranchTargets.cpp / X86IndirectBranchTracking.cpp . Differential Revision: https://reviews.llvm.org/D73070 (cherry picked from commit 22467e259507f5ead2a87d989251b4c951a587e4)
* [RISCV] Fix evaluating %pcrel_lo against global and weak symbolsJames Clarke2020-01-234-108/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, we would erroneously turn %pcrel_lo(label), where label has a %pcrel_hi against a weak symbol, into %pcrel_lo(label + offset), as evaluatePCRelLo would believe the target independent logic was going to fold it. Moreover, even if that were fixed, shouldForceRelocation lacks an MCAsmLayout and thus cannot evaluate the %pcrel_hi fixup to a value and check the symbol, so we would then erroneously constant-fold the %pcrel_lo whilst leaving the %pcrel_hi intact. After D72197, this same sequence also occurs for symbols with global binding, which is triggered in real-world code. Instead, as discussed in D71978, we introduce a new FKF_IsTarget flag to avoid these kinds of issues. All the resolution logic happens in one place, with no coordination required between RISCAsmBackend and RISCVMCExpr to ensure they implement the same logic twice. Although the implementation of %pcrel_hi can be left as target independent, we make it target dependent to ensure that they are handled identically to %pcrel_lo, otherwise we risk one of them being constant folded but the other being preserved. This also allows us to properly support fixup pairs where the instructions are in different fragments. Reviewers: asb, lenary, efriedma Reviewed By: efriedma Subscribers: arichardson, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73211 (cherry picked from commit 3f5976c97dbfefb4669abcf968bd79a9a64c18e0)
* [AArch64] Don't rename registers with pseudo defs in Ld/St opt.Florian Hahn2020-01-221-0/+13
| | | | | | | | | | | | If the root def of for renaming is a noop-pseudo instruction like kill, we would end up without a correct def for the renamed register, causing miscompiles. This patch conservatively bails out on any pseudo instruction. This fixes https://bugs.chromium.org/p/chromium/issues/detail?id=1037912#c70 (cherry picked from commit 300997c41a00b705ca10264c15910dd8d691ab75)
* [Transforms][RISCV] Remove a "using namespace llvm" from an include file. ↵Craig Topper2020-01-171-2/+2
| | | | | | | | | | | | Fix a place that became dependent on it. This include file was created in October and has a "using namespace llvm". This seems to get exposed to other include files and finally onto cpp files. While this somewhat okay for llvm itself, its bad for other projects that use llvm as a library and includes a header file that picks this up. This was found by ISPC which has some class names at gloal scope with the same names as LLVM. It looks like RISCV accidentally became dependent on this. I fixed it by reordering some includes in the RISCV code, but maybe we want to change the TableGenEmitter to put "namespace llvm {" in the generated file instead? But we probably want to do the simplest thing first so we can merge it to 10.0. Differential Revision: https://reviews.llvm.org/D72895 (cherry picked from commit caee96031d3be9f951e4a17c8d3fb1c8b748fb31)
* Revert rG6078f2fedcac5797ac39ee5ef3fd7a35ef1202d5 - "[AArch64][GlobalISel]: ↵Simon Pilgrim2020-01-151-38/+1
| | | | | | | | | | | | | Support @llvm.{return,frame}address selection." These intrinsics expand to a variable number of instructions so just like in ISelLowering.cpp we use custom code to deal with them. Committing Tim's original patch. Differential Revision: https://reviews.llvm.org/D65656 ---- Breaks EXPENSIVE_CHECKS builds.
* [RISCV] Support ABI checking with per function target-featuresZakk Chen2020-01-153-10/+27
| | | | | | | | | | | | | if users don't specific -mattr, the default target-feature come from IR attribute. Reviewers: lenary, asb Reviewed By: lenary, asb Tags: #llvm Differential Revision: https://reviews.llvm.org/D70837
* Revert "[RISCV] Support ABI checking with per function target-features"Zakk Chen2020-01-153-27/+10
| | | | This reverts commit 109e4d12edda07bdec139de36d9fdb6f73399f92.
* Fix "pointer is null" static analyzer warning. NFCI.Simon Pilgrim2020-01-151-2/+1
| | | | Use cast<> instead of dyn_cast<> since the pointer is always dereferenced and cast<> will perform the null assertion for us.
* [AArch64][SVE] Fold variable into assert to silence unused variable warnings ↵Benjamin Kramer2020-01-151-2/+2
| | | | in Release builds
* [AArch64][SVE] Add ptest intrinsicsCullen Rhodes2020-01-154-1/+54
| | | | | | | | | | | | | | | | | | | Summary: Implements the following intrinsics: * @llvm.aarch64.sve.ptest.any * @llvm.aarch64.sve.ptest.first * @llvm.aarch64.sve.ptest.last Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72398
* [RISCV] Support ABI checking with per function target-featuresZakk Chen2020-01-153-10/+27
| | | | | if users don't specific -mattr, the default target-feature come from IR attribute.
* [AMDGPU] Invert the handling of skip insertion.cdevadas2020-01-156-6/+173
| | | | | | | | | | | | | | | The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092
* [VE] Minimal codegen for empty functionsKazushi (Jam) Marukawa2020-01-1536-18/+2549
| | | | | | | | | | | | | | | | Summary: This patch implements minimal VE code generation for empty function bodies (no args, no value return). Contents * empty function code generation test. * Minimal function prologue & epilogue emission * Instruction formats and instruction definitions as far as required for the empty function prologue & epilogue. * I64 register class definitions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D72598
* [X86] Don't call LowerUINT_TO_FP_i32 for i32->f80 on 32-bit targets with sse2.Craig Topper2020-01-151-1/+1
| | | | | | | | | | We were performing an emulated i32->f64 in the SSE registers, then storing that value to memory and doing a extload into the X87 domain. After this patch we'll now just store the i32 to memory along with an i32 0. Then do a 64-bit FILD to f80 completely in the X87 unit. This matches what we do without SSE.
* [PowerPC] Fix powerpcspe subtarget enablement in llvm backendJustin Hibbits2020-01-142-4/+3
| | | | | | | | | | | | Summary: As currently written, -target powerpcspe will enable SPE regardless of disabling the feature later on in the command line. Instead, change this to just set a default CPU to 'e500' instead of a generic CPU. As part of this, add FeatureSPE to the e500 definition. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D72673
* CMake: Make most target symbols hidden by defaultTom Stellard2020-01-14105-105/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 36221 nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439
* [BranchAlign] Add master --x86-branches-within-32B-boundaries flagPhilip Reames2020-01-141-2/+23
| | | | | | | | | | | | This flag was originally part of D70157, but was removed as we carved away pieces of the review. Since we have the nop support checked in, and it appears mature(*), I think it's time to add the master flag. For now, it will default to nop padding, but once the prefix padding support lands, we'll update the defaults. (*) I can now confirm that downstream testing of the changes which have landed to date - nop padding and compiler support for suppressions - is passing all of the functional testing we've thrown at it. There might still be something lurking, but we've gotten enough coverage to be confident of the basic approach. Note that the new flag can be used either when assembling an .s file, or when using the integrated assembler directly from the compiler. The later will use all of the suppression mechanism and should always generate correct code. We don't yet have assembly syntax for the suppressions, so passing this directly to the assembler w/a raw .s file may result in broken code. Use at your own risk. Also note that this isn't the wiring for the clang option. I think the most recent review for that is D72227, but I've lost track, so that might be off. Differential Revision: https://reviews.llvm.org/D72738
* [Win64] Handle FP arguments more gracefully under -mno-sseReid Kleckner2020-01-142-22/+31
| | | | | | | | | | | | | | | | | Pass small FP values in GPRs or stack memory according the the normal convention. This is what gcc -mno-sse does on Win64. I adjusted the conditions under which we emit an error to check if the argument or return value would be passed in an XMM register when SSE is disabled. This has a side effect of no longer emitting an error for FP arguments marked 'inreg' when targetting x86 with SSE disabled. Our calling convention logic was already assigning it to FP0/FP1, and then we emitted this error. That seems unnecessary, we can ignore 'inreg' and compile it without SSE. Reviewers: jyknight, aemerson Differential Revision: https://reviews.llvm.org/D70465
* [X86] Drop an unneeded FIXME. NFCCraig Topper2020-01-141-1/+0
| | | | The extload on X87 is free.
* [X86] Swap the 0 and the fudge factor in the constant pool for the 32-bit ↵Craig Topper2020-01-141-4/+4
| | | | | | | | | | | | mode i64->f32/f64/f80 uint_to_fp algorithm. This allows us to generate better code for selecting the fixup to load. Previously when the sign was set we had to load offset 0. And when it was clear we had to load offset 4. This required a testl, setns, zero extend, and finally a mul by 4. By switching the offsets we can just shift the sign bit into the lsb and multiply it by 4.
* [codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU.Michael Liao2020-01-141-1/+1
| | | | | | | | | | | | | | | | | | | Summary: - `dead-mi-elimination` assumes MIR in the SSA form and cannot be arranged after phi elimination or DeSSA. It's enhanced to handle the dead register definition by skipping use check on it. Once a register def is `dead`, all its uses, if any, should be `undef`. - Re-arrange the DIE in RA phase for AMDGPU by placing it directly after `detect-dead-lanes`. - Many relevant tests are refined due to different register assignment. Reviewers: rampitec, qcolombet, sunfish Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72709
* [AArch64][GlobalISel]: Support @llvm.{return,frame}address selection.Amara Emerson2020-01-141-1/+38
| | | | | | | | | These intrinsics expand to a variable number of instructions so just like in ISelLowering.cpp we use custom code to deal with them. Committing Tim's original patch. Differential Revision: https://reviews.llvm.org/D65656
* [SVE] Add patterns for MUL immediate instruction.Danilo Carvalho Grael2020-01-142-2/+7
| | | | | | | | | | | | Summary: Add the missing MUL pattern for integer immediate instructions. Reviewers: sdesmalen, huntergr, efriedma, c-rhodes, kmclaughlin Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D72654
* [RISCV] Allow shrink wrapping for RISC-Vlewis-revill2020-01-141-4/+16
| | | | | | | | | Enabling shrink wrapping requires ensuring the insertion point of the epilogue is correct for MBBs without a terminator, in which case the instruction to adjust the stack pointer is the last instruction in the block. Differential Revision: https://reviews.llvm.org/D62190
OpenPOWER on IntegriCloud