summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86CmovConversion] Make heuristic for optimized cmov depth more ↵Nikita Popov2020-02-191-6/+7
| | | | | | | | | | | | | | | | | | | conservative (PR44539) Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights. This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice). Differential Revision: https://reviews.llvm.org/D74155 (cherry picked from commit 5eb19bf4a2b0c29a8d4d48dfb0276f096eff9bec)
* [X86] Use MVT::i8 instead of MVT::i64 for shift amount in BuildSDIVPow2Craig Topper2020-02-101-1/+1
| | | | | | | | | | | | X86 uses i8 for shift amounts. This code can fail on a 32-bit target if it runs after type legalization. This code was copied from AArch64 and modified for X86, but the shift amount wasn't changed to the correct type for X86. Fixes PR44812 (cherry picked from commit ec9a94af4d5fb3270f2451fcbec5a3a99f4ac03a)
* [X86] -fpatchable-function-entry=N,0: place patch label after ENDBR{32,64}Fangrui Song2020-02-051-0/+19
| | | | | | | | | | | | | | | | Similar to D73680 (AArch64 BTI). A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR. Also, add 32-bit tests and test that .cfi_startproc is at the function entry. The line information has a general implementation and is tested by AArch64/patchable-function-entry-empty.mir Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73760 (cherry picked from commit 8ff86fcf4c038c7156ed4f01e7ed35cae49489e2)
* [X86] Make `llc --help` output readable againRoman Lebedev2020-01-271-7/+7
| | | | | | | | | Long `cl::value_desc()` is added right after the flag name, before `cl::desc()` column. And thus the `cl::desc()` column, for all flags, is padded to the right, which makes the output unreadable. (cherry picked from commit 70cbf8c71c510077baadcad305fea6f62e830b06)
* [X86] Don't call LowerUINT_TO_FP_i32 for i32->f80 on 32-bit targets with sse2.Craig Topper2020-01-151-1/+1
| | | | | | | | | | We were performing an emulated i32->f64 in the SSE registers, then storing that value to memory and doing a extload into the X87 domain. After this patch we'll now just store the i32 to memory along with an i32 0. Then do a 64-bit FILD to f80 completely in the X87 unit. This matches what we do without SSE.
* CMake: Make most target symbols hidden by defaultTom Stellard2020-01-146-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 36221 nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439
* [BranchAlign] Add master --x86-branches-within-32B-boundaries flagPhilip Reames2020-01-141-2/+23
| | | | | | | | | | | | This flag was originally part of D70157, but was removed as we carved away pieces of the review. Since we have the nop support checked in, and it appears mature(*), I think it's time to add the master flag. For now, it will default to nop padding, but once the prefix padding support lands, we'll update the defaults. (*) I can now confirm that downstream testing of the changes which have landed to date - nop padding and compiler support for suppressions - is passing all of the functional testing we've thrown at it. There might still be something lurking, but we've gotten enough coverage to be confident of the basic approach. Note that the new flag can be used either when assembling an .s file, or when using the integrated assembler directly from the compiler. The later will use all of the suppression mechanism and should always generate correct code. We don't yet have assembly syntax for the suppressions, so passing this directly to the assembler w/a raw .s file may result in broken code. Use at your own risk. Also note that this isn't the wiring for the clang option. I think the most recent review for that is D72227, but I've lost track, so that might be off. Differential Revision: https://reviews.llvm.org/D72738
* [Win64] Handle FP arguments more gracefully under -mno-sseReid Kleckner2020-01-142-22/+31
| | | | | | | | | | | | | | | | | Pass small FP values in GPRs or stack memory according the the normal convention. This is what gcc -mno-sse does on Win64. I adjusted the conditions under which we emit an error to check if the argument or return value would be passed in an XMM register when SSE is disabled. This has a side effect of no longer emitting an error for FP arguments marked 'inreg' when targetting x86 with SSE disabled. Our calling convention logic was already assigning it to FP0/FP1, and then we emitted this error. That seems unnecessary, we can ignore 'inreg' and compile it without SSE. Reviewers: jyknight, aemerson Differential Revision: https://reviews.llvm.org/D70465
* [X86] Drop an unneeded FIXME. NFCCraig Topper2020-01-141-1/+0
| | | | The extload on X87 is free.
* [X86] Swap the 0 and the fudge factor in the constant pool for the 32-bit ↵Craig Topper2020-01-141-4/+4
| | | | | | | | | | | | mode i64->f32/f64/f80 uint_to_fp algorithm. This allows us to generate better code for selecting the fixup to load. Previously when the sign was set we had to load offset 0. And when it was clear we had to load offset 4. This required a testl, setns, zero extend, and finally a mul by 4. By switching the offsets we can just shift the sign bit into the lsb and multiply it by 4.
* [X86] Directly emit a BROADCAST_LOAD from constant pool in ↵Craig Topper2020-01-141-2/+12
| | | | | | | | lowerUINT_TO_FP_vXi32 to avoid double loads seen in D71971 By directly emitting the constants as a constant pool load we seem to avoid the build_vector/extract_subvector combines that resulted in the duplicate loads we had before. Differential Revision: https://reviews.llvm.org/D72307
* [X86] Copy the nofpexcept flag when folding a load into an instruction using ↵Craig Topper2020-01-131-0/+4
| | | | the load folding tables./
* [X86][Disassembler] Fix a bug when disassembling an empty stringFangrui Song2020-01-131-1/+3
| | | | | | | | | | | | | readPrefixes() assumes insn->bytes is non-empty. The code path is not exercised in llvm-mc because llvm-mc does not feed empty input to MCDisassembler::getInstruction(). This bug is uncovered by a5994c789a2982a770254ae1607b5b4cb641f73c. An empty string did not crash before because the deleted regionReader() allowed UINT64_C(-1) as insn->readerCursor. Bytes.size() <= Address -> R->Base 0 <= UINT64_C(-1) - UINT32_C(-1)
* [X86] Fix MSVC "truncation from 'int' to 'bool'" warning. NFCI.Simon Pilgrim2020-01-131-2/+2
|
* [X86] Use SDNPOptInGlue instead of SDNPInGlue on a couple SDNodes.Craig Topper2020-01-121-2/+2
| | | | | At least one of these is used without a Glue. This doesn't seem to change the X86GenDAGISel.inc output so maybe it doesn't matter?
* [X86][AVX] Use lowerShuffleAsLanePermuteAndSHUFP to lower binary v4f64 shuffles.Simon Pilgrim2020-01-121-0/+12
| | | | | | Only perform this if we are shuffling lower and upper lane elements across the lanes (otherwise splitting to lower xmm shuffles would be better). This is a regression if we shuffle build_vectors due to getVectorShuffle canonicalizing 'blend of splat' build vectors, for now I've set this not to shuffle build_vector nodes at all to avoid this.
* [X86][AVX] lowerShuffleAsLanePermuteAndSHUFP - only set the demanded ↵Simon Pilgrim2020-01-121-2/+1
| | | | | | elements of the lane mask. Fixes an cyclic dependency issue with an upcoming patch where getVectorShuffle canonicalizes masks with splat build vector sources.
* [X86][Disassembler] Merge X86DisassemblerDecoder.cpp into ↵Fangrui Song2020-01-124-1868/+1569
| | | | X86Disassembler.cpp and refactor
* [X86][Disassembler] SimplifyFangrui Song2020-01-123-45/+7
|
* [X86] Don't call LowerSETCC from LowerSELECT for ↵Craig Topper2020-01-111-3/+1
| | | | | | | | | | | STRICT_FSETCC/STRICT_FSETCCS nodes. This causes the STRICT_FSETCC/STRICT_FSETCCS nodes to lowered early while lowering SELECT, but the output chain doesn't get connected. Then we visit the node again when it is its turn because we haven't replaced the use of the chain result. In the case of the fp128 libcall lowering, after D72341 this will cause the libcall to be emitted twice.
* [TargetLowering][X86] Connect the chain from STRICT_FSETCC in ↵Craig Topper2020-01-111-2/+4
| | | | TargetLowering::expandFP_TO_UINT and X86TargetLowering::FP_TO_INTHelper.
* [X86][Disassembler] Optimize argument passing and immediate readingFangrui Song2020-01-113-74/+41
|
* [Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction()Fangrui Song2020-01-111-2/+1
| | | | | | | | | | The argument is llvm::null() everywhere except llvm::errs() in llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds. If we ever have the needs to add verbose log to disassemblers, we can record log with a member function, instead of passing it around as an argument.
* [X86][Disassembler] Replace custom logger with LLVM_DEBUGFangrui Song2020-01-113-56/+14
| | | | | | | llvm-objdump -d on clang is decreased from 7.8s to 7.4s. The improvement is likely due to the elimination of logger setup and dbgprintf(), which has a large overhead.
* [X86][Disassembler] Simplify and optimize reader functionsFangrui Song2020-01-113-180/+101
| | | | llvm-objdump -d on clang is decreased from 8.2s to 7.8s.
* [X86] Turn FP_ROUND/STRICT_FP_ROUND into X86ISD::VFPROUND/STRICT_VFPROUND ↵Craig Topper2020-01-113-67/+4
| | | | during PreprocessISelDAG to remove some duplicate isel patterns.
* [X86] Adjust nop emission by compiler to consider target decode limitationsPhilip Reames2020-01-111-0/+17
| | | | | | The primary motivation of this change is to bring the code more closely in sync behavior wise with the assembler's version of nop emission. I'd like to eventually factor them into one, but that's hard to do when one has features the other doesn't. The longest encodeable nop on x86 is 15 bytes, but many processors - for instance all intel chips - can't decode the 15 byte form efficiently. On those processors, it's better to use either a 10 byte or 11 byte sequence depending.
* [X86AsmBackend] Move static function before sole use [NFC]Philip Reames2020-01-111-34/+34
|
* [X86AsmBackend] Be consistent about placing definitions out of line [NFC]Philip Reames2020-01-111-49/+57
|
* [X86] Fix outdated commentSimon Pilgrim2020-01-111-2/+1
| | | | The generic saturated math opcodes are no longer widened inside X86TargetLowering
* [X86][AVX] Add lowerShuffleAsLanePermuteAndSHUFP loweringSimon Pilgrim2020-01-111-0/+40
| | | | | | | | Add initial support for lowering v4f64 shuffles to SHUFPD(VPERM2F128(V1, V2), VPERM2F128(V1, V2)), eventually this could be used for v8f32 (and maybe v8f64/v16f32) but I'm being conservative for the initial implementation as only v4f64 can always succeed. This currently is only called from lowerShuffleAsLanePermuteAndShuffle so only gets used for unary shuffles, and we limit this to cases where we use upper elements as otherwise concating 2 xmm shuffles is probably the better case. Helps with poor shuffles mentioned in D66004.
* [X86] Remove dead code from X86DAGToDAGISel::Select that is no longer needed ↵Craig Topper2020-01-111-28/+0
| | | | now that we don't mutate strict fp nodes. NFC
* [X86] Simplify code by removing an unreachable condition. NFCICraig Topper2020-01-101-12/+2
| | | | | | For X87<->SSE conversions, the SSE type is always smaller than the X87 type. So we can always use the smallest type for the memory type.
* [X86] Preserve fpexcept property when turning strict_fp_extend and ↵Craig Topper2020-01-102-4/+37
| | | | | | | | | | | strict_fp_round into stack operations. We use the stack for X87 fp_round and for moving from SSE f32/f64 to X87 f64/f80. Or from X87 f64/f80 to SSE f32/f64. Note for the SSE<->X87 conversions the conversion always happens in the X87 domain. The load/store ops in the X87 instructions are able to signal exceptions.
* [X86][Disassembler] Simplify readPrefixesFangrui Song2020-01-101-43/+25
|
* [X86] Use ReplaceAllUsesWith instead of ReplaceAllUsesOfValueWith to ↵Craig Topper2020-01-101-12/+2
| | | | simplify some code. NFCI
* [X86] Support function attribute "patchable-function-entry"Fangrui Song2020-01-101-3/+15
| | | | | | | For x86-64, we diverge from GCC -fpatchable-function-entry in that we emit multi-byte NOPs. Differential Revision: https://reviews.llvm.org/D72220
* [X86][AVX] lowerShuffleAsLanePermuteAndShuffle - consistently normalize ↵Simon Pilgrim2020-01-101-2/+2
| | | | | | | | multi-input shuffle elements We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths. Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.
* [NFC] Style cleanupShengchen Kan2020-01-101-9/+10
|
* CodeGen: Use LLT instead of EVT in getRegisterByNameMatt Arsenault2020-01-092-2/+2
| | | | | | Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.
* [ms] [X86] Use "P" modifier on all branch-target operands in inline X86 ↵Eric Astor2020-01-094-50/+31
| | | | | | | | | | | | | | | | | | | assembly. Summary: Extend D71677 to apply to all branch-target operands, rather than special-casing call instructions. Also add a regression test for llvm.org/PR44272, since this finishes fixing it. Reviewers: thakis, rnk Reviewed By: thakis Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72417
* [X86] AMD Znver2 (Rome) Scheduler enablementGanesh Gopalasubramanian2020-01-103-2/+1551
| | | | | | | | | | | | The patch gives out the details of the znver2 scheduler model. There are few improvements with respect to execution units, latencies and throughput when compared with znver1. The tests that were present for znver1 for llvm-mca tool were replicated. The latencies, execution units, timeline and throughput information are updated for znver2. Reviewers: craig.topper, Simon Pilgrim Differential Revision: https://reviews.llvm.org/D66088
* [X86] Remove EFLAGS from live-in lists in X86FlagsCopyLowering.Jonas Paulsson2020-01-081-0/+3
| | | | | | | | | | | When EFLAGS is no longer live into a basic block, remove it from the live-in list. Fixes https://bugs.llvm.org/show_bug.cgi?id=44462. Review: Craig Topper Differential Revision: https://reviews.llvm.org/D71375
* [X86] Keep cl::opts at top of file [NFC]Philip Reames2020-01-081-34/+34
|
* [X86] Custom type legalize v4i64->v4f32 uint_to_fp on sse4.1 targets in ↵Craig Topper2020-01-081-9/+11
| | | | | | | | | | 64-bit mode For v4i64->v4f32 uint_to_fp on pre-avx targets where v4i64 isn't legal we create to v2i64->v2f32 uint_to_fp that need to be shuffled together. Our codegen for v2i64->v2f32 involves detecting if the number is larger than (2^31 - 1), if so we do a special divison by 2 so we can do a signed conversion which we need to scalarize, then do a multiply by 2 at the end if we divided earlier. When v4i64 isn't legal we need to split the checking for a larger number and dividing by 2 into two v2i64 vectors. The scalar part can extract the 4 i64 values from those 4 splits. But we can reassemble the 4 scalar f32 results directly into a single v432 vector. Then we just need to combine the fixup indications from the 2 halves and we can do the final multiply by 2 fixup on all 4 values if needed at once using a single v4f32 blend and v4f32 fadd. Differential Revision: https://reviews.llvm.org/D72368
* [X86] Add isel patterns for bitcasting between v32i1/v64i1 and float/double.Craig Topper2020-01-081-0/+11
| | | | | | We have to do an intermediate jump to a GPR to make the cast. Fixes PR43750.
* [BranchAlign] Compiler support for suppressing branch alignPhilip Reames2020-01-082-2/+50
| | | | | | | | | | | | As discussed heavily in the original review (D70157), there's a need for the compiler to be able to selective suppress padding (either nop or prefix) to respect assumptions about the meaning of labels and instructions in generated code. Rather than wait for syntax to be finalized - which appears to be a very slow process - this patch focuses on the compiler use case and *only* worries about the integrated assembler. To my knowledge, this covers all cases mentioned to date for clang/JIT support. For testing purposes, I wired it up so that if the integrated assembler was using autopadding for branch alignment (e.g. enabled at command line) then the textual assembly output would contain a comment for each location where padding was enabled or disabled. This seemed like the least painful choice overall. Note that the result of this patch effective disables the jcc errata mitigation for many constructs (statepoints, implicit null checks, xray, etc...) which is non ideal. It is at least *correct* and should allow us to enable the mitigation for the compiler. Once that's done, and a few other items are worked through, we probably want to come back to this an explore a bundling based approach instead so that we can pad instructions while keeping labels in the right place. Differential Revision: https://reviews.llvm.org/D72303
* [X86] Adding fp128 support for strict fcmpWang, Pengfei2020-01-081-5/+5
| | | | | | | | | | | | Summary: Adding fp128 support for strict fcmp Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71897
* [X86] Enable v2i64->v2f32 uint_to_fp code in ReplaceNodeResults on SSE4.1 targetCraig Topper2020-01-071-3/+1
| | | | | | Now that we generate decent code for (v2i64 (setlt zero, X)) on pre-sse4.2 targets I think we can use this now. Differential Revision: https://reviews.llvm.org/D72354
* [X86] Improve lowering of (v2i64 (setgt X, -1)) on pre-SSE2 targets. Enable ↵Craig Topper2020-01-071-3/+14
| | | | | | | | v2i64 in foldVectorXorShiftIntoCmp. Similar to D72302 but for the canonical form for the opposite case. I've changed foldVectorXorShiftIntoCmp to form a target independent setcc node instead of PCMPGT now and enabled its for v2i64 on pre-SSE4.2 targets. The setcc should eventually get lowered to PCMPGT or the new v2i64 sequence. Differential Revision: https://reviews.llvm.org/D72318
OpenPOWER on IntegriCloud