| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op.
This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately.
Differential Revision: https://reviews.llvm.org/D61782
llvm-svn: 360594
|
|
|
|
|
|
| |
Test case will be included in a followup - its being used but its tricky to show a case that isn't caught at a later stage anyway.
llvm-svn: 360588
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bitcast handling"
I've included a new fix in X86RegisterInfo to prevent PR41619 without
reintroducing r359392. We might be able to improve that in the base class
implementation of shouldRewriteCopySrc somehow. But this hopefully enables
forward progress on SimplifyDemandedBits improvements for now.
Original commit message:
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.
The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb
but it caused a lot of noise on other targets - some improvements, some regressions.
The X86 changes are all definite wins.
llvm-svn: 360552
|
|
|
|
|
|
| |
Removes unnecessary vzeroupper noted in D61806
llvm-svn: 360543
|
|
|
|
|
|
|
|
|
|
| |
SimplifyDemandedVectorElts as well.
See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits.
This helps us with target shuffles and hops in particular.
llvm-svn: 360535
|
|
|
|
|
|
|
|
| |
The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference).
I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1.
llvm-svn: 360528
|
|
|
|
|
|
| |
Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts
llvm-svn: 360526
|
|
|
|
|
|
| |
Matches what we do in other functions and fixes scan-build warning about uninitialized NewOpcode variable.
llvm-svn: 360525
|
|
|
|
|
|
|
|
| |
to fix a machine verifier error after adding test cases.
Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it.
llvm-svn: 360524
|
|
|
|
| |
llvm-svn: 360523
|
|
|
|
| |
llvm-svn: 360522
|
|
|
|
|
|
|
| |
No need to select the register class based on type and features. It should
already be setup by X86ISelLowering.
llvm-svn: 360513
|
|
|
|
|
|
|
|
| |
We were checking for SSE4.1 for FP types, but not integer 128-bit types.
Fixes PR41837.
llvm-svn: 360512
|
|
|
|
|
|
|
|
| |
hardening. Fix an additional issue found by the test.
This test covers the fix from r360475 as well.
llvm-svn: 360511
|
|
|
|
|
|
|
|
|
| |
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360484
|
|
|
|
|
|
|
|
|
|
| |
As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we *use* the redzone (via a particularly optimization?), but there's no common way to check whether the function *has* a red zone.
I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated.
Differential Revision: https://reviews.llvm.org/D61799
llvm-svn: 360479
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
base.
After D58632, we can create idempotent atomic operations to the top of stack.
This confused speculative load hardening because it thinks accesses should have
virtual register base except for the cases it already excluded.
This commit adds a new exclusion for this case. I'll try to reduce a test case
for this, but this fix was verified to work by the reporter. This should avoid
needing to revert D58632.
llvm-svn: 360475
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61789
llvm-svn: 360471
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969
The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.
This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D61680
llvm-svn: 360436
|
|
|
|
|
|
|
|
| |
If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free).
Fixes some of the regressions noted in D61782
llvm-svn: 360435
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61735
llvm-svn: 360396
|
|
|
|
|
|
|
|
| |
The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local.
Differential Revision: https://reviews.llvm.org/D58632
llvm-svn: 360393
|
|
|
|
|
|
|
|
|
|
| |
As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2*shuffle+add/sub - so if we can reduce 2*shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count.
This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2*shuffle+add+shuffle to hadd+2*shuffle - I've opened PR41813 to cover this.
Differential Revision: https://reviews.llvm.org/D61308
llvm-svn: 360360
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've started this cleanup more several times now, but got sidetracked
elsewhere, e.g. by llvm-exegesis problems. Not this time, finally!
This is mainly cleaning up the inverse throughput values,
and a few latencies/uops, based on the llvm-exegesis measured values.
Though this is not complete by any means,
there's certainly more cleanup to be done.
The performance numbers (i've only checked by RawSpeed benchmark) aren't
really surprising - overall this *slightly* (< -1%) improves perf.
llvm-svn: 360341
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code was never covered by tests, in PR41786 it was pointed out that
the deletion part doesn't work, and in a full Chrome build I was never
able to hit the code path that looks through copies. It seems the
situation it's supposed to handle doesn't actually come up in practice.
Delete it to simplify the code.
Differential revision: https://reviews.llvm.org/D61671
llvm-svn: 360320
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A COFF stub indirects the reference to a symbol through memory. A
.refptr.$sym global variable pointer is created to refer to $sym.
Typically mingw uses these for external global variable declarations,
but we can use them for weak function declarations as well.
Updates the dso_local classification to add a special case for
extern_weak symbols on COFF in both clang and LLVM.
Fixes PR37598
Reviewers: smeenai, mstorsjo
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D61615
llvm-svn: 360207
|
|
|
|
|
|
|
| |
asked not be greater than preferred vector width for the vectorizer.
Test for both 128 and 256 with a skylake architecture.
llvm-svn: 360183
|
|
|
|
| |
llvm-svn: 360157
|
|
|
|
|
|
| |
Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining.
llvm-svn: 360134
|
|
|
|
| |
llvm-svn: 360133
|
|
|
|
|
|
| |
Fixes cppcheck warnings.
llvm-svn: 360131
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to support x/y/zmm16-31 when the type is mismatched.
The FR32/FR64/VR128/VR256 register classes don't contain the upper 16 registers. For most cases we use the default implementation which will find any register class that contains the register in question if the VT is legal for the register class. But if the VT is i32 or i64, we won't find a matching register class and will instead up in the code modified in this patch.
If the requested register is x/y/zmm16-31 we weren't returning a register class that contains those registers and will hit an assertion in the caller.
To fix this, I've changed to use the extended register class instead. I don't believe we need a subtarget check to see if avx512 is enabled. The default implementation just pick whatever register class it finds first. I checked and we currently pick FR32X for XMM0 with an f32 type using the default implementation regardless of whether avx512 is enabled. So I assume its it is ok to do the same for i32.
Differential Revision: https://reviews.llvm.org/D61457
llvm-svn: 360102
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.
Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.
After this patch we should support the d/q for parsing, but not print it when its unneeded.
llvm-svn: 360085
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"
Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.
Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.
Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.
This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.
llvm-svn: 360066
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is a prerequisite to RFC http://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61593
Fix typo.
Turn this patch into an NFC.
Addressing comments
llvm-svn: 360050
|
|
|
|
|
|
| |
Initialize all member variables in X86ATTInstPrinter and X86DAGToDAGISel constructors to fix cppcheck warning.
llvm-svn: 360047
|
|
|
|
|
|
| |
findBroadcastedOp should always initialize the value if it returns true but static-analyzer isn't great at recognising this.
llvm-svn: 360037
|
|
|
|
|
|
| |
scan-build was reporting that CommutableOpIdx1 never used its original initialized value - move it down to where its first used to make the real initialization more obvious (and matches the comment that's there).
llvm-svn: 360028
|
|
|
|
|
|
|
|
| |
indices. NFCI.
Fixes cppcheck local shadow warning as well.
llvm-svn: 360027
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data.
VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data.
VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision.
For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
Author: LiuTianle
Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel
Reviewed By: craig.topper
Subscribers: kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60550
llvm-svn: 360017
|
|
|
|
|
|
| |
Avoids a scan-build "uninitialized value" warning in X86FastISel::X86SelectFPExtOrFPTrunc
llvm-svn: 360001
|
|
|
|
| |
llvm-svn: 359999
|
|
|
|
|
|
| |
Avoids a cppcheck "Local variable name shadows outer variable" warning.
llvm-svn: 359991
|
|
|
|
|
|
| |
Fixes cppcheck warning.
llvm-svn: 359981
|
|
|
|
|
|
| |
warnings. NFCI.
llvm-svn: 359976
|
|
|
|
|
|
|
|
|
|
|
|
| |
vfpclassps/pd and on memory forms in intel syntax
The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned.
But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas.
The printing code will still only use the suffix for the memory form where it is needed.
llvm-svn: 359903
|
|
|
|
|
|
| |
Merge the if() tests for the various HADD/SUB + Subtarget tests
llvm-svn: 359901
|
|
|
|
| |
llvm-svn: 359889
|
|
|
|
|
|
| |
Prefer ((X & Y) ? A : B) to (X & Y ? A : B)
llvm-svn: 359884
|
|
|
|
|
|
| |
Leftover from before we had the extract128BitVector helpers.
llvm-svn: 359871
|