| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Also complete the set of related operations.
llvm-svn: 354480
|
| |
|
|
| |
llvm-svn: 354477
|
| |
|
|
| |
llvm-svn: 354473
|
| |
|
|
|
|
|
|
|
|
| |
DAG combiner combines two shifts into shift + and with bitmask.
Avoid such combines for vectors since leaving two vector shifts
as they are produces better end results.
Differential Revision: https://reviews.llvm.org/D58225
llvm-svn: 354461
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Inc and Dec were at one point slow on Intel CPUs due to their tendency to cause partial flag stalls on P6 derived CPU cores. This is because these instructions are defined to preserve the carry flag. This partial flag stall issue persisted until Sandy Bridge when flag merging was changed to be handled as a data dependency instead of as a stall until retirement. Sandy Bridge and later CPUs rename the C flag separately from OSPAZ so there is no flag merge needed on INC/DEC to preserve the C flag.
Given these improvements I don't know why INC/DEC was ever considered slow on Sandy Bridge. If anything they should have been disabled on the earlier CPUs instead.
Note after this patch, INC/DEC are still considered slow on Silvermont, Goldmont, Knights Landing and our generic "x86-64" CPU.
Reviewers: spatel, RKSimon, chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D58412
llvm-svn: 354436
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
horizontal X86 instructions (add, sub)"
As this has broken the lto bootstrap build for 3 days and is
showing a significant regression on the Dither_benchmark results (from
the LLVM benchmark suite) -- specifically, on the
BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and
BENCHMARK_FLOYD_DITHER_512; the others are unchanged. These have
regressed by about 28% on Skylake, 34% on Haswell, and over 40% on
Sandybridge.
This reverts commit r353923.
llvm-svn: 354434
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Those pseudo-instructions are making load/store instructions able to
load/store from/to a symbol, and its always using PC-relative addressing
to generating a symbol address.
Reviewers: asb, apazos, rogfer01, jrtc27
Differential Revision: https://reviews.llvm.org/D50496
llvm-svn: 354430
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D58364
llvm-svn: 354427
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Make `ATOMIC_I`, `ATOMIC_NRI`, `AtomicLoad`, `AtomicStore` classes and
make other operations inherit from them
- Factor the common opcode prefix '0xfe' out from the opcodes into the
common class
- Reorder instructions in the order of increasing opcodes
Reviewers: tlively
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58338
llvm-svn: 354421
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fixed a bug in the routine in AsmParser that determines whether the
current instruction is a load or a store. Atomic instructions' prefixes
are not `atomic_` but `atomic.`, and all atomic instructions are also
memory instructions. Also fixed the printing format of atomic
instructions to match other memory instructions and added encoding tests
for atomic instructions.
Reviewers: aardappel, tlively
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58337
llvm-svn: 354419
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58414
llvm-svn: 354416
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Rename MemoryIndex to InitFlags and implement logic for determining
data segment layout in ObjectYAML and MC. Also adds a "passive" flag
for the .section assembler directive although this cannot be assembled
yet because the assembler does not support data sections.
Reviewers: sbc100, aardappel, aheejin, dschuff
Subscribers: jgravelle-google, hiraditya, sunfish, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57938
llvm-svn: 354397
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
clobbering EFLAGS to prevent mis-scheduling during conversion from SelectionDAG to MIR.
After r354178, these instruction expand to a sequence that uses an OR instruction. That OR clobbers EFLAGS so we need to state that to avoid accidentally using the clobbered flags.
Our tests show the bug, but I didn't notice because the SETcc instructions didn't move after r354178 since it used to be safe to do the fp->int conversion first.
We should probably convert this whole sequence to SelectionDAG instead of a custom inserter to avoid mistakes like this.
Fixes PR40779
llvm-svn: 354395
|
| |
|
|
| |
llvm-svn: 354382
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
they view 512-bit vectors differently.
The use of the -mprefer-vector-width=256 command line option mixed with functions
using vector intrinsics can create situations where one function thinks 512 vectors
are legal, but another fucntion does not.
If a 512 bit vector is passed between them via a pointer, its possible ArgumentPromotion
might try to pass by value instead. This will result in type legalization for the two
functions handling the 512 bit vector differently leading to runtime failures.
Had the 512 bit vector been passed by value from clang codegen, both functions would
have been tagged with a min-legal-vector-width=512 function attribute. That would
make them be legalized the same way.
I observed this issue in 32-bit mode where a union containing a 512 bit vector was
being passed by a function that used intrinsics to one that did not. The caller
ended up passing in zmm0 and the callee tried to read it from ymm0 and ymm1.
The fix implemented here is just to consider it a mismatch if two functions
would handle 512 bit differently without looking at the types that are being
considered. This is the easist and safest fix, but it can be improved in the future.
Differential Revision: https://reviews.llvm.org/D58390
llvm-svn: 354376
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required.
With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors.
This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts.
I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case).
The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV.
Reapplied after reversion at rL353699 - AVX2 isel fix was applied at rL354358, additional test at rL354360/rL354361
Differential Revision: https://reviews.llvm.org/D57888
llvm-svn: 354363
|
| |
|
|
|
|
| |
This was the cause of the regression in D57888 - the commuted load pattern wasn't hidden by the predicate so once we enabled v4i32 blends on SSE41+ targets then isel was incorrectly matched against AVX2+ instructions.
llvm-svn: 354358
|
| |
|
|
|
|
|
|
|
|
| |
klocwork critical issues in CG files:
Patch by Xiang Zhang (xiangzhangllvm)
Differential Revision: https://reviews.llvm.org/D58363
llvm-svn: 354357
|
| |
|
|
|
|
|
|
|
|
| |
When parsing a sequence of tokens beginning with {, it will hit an assert and crash if the token afterwards is not an identifier. Instead of this, return a more verbose error as seen elsewhere in the function.
Patch by Brandon Jones (BrandonTJones)
Differential Revision: https://reviews.llvm.org/D57375
llvm-svn: 354356
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
checking for function inline compatibility.
Tuning flags don't have any effect on the available instructions so aren't a good reason to prevent inlining.
There are also some ISA flags that don't have any intrinsics our ABI requirements that we can exclude. I've put only the most basic ones like cmpxchg16b and lahfsahf. These are interesting because they aren't present in all 64-bit CPUs, but we have codegen workarounds when they aren't present.
Loosening these checks can help with scenarios where a caller has a more specific CPU than a callee. The default tuning flags on our generic 'x86-64' CPU can currently make it inline compatible with other CPUs. I've also added an example test for 'nocona' and 'prescott' where 'nocona' is just a 64-bit capable version of 'prescott' but in 32-bit mode they should be completely compatible.
I've based the implementation here of the similar code in AMDGPU.
Differential Revision: https://reviews.llvm.org/D58371
llvm-svn: 354355
|
| |
|
|
| |
llvm-svn: 354354
|
| |
|
|
| |
llvm-svn: 354348
|
| |
|
|
|
|
|
|
| |
The VBROADCAST combines and SimplifyDemandedVectorElts improvements mean that we now more consistently use shorter (128-bit) X86vzload input operands.
Follow up to D58053
llvm-svn: 354346
|
| |
|
|
| |
llvm-svn: 354345
|
| |
|
|
| |
llvm-svn: 354343
|
| |
|
|
|
|
|
|
|
|
| |
This patch adds scalar/subvector BROADCAST handling to EltsFromConsecutiveLoads.
It mainly shows codegen changes to 32-bit code which failed to handle i64 loads, although 64-bit code is also using this new path to more efficiently combine to a broadcast load.
Differential Revision: https://reviews.llvm.org/D58053
llvm-svn: 354340
|
| |
|
|
| |
llvm-svn: 354333
|
| |
|
|
|
|
| |
Same as arm mode.
llvm-svn: 354310
|
| |
|
|
|
|
| |
These should always follow the CPU string. There's no reason to control them independently.
llvm-svn: 354304
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Legalize/select llvm.ctlz.*
Add select-ctlz to show that we actually select them. Update arm64-clrsb.ll and
arm64-vclz.ll to show that we perform valid transformations in optimized builds,
and document where GISel can improve.
Differential Revision: https://reviews.llvm.org/D58155
llvm-svn: 354299
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754
https://bugs.llvm.org/show_bug.cgi?id=40487
..and those are shown in the IR test file and x86 codegen file.
Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.
This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo
formation by default. Only x86 overrides that setting for now although other targets will likely benefit
by forming usbuo too.
Differential Revision: https://reviews.llvm.org/D57789
llvm-svn: 354298
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
areMemAccessesTriviallyDisjoint in LoadStoreOptimizer pass.
Summary:
This is to fix a memory dependence bug in LoadStoreOptimizer.
Reviewers:
arsenm, rampitec
Differential Revision:
https://reviews.llvm.org/D58295
llvm-svn: 354295
|
| |
|
|
| |
llvm-svn: 354293
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to D57867 - this is a small patch with lots of test diffs.
With half-vector-width narrowing potential, using an extract + 128-bit vshufps
is a win because it replaces a 256-bit shuffle with a 128-bit shufle.
This seems like it should be a win even for targets with 'fast-variable-shuffle',
but we are intentionally deferring that to an independent change to make sure
that is true.
Differential Revision: https://reviews.llvm.org/D58181
llvm-svn: 354279
|
| |
|
|
|
|
|
|
|
|
| |
file via the stack, use the same stack slot we use for the integer conversion.
No need for a separate stack slot. The lifetimes don't overlap.
Also fix the MachinePointerInfo for the final load after the integer conversion to indicate it came from the stack slot.
llvm-svn: 354234
|
| |
|
|
|
|
|
|
|
|
| |
Generate all of the ops as i64 and let them be legalized.
No need to manually split everything. We can let the type legalizer work for us.
The test change seems to be caused by some DAG ordering issue that was previously circumventing a one use check in LowerSELECT where FP selects are turned into blends if the setcc has one use. But it was running after an integer select and the same setcc had been legalized to cmov and X86SISD::CMP. This dropped the use count of the setcc, but wasn't what was intended.
llvm-svn: 354197
|
| |
|
|
|
|
|
|
|
|
| |
hasPartialRegUpdate.
Preventing the load fold won't fix the partial register update since the
input we can fold is a GPR. So it will do nothing to prevent a false dependency
on an XMM register.
llvm-svn: 354193
|
| |
|
|
|
|
|
|
|
|
|
|
| |
mode for fp->int conversion
When we need to do an fp->int conversion using x87 instructions, we need to temporarily change the rounding mode to 0b11 and perform a store. To do this we save the old value of the fpcw to the stack, then set the fpcw to 0xc7f, do the store, then restore fpcw. But the 0xc7f value forces the exception mask bits 1. While this is what they would be in the default FP environment, as we move to support changing the FP environments, we shouldn't make this assumption.
This patch changes the code to explicitly OR 0xc00 with the old value so that only the rounding mode is changed. Unfortunately, this requires two stack temporaries instead of one. One to hold the old value and one to hold the new value. Without two stack temporaries we would need an additional GPR. We already need one to do the OR operation in. This is similar to what gcc and icc do for this operation. Though they are both better at reusing the stack temporaries when there are multiple truncates in a function(or at least in a basic block)
Differential Revision: https://reviews.llvm.org/D57788
llvm-svn: 354178
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Update Flag when generating cc output.
Fixes PR40737.
Reviewers: rnk, nickdesaulniers, craig.topper, spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58283
llvm-svn: 354163
|
| |
|
|
|
|
|
|
|
|
| |
LowerFP_TO_INT. NFCI
These checks aren't needed on the call to FP_TO_INTHelper from the type legalizer for splitting i64. We always want to use X87 FIST/FISTT to memory there.
Moving up the SSE checks will allow this routine to focus on what it cares about and makes its return semantics cleaner.
llvm-svn: 354161
|
| |
|
|
|
|
|
|
|
|
|
| |
It seems there were some problem with using a .mir test. For some reason
doing '-stop-before=codegenprepare' and then '-start-before=codegenprepare'
on the output .mir file results in the NoVRegs Property after instruction
selection.
Recommitting the same test as an .ll file instead.
llvm-svn: 354160
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
shuffle mask (PR40730)
As detailed on PR40730, we are not correctly filling in the lane shuffle mask (D53148/rL344446) - we fill in for the correct src lane but don't add it to the correct mask element, so any reference to the correct element is likely to see an UNDEF mask index.
This allows constant folding to propagate UNDEFs prior to the lane mask being (correctly) lowered to vperm2f128.
This patch fixes the issue by fully populating the lane shuffle mask - this is more than is necessary (if we only filled in the required mask elements we might be able to match other shuffle instructions - broadcasts etc.), but its the most cautious approach as this needs to be cherrypicked into the 8.0.0 release branch.
Differential Revision: https://reviews.llvm.org/D58237
llvm-svn: 354117
|
| |
|
|
|
|
|
|
| |
Add the opcode for ADDrr / t2ADDrr to the Opcode cache, as we did for
all other opcodes where the handling is otherwise the same between arm
mode and thumb2.
llvm-svn: 354115
|
| |
|
|
|
|
| |
Just like arm mode, but with different opcodes.
llvm-svn: 354113
|
| |
|
|
|
|
|
|
|
|
| |
This patch also introduces the emitAuipcInstPair helper, which is then used
for both emitLoadAddress and emitLoadLocalAddress.
Differential Revision: https://reviews.llvm.org/D55325
Patch by James Clarke.
llvm-svn: 354111
|
| |
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D55279
Patch by James Clarke.
llvm-svn: 354110
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
ConvertTruncs is used to replace a trunc for an AND mask, however
this function wasn't working as expected. By performing the change
later, we can create a wide type integer mask instead of a narrow -1
value, which could then be simply removed (incorrectly). Because we
now perform this action later, it's necessary to cache the trunc type
before we perform the promotion.
Differential Revision: https://reviews.llvm.org/D57686
llvm-svn: 354108
|
| |
|
|
|
|
| |
Also use modifiesRegister instead of looping over operands.
llvm-svn: 354098
|
| |
|
|
|
|
|
|
|
| |
This reverts commit aa0b77d3395dc6ab91647138139c1a15a3aa088d.
This fails to pass the machine verifier:
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/13579/
llvm-svn: 354096
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D57811
llvm-svn: 354085
|