| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 289492
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D27582
llvm-svn: 289486
|
| |
|
|
|
|
|
|
|
| |
Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX.
This patch fixes pr31144.
Differential Revision: https://reviews.llvm.org/D27287
llvm-svn: 289473
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
DWARF specifies that "line 0" really means "no appropriate source
location" in the line table. By default, use this for branch targets
and some other cases that have no specified source location, to
prevent inheriting unfortunate line numbers from physically preceding
instructions (which might be from completely unrelated source).
Updated patch allows enabling or suppressing this behavior for all
unspecified source locations.
Differential Revision: http://reviews.llvm.org/D24180
llvm-svn: 289468
|
| |
|
|
| |
llvm-svn: 289443
|
| |
|
|
| |
llvm-svn: 289438
|
| |
|
|
|
|
| |
Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift.
llvm-svn: 289429
|
| |
|
|
|
|
|
|
| |
PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far.
Differential Revision: https://reviews.llvm.org/D27657
llvm-svn: 289426
|
| |
|
|
|
|
| |
Pre-commit as discussed on D27657
llvm-svn: 289425
|
| |
|
|
|
|
|
|
| |
X86ISD::VZEXT_LOAD opcode.
Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics.
llvm-svn: 289424
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits.
As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded.
Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too.
Reviewers: spatel, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27611
llvm-svn: 289419
|
| |
|
|
| |
llvm-svn: 289407
|
| |
|
|
|
|
|
|
|
|
|
| |
has multiple uses (for v4i64 and v4f64).
When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded.
A fallback pattern is added to catch these cases and provide another solution.
Differential Revision: https://reviews.llvm.org/D27661
llvm-svn: 289404
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.
Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:
- That by the time an struct access tuple `(base-type, offset)` is
"reduced" to a scalar base type, the offset is `0`. For instance, in
C++ you can't start from, say `("struct-a", 16)`, and end up with
`("int", 4)` -- by the time the base type is `"int"`, the offset
better be zero. In particular, a variant of this invariant is needed
for `llvm::getMostGenericTBAA` to be correct.
- That there are no cycles in a struct path.
- That struct type nodes have their offsets listed in an ascending
order.
- That when generating the struct access path, you eventually reach the
access type listed in the tbaa tag node.
Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D26438
llvm-svn: 289402
|
| |
|
|
| |
llvm-svn: 289400
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have found that -- when the selected subarchitecture has a scheduling model
and we are not optimizing for size -- the machine-instruction combiner uses a
too-simple algorithm to compute the cost of one of the two alternatives [before
and after running a combining pass on a section of code], and therefor it throws
away the combination results too often.
This fix has the potential to help any ISA with the potential to combine
instructions and for which at least one subarchitecture has a scheduling model.
As of now, this is only known to definitely affect AArch64 subarchitectures with
a scheduling model.
Regression tested on AMD64/GNU-Linux, new test case tested to fail on an
unpatched compiler and pass on a patched compiler.
Patch by Abe Skolnik and Sebastian Pop.
llvm-svn: 289399
|
| |
|
|
| |
llvm-svn: 289398
|
| |
|
|
|
|
|
|
|
| |
Regcall calling convention passes mask types arguments in x86 GPR registers.
The review includes the changes required in order to support v32i1, v16i1 and v8i1.
Differential Revision: https://reviews.llvm.org/D27148
llvm-svn: 289383
|
| |
|
|
|
|
| |
This adds CodeGen tests for the AVR C calling convention.
llvm-svn: 289369
|
| |
|
|
| |
llvm-svn: 289362
|
| |
|
|
|
|
| |
being able to constant fold them in InstCombineCalls like we do for 128/256-bit.
llvm-svn: 289350
|
| |
|
|
|
|
| |
able to constant fold it in InstCombineCalls like we do for 128/256-bit.
llvm-svn: 289344
|
| |
|
|
| |
llvm-svn: 289342
|
| |
|
|
|
|
| |
select around the unmasked avx1 intrinsics.
llvm-svn: 289340
|
| |
|
|
|
|
| |
vcvttps2uqq when AVX512DQ and AVX512VL are available.
llvm-svn: 289335
|
| |
|
|
|
|
|
|
| |
from 'large element' scalar/vector to 'small element' vector.
Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types.
llvm-svn: 289329
|
| |
|
|
| |
llvm-svn: 289327
|
| |
|
|
|
|
|
|
|
|
| |
There was a bug where we would hit an assertion if 'Q' was used as a
constraint.
I also removed hardcoded register names to prefer regexes so the tests
don't break when the register allocator changes.
llvm-svn: 289325
|
| |
|
|
|
|
| |
This seems to have caused failures on the buildbot.
llvm-svn: 289324
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This gets rid of the hardcoded 'r0' that was used previously.
Reviewers: asl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27567
llvm-svn: 289322
|
| |
|
|
|
|
| |
This would previously trigger an assertion error in AVRISelDAGToDAG.
llvm-svn: 289321
|
| |
|
|
|
|
|
| |
The users of the addrspacecast were having their types incorrectly
changed, producing invalid bitcasts between address spaces.
llvm-svn: 289307
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 32-bit instructions with 32-bit input immediate behavior
are used to materialize 16-bit constants in 32-bit registers
for 16-bit instructions, determining the legality based
on the size is incorrect. Change operands to have the size
specified in the type.
Also adds a workaround for a disassembler bug that
produces an immediate MCOperand for an operand that
is supposed to be OPERAND_REGISTER.
The assembler appears to accept out of bounds immediates and
truncates them, but this seems to be an issue for 32-bit
already.
llvm-svn: 289306
|
| |
|
|
|
|
|
| |
Some of the immediates need to be printed differently
eventually.
llvm-svn: 289291
|
| |
|
|
| |
llvm-svn: 289279
|
| |
|
|
| |
llvm-svn: 289272
|
| |
|
|
| |
llvm-svn: 289265
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This frees 2 additional scalar registers.
These are results from all of my 3 patches combined:
Polaris:
Spilled SGPRs: 2231 -> 1517 (-32.00 %)
Tonga:
Spilled SGPRs: 3829 -> 2608 (-31.89 %)
Spilled VGPRs: 100 -> 84 (-16.00 %)
Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader
limited to 64 VGPRs.
Reviewers: tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27151
llvm-svn: 289262
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This frees 2 scalar registers.
Reviewers: tstellarAMD
Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27150
llvm-svn: 289261
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There is no point in setting SGPRS=104, because VI allocates SGPRs
in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs
for general purposes.
Reviewers: tstellarAMD
Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27149
llvm-svn: 289260
|
| |
|
|
|
|
|
|
|
|
|
| |
test/CodeGen/MIR should contain tests that intent to test the MIR
printing or parsing. Tests that test something else should be in
test/CodeGen/TargetName even when they are written in .mir.
As a rule of thumb, only tests using "llc -run-pass none" should be in
test/CodeGen/MIR.
llvm-svn: 289254
|
| |
|
|
|
|
| |
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.
llvm-svn: 289232
|
| |
|
|
| |
llvm-svn: 289231
|
| |
|
|
|
| |
Revision: https://reviews.llvm.org/D26547
llvm-svn: 289227
|
| |
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion
llvm-svn: 289226
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 289221
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These instructions technically do read from memory, but the memory
is considered to be out of bounds for normal load/store instructions.
shader-db stats:
SGPRS: 1416075 -> 1413323 (-0.19 %)
VGPRS: 867413 -> 863935 (-0.40 %)
Spilled SGPRs: 1409 -> 1354 (-3.90 %)
Spilled VGPRs: 63 -> 63 (0.00 %)
Private memory VGPRs: 880 -> 880 (0.00 %)
Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread
Code Size: 37889052 -> 37897340 (0.02 %) bytes
LDS: 2147 -> 2147 (0.00 %) blocks
Max Waves: 279243 -> 280369 (0.40 %)
Wait states: 0 -> 0 (0.00 %)
Reviewers: nhaehnle, mareko, arsenm
Subscribers: kzhuravl, wdng, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27593
llvm-svn: 289219
|
| |
|
|
|
|
| |
This fixes the build.
llvm-svn: 289201
|
| |
|
|
|
|
|
|
|
|
| |
Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together.
We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them.
Differential Revision: https://reviews.llvm.org/D27129
llvm-svn: 289200
|
| |
|
|
|
|
|
|
| |
This reverts commit r288916 as it is currently causing a crasher in
Halide. Reproducer on llvm.org/PR31323. While it might be that halide is
generating invalid IR, llc shouldn't crash.
llvm-svn: 289194
|