| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 309104
|
| |
|
|
| |
llvm-svn: 309103
|
| |
|
|
| |
llvm-svn: 309102
|
| |
|
|
|
|
| |
A G_GLOBAL_VALUE is basically a pointer, so it should live in the GPR.
llvm-svn: 309101
|
| |
|
|
|
|
| |
Cleaned up triple settings, added 32-bit/64-bit targets where useful, added broadcast comments
llvm-svn: 309100
|
| |
|
|
| |
llvm-svn: 309099
|
| |
|
|
| |
llvm-svn: 309098
|
| |
|
|
|
|
| |
Remove unused KNL checks and triple settings, added broadcast comments
llvm-svn: 309097
|
| |
|
|
|
|
| |
Tidied up triples and checks.
llvm-svn: 309095
|
| |
|
|
| |
llvm-svn: 309093
|
| |
|
|
| |
llvm-svn: 309090
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch expands the support of lowerInterleavedStore to 32x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low).
The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 32 chars:
c0, c1, , c31
m0, m1, , m31
y0, y1, , y31
k0, k1, ., k31
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Reviewers:
dorit
Farhana
RKSimon
guyblank
DavidKreitzer
Differential Revision: https://reviews.llvm.org/D34601
llvm-svn: 309086
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
be non-temporal
Summary: The aligned load predicates don't suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35712
llvm-svn: 309079
|
| |
|
|
|
|
|
|
| |
The existing tests only tested how a va_start is lowered.
Differential Revision: https://reviews.llvm.org/D35540
llvm-svn: 309015
|
| |
|
|
|
|
|
|
|
| |
This patch just adds printing of CR bit registers in a more human-readable
form akin to that used by the GNU binutils.
Differential Revision: https://reviews.llvm.org/D31494
llvm-svn: 309001
|
| |
|
|
|
|
|
| |
This is just a recommit since the issue that the commit exposed is now
resolved.
llvm-svn: 308995
|
| |
|
|
|
|
|
|
|
|
|
|
| |
D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914).
Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os).
This patch should be considered for the 5.0.0 release branch as well
Differential Revision: https://reviews.llvm.org/D35830
llvm-svn: 308986
|
| |
|
|
| |
llvm-svn: 308981
|
| |
|
|
| |
llvm-svn: 308980
|
| |
|
|
| |
llvm-svn: 308963
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Do not assume little endian architecture in DAGCombiner::visitTRUNCATE and DAGCombiner::visitEXTRACT_VECTOR_ELT.
PR33682
Reviewers: hfinkel, sdardis, RKSimon
Reviewed By: sdardis, RKSimon
Subscribers: uabelho, RKSimon, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D34990
llvm-svn: 308960
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Create a dummy 8 byte fixed object for the unused slot below the first
stored vararg.
Alternative ideas tested but skipped: One could try to align the whole
fixed object to 16, but I haven't found how to add an offset to the stack
frame used in LowerWin64_VASTART.
If only the size of the fixed stack object size is padded but not the offset, via
MFI.CreateFixedObject(alignTo(GPRSaveSize, 16), -(int)GPRSaveSize, false),
PrologEpilogInserter crashes due to "Attempted to reset backwards range!".
This fixes misconceptions about where registers are spilled, since
AArch64FrameLowering.cpp assumes the offset from fixed objects is
aligned to 16 bytes (and the Win64 case there already manually aligns
the offset to 16 bytes).
This fixes cases where local stack allocations could overwrite callee
saved registers on the stack.
Differential Revision: https://reviews.llvm.org/D35720
llvm-svn: 308950
|
| |
|
|
| |
llvm-svn: 308914
|
| |
|
|
|
|
| |
This patch doesn't modifay any non test file.
llvm-svn: 308909
|
| |
|
|
|
|
|
|
|
|
|
| |
instructions that were missing.
patterns were missed by D33188. Adding for completion.
+Updating test.
Differential Revesion: https://reviews.llvm.org/D35179
llvm-svn: 308868
|
| |
|
|
|
|
|
|
| |
I have a much better way of running integration tests now.
https://github.com/dylanmckay/avr-test-suite
llvm-svn: 308857
|
| |
|
|
|
|
| |
Patch by Carl Peto.
llvm-svn: 308856
|
| |
|
|
|
|
|
|
|
|
| |
optimization
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D35748
llvm-svn: 308854
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: dblaikie, t.p.northover, rengolin
Reviewed By: rengolin
Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D35620
llvm-svn: 308852
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes unnecessary zero copies in BBs that are targets of b.eq/b.ne
and we know the result of the compare instruction is zero. For example,
BB#0:
subs w0, w1, w2
str w0, [x1]
b.ne .LBB0_2
BB#1:
mov w0, wzr ; <-- redundant
str w0, [x2]
.LBB0_2
Differential Revision: https://reviews.llvm.org/D35075
llvm-svn: 308849
|
| |
|
|
|
|
|
|
| |
complexity adjustment to keep shift by immediate using the legacy instructions.
These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns.
llvm-svn: 308834
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check the actual memory type stored and not the extended value size
when considering if truncated store merge is worthwhile.
Reviewers: efriedma, RKSimon, spatel, jyknight
Reviewed By: efriedma
Subscribers: llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D35623
llvm-svn: 308833
|
| |
|
|
|
|
|
|
|
| |
This case is similar to the one fixed in r308808,
except when rematerializing.
Fixes bug 33884.
llvm-svn: 308813
|
| |
|
|
|
|
|
|
|
| |
This is possible if there is an undef use when
splitting the vreg during spilling.
Fixes bug 33620.
llvm-svn: 308808
|
| |
|
|
|
|
|
|
| |
Bitrig code has been merged back to OpenBSD, thus the OS has been abandoned.
Differential Revision: https://reviews.llvm.org/D35707
llvm-svn: 308799
|
| |
|
|
| |
llvm-svn: 308781
|
| |
|
|
|
|
|
| |
For example
asm ("memw(%0++%1) = %2" : : "r"(addr),"a"(mod),"r"(val) : "memory")
llvm-svn: 308761
|
| |
|
|
|
|
|
|
|
|
|
|
| |
-membedded-data changes the location of constant data from the .sdata to
the .rodata section. Previously it was (incorrectly) always located in the
.rodata section.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D35686
llvm-svn: 308758
|
| |
|
|
|
|
| |
test/CodeGen/SystemZ/loop-01.ll was incorrectly updated by r308729.
llvm-svn: 308736
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049
llvm-svn: 308729
|
| |
|
|
|
|
| |
We should be able to handle the case where some c1+c2 elements exceed max shift and some don't by performing a clamp after the sum
llvm-svn: 308724
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently we only support (i32 bitcast(v32i1)) using the AVX2 VPMOVMSKB ymm instruction.
This patch adds support for splitting pre-AVX2 targets into 2 x (V)PMOVMSKB xmm instructions and merging the integer results.
In future we could probably generalize this to handle more cases.
Differential Revision: https://reviews.llvm.org/D35303
llvm-svn: 308723
|
| |
|
|
|
|
|
|
| |
movntdqa instruction.
The bitconverts here had an input type of 128-bits and an output type of 256 bits. The input type should also have been 256 bits.
llvm-svn: 308702
|
| |
|
|
|
|
|
|
| |
It revealed a bug in the Localizer pass which has now been fixed.
This includes the fix for SUBREG_TO_REG committed separately last time.
llvm-svn: 308688
|
| |
|
|
|
|
|
|
| |
If the localizer pass puts one of its constants before the label that tells the
unwinder "jump here to handle your exception" then control-flow will skip it,
leaving uninitialized registers at runtime. That's bad.
llvm-svn: 308687
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].
Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types
Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)
llvm-svn: 308675
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On AMDGPU SGPR spills are really spilled to another register.
The spiller creates the spills to new frame index objects,
which is used as a placeholder.
This will eventually be replaced with a reference to a position
in a VGPR to write to and the frame index deleted. It is
most likely not a real stack location that can be shared
with another stack object.
This is a problem when StackSlotColoring decides it should
combine a frame index used for a normal VGPR spill with
a real stack location and a frame index used for an SGPR.
Add an ID field so that StackSlotColoring has a way
of knowing the different frame index types are
incompatible.
llvm-svn: 308673
|
| |
|
|
| |
llvm-svn: 308672
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Also enable no-fsmuld for sparcv7 (which doesn't have the
instruction).
The previous code which used a post-processing pass to do this was
unnecessary; disabling the instruction is entirely sufficient.
Reviewers: jacob_hansen, ekedaigle
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35576
llvm-svn: 308661
|
| |
|
|
|
|
| |
optimization for the 64-bit memory shifts.
llvm-svn: 308657
|