| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
| |
D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914).
Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os).
This patch should be considered for the 5.0.0 release branch as well
Differential Revision: https://reviews.llvm.org/D35830
llvm-svn: 308986
|
| |
|
|
| |
llvm-svn: 308981
|
| |
|
|
| |
llvm-svn: 308980
|
| |
|
|
| |
llvm-svn: 308963
|
| |
|
|
|
|
| |
This patch doesn't modifay any non test file.
llvm-svn: 308909
|
| |
|
|
|
|
|
|
|
|
|
| |
instructions that were missing.
patterns were missed by D33188. Adding for completion.
+Updating test.
Differential Revesion: https://reviews.llvm.org/D35179
llvm-svn: 308868
|
| |
|
|
|
|
|
|
|
|
| |
optimization
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D35748
llvm-svn: 308854
|
| |
|
|
|
|
|
|
| |
complexity adjustment to keep shift by immediate using the legacy instructions.
These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns.
llvm-svn: 308834
|
| |
|
|
|
|
| |
We should be able to handle the case where some c1+c2 elements exceed max shift and some don't by performing a clamp after the sum
llvm-svn: 308724
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently we only support (i32 bitcast(v32i1)) using the AVX2 VPMOVMSKB ymm instruction.
This patch adds support for splitting pre-AVX2 targets into 2 x (V)PMOVMSKB xmm instructions and merging the integer results.
In future we could probably generalize this to handle more cases.
Differential Revision: https://reviews.llvm.org/D35303
llvm-svn: 308723
|
| |
|
|
|
|
|
|
| |
movntdqa instruction.
The bitconverts here had an input type of 128-bits and an output type of 256 bits. The input type should also have been 256 bits.
llvm-svn: 308702
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On AMDGPU SGPR spills are really spilled to another register.
The spiller creates the spills to new frame index objects,
which is used as a placeholder.
This will eventually be replaced with a reference to a position
in a VGPR to write to and the frame index deleted. It is
most likely not a real stack location that can be shared
with another stack object.
This is a problem when StackSlotColoring decides it should
combine a frame index used for a normal VGPR spill with
a real stack location and a frame index used for an SGPR.
Add an ID field so that StackSlotColoring has a way
of knowing the different frame index types are
incompatible.
llvm-svn: 308673
|
| |
|
|
| |
llvm-svn: 308672
|
| |
|
|
|
|
| |
optimization for the 64-bit memory shifts.
llvm-svn: 308657
|
| |
|
|
|
|
|
|
|
|
| |
bits in the (shift x, (and y, mask)) patterns for the 64-bit memory form.
We allow wider than 5 bits in the 16 and 32 bit store forms. And we allow wider than 6 bits on the 64-bit regsiter form.:w
I'm assuming this was a mistake made back in r148024.
llvm-svn: 308656
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When pushing an extension of a constant bitwise operator on a load
into the load, change other uses of the load value if they exist to
prevent the old load from persisting.
Reviewers: spatel, RKSimon, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35030
llvm-svn: 308618
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add missing vector write of vector read reduction, i.e.:
(insert_vector_elt x (extract_vector_elt x idx) idx) to x
Reviewers: spatel, RKSimon, efriedma
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35563
llvm-svn: 308617
|
| |
|
|
|
|
| |
Test constant folding both on node creation (which already works) and once the input nodes have been folded themselves (not working yet).
llvm-svn: 308611
|
| |
|
|
|
|
|
|
| |
predicates.
Use predicate matchers introduced in D35492 to match more ISD::SRL constant folds
llvm-svn: 308602
|
| |
|
|
|
|
|
|
| |
predicates.
Use predicate matchers introduced in D35492 to match more ISD::SRA constant folds
llvm-svn: 308600
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Most combines currently recognise scalar and splat-vector constants, but not non-uniform vector constants.
This patch introduces a matching mechanism that uses predicates to check against BUILD_VECTOR of ConstantSDNode, as well as scalar ConstantSDNode cases.
I've changed a couple of predicates to demonstrate - the combine-shl changes add currently unsupported cases, while the MatchRotate replaces an existing mechanism.
Differential Revision: https://reviews.llvm.org/D35492
llvm-svn: 308598
|
| |
|
|
|
|
| |
Fixes PR33841.
llvm-svn: 308591
|
| |
|
|
|
|
| |
We should use SHLX and similar instructions for these patterns, but we currently don't.
llvm-svn: 308590
|
| |
|
|
|
|
| |
I've stripped the checks for 64-bit types in 32-bit mode to match the existing tests.
llvm-svn: 308589
|
| |
|
|
|
|
| |
The test issue was fixed and the test was updated in r244577, but the comment wasn't removed.
llvm-svn: 308588
|
| |
|
|
|
|
|
|
|
|
| |
Add optimization remarks support to the PrologueEpilogueInserter. For
now, emit the stack size as an analysis remark, but more additions wrt
shrink-wrapping may be added.
https://reviews.llvm.org/D35645
llvm-svn: 308556
|
| |
|
|
| |
llvm-svn: 308527
|
| |
|
|
|
|
|
|
|
|
|
|
| |
debug
and non-debug units.
Patch by Andrea DiBiagio.
Differential Revision: https://reviews.llvm.org/D35637
llvm-svn: 308513
|
| |
|
|
|
|
| |
Fixes the crash reported in PR33844.
llvm-svn: 308503
|
| |
|
|
|
|
| |
XOP shifts only support 128-bit vectors, so we were ending up with less optimal codegen requiring constants
llvm-svn: 308430
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds the following
1. Adds a skeleton scheduler model for AMD Znver1.
2. Introduces the znver1 execution units and pipes.
3. Caters the instructions based on the generic scheduler classes.
4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on
a. Instructions types
b. Registers used
5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added.
6. Scheduler testcases are modified accordingly to suit the new model.
Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me.
Reviewers: craig.topper, RKSimon
Subscribers: javed.absar, shivaram, ddibyend, vprasad
Differential Revision: https://reviews.llvm.org/D35293
llvm-svn: 308411
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-recommiting after landing DAG extension-crash fix.
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand
Reviewed By: rnk
Subscribers: sdardis, nemanjai, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33345
llvm-svn: 308350
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When replacing a node and it's operand, replacing the operand node may
cause the deletion of the original node leading to an assertion
failure. Case around these replacements to avoid this without relying
on inspecting the DELETED_NODE opcode in various extend
dagcombiner cases.
Fixes PR32515.
Reviewers: dbabokin, RKSimon, davide, chandlerc
Subscribers: chandlerc, llvm-commits
Differential Revision: https://reviews.llvm.org/D34095
llvm-svn: 308330
|
| |
|
|
| |
llvm-svn: 308323
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.
Notes:
Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.
There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.
We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:
https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.
Committed on behalf of Sanjay Patel
Differential Revision: https://reviews.llvm.org/D35067
llvm-svn: 308322
|
| |
|
|
| |
llvm-svn: 308311
|
| |
|
|
|
|
|
|
| |
As discussed by @spatel on D35067:
"I added the cmov attribute to the 32-bit codegen test because it removes some noise for that file. I think the intent for the SSE vs no-SSE runs is to show the potential difference for the 16 and 32 byte cases rather than the lack of cmov (which has been available for all CPUs since SSE1, so that's why it shows up automatically with -mattr=sse2)."
llvm-svn: 308309
|
| |
|
|
|
|
| |
Take the modulo of rotations by a constant greater than or equal to the bit-width
llvm-svn: 308302
|
| |
|
|
| |
llvm-svn: 308295
|
| |
|
|
| |
llvm-svn: 308286
|
| |
|
|
|
|
|
| |
Notably, this is failing on our PPC build bots:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/8338/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Apr33772.ll
llvm-svn: 308272
|
| |
|
|
|
|
|
|
|
| |
with a minimal test case in http://llvm.org/PR33833.
Original commit message:
Improve Aliasing of operations to static alloca
llvm-svn: 308271
|
| |
|
|
|
|
|
|
|
|
| |
non-constant scale value.
This isn't legal code, but we shouldn't crash on it. Now we just don't convert the gather intrinsic if the scale isn't constant and let it go through to isel where we'll report an isel failure.
Fixes PR33772.
llvm-svn: 308267
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Rename the enum value from X86_64_Win64 to plain Win64.
The symbol exposed in the textual IR is changed from 'x86_64_win64cc'
to 'win64cc', but the numeric value is kept, keeping support for
old bitcode.
Differential Revision: https://reviews.llvm.org/D34474
llvm-svn: 308208
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: t.p.northover, oren_ben_simhon, niravd, mcrosier
Reviewed By: oren_ben_simhon, mcrosier
Subscribers: nhaehnle, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35466
llvm-svn: 308193
|
| |
|
|
| |
llvm-svn: 308180
|
| |
|
|
|
|
|
|
| |
Add support for lowering to ISD::ROTL/ISD::ROTR, including rotate by immediate
Differential Revision: https://reviews.llvm.org/D35463
llvm-svn: 308177
|
| |
|
|
| |
llvm-svn: 308175
|
| |
|
|
|
|
| |
Was preventing rotate matching
llvm-svn: 308171
|
| |
|
|
| |
llvm-svn: 308169
|