| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 222795
|
|
|
|
| |
llvm-svn: 222793
|
|
|
|
| |
llvm-svn: 222792
|
|
|
|
| |
llvm-svn: 222791
|
|
|
|
| |
llvm-svn: 222789
|
|
|
|
| |
llvm-svn: 222786
|
|
|
|
| |
llvm-svn: 222784
|
|
|
|
| |
llvm-svn: 222778
|
|
|
|
| |
llvm-svn: 222771
|
|
|
|
|
|
|
|
| |
tests to start failing.
Original commit log: R600/SI: Disable commutativity for MIN/MAX_LEGACY
llvm-svn: 222753
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D6338
llvm-svn: 222752
|
|
|
|
| |
llvm-svn: 222746
|
|
|
|
|
|
|
|
| |
Only the super register flat_scr was marked as reserved,
so in some cases with high register usage it would still
try to allocate the subregisters.
llvm-svn: 222737
|
|
|
|
|
|
|
|
|
|
| |
The pattern matching failed to recognize all instances of "-1", because when
comparing against "-1" we didn't use an APInt of the same bitwidth.
This commit fixes this and also adds inverse versions of the conditon to catch
more cases.
llvm-svn: 222722
|
|
|
|
|
|
|
|
| |
The attn instruction is not part of the Power ISA, but is documented in the A2
user manual, and is accepted by the GNU assembler for the A2 and the POWER4+.
Reported as part of PR21650.
llvm-svn: 222712
|
|
|
|
|
|
|
|
| |
This does not matter on newer cores (where we can use reciprocal estimates in
fast-math mode anyway), but for older cores this allows us to generate better
fast-math code where we have multiple FDIVs with a common divisor.
llvm-svn: 222710
|
|
|
|
|
|
|
| |
Extremely difficult to reproduce, so no test case included.
PR21637
llvm-svn: 222677
|
|
|
|
| |
llvm-svn: 222676
|
|
|
|
|
|
|
|
|
|
|
| |
When processing an assignment in the integrated assembler that sets
a symbol to the value of another symbol, we need to copy the st_other
bits that encode the local entry point offset.
Modeled after MipsTargetELFStreamer::emitAssignment handling of the
ELF::STO_MIPS_MICROMIPS flag.
llvm-svn: 222672
|
|
|
|
| |
llvm-svn: 222670
|
|
|
|
| |
llvm-svn: 222668
|
|
|
|
| |
llvm-svn: 222662
|
|
|
|
| |
llvm-svn: 222660
|
|
|
|
|
|
|
|
|
| |
Fix JRADDIUSP instruction, remove delay slot flag because this instruction
doesn't have delay slot.
Differential Revision: http://reviews.llvm.org/D6365
llvm-svn: 222658
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5122
llvm-svn: 222653
|
|
|
|
|
|
|
|
|
|
| |
instead of S0
Implement microMIPS 16-bit instructions register set: $0, $2-$7 and $17.
Differential Revision: http://reviews.llvm.org/D5780
llvm-svn: 222652
|
|
|
|
|
|
| |
has been alerted to the warning, in case this variable is meant to be used. Fixes -Werror builds in the meantime.
llvm-svn: 222649
|
|
|
|
|
|
|
|
|
|
|
| |
With the help of new method readInstruction16() two bytes are read and
decodeInstruction() is called with DecoderTableMicroMips16, if this fails
four bytes are read and decodeInstruction() is called with
DecoderTableMicroMips32.
Differential Revision: http://reviews.llvm.org/D6149
llvm-svn: 222648
|
|
|
|
|
|
|
|
|
|
|
| |
This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to
convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1.
On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd.
Also, removed a target specific combine that performed a premature lowering of
VSELECT nodes to target specific MOVSS/MOVSD nodes.
llvm-svn: 222647
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where:
1. A single extracted element is used twice.
2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask.
This caused a crash, since the source value for the insertps ends-up uninitialized.
Differential Revision: http://reviews.llvm.org/D6377
llvm-svn: 222635
|
|
|
|
| |
llvm-svn: 222634
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
llvm-svn: 222632
|
|
|
|
| |
llvm-svn: 222631
|
|
|
|
|
|
|
| |
i1 is not a legal type on Evergreen, so this combine proceeded
and tried to produce a bitcast between i1 and i8.
llvm-svn: 222630
|
|
|
|
|
|
| |
Use Triple::isOS*() helper functions where possible.
llvm-svn: 222622
|
|
|
|
|
|
|
|
| |
No functionality changed yet, but this will prevent subsequent patches
from having to handle permutations of various interleaved shuffle
patterns.
llvm-svn: 222614
|
|
|
|
|
|
| |
arguments.
llvm-svn: 222587
|
|
|
|
|
|
|
| |
We need to use a s_mov_b32 rather than a copy, so that CSE will
eliminate redundant moves to the m0 register.
llvm-svn: 222584
|
|
|
|
|
|
|
|
|
|
|
|
| |
This s_mov_b32 will write to a virtual register from the M0Reg
class and all the ds instructions now take an extra M0Reg explicit
argument.
This change is necessary to prevent issues with the scheduler
mixing together instructions that expect different values in the m0
registers.
llvm-svn: 222583
|
|
|
|
|
|
|
| |
This pass attempts to fold the source operands of mov and copy
instructions into their uses.
llvm-svn: 222581
|
|
|
|
|
|
|
|
|
|
|
| |
filler such as if delay slot filler have to put NOP instruction into the
delay slot of microMIPS BEQ or BNE instruction which uses the register $0,
then instead of emitting NOP this instruction is replaced by the corresponding
microMIPS compact branch instruction, i.e. BEQZC or BNEZC.
Differential Revision: http://reviews.llvm.org/D3566
llvm-svn: 222580
|
|
|
|
| |
llvm-svn: 222579
|
|
|
|
| |
llvm-svn: 222577
|
|
|
|
|
|
| |
that will be removed after converting referencing defs.
llvm-svn: 222575
|
|
|
|
| |
llvm-svn: 222571
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.
Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).
Differential Revision: http://reviews.llvm.org/D6355
llvm-svn: 222544
|
|
|
|
|
|
|
|
|
|
| |
shuffle lowering to allow much better blend matching.
Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.
llvm-svn: 222539
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a bunch more improvements.
Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.
This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.
llvm-svn: 222537
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lanes.
By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.
While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.
llvm-svn: 222533
|
|
|
|
|
|
|
|
| |
positive numbers
Differential Revision: http://reviews.llvm.org/D5938
llvm-svn: 222521
|