| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17307
llvm-svn: 261032
|
|
|
|
| |
llvm-svn: 261025
|
|
|
|
|
|
| |
I suspect this is what let PR26110 lie dormant for so long.
llvm-svn: 261024
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we sometimes miscompile this vector pattern:
(c ? -v : v)
We lower it to (because "c" is <4 x i1>, lowered as a vector mask):
(~c & v) | (c & -v)
When we have SSSE3, we incorrectly lower that to PSIGN, which does:
(c < 0 ? -v : c > 0 ? v : 0)
in other words, when c is either all-ones or all-zero:
(c ? -v : 0)
While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.
Note that the PSIGN tests are also incorrect. Consider:
%b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %a, %0
%2 = and <4 x i32> %b.lobit, %sub
%cond = or <4 x i32> %1, %2
ret <4 x i32> %cond
if %b is zero:
%b.lobit = <4 x i32> zeroinitializer
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = <4 x i32> %a
%2 = <4 x i32> zeroinitializer
%cond = or <4 x i32> %a, zeroinitializer
ret <4 x i32> %a
whereas we currently generate:
psignd %xmm1, %xmm0
retq
which returns 0, as %xmm1 is 0.
Instead, use a pure logic sequence, as described in:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate
Fixes PR26110.
Differential Revision: http://reviews.llvm.org/D17181
llvm-svn: 261023
|
|
|
|
| |
llvm-svn: 261021
|
|
|
|
|
|
| |
This makes it IMO more readable and reduces indentation.
llvm-svn: 261020
|
|
|
|
|
|
| |
These were fixed with r260978
llvm-svn: 261017
|
|
|
|
|
|
|
|
|
|
| |
The register stackifier currently checks for intervening stores (and
loads that may alias them) but doesn't account for the fact that the
instruction being moved may affect intervening loads.
Differential Revision: http://reviews.llvm.org/D17298
llvm-svn: 261014
|
|
|
|
|
|
|
|
|
|
|
|
| |
23-bit 4-byte aligned relocation to be a valid instruction encoding.
The usual way to get a 32-bit relocation is to use a constant extender which doubles the size of the instruction, 4 bytes to 8 bytes.
Another way is to put a .word32 and mix code and data within a function. The disadvantage is it's not a valid instruction encoding and jumping over it causes prefetch stalls inside the hardware.
This relocation packs a 23-bit value in to an "r0 = add(rX, #a)" instruction by overwriting the source register bits. Since r0 is the return value register, if this instruction is placed after a function call which return void, r0 will be filled with an undefined value, the prefetch won't be confused, and the callee can access the constant value by way of the link register.
llvm-svn: 261006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change will add a pass to remove unnecessary zero copies in target blocks
of cbz/cbnz instructions. E.g., the copy instruction in the code below can be
removed because the cbz jumps to BB1 when x0 is zero :
BB0:
cbz x0, .BB1
BB1:
mov x0, xzr
Jun
Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier
Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D16203
llvm-svn: 261004
|
|
|
|
|
|
|
|
|
| |
Original message:
Get rid of the ifdefs in TargetLowering.
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.
llvm-svn: 260998
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CopyToReg nodes don't support FrameIndex operands. Other targets select
the FI to some LEA-like instruction, but since we don't have that, we
need to insert some kind of instruction that can take an FI operand and
produces a value usable by CopyToReg (i.e. in a vreg). So insert a dummy
copy_local between Op and its FI operand. This results in a redundant
copy which we should optimize away later (maybe in the post-FI-lowering
peephole pass).
Differential Revision: http://reviews.llvm.org/D17213
llvm-svn: 260987
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler.
Reviewers: vpykhtin, tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits, nhaustov
Projects: #llvm-amdgpu-spb
Differential Revision: http://reviews.llvm.org/D16920
llvm-svn: 260986
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16877
llvm-svn: 260979
|
|
|
|
|
|
|
|
|
|
| |
WebAssembly doesn't require full RPO; topological sorting is sufficient and
can preserve more of the MachineBlockPlacement ordering. Unfortunately, this
still depends a lot on heuristics, because while we use the
MachineBlockPlacement ordering as a guide, we can't use it in places where
it isn't topologically ordered. This area will require further attention.
llvm-svn: 260978
|
|
|
|
|
|
|
| |
http://lab.llvm.org:8011/builders/lldb-x86-windows-msvc2015/builds/15436/steps/build/logs/stdio
http://bb.pgr.jp/builders/msbuild-llvmclang-x64-msc18-DA/builds/961/steps/build_llvm/logs/stdio
llvm-svn: 260972
|
|
|
|
|
|
|
|
| |
This avoids some complications updating LiveIntervals to be aware of the new
register lifetimes, because we can just compute new intervals from scratch
rather than describe how the old ones have been changed.
llvm-svn: 260971
|
|
|
|
| |
llvm-svn: 260968
|
|
|
|
|
|
|
|
|
|
| |
Add a missing check for a type of address displacement operand of the load/store instruction being a candidate for LEA substitution.
Ref: https://llvm.org/bugs/show_bug.cgi?id=26575
Differential Revision: http://reviews.llvm.org/D17261
llvm-svn: 260959
|
|
|
|
|
|
|
|
| |
Once a pointer is turned into a reference it cannot be nullptr, clang
rightfully warns about this assert being a tautology. Put the assert
before the reference is created.
llvm-svn: 260949
|
|
|
|
| |
llvm-svn: 260943
|
|
|
|
| |
llvm-svn: 260942
|
|
|
|
| |
llvm-svn: 260940
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This consist in variosu addition to the C API:
LLVMTargetDataRef LLVMGetModuleDataLayout(LLVMModuleRef M);
void LLVMSetModuleDataLayout(LLVMModuleRef M, LLVMTargetDataRef DL);
LLVMTargetDataRef LLVMCreateTargetMachineData(LLVMTargetMachineRef T);
Reviewers: joker.eph, Wallbraker, echristo
Subscribers: axw
Differential Revision: http://reviews.llvm.org/D17255
llvm-svn: 260936
|
|
|
|
|
|
|
| |
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.
llvm-svn: 260922
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: It's red, it's dead.
Reviewers: joker.eph, Wallbraker, echristo
Subscribers: llvm-commits, axw
Differential Revision: http://reviews.llvm.org/D17282
llvm-svn: 260919
|
|
|
|
|
|
|
|
| |
locality and code size from SP/FP offset encoding.
Differential Revision: http://reviews.llvm.org/D15393
llvm-svn: 260917
|
|
|
|
|
|
|
| |
The variable was made dead in NDEBUG by r260901, but the assert
was redundant anyway: get rid of both.
llvm-svn: 260908
|
|
|
|
| |
llvm-svn: 260903
|
|
|
|
|
|
| |
simplify handling and allow flags on the expression.
llvm-svn: 260902
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17229
llvm-svn: 260901
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In order to pass the tests, this required marking R_MIPS_16 relocations
as needing to point to the symbol and not the section.
Reviewers: vkalintiris, dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D17200
llvm-svn: 260896
|
|
|
|
| |
llvm-svn: 260895
|
|
|
|
| |
llvm-svn: 260880
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This section is used for debug information and has no need to be
in memory at runtime. With this patch, LLVM now emits the same flags as
the GNU assembler. This patch also fixes an error when compiling
the Linux kernel, The error is that there are relocations within the
.pdr section in a VDSO.
Reviewers: vkalintiris, dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D17199
llvm-svn: 260879
|
|
|
|
|
|
|
|
|
|
| |
are changed to 16 bits.
If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes.
Differential Revision: http://reviews.llvm.org/D17138
llvm-svn: 260878
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r180893 added an indirect include of llvm/Config/Targets.def to
llvm/Support/CodeGen.h, which in turn is included by things like
llvm/IR/Module.h. After a full build of LLVM and Clang, ninja had to
rebuild 1274 files after reconfiguring.
This commit strips CodeGen.h back down to just a pile of enums and moves
the expensive includes over to CodeGenCWrappers.h (which is only
included in two places). This gets ninja down to 88 files if you
reconfigure with, e.g., -DLLVM_TARGETS_TO_BUILD=X86.
llvm-svn: 260835
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations.
On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle.
This patch has several benefits:
* Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling.
* Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure).
* Matching the repeating shuffle makes use of a lot of existing shuffle lowering.
There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review.
Differential Revision: http://reviews.llvm.org/D16537
llvm-svn: 260834
|
|
|
|
|
|
| |
features from one processor to another. This exposed extra features to the -mattr command line that we shouldn't. Replace with just inherited listconcats.
llvm-svn: 260832
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As shown in:
https://llvm.org/bugs/show_bug.cgi?id=23203
...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64,
but the instruction def doesn't know that.
I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :)
Differential Revision: http://reviews.llvm.org/D17219
llvm-svn: 260828
|
|
|
|
|
|
|
| |
Gcc 4.7.2-4 does not seem to have "emplace" in its implementation of map.
This should fix the build failure on polly-amd64-linux.
llvm-svn: 260816
|
|
|
|
|
|
| |
SlotInfo() instead of member initializers.
llvm-svn: 260812
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tests for the new scalarize all private access options will be
included with a future commit.
The only functional change is to make the split/scalarize behavior
for private access of > 4 element vectors to be consistent
with the flat/global handling. This makes the spilling worse
in the two changed tests.
llvm-svn: 260804
|
|
|
|
|
|
|
| |
This intrinsic will be used to expose dpp functionality to higher-level
languages. It will map to the dpp version of v_mov_b32.
llvm-svn: 260792
|
|
|
|
| |
llvm-svn: 260784
|
|
|
|
|
|
|
| |
These provide direct access to the hardware instruction without
the unit version required like llvm.sin/llvm.cos lowering requires.
llvm-svn: 260782
|
|
|
|
|
|
| |
Also fixes missing f32 test.
llvm-svn: 260780
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: nhaustov, cfang, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17159
llvm-svn: 260774
|
|
|
|
| |
llvm-svn: 260773
|
|
|
|
| |
llvm-svn: 260766
|