| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
| |
Bug description:
The bug was discovered when test was compiled with -O0.
In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result.
Change LowerMSCATTER() to return chain as original node do.
Differential Revision: http://reviews.llvm.org/D17331
llvm-svn: 261090
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This section is used for debug information and has no need to be
in memory at runtime. This patch also fixes an error when compiling
the Linux kernel. The error is that there are relocations within the
.pdr section in a VDSO. SHT_REL was removed as it is a section type
and not a section flag, therefore it does not make sense for it to
be there. With this patch, LLVM now emits the same flags as
the GNU assembler.
llvm-svn: 261083
|
| |
|
|
|
|
|
|
|
|
|
|
| |
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.
This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour.
Part 2 of 2
Differential Revision: http://reviews.llvm.org/D17292
llvm-svn: 261082
|
| |
|
|
|
|
|
|
|
|
|
|
| |
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.
This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors.
Part 1 of 2
Differential Revision: http://reviews.llvm.org/D17292
llvm-svn: 261081
|
| |
|
|
|
|
|
|
| |
Avoid reuse of operand variables, keep them local to a particular lowering - the operand collection is unique to each case anyhow.
Renamed from V to Ops to more closely match their purpose.
llvm-svn: 261078
|
| |
|
|
| |
llvm-svn: 261077
|
| |
|
|
|
|
| |
Asserts are still firing in Chromium builds. PR26575.
llvm-svn: 261058
|
| |
|
|
|
|
| |
r261050 seems to inadvertently fix the assertion failure.
llvm-svn: 261051
|
| |
|
|
|
|
|
|
|
|
| |
This fixes very slow compilation on
test/CodeGen/Generic/2010-11-04-BigByval.ll . Note that MaxStoresPerMemcpy
and friends are not yet carefully tuned so the cutoff point is currently
somewhat arbitrary. However, it's important that there be a cutoff point
so that we don't emit unbounded quantities of loads and stores.
llvm-svn: 261050
|
| |
|
|
|
|
| |
r261032 adds frame address support.
llvm-svn: 261044
|
| |
|
|
|
|
|
|
|
| |
__chkstk clobbers EAX. If EAX is live across the prologue, then we have
to take extra steps to save it. We already had code to do this if EAX
was a register parameter. This change adapts it to work when shrink
wrapping is used.
llvm-svn: 261039
|
| |
|
|
| |
llvm-svn: 261037
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17307
llvm-svn: 261032
|
| |
|
|
| |
llvm-svn: 261025
|
| |
|
|
|
|
| |
I suspect this is what let PR26110 lie dormant for so long.
llvm-svn: 261024
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we sometimes miscompile this vector pattern:
(c ? -v : v)
We lower it to (because "c" is <4 x i1>, lowered as a vector mask):
(~c & v) | (c & -v)
When we have SSSE3, we incorrectly lower that to PSIGN, which does:
(c < 0 ? -v : c > 0 ? v : 0)
in other words, when c is either all-ones or all-zero:
(c ? -v : 0)
While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.
Note that the PSIGN tests are also incorrect. Consider:
%b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %a, %0
%2 = and <4 x i32> %b.lobit, %sub
%cond = or <4 x i32> %1, %2
ret <4 x i32> %cond
if %b is zero:
%b.lobit = <4 x i32> zeroinitializer
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = <4 x i32> %a
%2 = <4 x i32> zeroinitializer
%cond = or <4 x i32> %a, zeroinitializer
ret <4 x i32> %a
whereas we currently generate:
psignd %xmm1, %xmm0
retq
which returns 0, as %xmm1 is 0.
Instead, use a pure logic sequence, as described in:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate
Fixes PR26110.
Differential Revision: http://reviews.llvm.org/D17181
llvm-svn: 261023
|
| |
|
|
| |
llvm-svn: 261021
|
| |
|
|
|
|
| |
This makes it IMO more readable and reduces indentation.
llvm-svn: 261020
|
| |
|
|
|
|
| |
These were fixed with r260978
llvm-svn: 261017
|
| |
|
|
|
|
|
|
|
|
| |
The register stackifier currently checks for intervening stores (and
loads that may alias them) but doesn't account for the fact that the
instruction being moved may affect intervening loads.
Differential Revision: http://reviews.llvm.org/D17298
llvm-svn: 261014
|
| |
|
|
|
|
|
|
|
|
|
|
| |
23-bit 4-byte aligned relocation to be a valid instruction encoding.
The usual way to get a 32-bit relocation is to use a constant extender which doubles the size of the instruction, 4 bytes to 8 bytes.
Another way is to put a .word32 and mix code and data within a function. The disadvantage is it's not a valid instruction encoding and jumping over it causes prefetch stalls inside the hardware.
This relocation packs a 23-bit value in to an "r0 = add(rX, #a)" instruction by overwriting the source register bits. Since r0 is the return value register, if this instruction is placed after a function call which return void, r0 will be filled with an undefined value, the prefetch won't be confused, and the callee can access the constant value by way of the link register.
llvm-svn: 261006
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change will add a pass to remove unnecessary zero copies in target blocks
of cbz/cbnz instructions. E.g., the copy instruction in the code below can be
removed because the cbz jumps to BB1 when x0 is zero :
BB0:
cbz x0, .BB1
BB1:
mov x0, xzr
Jun
Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier
Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D16203
llvm-svn: 261004
|
| |
|
|
|
|
|
|
|
| |
Original message:
Get rid of the ifdefs in TargetLowering.
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.
llvm-svn: 260998
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
CopyToReg nodes don't support FrameIndex operands. Other targets select
the FI to some LEA-like instruction, but since we don't have that, we
need to insert some kind of instruction that can take an FI operand and
produces a value usable by CopyToReg (i.e. in a vreg). So insert a dummy
copy_local between Op and its FI operand. This results in a redundant
copy which we should optimize away later (maybe in the post-FI-lowering
peephole pass).
Differential Revision: http://reviews.llvm.org/D17213
llvm-svn: 260987
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler.
Reviewers: vpykhtin, tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits, nhaustov
Projects: #llvm-amdgpu-spb
Differential Revision: http://reviews.llvm.org/D16920
llvm-svn: 260986
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16877
llvm-svn: 260979
|
| |
|
|
|
|
|
|
|
|
| |
WebAssembly doesn't require full RPO; topological sorting is sufficient and
can preserve more of the MachineBlockPlacement ordering. Unfortunately, this
still depends a lot on heuristics, because while we use the
MachineBlockPlacement ordering as a guide, we can't use it in places where
it isn't topologically ordered. This area will require further attention.
llvm-svn: 260978
|
| |
|
|
|
|
|
| |
http://lab.llvm.org:8011/builders/lldb-x86-windows-msvc2015/builds/15436/steps/build/logs/stdio
http://bb.pgr.jp/builders/msbuild-llvmclang-x64-msc18-DA/builds/961/steps/build_llvm/logs/stdio
llvm-svn: 260972
|
| |
|
|
|
|
|
|
| |
This avoids some complications updating LiveIntervals to be aware of the new
register lifetimes, because we can just compute new intervals from scratch
rather than describe how the old ones have been changed.
llvm-svn: 260971
|
| |
|
|
| |
llvm-svn: 260968
|
| |
|
|
|
|
|
|
|
|
| |
Add a missing check for a type of address displacement operand of the load/store instruction being a candidate for LEA substitution.
Ref: https://llvm.org/bugs/show_bug.cgi?id=26575
Differential Revision: http://reviews.llvm.org/D17261
llvm-svn: 260959
|
| |
|
|
|
|
|
|
| |
Once a pointer is turned into a reference it cannot be nullptr, clang
rightfully warns about this assert being a tautology. Put the assert
before the reference is created.
llvm-svn: 260949
|
| |
|
|
| |
llvm-svn: 260943
|
| |
|
|
| |
llvm-svn: 260942
|
| |
|
|
| |
llvm-svn: 260940
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This consist in variosu addition to the C API:
LLVMTargetDataRef LLVMGetModuleDataLayout(LLVMModuleRef M);
void LLVMSetModuleDataLayout(LLVMModuleRef M, LLVMTargetDataRef DL);
LLVMTargetDataRef LLVMCreateTargetMachineData(LLVMTargetMachineRef T);
Reviewers: joker.eph, Wallbraker, echristo
Subscribers: axw
Differential Revision: http://reviews.llvm.org/D17255
llvm-svn: 260936
|
| |
|
|
|
|
|
| |
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.
llvm-svn: 260922
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: It's red, it's dead.
Reviewers: joker.eph, Wallbraker, echristo
Subscribers: llvm-commits, axw
Differential Revision: http://reviews.llvm.org/D17282
llvm-svn: 260919
|
| |
|
|
|
|
|
|
| |
locality and code size from SP/FP offset encoding.
Differential Revision: http://reviews.llvm.org/D15393
llvm-svn: 260917
|
| |
|
|
|
|
|
| |
The variable was made dead in NDEBUG by r260901, but the assert
was redundant anyway: get rid of both.
llvm-svn: 260908
|
| |
|
|
| |
llvm-svn: 260903
|
| |
|
|
|
|
| |
simplify handling and allow flags on the expression.
llvm-svn: 260902
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17229
llvm-svn: 260901
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In order to pass the tests, this required marking R_MIPS_16 relocations
as needing to point to the symbol and not the section.
Reviewers: vkalintiris, dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D17200
llvm-svn: 260896
|
| |
|
|
| |
llvm-svn: 260895
|
| |
|
|
| |
llvm-svn: 260880
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This section is used for debug information and has no need to be
in memory at runtime. With this patch, LLVM now emits the same flags as
the GNU assembler. This patch also fixes an error when compiling
the Linux kernel, The error is that there are relocations within the
.pdr section in a VDSO.
Reviewers: vkalintiris, dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D17199
llvm-svn: 260879
|
| |
|
|
|
|
|
|
|
|
| |
are changed to 16 bits.
If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes.
Differential Revision: http://reviews.llvm.org/D17138
llvm-svn: 260878
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
r180893 added an indirect include of llvm/Config/Targets.def to
llvm/Support/CodeGen.h, which in turn is included by things like
llvm/IR/Module.h. After a full build of LLVM and Clang, ninja had to
rebuild 1274 files after reconfiguring.
This commit strips CodeGen.h back down to just a pile of enums and moves
the expensive includes over to CodeGenCWrappers.h (which is only
included in two places). This gets ninja down to 88 files if you
reconfigure with, e.g., -DLLVM_TARGETS_TO_BUILD=X86.
llvm-svn: 260835
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations.
On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle.
This patch has several benefits:
* Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling.
* Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure).
* Matching the repeating shuffle makes use of a lot of existing shuffle lowering.
There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review.
Differential Revision: http://reviews.llvm.org/D16537
llvm-svn: 260834
|