| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values (reapplied)
Summary:
Take the target's endianness into account when splitting the
debug information in DAGTypeLegalizer::SetExpandedInteger.
This patch fixes so that, for big-endian targets, the fragment
expression corresponding to the high part of a split integer
value is placed at offset 0, in order to correctly represent
the memory address order.
I have attached a PPC32 reproducer where the resulting DWARF
pieces for a 64-bit integer were incorrectly reversed.
Original patch was reverted due to using -stop-after=isel in
the test case (but that is only working when AMDGPU target
is included in the llc build). The test case has now been
updated to use -stop-before=expand-isel-pseudos instead.
Patch by: dstenb
Reviewers: JDevlieghere, aprantl, dblaikie
Reviewed By: JDevlieghere, aprantl, dblaikie
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D38172
llvm-svn: 314781
|
| |
|
|
|
|
| |
Mentioned in D38472
llvm-svn: 314777
|
| |
|
|
|
|
| |
Pulled out of D38472
llvm-svn: 314776
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This makes sure the LSDA pointer isn't truncated to 32 bit.
Make LowerINTRINSIC_WO_CHAIN a member function instead of a static
function, so that it can use the getGlobalWrapperKind method.
This solves the second half of the issues mentioned in PR34720.
Differential Revision: https://reviews.llvm.org/D38343
llvm-svn: 314767
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Legalize bitwise OR:
A = BinOp<Ty> B, C
into:
B1, ..., BN = G_UNMERGE_VALUES B
C1, ..., CN = G_UNMERGE_VALUES C
A1 = BinOp<Ty/N> B1, C2
...
AN = BinOp<Ty/N> BN, CN
A = G_MERGE_VALUES A1, ..., AN
llvm-svn: 314760
|
| |
|
|
|
|
|
|
|
| |
See https://reviews.llvm.org/D38172.
I tried to XFAIL it, but sometimes XPASS triggers the bot. Simply
revert it.
llvm-svn: 314739
|
| |
|
|
|
|
| |
See https://reviews.llvm.org/D38172 for details.
llvm-svn: 314732
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issues addressed since original review:
- Avoid bug in regalloc greedy/machine verifier when forwarding to use
in an instruction that re-defines the same virtual register.
- Fixed bug when forwarding to use in EarlyClobber instruction slot.
- Fixed incorrect forwarding to register definitions that showed up in
explicit_uses() iterator (e.g. in INLINEASM).
- Moved removal of dead instructions found by
LiveIntervals::shrinkToUses() outside of loop iterating over
instructions to avoid instructions being deleted while pointed to by
iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
llvm-svn: 314729
|
| |
|
|
|
|
|
|
| |
These check lines are supposed to make sure the new d16
load instructions aren't used, but the expected instruction
name is a prefix of the incorrect instruction name.
llvm-svn: 314714
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Also add support for some older Myriad CPUs that were missing.
Reviewers: jyknight
Subscribers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D37552
llvm-svn: 314705
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38421
llvm-svn: 314688
|
| |
|
|
| |
llvm-svn: 314682
|
| |
|
|
|
|
| |
Still avoiding the floating point comments to prevent linux/windows discrepancies.
llvm-svn: 314681
|
| |
|
|
| |
llvm-svn: 314680
|
| |
|
|
| |
llvm-svn: 314679
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Take the target's endianness into account when splitting the
debug information in DAGTypeLegalizer::SetExpandedInteger.
This patch fixes so that, for big-endian targets, the fragment
expression corresponding to the high part of a split integer
value is placed at offset 0, in order to correctly represent
the memory address order.
I have attached a PPC32 reproducer where the resulting DWARF
pieces for a 64-bit integer were incorrectly reversed.
Patch by: dstenb
Reviewers: JDevlieghere, aprantl, dblaikie
Reviewed By: JDevlieghere, aprantl, dblaikie
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D38172
llvm-svn: 314666
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch add a support of ISD::ZERO_EXTEND in PPCDAGToDAGISel::tryBitPermutation to increase the opportunity to use rotate-and-mask by reordering ZEXT and ANDI.
Since tryBitPermutation stops analyzing nodes if it hits a ZEXT node while traversing SDNodes, we want to avoid ZEXT between two nodes that can be folded into a rotate-and-mask instruction.
For example, we allow these nodes
t9: i32 = add t7, Constant:i32<1>
t11: i32 = and t9, Constant:i32<255>
t12: i64 = zero_extend t11
t14: i64 = shl t12, Constant:i64<2>
to be folded into a rotate-and-mask instruction.
Such case often happens in array accesses with logical AND operation in the index, e.g. array[i & 0xFF];
Differential Revision: https://reviews.llvm.org/D37514
llvm-svn: 314655
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86InterleavedAccess (VF64 stride 3-4)
I continue to support different VF interleaved and in this pass for this patch,
I added the vf64 stride3 support for both load and store.
I also added support fot the stride4 store.
Reviewers:
1. zvi
2. dorit
3. igorb
4. guyblank
Differential Revision: https://reviews.llvm.org/D37687
Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b
llvm-svn: 314651
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
HexagonVectorLoopCarriedReuse pass
If the two instructions being compared for equivalence have corresponding operands
that are integer constants, then check their values to determine equivalence.
Patch by Suyog Sarda!
llvm-svn: 314642
|
| |
|
|
|
|
|
| |
This patch extracts 1 element from vector consisting
of elements of size 1 bit at given index.
llvm-svn: 314641
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.
For the register®ister form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register®ister form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.
I believe this supercedes D38025 which was trying to switch the register®ister form back to pre-PR22995.
Reviewers: aymanmus, RKSimon, zvi
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38120
llvm-svn: 314639
|
| |
|
|
| |
llvm-svn: 314631
|
| |
|
|
|
|
| |
Trying to use a AND mask is tricky as after legalization its nigh impossible for computeKnownBits to do anything with it
llvm-svn: 314630
|
| |
|
|
|
|
| |
Support unary packing and fix the faux shuffle mask for vectors larger than 128 bits.
llvm-svn: 314629
|
| |
|
|
| |
llvm-svn: 314628
|
| |
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38312
Change-Id: Ifbc4189549f2f59995019a86f85f989c04e4d37d
llvm-svn: 314626
|
| |
|
|
|
| |
Change-Id: I9ea62aac81b763c83d26613dca6fcd846997a017
llvm-svn: 314621
|
| |
|
|
| |
llvm-svn: 314615
|
| |
|
|
|
|
|
|
| |
This reverts commit e60b5028619be1c81bd039d63a0627dac32d38f9.
Incorrectly include changes that are not typo fix.
llvm-svn: 314614
|
| |
|
|
| |
llvm-svn: 314613
|
| |
|
|
| |
llvm-svn: 314607
|
| |
|
|
|
|
| |
Remove sign extend in register style pattern if the sign is already extended enough
llvm-svn: 314599
|
| |
|
|
| |
llvm-svn: 314598
|
| |
|
|
|
|
| |
We should be using PACKSS/PACKUS more aggressively when we know the state of the upper bits
llvm-svn: 314597
|
| |
|
|
|
| |
Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1
llvm-svn: 314596
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
NFC.
Added code gen regression tests for avx512 instructions scheduling called avx512-schedule.ll and
avx512-shuffle-schedule.ll.
This patch is in preparation of a larger patch of adding all SKX instruction scheduling and therefore
the scheduling for the avx512 instructions are still missing.
Reviewers: zvi, delena, RKSimon, igorb
Differential Revision: https://reviews.llvm.org/D38035
Change-Id: I792762763127a921b9e13684b58af03646536533
llvm-svn: 314594
|
| |
|
|
|
|
|
|
| |
Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results.
Differential Revision: https://reviews.llvm.org/D38307
llvm-svn: 314584
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a single library build without relaxation options.
When inlined library functions remove fast math attributes
from the functions they are integrated into.
This patch sets relaxation attributes on the functions after
linking provided corresponding relaxation options are given.
Math instructions inside the inlined functions remain to have
no fast flags, but inlining does not prevent fast math
transformations of a surrounding caller code anymore.
Differential Revision: https://reviews.llvm.org/D38325
llvm-svn: 314568
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently expandUnalignedLoad/Store uses place holder pointer info for temporary memory operand
in stack, which does not have correct address space. This causes unaligned private double16 load/store to be
lowered to flat_load instead of buffer_load for amdgcn target.
This fixes failures of OpenCL conformance test basic/vload_private/vstore_private on target amdgcn---amdgizcl.
Differential Revision: https://reviews.llvm.org/D35361
llvm-svn: 314566
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware will only forward EXEC_LO; the high 32 bits will be zero.
Additionally, inline constants do not work. At least,
v_addc_u32_e64 v0, vcc, v0, v1, -1
which could conceivably be used to combine (v0 + v1 + 1) into a single
instruction, acts as if all carry-in bits are zero.
The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine
s_mov_b64 s[0:1], exec
v_cndmask_b32_e64 v0, v1, v2, s[0:1]
into
v_mov_b32 v0, v3
but it's not particularly high priority.
Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.*
llvm-svn: 314522
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.
If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).
This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D37899
https://bugs.llvm.org/show_bug.cgi?id=34610
llvm-svn: 314516
|
| |
|
|
|
|
|
|
| |
Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val)
Differential Revision: https://reviews.llvm.org/D38161
llvm-svn: 314514
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Recommitting r314497. This version does not contain test which fails
when compiler is not build in debug mode.
Differential Revision: https://reviews.llvm.org/D37328
llvm-svn: 314507
|
| |
|
|
|
|
|
|
|
| |
Added test relies on the compiler being built in debug mode,
which may not be the case.
This reverts commit r314497.
llvm-svn: 314506
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Added additional tests for vector multiplications with multipliers that are:
* powers of 2 displaced by 1,
* product of a power of 2 displaced by one with another power of 2.
Patch by @pacxx (Michael Haidl)
Differential Revision: https://reviews.llvm.org/D38350
llvm-svn: 314504
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit adds comments on how the AMDPAL OS type overloads the
existing AMDGPU_ calling conventions used by Mesa, and adds a couple of
new ones.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37752
llvm-svn: 314502
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Added support for scratch (including spilling) for OS type amdpal:
generates code to set up the scratch descriptor if it is needed.
With amdpal, the scratch resource descriptor is loaded from offset 0 of
the global information table. The low 32 bits of the address of the
global information table is passed in s0.
Added amdgpu-git-ptr-high function attribute to hard-wire the high 32
bits of the address of the global information table. If the function
attribute is not specified, or is 0xffffffff, then the backend generates
code to use the high 32 bits of pc.
The documentation for the AMDPAL ABI will be added in a later commit.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye
Differential Revision: https://reviews.llvm.org/D37483
llvm-svn: 314501
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This operating system type represents the AMDGPU PAL runtime, and will
be required by the AMDGPU backend in order to generate correct code for
this runtime.
Currently it generates the same code as not specifying an OS at all.
That will change in future commits.
Patch from Tim Corringham.
Subscribers: arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D37380
llvm-svn: 314500
|
| |
|
|
|
|
|
|
|
|
| |
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Differential Revision: https://reviews.llvm.org/D37328
llvm-svn: 314497
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters.
I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs.
Reviewers: RKSimon, zvi, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38273
llvm-svn: 314474
|