| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
by Asaf Badouh
http://reviews.llvm.org/D6456
llvm-svn: 224764
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
It is tempting to mark the fixed stack slot used to store the return address as
immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the
function, it is not completely immutable: it is written during the function
prologue. When using post-RA instruction scheduling, the prologue instructions
are available for scheduling, and we're not free to interchange the order of a
particular store in the prologue with loads from that stack location.
Fixes PR21976.
llvm-svn: 224761
|
| |
|
|
|
|
| |
Added more intrinsics and encoding tests.
llvm-svn: 224760
|
| |
|
|
|
|
|
|
|
|
| |
In r224033, in moving the signed power-of-2 division expansion into
BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a
64-bit division on PPC32. This later asserts.
Fixes PR21928.
llvm-svn: 224758
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r223862/r224203 tried to also combine base-updating load/stores.
There was a mistake there: the alignment was added as is as an operand to
the ARMISD::VLD/VST node. However, the VLD/VST selection logic doesn't care
about less-than-standard alignment attributes.
For example, no matter the alignment of a v2i64 load (say 1), SelectVLD picks
VLD1q64 (because of the memory type). But VLD1q64 ("vld1.64 {dXX, dYY}") is
8-aligned, per ARMARMv7a 3.2.1.
For the 1-aligned load, what we really want is VLD1q8.
This commit introduces bitcasts if necessary, and changes the vld/vst type to
one whose standard alignment matches the original load/store alignment.
Differential Revision: http://reviews.llvm.org/D6759
llvm-svn: 224754
|
| |
|
|
| |
llvm-svn: 224752
|
| |
|
|
|
|
|
|
|
|
|
| |
When combining consecutive loads+inserts into a single vector load,
we should keep the alignment of the base load. Doing otherwise can, and does,
lead to using overly aligned instructions. In the included test case, for
example, using a 32-byte vmovaps on a 16-byte aligned value. Oops.
rdar://19190968
llvm-svn: 224746
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously I tried to plug musttail into the existing vararg lowering
code. That turned out to be a mistake, because non-vararg calls use
significantly different register lowering, even on x86. For example, AVX
vectors are usually passed in registers to normal functions and memory
to vararg functions. Now musttail uses a completely separate lowering.
Hopefully this can be used as the basis for non-x86 perfect forwarding.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D6156
llvm-svn: 224745
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Followup to r224294:
ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.
llvm-svn: 224743
|
| |
|
|
| |
llvm-svn: 224735
|
| |
|
|
| |
llvm-svn: 224730
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when ctpop is supported for scalar types, the expansion of
@llvm.ctpop.vXiY uses vector element extractions, insertions and individual
calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used
for the scalar calls.
Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY
expansion in some cases by using a using a vector parallel bit twiddling
approach, based on:
v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = ((v + (v >> 4) & 0xF0F0F0F)
v = v + (v >> 8)
v = v + (v >> 16)
v = v & 0x0000003F
(from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)
When scalar ctpop isn't supported, the approach above performs better for
v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop
is supported, this approach performs ~2x better for v8i32.
Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop
support with -march=core-avx2.
== [x86_64h - new]
v8i32: 0.661685
v4i32: 0.514678
v4i64: 0.652009
v2i64: 0.324289
== [x86_64h - old]
v8i32: 1.29578
v4i32: 0.528807
v4i64: 0.65981
v2i64: 0.330707
== [x86_64 - new]
v8i32: 1.003
v4i32: 0.656273
v4i64: 1.11711
v2i64: 0.754064
== [x86_64 - old]
v8i32: 2.34886
v4i32: 1.72053
v4i64: 1.41086
v2i64: 1.0244
More work for other vector types will come next.
llvm-svn: 224725
|
| |
|
|
|
|
| |
intrinsics, encoding tests for AVX-512F and skx instructions.
llvm-svn: 224707
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch pattern matches code such as-
neg w8, w8
mul w8, w9, w8
to
mneg w8, w8, w9
Review: http://reviews.llvm.org/D6754
llvm-svn: 224706
|
| |
|
|
|
|
| |
from patterns for the 32-bit version.
llvm-svn: 224692
|
| |
|
|
|
|
|
|
|
| |
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.
llvm-svn: 224691
|
| |
|
|
| |
llvm-svn: 224687
|
| |
|
|
|
|
| |
printing of the alias instead of the real instruction.
llvm-svn: 224686
|
| |
|
|
| |
llvm-svn: 224685
|
| |
|
|
|
|
| |
call/jump in Intel syntax.
llvm-svn: 224684
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ARM ARM states:
LDM/LDMIA/LDMFD:
The SP can be in the list. However, ARM deprecates using these instructions
with SP in the list.
ARM deprecates using these instructions with both the LR and the PC in the
list.
LDMDA/LDMFA/LDMDB/LDMEA/LDMIB/LDMED:
The SP can be in the list. However, instructions that include the SP in the
list are deprecated.
Instructions that include both the LR and the PC in the list are deprecated.
POP:
The SP can only be in the list before ARMv7. ARM deprecates any use of ARM
instructions that include the SP, and the value of the SP after such an
instruction is UNKNOWN.
ARM deprecates the use of this instruction with both the LR and the PC in
the list.
Attempt to diagnose use of deprecated forms of these instructions. This mirrors
the previous changes to diagnose use of the deprecated forms of STM in ARM mode.
llvm-svn: 224682
|
| |
|
|
| |
llvm-svn: 224678
|
| |
|
|
| |
llvm-svn: 224655
|
| |
|
|
| |
llvm-svn: 224650
|
| |
|
|
| |
llvm-svn: 224648
|
| |
|
|
|
|
|
| |
The codegen failed on 128-bit types on AVX2.
I added patterns and in td files and tests.
llvm-svn: 224647
|
| |
|
|
|
|
|
| |
If the condition is used for something else, this increases
the number of instructions.
llvm-svn: 224646
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is intended to be used for a family of personality functions that
have similar IR preparation requirements. Typically when interoperating
with MSVC personality functions, bits of functionality need to be
outlined from the main function into helper functions. There is also
usually more than one landing pad per invoke, which does not match the
LLVM IR landingpad representation.
None of this is implemented yet. This change just adds a new enum that
is active for *-windows-msvc and delegates to the EH removal preparation
pass. No functionality change for other targets.
llvm-svn: 224625
|
| |
|
|
|
|
|
|
|
|
|
| |
destination (PR14221)
This is a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).
That patch started to fix PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ), but it was not completed.
Differential Revision: http://reviews.llvm.org/D6330
llvm-svn: 224624
|
| |
|
|
|
|
|
| |
The constant bus restrictions only apply to VALU instructions. This
enables SIFoldOperands to fold immediates into SALU instructions.
llvm-svn: 224623
|
| |
|
|
|
|
|
|
| |
mubuf instructions now define the soffset field using the SCSrc_32
register class which indicates that only SGPRs and inline constants
are allowed.
llvm-svn: 224622
|
| |
|
|
| |
llvm-svn: 224612
|
| |
|
|
| |
llvm-svn: 224610
|
| |
|
|
| |
llvm-svn: 224609
|
| |
|
|
| |
llvm-svn: 224608
|
| |
|
|
| |
llvm-svn: 224604
|
| |
|
|
| |
llvm-svn: 224599
|
| |
|
|
|
|
| |
register class.
llvm-svn: 224598
|
| |
|
|
|
|
| |
Found by the Clang static analyzer.
llvm-svn: 224586
|
| |
|
|
| |
llvm-svn: 224556
|
| |
|
|
| |
llvm-svn: 224552
|
| |
|
|
| |
llvm-svn: 224550
|
| |
|
|
|
|
|
|
|
| |
Fix bugs related to atomic microMIPS SC/LL instructions: While expanding atomic
operations the mips32r2 encoding was emitted instead of microMIPS.
Differential Revision: http://reviews.llvm.org/D6659
llvm-svn: 224524
|
| |
|
|
|
|
|
|
|
| |
Fix an off-by-one access introduced in 224502 for push.w and pop.w with single
register operands. Add test cases for both scenarios.
Thanks to Asiri Rathnayake for pointing out the failure!
llvm-svn: 224521
|
| |
|
|
|
|
|
| |
Added RegOp2MemOpTable4 to transform 4th operand from register to memory in merge-masked versions of instructions.
Added lowering tests.
llvm-svn: 224516
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ARM Architecture Reference Manual states the following:
LDM{,IA,DB}:
The SP cannot be in the list.
The PC can be in the list.
If the PC is in the list:
• the LR must not be in the list
• the instruction must be either outside any IT block, or the last
instruction in an IT block.
POP:
The PC can be in the list.
If the PC is in the list:
• the LR must not be in the list
• the instruction must be either outside any IT block, or the last
instruction in an IT block.
PUSH:
The SP and PC can be in the list in ARM instructions, but not in Thumb
instructions.
STM:{,IA,DB}:
The SP and PC can be in the list in ARM instructions, but not in Thumb
instructions.
llvm-svn: 224502
|
| |
|
|
|
|
| |
Only put the anonymous namespace around classes. NFC.
llvm-svn: 224498
|
| |
|
|
| |
llvm-svn: 224497
|
| |
|
|
|
|
| |
Near as I can tell prefixes are ignored on these instructions except for a comment in the Intel docs about 0xf3. Binutils disassembler seems to ignore prefixes on these instructions. Our disassembler still doesn't distinguish PS and "no prefix" well enough for this to make a functional change, but it helps with experiments I'm doing on a potential new disassembler table builder.
llvm-svn: 224496
|
| |
|
|
|
|
| |
already indicate use of REX.W.
llvm-svn: 224495
|