| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
information and update all callers. No functional change.
llvm-svn: 214781
|
|
|
|
|
|
|
|
|
| |
scalar integer instruction pass.
This is a patch I had lying around from a few months ago. The pass is
currently disabled by default, so nothing to interesting.
llvm-svn: 214779
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the last instruction prior to a function epilogue is a call, we
need to emit a nop so that the return address is not in the epilogue IP
range. This is consistent with MSVC's behavior, and may be a workaround
for a bug in the Win64 unwinder.
Differential Revision: http://reviews.llvm.org/D4751
Patch by Vadim Chugunov!
llvm-svn: 214775
|
|
|
|
| |
llvm-svn: 214769
|
|
|
|
|
|
|
| |
These were just wrong, using the wrong register classes
and store2 was missing an operand.
llvm-svn: 214756
|
|
|
|
|
|
| |
systems it represents.
llvm-svn: 214755
|
|
|
|
| |
llvm-svn: 214754
|
|
|
|
|
|
|
| |
nothing subtarget dependent about the intrinsic support in any
backend as far as I can tell.
llvm-svn: 214738
|
|
|
|
| |
llvm-svn: 214733
|
|
|
|
|
|
| |
This corrects r214672, which was committed to silence a gcc warning.
llvm-svn: 214732
|
|
|
|
| |
llvm-svn: 214731
|
|
|
|
| |
llvm-svn: 214729
|
|
|
|
| |
llvm-svn: 214728
|
|
|
|
|
|
|
| |
allow us to forward all of the standard TargetMachine calls to the
subtarget and still return null as we were before.
llvm-svn: 214727
|
|
|
|
|
|
| |
Move the test cases for them into separate files.
llvm-svn: 214724
|
|
|
|
|
|
|
|
| |
Instructions: VMOVAPD, VMOVAPS, VMOVDQA8, VMOVDQA16, VMOVDQA32,VMOVDQA64, VMOVDQU8, VMOVDQU16, VMOVDQU32,VMOVDQU64, VMOVUPD, VMOVUPS,
Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>
llvm-svn: 214719
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments. As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.
This fixes another regression in llvmpipe when vector support is
enabled.
Reviewed by Bill Schmidt.
llvm-svn: 214718
|
|
|
|
|
|
| |
likely-less-than-useful MSVC warnings: warning C4258: 'I' : definition from the for loop is ignored; the definition from the enclosing scope is used.
llvm-svn: 214717
|
|
|
|
|
|
|
|
|
|
| |
I ran into some test failures where common code changed vector division
by constant into a multiply-high operation (MULHU). But these are not
implemented by the back-end, so we failed to recognize the insn.
Fixed by marking MULHU/MULHS as Expand for vector types.
llvm-svn: 214716
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch refactors code generation of vector comparisons.
This fixes a wrong code-gen bug for ISD::SETGE for floating-point types,
and improves generated code for vector comparisons in general.
Specifically, the patch moves all logic deciding how to implement vector
comparisons into getVCmpInst, which gets two extra boolean outputs
indicating to its caller whether its needs to swap the input operands
and/or negate the result of the comparison. Apart from implementing
these two modifications as directed by getVCmpInst, there is no need
to ever implement vector comparisons in any other manner; in particular,
there is never a need to perform two separate comparisons (e.g. one for
equal and one for greater-than, as code used to do before this patch).
Reviewed by Bill Schmidt.
llvm-svn: 214714
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch also fixes an issue with the way the Mips assembler enables/disables architecture
features. Before this patch, the assembler never disabled feature bits. For example,
.set mips64
.set mips32r2
would result in the 'OR' of mips64 with mips32r2 feature bits which isn't right.
Unfortunately this isn't trivial to fix because there's not an easy way to clear
feature bits as the algorithm in MCSubtargetInfo (ToggleFeature) only clears the bits
that imply the feature being cleared and not the implied bits by the feature (there's a
better explanation to the code I added).
Patch by Matheus Almeida and updated by Toma Tabacu
Reviewers: vmedic, matheusalmeida, dsanders
Reviewed By: dsanders
Subscribers: tomatabacu, llvm-commits
Differential Revision: http://reviews.llvm.org/D4123
llvm-svn: 214709
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
use of PACKUS. It's cleaner that way.
I looked at implementing clever combine-based folding of PACKUS chains
into PSHUFB but it is quite hard and doesn't seem likely to be worth it.
The most annoying part would be detecting that the correct masking had
been done to use PACKUS-style instructions as a blend operation rather
than there being any saturating as is indicated by its name. We generate
really nice code for what few test cases I've come up with that aren't
completely contrived for this by just directly prefering PSHUFB and so
let's go with that strategy for now. =]
llvm-svn: 214707
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
patterns of v16i8 shuffles.
This implements one of the more important FIXMEs for the SSE2 support in
the new shuffle lowering. We now generate the optimal shuffle sequence
for truncate-derived shuffles which show up essentially everywhere.
Unfortunately, this exposes a weakness in other parts of the shuffle
logic -- we can no longer form PSHUFB here. I'll add the necessary
support for that and other things in a subsequent commit.
llvm-svn: 214702
|
|
|
|
|
|
| |
This commit broke "make check" for several hours, so get it reverted.
llvm-svn: 214697
|
|
|
|
|
|
|
|
|
|
| |
I spent some time looking into a better or more principled way to handle
this. For example, by detecting arbitrary "unneeded" ORs... But really,
there wasn't any point. We just shouldn't build blatantly wrong code so
late in the pipeline rather than adding more stages and logic later on
to fix it. Avoiding this is just too simple.
llvm-svn: 214680
|
|
|
|
|
|
|
|
|
| |
GCC 4.8.2 points out the ambiguity in evaluation of the assertion condition:
lib/Target/X86/X86FloatingPoint.cpp:949:49: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
assert(STReturns == 0 || isMask_32(STReturns) && N <= 2);
llvm-svn: 214672
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sequence - AArch64 target support
This patch turns off madd/msub generation in the DAGCombiner and generates
them in the MachineCombiner instead. It replaces the original code sequence
with the combined sequence when it is beneficial to do so.
When there is no machine model support it always generates the madd/msub
instruction. This is true also when the objective is to optimize for code
size: when the combined sequence is shorter is always chosen and does not
get evaluated.
When there is a machine model the combined instruction sequence
is evaluated for critical path and resource length using machine
trace metrics and the original code sequence is replaced when it is
determined to be faster.
rdar://16319955
llvm-svn: 214669
|
|
|
|
|
|
|
|
|
| |
This makes EmitWindowsUnwindTables a virtual function and lowers the
implementation of the function to the X86WinCOFFStreamer. This method is a
target specific operation. This enables making the behaviour target dependent
by isolating it entirely to the target specific streamer.
llvm-svn: 214664
|
|
|
|
|
|
|
|
|
| |
This slipped in in r214467, so something like
V_MOV_B32_e32 v0, ... is now printed with 2 spaces
between the instruction name and first operand.
llvm-svn: 214660
|
|
|
|
| |
llvm-svn: 214640
|
|
|
|
| |
llvm-svn: 214639
|
|
|
|
|
|
|
|
|
| |
when let can do the same thing. Keep the 64bit variants as codegen-only.
While they have a different register class, the encoding is the same for
32bit and 64bit mode. Having both present would otherwise confuse the
disassembler.
llvm-svn: 214636
|
|
|
|
|
|
| |
forget to update the comment here... =/
llvm-svn: 214630
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lowering with a small addition to it and adding PSHUFB combining.
There is one obvious place in the new vector shuffle lowering where we
should form PSHUFBs directly: when without them we will unpack a vector
of i8s across two different registers and do a potentially 4-way blend
as i16s only to re-pack them into i8s afterward. This is the crazy
expensive fallback path for i8 shuffles and we can just directly use
pshufb here as it will always be cheaper (the unpack and pack are
two instructions so even a single shuffle between them hits our
three instruction limit for forming PSHUFB).
However, this doesn't generate very good code in many cases, and it
leaves a bunch of common patterns not using PSHUFB. So this patch also
adds support for extracting a shuffle mask from PSHUFB in the X86
lowering code, and uses it to handle PSHUFBs in the recursive shuffle
combining. This allows us to combine through them, combine multiple ones
together, and generally produce sufficiently high quality code.
Extracting the PSHUFB mask is annoyingly complex because it could be
either pre-legalization or post-legalization. At least this doesn't have
to deal with re-materialized constants. =] I've added decode routines to
handle the different patterns that show up at this level and we dispatch
through them as appropriate.
The two primary test cases are updated. For the v16 test case there is
still a lot of room for improvement. Since I was going through it
systematically I left behind a bunch of FIXME lines that I'm hoping to
turn into ALL lines by the end of this.
llvm-svn: 214628
|
|
|
|
|
|
|
| |
Spotted this missed refactoring by inspection when reading code, and it
doesn't changethe functionality at all.
llvm-svn: 214627
|
|
|
|
| |
llvm-svn: 214626
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of normally binary shuffle instructions like PUNPCKL and MOVLHPS.
This detects cases where a single register is used for both operands
making the shuffle behave in a unary way. We detect this and adjust the
mask to use the unary form which allows the existing DAG combine for
shuffle instructions to actually work at all.
As a consequence, this uncovered a number of obvious bugs in the
existing DAG combine which are fixed. It also now canonicalizes several
shuffles even with the existing lowering. These typically are trying to
match the shuffle to the domain of the input where before we only really
modeled them with the floating point variants. All of the cases which
change to an integer shuffle here have something in the integer domain, so
there are no more or fewer domain crosses here AFAICT. Technically, it
might be better to go from a GPR directly to the floating point domain,
but detecting floating point *outputs* despite integer inputs is a lot
more code and seems unlikely to be worthwhile in practice. If folks are
seeing domain-crossing regressions here though, let me know and I can
hack something up to fix it.
Also as a consequence, a bunch of missed opportunities to form pshufb
now can be formed. Notably, splats of i8s now form pshufb.
Interestingly, this improves the existing splat lowering too. We go from
3 instructions to 1. Yes, we may tie up a register, but it seems very
likely to be worth it, especially if splatting the 0th byte (the
common case) as then we can use a zeroed register as the mask.
llvm-svn: 214625
|
|
|
|
|
|
|
| |
PSHUFB forms. This will be important to update some AVX tests when I add
PSHUFB combining.
llvm-svn: 214624
|
|
|
|
|
|
|
|
|
| |
expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used
in pic mode. This patch fixes the bug.
<rdar://problem/17886592>
llvm-svn: 214614
|
|
|
|
|
|
| |
Avoid weird line wrapping of BuildMI dest register.
llvm-svn: 214608
|
|
|
|
|
|
|
|
|
|
|
|
| |
introduced during legalization. This pattern is based on other patterns
in the legalizer that I changed in the same way. Now, the legalizer
eagerly collects its garbage when necessary so that we can survive
leaving such nodes around for it.
Instead, we add an assert to make sure the node will be correctly
handled by that layer.
llvm-svn: 214602
|
|
|
|
|
|
|
| |
deleting them. This already seems to work, as no tests fail without
this.
llvm-svn: 214601
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stop using ST registers for function returns and inline-asm instructions and use
FP registers instead. This allows removing a large amount of code in the
stackifier pass that was needed to track register liveness and handle copies
between ST and FP registers and function calls returning floating point values.
It also fixes a bug which manifests when an ST register defined by an
inline-asm instruction was live across another inline-asm instruction, as shown
in the following sequence of machine instructions:
1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5>
2. INLINEASM <es:fldcw $0>
3. %FP0<def> = COPY %ST0
<rdar://problem/16952634>
llvm-svn: 214580
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fromulation of the node, which isn't really the desired behavior from
within the combiner or legalizer, but is necessary within ISel. I've
added a hopefully helpful comment and fixed the only two places where
this took place.
Yet another step toward the combiner and legalizer not needing to use
update listeners with virtual calls to manage the worklists behind
legalization and combining.
llvm-svn: 214574
|
|
|
|
|
|
|
|
| |
This reverts commit r214566.
I did not mean to commit this yet.
llvm-svn: 214572
|
|
|
|
| |
llvm-svn: 214569
|
|
|
|
|
|
|
| |
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.
llvm-svn: 214566
|
|
|
|
|
|
|
|
|
| |
so that we can use it to get the old-style JIT out of the subtarget.
This code should be removed when the old-style JIT is removed
(imminently).
llvm-svn: 214560
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is consistent with how we parse them in a standalone .s file, and
inline assembly shouldn't differ.
This fixes errors about requiring more registers than available in
cases like this:
void f();
void __declspec(naked) g() {
__asm pusha
__asm call f
__asm popa
__asm ret
}
There are no registers available to pass the address of 'f' into the asm
blob. The asm should now directly call 'f'.
Tests will land in Clang shortly.
llvm-svn: 214550
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fold simple offsets into the memory operation:
add x0, x0, #8
ldr x0, [x0]
-->
ldr x0, [x0, #8]
Fixes <rdar://problem/17887945>.
llvm-svn: 214545
|