| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit gives an address mode to the PLD instruction. We
were getting an assertion failure in the frame lowering code
because we had code that was doing a pld of a stack allocated
address. The frame lowering was checking the address mode and
then asserting because pld had none defined.
This commit fixes pld for arm mode. There was a previous fix for
thumb mode in a separate commit. The commit for thumb mode
added a test in a separate file because it would otherwise fail
for arm. This commit moves the thumb test back into the prefetch.ll
file and adds the corresponding arm test.
Differential Revision: http://llvm-reviews.chandlerc.com/D2622
llvm-svn: 200248
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when
the operand in input is a build vector of constants (or UNDEFs).
The inability to fold a sext/zext of a constant build_vector was the root
cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support.
Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a
ConstantSDNode.
llvm-svn: 200234
|
| |
|
|
|
|
| |
X86 directory.
llvm-svn: 200202
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from
mem-ops sequence sorting.
Consider, how MergeConsequtiveStores works for next example:
store i8 1, a[0]
store i8 2, a[1]
store i8 3, a[1] ; a[1] again.
return ; DAG starts here
1. Method will collect all the 3 stores.
2. It sorts them by distance from the base pointer (farthest with highest
index).
3. It takes first consecutive non-overlapping stores and (if possible) replaces
them with a single store instruction.
The point is, we can't determine here which 'store' instruction
would be the second after sorting ('store 2' or 'store 3').
It happens that 'store 3' would be the second, and 'store 2' would be the third.
So after merging we have the next result:
store i16 (1 | 3 << 8), base ; is a[0] but bit-casted to i16
store i8 2, a[1]
So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1].
Fix: In sort routine just also take into account mem-op sequence number.
llvm-svn: 200201
|
| |
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200196
|
| |
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200195
|
| |
|
|
|
|
|
|
| |
SHUFFLE_VECTOR.
Replace r199791.
llvm-svn: 200180
|
| |
|
|
|
|
| |
It's old version which has some bugs. I'll commit lattest patch soon.
llvm-svn: 200179
|
| |
|
|
| |
llvm-svn: 200141
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These were:
* noreorder handling on the target object streamer and asm parser.
* setting the initial flag bits based on the enabled features.
* setting the elf header flag for micromips
It is *really* depressing I am the one doing this instead of someone at
mips actually taking the time to understand the infrastructure.
llvm-svn: 200138
|
| |
|
|
|
|
|
| |
The popc instruction is defined in the SPARCv9 instruction set
architecture, but it was emulated on CPUs older than Niagara 2.
llvm-svn: 200131
|
| |
|
|
|
|
| |
Found by SingleSource/UnitTests/AtomicOps.c
llvm-svn: 200130
|
| |
|
|
| |
llvm-svn: 200119
|
| |
|
|
| |
llvm-svn: 200116
|
| |
|
|
|
|
| |
This is an expanded version of r200064.
llvm-svn: 200115
|
| |
|
|
| |
llvm-svn: 200113
|
| |
|
|
| |
llvm-svn: 200111
|
| |
|
|
| |
llvm-svn: 200110
|
| |
|
|
|
|
|
|
|
| |
I disabled the use of TBAA in CodeGen in r200093. This adds a test case that
demonstrates the problems with inttoptr and TBAA in CodeGen (and, specifically,
the problem that causes LLVM to miscompile itself in Release mode). This test
will currently fail if -use-tbaa-in-sched-mi is enabled.
llvm-svn: 200097
|
| |
|
|
| |
llvm-svn: 200094
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r200064 depends on r200051.
r200051 is broken: I tries to replace .mips_hack_elf_flags, which is a good
thing, but what it replaces it with is even worse.
The new emitMipsELFFlags it adds corresponds to no assembly directive, is not
marked as a hack and is not even printed to the .s file.
The patch also introduces more uses of hasRawTextSupport.
The correct way to remove .mips_hack_elf_flags is to have the mips target
streamer handle the default flags (and command line options). That way the
same code path is used for asm and obj. The streamer interface should *really*
correspond to what is printed in the .s file.
llvm-svn: 200078
|
| |
|
|
|
|
| |
No code changes. Just reassignment of test case files.
llvm-svn: 200064
|
| |
|
|
|
|
|
| |
This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.
llvm-svn: 200062
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.
We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.
llvm-svn: 200058
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The i8 type is not registered with any register class.
This causes a segmentation fault in MachineLICM::getRegisterClassIDAndCost.
The code selects the first type associated with register class FPR8,
which happens to be i8.
It uses this type (i8) to get the representative class pointer, which is 0.
It then uses this pointer to access a field, resulting in segmentation fault.
Since i8 type is not being used for printing any neon instruction
we can safely remove it.
llvm-svn: 200046
|
| |
|
|
|
|
|
|
| |
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.
llvm-svn: 200034
|
| |
|
|
|
|
| |
<rdar://problem/15611947>
llvm-svn: 200027
|
| |
|
|
|
|
| |
This reverts commit r200022 to unbreak the build bots.
llvm-svn: 200024
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.
First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.
If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.
When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.
This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)
Reviewed by Eric
llvm-svn: 200022
|
| |
|
|
|
|
|
| |
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.
llvm-svn: 200018
|
| |
|
|
|
|
|
| |
r200011 remove the special codepaths in MC for inline asm, so we can now test
all the logic with just llc + llvm-mc.
llvm-svn: 200013
|
| |
|
|
| |
llvm-svn: 199978
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This commit teaches the X86 backend to create the same X86 instructions when it
lowers an sadd/ssub with overflow intrinsic and a conditional branch that uses
that overflow result. This allows SelectionDAG to recognize and remove one of
the redundant operations.
This fixes <rdar://problem/15874016> and <rdar://problem/15661073>.
Reviewed by Nadav
llvm-svn: 199976
|
| |
|
|
|
|
| |
These all use the compare-and-swap CASA/CASXA instructions.
llvm-svn: 199975
|
| |
|
|
|
|
|
| |
loops. Writing back to the accumulator (231-type) allows the coalescer to
eliminate an extra copy.
llvm-svn: 199933
|
| |
|
|
|
|
|
|
|
| |
Originally, BLX was passed as operand #0 in MachineInstr and as operand
#2 in MCInst. But now, it's operand #2 in both cases.
This patch also removes unnecessary FileCheck in the test case added by r199127.
llvm-svn: 199928
|
| |
|
|
| |
llvm-svn: 199927
|
| |
|
|
| |
llvm-svn: 199925
|
| |
|
|
|
|
|
|
| |
void.
Patch by Scott Talbot.
llvm-svn: 199924
|
| |
|
|
|
|
|
|
|
|
| |
This pattern uses an SDNodeXForm, which isn't being emitted for some
reason. I can get it to work by attaching the PatLeaf that has the
XForm to the argument in the output pattern, but this results in an
immediate being used in a register operand, which the backend can't
handle yet.
llvm-svn: 199918
|
| |
|
|
|
|
|
|
| |
The control flow finalizer would sometimes use an ALU_POP_AFTER
instruction before the vetex fetch clause instead of using a POP
instruction after it.
llvm-svn: 199917
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Implement the getUnrollingPreferences() function for
AMDGPUTargetTransformInfo so that loops that do address calculations
on pointers derived from alloca are unconditionally unrolled.
Unrolling these loops makes it more likely that SROA will be able to
eliminate the allocas, which is a big win for R600 since memory
allocated by alloca (private memory) is really slow.
llvm-svn: 199916
|
| |
|
|
| |
llvm-svn: 199911
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The unit test is now disabled on non-asserts builds.
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)
We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199905
|
| |
|
|
|
|
| |
they give better sequences than VPERMI
llvm-svn: 199893
|
| |
|
|
|
|
|
|
|
| |
With constant-sharing, litpool loads consume 4 + N*2 bytes of code, but
movw/movt pairs consume 8*N. This means litpools are better than movw/movt even
with just one use. Other materialisation strategies can still be better though,
so the logic is a little odd.
llvm-svn: 199891
|
| |
|
|
|
|
| |
r199461.
llvm-svn: 199861
|
| |
|
|
|
|
|
|
|
|
| |
the pointer type is illegal.
This is a horrible bit of code. We're calling a simplification routine *in the middle* of type legalization. We tell the
simplification routine that it's running after legalization, but some of the types it will encounter will be illegal! The
fix is only to invoke the simplification if the types in question were legal, so that none of its invariants will be violated.
llvm-svn: 199847
|
| |
|
|
|
|
|
|
|
| |
This reverts commit 35b8331cad6eb512a2506adbc394201181da94ba.
The -debug-only flag for llc doesn't appear to be available in
all build configurations.
llvm-svn: 199845
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)
We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199842
|