| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit gives an address mode to the PLD instruction. We
were getting an assertion failure in the frame lowering code
because we had code that was doing a pld of a stack allocated
address. The frame lowering was checking the address mode and
then asserting because pld had none defined.
This commit fixes pld for arm mode. There was a previous fix for
thumb mode in a separate commit. The commit for thumb mode
added a test in a separate file because it would otherwise fail
for arm. This commit moves the thumb test back into the prefetch.ll
file and adds the corresponding arm test.
Differential Revision: http://llvm-reviews.chandlerc.com/D2622
llvm-svn: 200248
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when
the operand in input is a build vector of constants (or UNDEFs).
The inability to fold a sext/zext of a constant build_vector was the root
cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support.
Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a
ConstantSDNode.
llvm-svn: 200234
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This commit allows LLVM MC to process .cfi_startproc directives when
they are followed by an additional `simple' identifier. This signals to
elide the emission of target specific CFI instructions that would
normally occur initially.
This fixes PR16587.
Differential Revision: http://llvm-reviews.chandlerc.com/D2624
llvm-svn: 200227
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cold loops as-if they were being optimized for size.
Nothing fancy here. Simply test case included. The nice thing is that we
can now incrementally build on top of this to drive other heuristics.
All of the infrastructure work is done to get the profile information
into this layer.
The remaining work necessary to make this a fully general purpose loop
unroller for very hot loops is to make it a fully general purpose loop
unroller. Things I know of but am not going to have time to benchmark
and fix in the immediate future:
1) Don't disable the entire pass when the target is lacking vector
registers. This really doesn't make any sense any more.
2) Teach the unroller at least and the vectorizer potentially to handle
non-if-converted loops. This is trivial for the unroller but hard for
the vectorizer.
3) Compute the relative hotness of the loop and thread that down to the
various places that make cost tradeoffs (very likely only the
unroller makes sense here, and then only when dealing with loops that
are small enough for unrolling to not completely blow out the LSD).
I'm still dubious how useful hotness information will be. So far, my
experiments show that if we can get the correct logic for determining
when unrolling actually helps performance, the code size impact is
completely unimportant and we can unroll in all cases. But at least
we'll no longer burn code size on cold code.
One somewhat unrelated idea that I've had forever but not had time to
implement: mark all functions which are only reachable via the global
constructors rigging in the module as optsize. This would also decrease
the impact of any more aggressive heuristics here on code size.
llvm-svn: 200219
|
| |
|
|
|
|
| |
Insert before the terminating instruction of the dominating block instead.
llvm-svn: 200218
|
| |
|
|
|
|
|
| |
to stabilize a test that really is trying to test generic behavior and
not a specific target's behavior.
llvm-svn: 200215
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
powers of two. This is essentially always the correct thing given the
impact on alignment, scaling factors that can be used in addressing
modes, etc. Also, fix the management of the unroll vs. small loop cost
to more accurately model things with this world.
Enhance a test case to actually exercise more of the unroll machinery if
using synthetic constants rather than a specific target model. Before
this change, with the added flags this test will unroll 3 times instead
of either 2 or 4 (the two sensible answers).
While I don't expect this to make a huge difference, if there are lots
of loops sitting right on the edge of hitting the 'small unroll' factor,
they might change behavior. However, I've benchmarked moving the small
loop cost up and down in many various ways and by a huge factor (2x)
without seeing more than 0.2% code size growth. Small adjustments such
as the series that led up here have led to about 1% improvement on some
benchmarks, but it is very close to the noise floor so I mostly checked
that nothing regressed. Let me know if you see bad behavior on other
targets but I don't expect this to be a sufficiently dramatic change to
trigger anything.
llvm-svn: 200213
|
| |
|
|
|
|
| |
assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else!
llvm-svn: 200210
|
| |
|
|
|
|
|
|
|
|
| |
any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits.
Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary.
Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown.
llvm-svn: 200203
|
| |
|
|
|
|
| |
X86 directory.
llvm-svn: 200202
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from
mem-ops sequence sorting.
Consider, how MergeConsequtiveStores works for next example:
store i8 1, a[0]
store i8 2, a[1]
store i8 3, a[1] ; a[1] again.
return ; DAG starts here
1. Method will collect all the 3 stores.
2. It sorts them by distance from the base pointer (farthest with highest
index).
3. It takes first consecutive non-overlapping stores and (if possible) replaces
them with a single store instruction.
The point is, we can't determine here which 'store' instruction
would be the second after sorting ('store 2' or 'store 3').
It happens that 'store 3' would be the second, and 'store 2' would be the third.
So after merging we have the next result:
store i16 (1 | 3 << 8), base ; is a[0] but bit-casted to i16
store i8 2, a[1]
So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1].
Fix: In sort routine just also take into account mem-op sequence number.
llvm-svn: 200201
|
| |
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200196
|
| |
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200195
|
| |
|
|
|
|
|
| |
editbin.exe and link.exe both accepts /highentropyva option to set this bit, so
doing s/VIRTUAL_ADDRESS/VA/ should make sense.
llvm-svn: 200191
|
| |
|
|
|
|
|
|
| |
SHUFFLE_VECTOR.
Replace r199791.
llvm-svn: 200180
|
| |
|
|
|
|
| |
It's old version which has some bugs. I'll commit lattest patch soon.
llvm-svn: 200179
|
| |
|
|
|
|
|
| |
Placed the MC variant diagnostics in the wrong directory accidentally. Move
them into their respective architecture specific directories.
llvm-svn: 200161
|
| |
|
|
|
|
|
|
|
| |
If a complex expression was passed to the .word directive and the first part of
the directive failed to parse, a secondary diagnostic would be produced that
would clutter the error diagnostics. Improve the diagnostics by consuming the
remainder of the statement.
llvm-svn: 200160
|
| |
|
|
|
|
|
|
|
| |
An emitted diagnostic for an invalid relocation variant would place the caret on
the token following the relocation variant indicator or at the end of the line
if there was no following token. This change corrects the placement of the
caret to point to the token.
llvm-svn: 200159
|
| |
|
|
| |
llvm-svn: 200141
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These were:
* noreorder handling on the target object streamer and asm parser.
* setting the initial flag bits based on the enabled features.
* setting the elf header flag for micromips
It is *really* depressing I am the one doing this instead of someone at
mips actually taking the time to understand the infrastructure.
llvm-svn: 200138
|
| |
|
|
|
|
|
| |
The popc instruction is defined in the SPARCv9 instruction set
architecture, but it was emulated on CPUs older than Niagara 2.
llvm-svn: 200131
|
| |
|
|
|
|
| |
Found by SingleSource/UnitTests/AtomicOps.c
llvm-svn: 200130
|
| |
|
|
| |
llvm-svn: 200127
|
| |
|
|
|
|
|
|
|
| |
That bit is not documented in the PE/COFF spec published by Microsoft, so we
don't know the official name of it. I named this bit
IMAGE_DLL_CHARACTERISTICS_HIGH_ENTROPY_VIRTUAL_ADDRESS because the bit is
reported as "high entropy virtual address" by dumpbin.exe,
llvm-svn: 200121
|
| |
|
|
| |
llvm-svn: 200119
|
| |
|
|
|
|
|
|
|
|
|
|
| |
PE32+ supports 64 bit address space, but the file format remains 32 bit.
So its file format is pretty similar to PE32 (32 bit executable). The
differences compared to PE32 are (1) the lack of "BaseOfData" field and
(2) some of its data members are 64 bit.
In this patch, I added a new member function to get a PE32+ Header object to
COFFObjectFile class and made llvm-readobj to use it.
llvm-svn: 200117
|
| |
|
|
| |
llvm-svn: 200116
|
| |
|
|
|
|
| |
This is an expanded version of r200064.
llvm-svn: 200115
|
| |
|
|
| |
llvm-svn: 200113
|
| |
|
|
| |
llvm-svn: 200112
|
| |
|
|
| |
llvm-svn: 200111
|
| |
|
|
| |
llvm-svn: 200110
|
| |
|
|
|
|
|
|
|
| |
After several refactorings on the MCJIT remote communication, things are
finally looking good on Clang-compiled LLVM regarding MCJIT remote tests,
so I'm re-enabling them to see how the self-hosting buildbot behaves over
a longer period.
llvm-svn: 200102
|
| |
|
|
|
|
| |
llvm-cov test is not supported in big-endian architectures.
llvm-svn: 200101
|
| |
|
|
|
|
|
|
|
| |
I disabled the use of TBAA in CodeGen in r200093. This adds a test case that
demonstrates the problems with inttoptr and TBAA in CodeGen (and, specifically,
the problem that causes LLVM to miscompile itself in Release mode). This test
will currently fail if -use-tbaa-in-sched-mi is enabled.
llvm-svn: 200097
|
| |
|
|
| |
llvm-svn: 200094
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a regression introduced by r182908, which broke
llvm-objdump's ability to display relocations inline in a disassembly
dump for ELF object files.
That change removed a SectionRelocMap from Object/ELF.h, which we
recreate in llvm-objdump.cpp.
I discovered this regression via an out-of-tree test
(test/NaCl/X86/pnacl-hides-sandbox-x86-64.ll) which used llvm-objdump.
Note that the "Unknown" string in the test output on i386 isn't quite
right, but this appears to be a pre-existing bug.
Differential Revision: http://llvm-reviews.chandlerc.com/D2559
llvm-svn: 200090
|
| |
|
|
|
|
| |
and features)
llvm-svn: 200083
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r200064 depends on r200051.
r200051 is broken: I tries to replace .mips_hack_elf_flags, which is a good
thing, but what it replaces it with is even worse.
The new emitMipsELFFlags it adds corresponds to no assembly directive, is not
marked as a hack and is not even printed to the .s file.
The patch also introduces more uses of hasRawTextSupport.
The correct way to remove .mips_hack_elf_flags is to have the mips target
streamer handle the default flags (and command line options). That way the
same code path is used for asm and obj. The streamer interface should *really*
correspond to what is printed in the .s file.
llvm-svn: 200078
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the loops in a function, and teach LICM to work in the presance of
LCSSA.
Previously, LCSSA was a loop pass. That made passes requiring it also be
loop passes and unable to depend on function analysis passes easily. It
also caused outer loops to have a different "canonical" form from inner
loops during analysis. Instead, we go into LCSSA form and preserve it
through the loop pass manager run.
Note that this has the same problem as LoopSimplify that prevents
enabling its verification -- loop passes which run at the end of the loop
pass manager and don't preserve these are valid, but the subsequent loop
pass runs of outer loops that do preserve this pass trigger too much
verification and fail because the inner loop no longer verifies.
The other problem this exposed is that LICM was completely unable to
handle LCSSA form. It didn't preserve it and it actually would give up
on moving instructions in many cases when they were used by an LCSSA phi
node. I've taught LICM to support detecting LCSSA-form PHI nodes and to
hoist and sink around them. This may actually let LICM fire
significantly more because we put everything into LCSSA form to rotate
the loop before running LICM. =/ Now LICM should handle that fine and
preserve it correctly. The down side is that LICM has to require LCSSA
in order to preserve it. This is just a fact of life for LCSSA. It's
entirely possible we should completely remove LCSSA from the optimizer.
The test updates are essentially accomodating LCSSA phi nodes in the
output of LICM, and the fact that we now completely sink every
instruction in ashr-crash below the loop bodies prior to unrolling.
With this change, LCSSA is computed only three times in the pass
pipeline. One of them could be removed (and potentially a SCEV run and
a separate LoopPassManager entirely!) if we had a LoopPass variant of
InstCombine that ran InstCombine on the loop body but refused to combine
away LCSSA PHI nodes. Currently, this also prevents loop unrolling from
being in the same loop pass manager is rotate, LICM, and unswitch.
There is one thing that I *really* don't like -- preserving LCSSA in
LICM is quite expensive. We end up having to re-run LCSSA twice for some
loops after LICM runs because LICM can undo LCSSA both in the current
loop and the parent loop. I don't really see good solutions to this
other than to completely move away from LCSSA and using tools like
SSAUpdater instead.
llvm-svn: 200067
|
| |
|
|
|
|
| |
No code changes. Just reassignment of test case files.
llvm-svn: 200064
|
| |
|
|
|
|
|
| |
This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.
llvm-svn: 200062
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.
We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.
llvm-svn: 200058
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch uses a common MipsTargetSteamer interface for both
MipsAsmPrinter and MipsAsmParser for recording default and commandline
driven directives that affect ELF header flags.
It has been noted that the .ll tests affected by this patch belong in
test/Codegen/Mips. I will move them in a separate patch.
Also, a number of directives do not get expressed by AsmPrinter in the
resultant .s assembly such as setting the correct ASI. I have noted this
in the tests and they will be addressed in later patches.
llvm-svn: 200051
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The i8 type is not registered with any register class.
This causes a segmentation fault in MachineLICM::getRegisterClassIDAndCost.
The code selects the first type associated with register class FPR8,
which happens to be i8.
It uses this type (i8) to get the representative class pointer, which is 0.
It then uses this pointer to access a field, resulting in segmentation fault.
Since i8 type is not being used for printing any neon instruction
we can safely remove it.
llvm-svn: 200046
|
| |
|
|
|
|
|
|
| |
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.
llvm-svn: 200034
|
| |
|
|
|
|
|
|
| |
We don't want to lose attributes when a function decl without them is merged
with a function decl that has them.
PR2382
llvm-svn: 200030
|
| |
|
|
|
|
| |
PR18600.
llvm-svn: 200028
|
| |
|
|
|
|
| |
<rdar://problem/15611947>
llvm-svn: 200027
|