|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| | llvm-svn: 200264 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | uint32.
When folding branches to common destination, the updated branch weights
can exceed uint32 by more than factor of 2. We should keep halving the
weights until they can fit into uint32.
llvm-svn: 200262 | 
| | 
| 
| 
| 
| 
| | Without this fix, WaitResult is not defined.
llvm-svn: 200259 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This brings MC into line with GNU 'as' on ARM, and it brings the ARM
target into line with most other LLVM targets, which declare the
initial CFI state with addInitialFrameState().
Without this, functions generated with .cfi_startproc/endproc on ARM
will tend to cause GDB to abort with:
  gdb/dwarf2-frame.c:1132: internal-error: Unknown CFA rule.
I've also tested this by comparing the output of "readelf -w" on the
object files produced by llvm-mc and gas when given the .s file added
here.
This change is part of addressing PR18636.
Differential Revision: http://llvm-reviews.chandlerc.com/D2597
llvm-svn: 200255 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Also update the comment, since it actually produces a
select (setcc) instead of select_cc.
It was checking and using the setcc result type for the
type of the sext, instead of the type of the compared items.
In my problem case, the sext was to i32 and was used as the setcc type,
but the expected type was i64.
No test since I haven't been able to hit the problem with
this on any in-tree targets.
llvm-svn: 200249 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
This commit gives an address mode to the PLD instruction. We
were getting an assertion failure in the frame lowering code
because we had code that was doing a pld of a stack allocated
address. The frame lowering was checking the address mode and
then asserting because pld had none defined.
This commit fixes pld for arm mode. There was a previous fix for
thumb mode in a separate commit. The commit for thumb mode
added a test in a separate file because it would otherwise fail
for arm. This commit moves the thumb test back into the prefetch.ll
file and adds the corresponding arm test.
Differential Revision: http://llvm-reviews.chandlerc.com/D2622
llvm-svn: 200248 | 
| | 
| 
| 
| | llvm-svn: 200244 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when
the operand in input is a build vector of constants (or UNDEFs).
The inability to fold a sext/zext of a constant build_vector was the root
cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support.
Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a
ConstantSDNode.
llvm-svn: 200234 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This commit allows LLVM MC to process .cfi_startproc directives when
they are followed by an additional `simple' identifier. This signals to
elide the emission of target specific CFI instructions that would
normally occur initially.
This fixes PR16587.
Differential Revision: http://llvm-reviews.chandlerc.com/D2624
llvm-svn: 200227 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | cold loops as-if they were being optimized for size.
Nothing fancy here. Simply test case included. The nice thing is that we
can now incrementally build on top of this to drive other heuristics.
All of the infrastructure work is done to get the profile information
into this layer.
The remaining work necessary to make this a fully general purpose loop
unroller for very hot loops is to make it a fully general purpose loop
unroller. Things I know of but am not going to have time to benchmark
and fix in the immediate future:
1) Don't disable the entire pass when the target is lacking vector
   registers. This really doesn't make any sense any more.
2) Teach the unroller at least and the vectorizer potentially to handle
   non-if-converted loops. This is trivial for the unroller but hard for
   the vectorizer.
3) Compute the relative hotness of the loop and thread that down to the
   various places that make cost tradeoffs (very likely only the
   unroller makes sense here, and then only when dealing with loops that
   are small enough for unrolling to not completely blow out the LSD).
I'm still dubious how useful hotness information will be. So far, my
experiments show that if we can get the correct logic for determining
when unrolling actually helps performance, the code size impact is
completely unimportant and we can unroll in all cases. But at least
we'll no longer burn code size on cold code.
One somewhat unrelated idea that I've had forever but not had time to
implement: mark all functions which are only reachable via the global
constructors rigging in the module as optsize. This would also decrease
the impact of any more aggressive heuristics here on code size.
llvm-svn: 200219 | 
| | 
| 
| 
| 
| 
| | Insert before the terminating instruction of the dominating block instead.
llvm-svn: 200218 | 
| | 
| 
| 
| | llvm-svn: 200216 | 
| | 
| 
| 
| 
| 
| 
| | to stabilize a test that really is trying to test generic behavior and
not a specific target's behavior.
llvm-svn: 200215 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | object and fewer pointless variables.
Also, add a clarifying comment and a FIXME because the code which
disables *all* vectorization if we can't use implicit floating point
instructions just makes no sense at all.
llvm-svn: 200214 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | powers of two. This is essentially always the correct thing given the
impact on alignment, scaling factors that can be used in addressing
modes, etc. Also, fix the management of the unroll vs. small loop cost
to more accurately model things with this world.
Enhance a test case to actually exercise more of the unroll machinery if
using synthetic constants rather than a specific target model. Before
this change, with the added flags this test will unroll 3 times instead
of either 2 or 4 (the two sensible answers).
While I don't expect this to make a huge difference, if there are lots
of loops sitting right on the edge of hitting the 'small unroll' factor,
they might change behavior. However, I've benchmarked moving the small
loop cost up and down in many various ways and by a huge factor (2x)
without seeing more than 0.2% code size growth. Small adjustments such
as the series that led up here have led to about 1% improvement on some
benchmarks, but it is very close to the noise floor so I mostly checked
that nothing regressed. Let me know if you see bad behavior on other
targets but I don't expect this to be a sufficiently dramatic change to
trigger anything.
llvm-svn: 200213 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | with the unrolling behavior in the loop vectorizer. No functionality
changed at this point.
These are a bit hack-y, but talking with Hal, there doesn't seem to be
a cleaner way to easily experiment with different thresholds here and he
was also interested in them so I wanted to commit them. Suggestions for
improvement are very welcome here.
llvm-svn: 200212 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | number of vector registers rather than toggling between vector and
scalar register number based on VF. I don't have a test case as
I spotted this by inspection and on X86 it only makes a difference if
your target is lacking SSE and thus has *no* vector registers.
If someone wants to add a test case for this for ARM or somewhere else
where this is more significant, that would be awesome.
Also made the variable name a bit more sensible while I'm here.
llvm-svn: 200211 | 
| | 
| 
| 
| 
| 
| | assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else!
llvm-svn: 200210 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits.
Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary.
Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown.
llvm-svn: 200203 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from
mem-ops sequence sorting.
Consider, how MergeConsequtiveStores works for next example:
store i8 1, a[0]
store i8 2, a[1]
store i8 3, a[1]   ; a[1] again.
return   ; DAG starts here
1. Method will collect all the 3 stores.
2. It sorts them by distance from the base pointer (farthest with highest
index).
3. It takes first consecutive non-overlapping stores and (if possible) replaces
them with a single store instruction.
The point is, we can't determine here which 'store' instruction
would be the second after sorting ('store 2' or 'store 3').
It happens that 'store 3' would be the second, and 'store 2' would be the third.
So after merging we have the next result:
store i16 (1 | 3 << 8), base   ; is a[0] but bit-casted to i16
store i8 2, a[1]
So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1].
Fix: In sort routine just also take into account mem-op sequence number. 
llvm-svn: 200201 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | LoopVectorize pass.
The logic here doesn't make much sense. We *only* unrolled if the
unvectorized loop was a reduction loop with a single basic block *and*
small loop body. The reduction part in particular doesn't make much
sense. Instead, if we just fall through to the vectorized unroll logic
it makes more sense of unrolling if there is a vectorized reduction that
could be hacked on by the SLP vectorizer *or* if the loop is small.
This is mostly a cleanup and nothing in the test suite really exercises
this, but I did run benchmarks across this change and saw no really
significant changes.
llvm-svn: 200198 | 
| | 
| 
| 
| 
| | Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200196 | 
| | 
| 
| 
| 
| | Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200195 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | There are a couple of interesting things here that we want to check over
(particularly the expecting asserts in StringRef) and get right for general use
in ADT so hold back on this one. For clang we have a workable templated
solution to use in the meanwhile.
This reverts commit r200187.
llvm-svn: 200194 | 
| | 
| 
| 
| 
| 
| 
| | Testing this also found the missing '\n' after .frame that this patch also
fixes.
llvm-svn: 200192 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | (1) Add llvm_expect(), an asserting macro that can be evaluated as a constexpr
    expression as well as a runtime assert or compiler hint in release builds. This
    technique can be used to construct functions that are both unevaluated and
    compiled depending on usage.
(2) Update StringRef using llvm_expect() to preserve runtime assertions while
    extending the same checks to static asserts in C++11 builds that support the
    feature.
(3) Introduce ConstStringRef, a strong subclass of StringRef that references
    compile-time constant strings. It's convertible to, but not from, ordinary
    StringRef and thus can be used to add compile-time safety to various interfaces
    in LLVM and clang that only accept fixed inputs such as diagnostic format
    strings that tend to get misused.
llvm-svn: 200187 | 
| | 
| 
| 
| | llvm-svn: 200186 | 
| | 
| 
| 
| 
| 
| 
| 
| | SHUFFLE_VECTOR.
Replace r199791.
llvm-svn: 200180 | 
| | 
| 
| 
| 
| 
| | It's old version which has some bugs. I'll commit lattest patch soon.
llvm-svn: 200179 | 
| | 
| 
| 
| | llvm-svn: 200178 | 
| | 
| 
| 
| | llvm-svn: 200174 | 
| | 
| 
| 
| 
| 
| | Sorry about that.
llvm-svn: 200171 | 
| | 
| 
| 
| | llvm-svn: 200170 | 
| | 
| 
| 
| | llvm-svn: 200169 | 
| | 
| 
| 
| | llvm-svn: 200167 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | If a complex expression was passed to the .word directive and the first part of
the directive failed to parse, a secondary diagnostic would be produced that
would clutter the error diagnostics.  Improve the diagnostics by consuming the
remainder of the statement.
llvm-svn: 200160 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | An emitted diagnostic for an invalid relocation variant would place the caret on
the token following the relocation variant indicator or at the end of the line
if there was no following token.  This change corrects the placement of the
caret to point to the token.
llvm-svn: 200159 | 
| | 
| 
| 
| 
| 
| | Fix indentation, remove unnecessary line.  NFC.
llvm-svn: 200158 | 
| | 
| 
| 
| 
| 
| | lib/Target/X86/Disassembler/X86DisassemblerDecoder.c:1361:7: error: C++ style comments are not allowed in ISO C90
llvm-svn: 200153 | 
| | 
| 
| 
| | llvm-svn: 200152 | 
| | 
| 
| 
| | llvm-svn: 200141 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | These were:
* noreorder handling on the target object streamer and asm parser.
* setting the initial flag bits based on the enabled features.
* setting the elf header flag for micromips
It is *really* depressing I am the one doing this instead of someone at
mips actually taking the time to understand the infrastructure.
llvm-svn: 200138 | 
| | 
| 
| 
| 
| 
| 
| | With this the target streamers will be able to know the target features that
are in use.
llvm-svn: 200135 | 
| | 
| 
| 
| 
| 
| 
| | The popc instruction is defined in the SPARCv9 instruction set
architecture, but it was emulated on CPUs older than Niagara 2.
llvm-svn: 200131 | 
| | 
| 
| 
| 
| 
| | Found by SingleSource/UnitTests/AtomicOps.c
llvm-svn: 200130 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | This has a few advantages:
* Only targets that use a MCTargetStreamer have to worry about it.
* There is never a MCTargetStreamer without a MCStreamer, so we can use a
  reference.
* A MCTargetStreamer can talk to the MCStreamer in its constructor.
llvm-svn: 200129 | 
| | 
| 
| 
| | llvm-svn: 200127 | 
| | 
| 
| 
| | llvm-svn: 200122 | 
| | 
| 
| 
| | llvm-svn: 200120 | 
| | 
| 
| 
| | llvm-svn: 200119 |