| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
This reverts commit r199244.
Conflicts:
include/llvm-c/lto.h
include/llvm/LTO/LTOCodeGenerator.h
lib/LTO/LTOCodeGenerator.cpp
llvm-svn: 205471
|
| |
|
|
|
|
| |
GetElementPtr opaque (r204739).
llvm-svn: 205468
|
| |
|
|
|
|
|
| |
Update the subtarget information for Windows on ARM. This enables using the MC
layer to target Windows on ARM.
llvm-svn: 205459
|
| |
|
|
|
|
| |
No functional change.
llvm-svn: 205458
|
| |
|
|
|
|
| |
There are no implementations of these for R600.
llvm-svn: 205455
|
| |
|
|
|
|
|
|
| |
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.
llvm-svn: 205453
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Unlike other v6+ processors, cortex-m0 never supports unaligned accesses.
From the v6m ARM ARM:
"A3.2 Alignment support: ARMv6-M always generates a fault when an unaligned
access occurs."
rdar://16491560
llvm-svn: 205452
|
| |
|
|
|
|
| |
No functional change, but more readable code.
llvm-svn: 205451
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Adds the instructions ext/ext32/cins/cins32.
It also changes pop/dpop to accept the two operand version and
adds a simple pattern to generate baddu.
Tests for the two operand versions (including baddu/dmul/dpop/pop)
and the code generation pattern for baddu are included.
Reviewed by: Daniel.Sanders@imgtec.com
llvm-svn: 205449
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 205446
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 205445
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 205444
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 205443
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 205442
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 205441
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 205440
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 205439
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 205438
|
| |
|
|
|
|
| |
No functional change intended.
llvm-svn: 205437
|
| |
|
|
| |
llvm-svn: 205435
|
| |
|
|
|
|
| |
Patch by Alex Crichton, ILyoan, Luqman Aden and Svetoslav.
llvm-svn: 205430
|
| |
|
|
| |
llvm-svn: 205429
|
| |
|
|
|
|
|
|
|
|
| |
Weak symbols cannot use the small code model's usual ADRP sequences since the
instruction simply may not be able to encode a value of 0.
This redirects them to use the GOT, which hopefully linkers are able to cope
with even in the static relocation model.
llvm-svn: 205426
|
| |
|
|
|
|
|
| |
We were creating libcall nodes that returned an MVT::f128, when these
particular operations actually return an int of some stripe.
llvm-svn: 205425
|
| |
|
|
|
|
|
|
|
| |
Some Intrinsics are overloaded to the extent that return type equality (all
that's been checked up to now) does not guarantee that the arguments are the
same. In these cases SLP vectorizer should not recurse into the operands, which
can be achieved by comparing them as "Function *" rather than simply the ID.
llvm-svn: 205424
|
| |
|
|
|
|
|
|
| |
Again, coalescing and other optimisations swiftly made the MachineInstrs
consistent again, but when compiled at -O0 a bad INSERT_SUBREGISTER was
produced.
llvm-svn: 205423
|
| |
|
|
|
|
|
|
| |
The previous attempt was fine with optimisations, but was actually rather
cavalier with its types. When compiled at -O0, it produced invalid COPY
MachineInstrs.
llvm-svn: 205422
|
| |
|
|
| |
llvm-svn: 205421
|
| |
|
|
| |
llvm-svn: 205416
|
| |
|
|
|
|
|
|
|
| |
ARM specific optimiztion, finding places in ARM machine code where 2 dmbs
follow one another, and eliminating one of them.
Patch by Reinoud Elhorst.
llvm-svn: 205409
|
| |
|
|
|
|
|
|
|
| |
and isTargetCygwin() to isTargetWindowsCygwin() to be consistent with the
four Windows environments in Triple.h.
Suggestion by Saleem Abdulrasool!
llvm-svn: 205393
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the purpose of calculating the cost of the loop at various vectorization
factors, we need to count dependencies of consecutive pointers as uniforms
(which means that the VF = 1 cost is used for all overall VF values).
For example, the TSVC benchmark function s173 has:
...
%3 = add nsw i64 %indvars.iv, 16000
%arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3
...
and we must realize that the add will be a scalar in order to correctly deduce
it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all
dependencies of a consecutive pointer must be a scalar (uniform), and so we
simply need to add all consecutive pointers to the worklist that currently
detects collects uniforms.
Fixes PR19296.
llvm-svn: 205387
|
| |
|
|
|
|
|
|
| |
I'm not sure the comment in the implementation really adds a lot of
value (it's clear that we emit zero when no symbol is provided, but it
doesn't explain why we would do that). Happy to iterate.
llvm-svn: 205386
|
| |
|
|
|
|
| |
Based on code review feedback from Eric Christopher on r204697
llvm-svn: 205385
|
| |
|
|
|
|
|
|
|
|
|
|
| |
and an MC Label to refer to them
This removes the magic-number-esque code creating/retrieving the same
label for a debug_loc entry from two places and removes the last small
piece of reusable logic from emitDebugLoc so that there will be less
duplication when refactoring it into two functions (one for debug_loc,
the other for debug_loc.dwo).
llvm-svn: 205382
|
| |
|
|
|
|
|
| |
framework works (for the compiler part), since the design
document is not available.
llvm-svn: 205379
|
| |
|
|
| |
llvm-svn: 205374
|
| |
|
|
|
|
|
|
|
|
| |
Seems we didn't have any test coverage for merging... awesome. So I
added some - but hit an llvm-objdump bug while I was there. I'm choosing
not to shave that yak right now.
Code review feedback/bug catch by Adrian Prantl in r205360.
llvm-svn: 205373
|
| |
|
|
|
|
|
|
|
|
| |
No test case (this would invoke UB by examining uninitialized members,
etc, at best - and this code is apparently untested anyway - I'm about
to fix that)
Code review feedback from Adrian Prantl on r205360.
llvm-svn: 205367
|
| |
|
|
| |
llvm-svn: 205365
|
| |
|
|
| |
llvm-svn: 205364
|
| |
|
|
|
|
|
| |
It seems big enough that it deserves its own file - but it is header
only, so there's no need for another cpp file, etc.
llvm-svn: 205360
|
| |
|
|
| |
llvm-svn: 205358
|
| |
|
|
|
|
|
|
| |
constants into only the first one.
rdar://14874886.
llvm-svn: 205357
|
| |
|
|
| |
llvm-svn: 205352
|
| |
|
|
|
|
| |
Environment == Triple::MSVC so it will never be MinGW or Cygwin.
llvm-svn: 205349
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This provides an initial implementation of getUnrollingPreferences for x86.
getUnrollingPreferences is used by the generic (concatenation) unroller, which
is distinct from the unrolling done by the loop vectorizer. Many modern x86
cores have some kind of uop cache and loop-stream detector (LSD) used to
efficiently dispatch small loops, and taking full advantage of this requires
unrolling small loops (small here means 10s of uops).
These caches also have limits on the number of taken branches in the loop, and
so we also cap the loop unrolling factor based on the maximum "depth" of the
loop. This is currently calculated with a partial DFS traversal (partial
because it will stop early if the path length grows too much). This is still an
approximation, and one that is both conservative (because it does not account
for branches eliminated via block placement) and optimistic (because it is only
recording the maximum depth over minimum paths). Nevertheless, because the
loops that fit in these uop caches are so small, it is not clear how much the
details matter.
The original set of patches posted for review produced the following test-suite
performance results (from the TSVC benchmark) at that time:
ControlLoops-dbl - 13% speedup
ControlLoops-flt - 15% speedup
Reductions-dbl - 7.5% speedup
llvm-svn: 205348
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for an upcoming commit implementing unrolling preferences for
x86, this adds additional fields to the UnrollingPreferences structure:
- PartialThreshold and PartialOptSizeThreshold - Like Threshold and
OptSizeThreshold, but used when not fully unrolling. These are necessary
because we need different thresholds for full unrolling from those used when
partially unrolling (the full unrolling thresholds are generally going to be
larger).
- MaxCount - A cap on the unrolling factor when partially unrolling. This can
be used by a target to prevent the unrolled loop from exceeding some
resource limit independent of the loop size (such as number of branches).
There should be no functionality change for any in-tree targets.
llvm-svn: 205347
|
| |
|
|
|
|
|
|
|
|
|
| |
The implementation of getUserCost had duplicated (and hard-coded) the default
logic in getGEPCost. Instead, it is better to use getGEPCost directly, which
limits the default logic to the implementation of one function, and allows
targets to override the behavior.
No functionality change intended.
llvm-svn: 205346
|
| |
|
|
|
|
|
|
|
| |
Adds the Octeon cnMips instructions "load multiplier register MPLx" and "load product register Px".
Includes tests.
Reviews by: Daniel.Sanders@imgtec.com
llvm-svn: 205343
|