| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 205364
|
|
|
|
|
|
|
| |
It seems big enough that it deserves its own file - but it is header
only, so there's no need for another cpp file, etc.
llvm-svn: 205360
|
|
|
|
| |
llvm-svn: 205358
|
|
|
|
|
|
|
|
| |
constants into only the first one.
rdar://14874886.
llvm-svn: 205357
|
|
|
|
| |
llvm-svn: 205352
|
|
|
|
|
|
| |
Environment == Triple::MSVC so it will never be MinGW or Cygwin.
llvm-svn: 205349
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This provides an initial implementation of getUnrollingPreferences for x86.
getUnrollingPreferences is used by the generic (concatenation) unroller, which
is distinct from the unrolling done by the loop vectorizer. Many modern x86
cores have some kind of uop cache and loop-stream detector (LSD) used to
efficiently dispatch small loops, and taking full advantage of this requires
unrolling small loops (small here means 10s of uops).
These caches also have limits on the number of taken branches in the loop, and
so we also cap the loop unrolling factor based on the maximum "depth" of the
loop. This is currently calculated with a partial DFS traversal (partial
because it will stop early if the path length grows too much). This is still an
approximation, and one that is both conservative (because it does not account
for branches eliminated via block placement) and optimistic (because it is only
recording the maximum depth over minimum paths). Nevertheless, because the
loops that fit in these uop caches are so small, it is not clear how much the
details matter.
The original set of patches posted for review produced the following test-suite
performance results (from the TSVC benchmark) at that time:
ControlLoops-dbl - 13% speedup
ControlLoops-flt - 15% speedup
Reductions-dbl - 7.5% speedup
llvm-svn: 205348
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for an upcoming commit implementing unrolling preferences for
x86, this adds additional fields to the UnrollingPreferences structure:
- PartialThreshold and PartialOptSizeThreshold - Like Threshold and
OptSizeThreshold, but used when not fully unrolling. These are necessary
because we need different thresholds for full unrolling from those used when
partially unrolling (the full unrolling thresholds are generally going to be
larger).
- MaxCount - A cap on the unrolling factor when partially unrolling. This can
be used by a target to prevent the unrolled loop from exceeding some
resource limit independent of the loop size (such as number of branches).
There should be no functionality change for any in-tree targets.
llvm-svn: 205347
|
|
|
|
|
|
|
|
|
|
|
| |
The implementation of getUserCost had duplicated (and hard-coded) the default
logic in getGEPCost. Instead, it is better to use getGEPCost directly, which
limits the default logic to the implementation of one function, and allows
targets to override the behavior.
No functionality change intended.
llvm-svn: 205346
|
|
|
|
|
|
|
|
|
| |
Adds the Octeon cnMips instructions "load multiplier register MPLx" and "load product register Px".
Includes tests.
Reviews by: Daniel.Sanders@imgtec.com
llvm-svn: 205343
|
|
|
|
|
|
|
| |
Identical to Win32 method except the GS segment register is used for TLS
instead of FS and pvArbitrary is at TEB offset 0x28 instead of 0x14.
llvm-svn: 205342
|
|
|
|
|
|
|
|
| |
to reflect its current functionality.
Based on Takumi NAKAMURA suggestion.
llvm-svn: 205338
|
|
|
|
| |
llvm-svn: 205336
|
|
|
|
| |
llvm-svn: 205335
|
|
|
|
|
|
|
|
|
| |
debug_loc.dwo location list entries
In preparation for refactoring this function into two, one for
debug_loc, one for debug_loc.dwo.
llvm-svn: 205324
|
|
|
|
| |
llvm-svn: 205323
|
|
|
|
| |
llvm-svn: 205322
|
|
|
|
|
|
| |
ThumbLE/ThumbBE
llvm-svn: 205317
|
|
|
|
| |
llvm-svn: 205314
|
|
|
|
|
|
| |
Suggestion from Yaron Keren.
llvm-svn: 205313
|
|
|
|
|
|
|
|
| |
The Cyclone CPU is similar to swift for most LLVM purposes, but does have two
preferred instructions for zeroing a VFP register. This teaches LLVM about
them.
llvm-svn: 205309
|
|
|
|
|
|
|
|
|
| |
This is for consistency with other functions. The Parse* functions consume
tokens and the Match* functions don't.
No functional change.
llvm-svn: 205305
|
|
|
|
|
|
| |
implicitly. No functional change intended.
llvm-svn: 205304
|
|
|
|
| |
llvm-svn: 205302
|
|
|
|
| |
llvm-svn: 205301
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This should fix the issues the D3222 caused in lld. Testcase is based on
the one that failed in the buildbot.
Depends on D3233
Reviewers: matheusalmeida, vmedic
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3234
llvm-svn: 205298
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Parsing registers no longer consume the $ token before it's confirmed whether it really has a register or not, therefore it's no longer impossible to match symbols if registers were tried first.
Depends on D3232
Reviewers: matheusalmeida, vmedic
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3233
llvm-svn: 205297
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
No functional change
Depends on D3222
Reviewers: matheusalmeida, vmedic
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3232
llvm-svn: 205295
|
|
|
|
| |
llvm-svn: 205294
|
|
|
|
| |
llvm-svn: 205293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Highlights:
- Registers are resolved much later (by the render method).
Prior to that point, GPR32's/GPR64's are GPR's regardless of register
size. Similarly FGR32's/FGR64's/AFGR64's are FGR's regardless of register
size or FR mode. Numeric registers can be anything.
- All registers are parsed the same way everywhere (even when handling
symbol aliasing)
- One consequence is that all registers can be specified numerically
almost anywhere (e.g. $fccX, $wX). The exception is symbol aliasing
but that can be easily resolved.
- Removes the need for the hasConsumedDollar hack
- Parenthesis and Bracket suffixes are handled generically
- Micromips instructions are parsed directly instead of going through the
standard encodings first.
- rdhwr accepts all 32 registers, and the following instructions that previously
xfailed now work:
ddiv, ddivu, div, divu, cvt.l.[ds], se[bh], wsbh, floor.w.[ds], c.ngl.d,
c.sf.s, dsbh, dshd, madd.s, msub.s, nmadd.s, nmsub.s, swxc1
- Diagnostics involving registers point at the correct character (the $)
- There's only one kind of immediate in MipsOperand. LSA immediates are handled
by the predicate and renderer.
Lowlights:
- Hardcoded '$zero' in the div patterns is handled with a hack.
MipsOperand::isReg() will return true for a k_RegisterIndex token
with Index == 0 and getReg() will return ZERO for this case. Note that it
doesn't return ZERO_64 on isGP64() targets.
- I haven't cleaned up all of the now-unused functions.
Some more of the generic parser could be removed too (integers and relocs
for example).
- insve.df needed a custom decoder to handle the implicit fourth operand that
was needed to make it parse correctly. The difficulty was that the matcher
expected a Token<'0'> but gets an Imm<0>. Adding an implicit zero solved this.
Reviewers: matheusalmeida, vmedic
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3222
llvm-svn: 205292
|
|
|
|
|
|
| |
Differential Revision: http://llvm-reviews.chandlerc.com/D2824
llvm-svn: 205288
|
|
|
|
|
|
|
|
|
|
|
|
| |
special case of '0' in DwarfCompileUnit::initStmtList by just always using a label difference
This moves one case of raw text checking down into the MCStreamer
interfaces in the form of a virtual function, even if we ultimately end
up consolidating on the one-or-many line tables issue one day, this is
nicer in the interim. This just generally streamlines a bunch of use
cases into a common code path.
llvm-svn: 205287
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't think this is reachable by any frontend (why would you transform
asm to asm+debug info?) but it helps tidy up some of this code, avoid
the weird special case of "emit the first CU, store the label, then emit
the rest" in MCDwarfLineTable::Emit by instead having the
DWARF-for-assembly case use the same codepath as DwarfDebug.cpp, by
registering the label of the debug_line section, thus causing it to be
emitted. (with a special case in asm output to just emit the label since
asm output uses the .loc directives, etc, rather than the debug_loc
directly)
llvm-svn: 205286
|
|
|
|
|
|
|
|
|
|
| |
No other functionality changes, DIBuilder testcase is included in a paired
CFE commit.
This relaxes the assertion in isScopeRef to also accept subclasses of
DIScope.
llvm-svn: 205279
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The generic (concatenation) loop unroller is currently placed early in the
standard optimization pipeline. This is a good place to perform full unrolling,
but not the right place to perform partial/runtime unrolling. However, most
targets don't enable partial/runtime unrolling, so this never mattered.
However, even some x86 cores benefit from partial/runtime unrolling of very
small loops, and follow-up commits will enable this. First, we need to move
partial/runtime unrolling late in the optimization pipeline (importantly, this
is after SLP and loop vectorization, as vectorization can drastically change
the size of a loop), while keeping the full unrolling where it is now. This
change does just that.
llvm-svn: 205264
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reschedule them"
This reverts commit r205018.
Conflicts:
lib/Transforms/Vectorize/SLPVectorizer.cpp
test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
This is breaking libclc build.
llvm-svn: 205260
|
|
|
|
|
|
|
|
|
|
|
|
| |
function address and properly align all entries.
This commit updates the stackmap format to version 1 to indicate the
reorganizaion of several fields. This was done in order to align stackmap
entries to their natural alignment and to minimize padding.
Fixes <rdar://problem/16005902>
llvm-svn: 205254
|
|
|
|
|
|
|
|
|
| |
Pretty obvious follow-on to r205159 to also handle conversion from double
besides float.
Fixes <rdar://problem/16373208>
llvm-svn: 205253
|
|
|
|
|
|
|
| |
It's now matched to the scalar 64-bit or and split later if
necessary.'
llvm-svn: 205252
|
|
|
|
|
|
|
|
|
|
|
|
| |
A value of 5 means if we have a split or spill option that has a really
low cost (1 << 14 is the entry frequency), we will choose to spill
or split the really cold path before using a callee-saved register.
This gives us the performance benefit on SPECInt2k and is also conservative.
rdar://16162005
llvm-svn: 205248
|
|
|
|
|
|
| |
Pass the entire vector type, and not just the element.
llvm-svn: 205247
|
|
|
|
| |
llvm-svn: 205244
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the ability to expand large (meaning with more than two unique
defined values) BUILD_VECTOR nodes in terms of SCALAR_TO_VECTOR and (legal)
vector shuffles. There is now no limit of the size we are capable of expanding
this way, although we don't currently do this for vectors with many unique
values because of the default implementation of TLI's
shouldExpandBuildVectorWithShuffles function.
There is currently no functional change to any existing targets because the new
capabilities are not used unless some target overrides the TLI
shouldExpandBuildVectorWithShuffles function. As a result, I've not included a
test case for the new functionality in this commit, but regression tests will
(at least) be added soon when I commit support for the PPC QPX vector
instruction set.
The benefit of committing this now is that it makes the
shouldExpandBuildVectorWithShuffles callback, which had to be added for other
reasons regardless, fully functional. I suspect that other targets will
also benefit from tuning the heuristic.
llvm-svn: 205243
|
|
|
|
| |
llvm-svn: 205242
|
|
|
|
| |
llvm-svn: 205240
|
|
|
|
|
|
|
|
| |
errors in lld tests.
It's currently unable to parse 'sym + imm' without surrounding parenthesis.
llvm-svn: 205237
|
|
|
|
| |
llvm-svn: 205236
|
|
|
|
| |
llvm-svn: 205235
|
|
|
|
| |
llvm-svn: 205233
|