| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
This reverts commit r264587. Reverting to investigate 6 unexpected
failures on the ppc bot:
http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/2822
llvm-svn: 264590
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.
This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.
shader-db stats:
2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18451
llvm-svn: 264589
|
|
|
|
| |
llvm-svn: 264588
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Function names in ObjC can have spaces in them. This interacts poorly
with name compression, which uses spaces to separate PGO names. Fix the
issue by using a different separator and update a test.
I chose "\01" as the separator because 1) it's non-printable, 2) we
strip it from PGO names, and 3) it's the next natural choice once "\00"
is discarded (that one's overloaded).
Differential Revision: http://reviews.llvm.org/D18516
llvm-svn: 264587
|
|
|
|
|
|
| |
Patch suggested by David Li!
llvm-svn: 264586
|
|
|
|
| |
llvm-svn: 264584
|
|
|
|
| |
llvm-svn: 264583
|
|
|
|
| |
llvm-svn: 264581
|
|
|
|
|
|
|
|
| |
- Do not optimize stack slots in optnone functions.
- Get aligned-base register from HexagonMachineFunctionInfo instead of
looking for ALIGNA instruction in the function's body.
llvm-svn: 264580
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D18463
llvm-svn: 264579
|
|
|
|
|
|
|
|
|
|
| |
Add the Lanai backend to lib/Target.
General Lanai backend discussion on llvm-dev thread "[RFC] Lanai backend" (http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html).
Differential Revision: http://reviews.llvm.org/D17011
llvm-svn: 264578
|
|
|
|
| |
llvm-svn: 264573
|
|
|
|
|
|
|
| |
We require C++11 to build, so remove a few remaining preprocessor checks for
'__cplusplus >= 201103L'. This should always be true.
llvm-svn: 264572
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements the following altivec instructions:
- Decimal Convert From/to National/Zoned/Signed-QWord:
bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq.
- Decimal Copy-Sign/Set-Sign:
bcdcpsgn. bcdsetsgn.
- Decimal Shift/Unsigned-Shift/Shift-and-Round:
bcds. bcdus. bcdsr.
- Decimal (Unsigned) Truncate:
bcdtrunc. bcdutrunc.
Total 13 instructions
Thanks Amehsan's advice! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan
http://reviews.llvm.org/D17838
llvm-svn: 264568
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
min/max, reverse, permute, splat
This change implements the following vsx instructions:
- Scalar Insert/Extract
xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp
- Vector Insert/Extract
xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp
xxextractuw xxinsertw
- Scalar/Vector Test Data Class
xststdcdp xststdcsp xststdcqp
xvtstdcdp xvtstdcsp
- Maximum/Minimum
xsmaxcdp xsmaxjdp
xsmincdp xsminjdp
- Vector Byte-Reverse/Permute/Splat
xxbrd xxbrh xxbrq xxbrw
xxperm xxpermr
xxspltib
30 instructions
Thanks Nemanja for invaluable discussion! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan
http://reviews.llvm.org/D16842
llvm-svn: 264567
|
|
|
|
|
|
|
|
|
|
| |
ICMP instruction selection fails on SKX and KNL for i1 operand.
I use XOR to resolve:
(A == B) is equivalent to (A xor B) == 0
Differential Revision: http://reviews.llvm.org/D18511
llvm-svn: 264566
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change implements the following vsx instructions:
- quad-precision move
xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp
- quad-precision fp-arithmetic
xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o)
xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o)
22 instructions
Thanks Nemanja and Kit for careful review and invaluable discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan
http://reviews.llvm.org/D16110
llvm-svn: 264565
|
|
|
|
|
|
|
|
|
| |
When emitting coverage mappings for functions with local linkage and an
unknown filename, we use "<unknown>:func" for the PGO function name. The
problem is that we don't strip "<unknown>" from the name when loading
coverage data, like we do for other file names. Fix that and add a test.
llvm-svn: 264559
|
|
|
|
|
|
|
| |
The caller of ValueEnumerator::EnumerateOperandType never sends in
metadata. Assert that, and remove the unnecessary logic.
llvm-svn: 264558
|
|
|
|
|
|
|
|
| |
Change writeFunctionMetadata to call writeMetadataRecords. For now
there's no functionality change, but makes it easy to serialize other
types of metadata in the function block in the future.
llvm-svn: 264557
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To match writeMetadataRecords, writeNamedMetadata and
writeMetadataStrings, change:
WriteModuleMetadata => writeModuleMetadata
WriteFunctionLocalMetadata => writeFunctionMetadata
Write##CLASS => write##CLASS
The only major change is "FunctionLocal" => "Function". The point is to
be less specific, in preparation for emitting normal metadata records
inside function metadata blocks (currently we only emit
`LocalAsMetadata` there).
llvm-svn: 264556
|
|
|
|
|
|
|
| |
Besides being a nice cleanup, this is preparation for reusing the code
in function metadata blocks.
llvm-svn: 264555
|
|
|
|
|
|
| |
Use an early return to simplify logic.
llvm-svn: 264554
|
|
|
|
|
|
|
|
| |
We don't really need a separate vector here; instead, point at a range
inside the main MDs array. This matches how r264551 references the
ranges of strings and non-strings.
llvm-svn: 264552
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spiritually reapply commit r264409 (reverted in r264410), albeit with a
bit of a redesign.
Firstly, avoid splitting the big blob into multiple chunks of strings.
r264409 imposed an arbitrary limit to avoid a massive allocation on the
shared 'Record' SmallVector. The bug with that commit only reproduced
when there were more than "chunk-size" strings. A test for this would
have been useless long-term, since we're liable to adjust the chunk-size
in the future.
Thus, eliminate the motivation for chunk-ing by storing the string sizes
in the blob. Here's the layout:
vbr6: # of strings
vbr6: offset-to-blob
blob:
[vbr6]: string lengths
[char]: concatenated strings
Secondly, make the output of llvm-bcanalyzer readable.
I noticed when debugging r264409 that llvm-bcanalyzer was outputting a
massive blob all in one line. Past a small number, the strings were
impossible to split in my head, and the lines were way too long. This
version adds support in llvm-bcanalyzer for pretty-printing.
<STRINGS abbrevid=4 op0=3 op1=9/> num-strings = 3 {
'abc'
'def'
'ghi'
}
From the original commit:
Inspired by Mehdi's similar patch, http://reviews.llvm.org/D18342, this
should (a) slightly reduce bitcode size, since there is less record
overhead, and (b) greatly improve reading speed, since blobs are super
cheap to deserialize.
llvm-svn: 264551
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The implementation is fairly obvious. This is preparation for using
some blobs in bitcode.
For clarity (and perhaps future-proofing?), I moved the call to
JumpToBit in BitstreamCursor::readRecord ahead of calling
MemoryObject::getPointer, since JumpToBit can theoretically (a) read
bytes, which (b) invalidates the blob pointer.
This isn't strictly necessary the two memory objects we have:
- The return of RawMemoryObject::getPointer is valid until the memory
object is destroyed.
- StreamingMemoryObject::getPointer is valid until the next chunk is
read from the stream. Since the JumpToBit call is only going ahead
to a word boundary, we'll never load another chunk.
However, reordering makes it clear by inspection that the blob returned
by BitstreamCursor::readRecord will be valid.
I added some tests for StreamingMemoryObject::getPointer and
BitstreamCursor::readRecord.
llvm-svn: 264549
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add API to SimpleBitstreamCursor to allow users to translate between
byte addresses and pointers.
- jumpToPointer: move the bit position to a particular pointer.
- getPointerToByte: get the pointer for a particular byte.
- getPointerToBit: get the pointer for the byte of the current bit.
- getCurrentByteNo: convenience function for assertions and tests.
Mainly adds unit tests (getPointerToBit/Byte already has a use), but
also preparation for eventually using jumpToPointer.
llvm-svn: 264546
|
|
|
|
|
|
|
|
|
|
|
| |
Split out SimpleBitstreamCursor from BitstreamCursor, which is a
lower-level cursor with no knowledge of bitcode blocks, abbreviations,
or records. It just knows how to read bits and navigate the stream.
This is mainly organizational, to separate the API for manipulating raw
bits from that for bitcode concepts like Record and Block.
llvm-svn: 264545
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add a statistic to count the number of imported functions. Also, add a
new -print-imports option to emit a trace of imported functions, that
works even for an NDEBUG build.
Note that emitOptimizationRemark does not work for the above printing as
it expects a Function object and DebugLoc, neither of which we have
with summary-based importing.
This is part 2 of D18487, the first part was committed separately as
r264536.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18487
llvm-svn: 264537
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With r264503, aliases are now being added to the GlobalsToImport set
even when their aliasees can't be imported due to their linkage type.
While the importing worked correctly (the aliases imported as
declarations) due to the logic in doImportAsDefinition, there is no
point to adding them to the GlobalsToImport set.
Additionally, with D18487 it was resulting in incorrectly printing a
message indicating that the alias was imported.
To avoid this, delay adding aliases to the GlobalsToImport set until
after the linkage type of the aliasee is checked.
This patch is part of D18487.
llvm-svn: 264536
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
loop legality
Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc
function calls (fmax[f], etc.) generally map to function calls after lowering.
For some vector types with QPX at least, however, we can legally lower these,
and we don't need to prohibit CTR-based loops on their account.
It turned out, however, that the logic that checked the opcodes associated with
intrinsics was broken (it would set the Opcode variable, but that variable was
later checked only if set for some otherwise-external function call.
This fixes the latter problem and adds the FMAX/MINNUM mappings.
llvm-svn: 264532
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reject the following IR as malformed (assuming that %entry, %next are
not in a loop):
next:
%y = phi i32 [ 0, %entry ]
%x = phi i32 [ %y, %entry ]
Such PHI nodes came up in PR26718. While there was no consensus on
whether or not this is valid IR, most opinions on that bug and in a
discussion on the llvm-dev mailing list tended towards a
"strict interpretation" (term by Joseph Tremoulet) of PHI node uses.
Also, the language reference explicitly states that "the use of each
incoming value is deemed to occur on the edge from the corresponding
predecessor block to the current block" and
`DominatorTree::dominates(Instruction*, Use&)` uses this definition as
well.
For the code mentioned in PR15384, clang does not compile to such PHIs
(anymore?). The test case still hangs when replacing `%tmp6` with `%tmp`
in revisions before r176366 (where PR15384 has been fixed). The
occurrence of %tmp6 therefore was probably unintentional. Its value is
not used except in other PHIs.
Reviewers: majnemer, reames, JosephTremoulet, bkramer, grosser, jdoerfert, kparzysz, sanjoy
Differential Revision: http://reviews.llvm.org/D18443
llvm-svn: 264528
|
|
|
|
| |
llvm-svn: 264527
|
|
|
|
|
|
| |
Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization
llvm-svn: 264517
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit fa36fcff16c7d4f78204d6296bf96c3558a4a672.
Causes the following Windows failure:
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MachineInstr.cpp(762):
error C2338: must be trivially copyable to memmove
llvm-svn: 264516
|
|
|
|
|
|
|
|
|
|
| |
Summary: isPodLike is as close as we have for is_trivially_copyable.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18483
llvm-svn: 264515
|
|
|
|
|
|
|
|
| |
Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization
Differential Revision: http://reviews.llvm.org/D18307
llvm-svn: 264512
|
|
|
|
|
|
|
|
| |
Currently this is to mainly to prevent scalarization of integer division by constants.
Differential Revision: http://reviews.llvm.org/D18307
llvm-svn: 264511
|
|
|
|
|
|
|
|
| |
multiplies.
Only pre-AVX512BW targets need to split v32i8 vectors.
llvm-svn: 264509
|
|
|
|
|
|
|
|
|
| |
The minnum and maxnum intrinsics get lowered to libcalls which
invalidates the CTR optimization.
This fixes PR27083.
llvm-svn: 264508
|
|
|
|
|
|
| |
LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead.
llvm-svn: 264506
|
|
|
|
| |
llvm-svn: 264505
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sign, negate, parity, shift/rotate, mul10
This change implements the following vector operations:
- vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw
- vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d
- vnegd vnegw
- vprtybd vprtybq vprtybw
- vbpermd vpermr
- vrlwnm vrlwmi vrldnm vrldmi vslv vsrv
- vmul10cuq vmul10uq vmul10ecuq vmul10euq
28 instructions
Thanks Nemanja, Kit for invaluable hints and discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan
Phabricator: http://reviews.llvm.org/D15887
llvm-svn: 264504
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Now that the summary contains the full reference/call graph, we can
replace the existing function importer that loads and inspect the IR
to iteratively walk the call graph by a traversal based purely on the
summary information. Decouple the actual importing decision from any
IR manipulation.
Reviewers: tejohnson
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18343
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264503
|
|
|
|
|
|
|
| |
It now return the map instead of an iterator.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264489
|
|
|
|
|
|
| |
As discussed on the llvm-commits thread for r264467.
llvm-svn: 264479
|
|
|
|
|
|
|
|
|
|
|
| |
the rename operation on 3 error conditions of ReplaceFileW() that it was
previously bailing out on.
Patch by Douglas Yung!
Differential Revision: http://reviews.llvm.org/D17903
llvm-svn: 264477
|
|
|
|
|
|
| |
ErrorOr<...>.
llvm-svn: 264473
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-load forwarding across a release fence.
We choose to be much more conservative about stores. In theory, nothing prevents us from shifting a store from after a release fence to before it, and then eliminating the preceeding (previously fenced) store. Doing this without actually moving the second store is likely also legal, but we chose to be conservative at this time.
The LangRef indicates only atomic loads and stores are effected by fences. This patch chooses to be far more conservative then that.
This is the GVN companion to http://reviews.llvm.org/D11434 which applied the same logic in EarlyCSE and has been baking in tree for a while now.
Differential Revision: http://reviews.llvm.org/D11436
llvm-svn: 264472
|
|
|
|
|
|
|
|
|
|
| |
method instead.
This is not quite a named constructor: Construction may fail, and
MachOObjectFiles are usually passed by unique_ptr anyway, so create
returns an Expected<std::unique_ptr<MachOObjectFile>>.
llvm-svn: 264469
|