bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Strip trailing whitespace and reword explanatory comment.	Eric Christopher	2015-04-04	1	-10/+5
\| \| \| \|	llvm-svn: 234078
*	[PowerPC] Enable splat generation for BUILD_VECTOR with little endian	Bill Schmidt	2015-04-03	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR nodes for little endian because wrong results were produced. I've subsequently investigated and found this is due to a call to BuildVectorSDNode::isConstantSplat that was always specifying big-endian. With this changed to correctly identify the target endianness, the optimizations work as expected. I found another case of a call to the same method with big-endian hardcoded, in PPC::isAllNegativeZeroVector(). I discovered this was an orphaned method with no callers, so I've just removed it. The existing test/CodeGen/PowerPC/vec_constants.ll checks these optimizations, so for testing I've just added a variant for little endian. llvm-svn: 234011
*	[DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR	Simon Pilgrim	2015-04-03	1	-46/+11
\| \| \| \| \| \| \| \|	This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed. Differential Revision: http://reviews.llvm.org/D8516 llvm-svn: 234004
*	[PowerPC] FastISel can't handle i1 return values when using CR bits	Hal Finkel	2015-04-01	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under normal circumstances, use of CR bits is disabled when running at -O0, but it is enabled by default otherwise, and if you have optnone functions, they'll still generally be generated with crbits turned on (because nothing else turns them off). FastISel can't handle most things dealing with i1 values when using CR bits, and checks for that, but was not checking the return type on functions; we can't fast-isel function calls with i1 return values either when using CR bits for boolean values. Fixes PR22664. llvm-svn: 233775
*	[PowerPC] Don't use a vector preferred memory type at -O0	Hal Finkel	2015-03-31	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic is a memset/memcpy/etc. we might normally use vector types. At -O0, this is probably not a good idea (because, if there is a bug in the lowering code, there would be no good way to turn it off). At -O0, only use scalar preferred types. Related to PR22754. llvm-svn: 233755
*	[SDAG] Handle non-integer preferred memset types for non-constant values	Hal Finkel	2015-03-31	2	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing code in getMemsetValue only handled integer-preferred types when the fill value was not a constant. Make this more robust in two ways: 1. If the preferred type is a floating-point value, do the mul-splat trick on the corresponding integer type and then bitcast. 2. If the preferred type is a vector, do the mul-splat trick on one vector element, and then build a vector out of them. Fixes PR22754 (although, we should also turn off use of vector types at -O0). llvm-svn: 233749
*	DebugInfo: Fix bad debug info for compile units and types	Duncan P. N. Exon Smith	2015-03-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix debug info in these tests, which started failing with a WIP patch to verify compile units and types. The problems look like they were all caused by bitrot. They fell into these categories: - Using `!{i32 0}` instead of `!{}`. - Using `!{null}` instead of `!{}`. - Using `!MDExpression()` instead of `!{}`. - Using `!8` instead of `!{!8}`. - `file:` references that pointed at `MDCompileUnit`s instead of the same `MDFile` as the compile unit. - `file:` references that were numerically off-by-one or (off-by-ten). llvm-svn: 233415
*	Complete the MachineScheduler fix made way back in r210390.	Andrew Trick	2015-03-27	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"Fix the MachineScheduler's logic for updating ready times for in-order. Now the scheduler updates a node's ready time as soon as it is scheduled, before releasing dependent nodes." This fix was only made in one variant of the ScheduleDAGMI driver. Francois de Ferriere reported the issue in the other bit of code where it was also needed. I never got around to coming up with a test case, but it's an obvious fix that shouldn't be delayed any longer. I'll try to refactor this code a little better. I did verify performance on a wide variety of targets and saw no negative impact with this fix. llvm-svn: 233366
*	Testcase for r233239.	Eric Christopher	2015-03-26	1	-0/+32
\| \| \| \|	llvm-svn: 233240
*	Add Hardware Transactional Memory (HTM) Support	Kit Barton	2015-03-25	1	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07 (POWER8). The intrinsic support is based on GCC one [1], but currently only the 'PowerPC HTM Low Level Built-in Function' are implemented. The HTM instructions follows the RC ones and the transaction initiation result is set on RC0 (with exception of tcheck). Currently approach is to create a register copy from CR0 to GPR and comapring. Although this is suboptimal, since the branch could be taken directly by comparing the CR0 value, it generates code correctly on both test and branch and just return value. A possible future optimization could be elimitate the MFCR instruction to branch directly. The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on powerpc64 and powerpc64le. This is send along a clang patch to enabled the builtins and option switch. [1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html Phabricator Review: http://reviews.llvm.org/D8247 llvm-svn: 233204
*	[SDAG] Don't widen VSETCC during type legalization for split operands	Hal Finkel	2015-03-23	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \|	Because the operands of a vector SETCC node can be of a different type from the result (and often are), it can happen that even if we'd prefer to widen the result type of the SETCC, the operands have been split instead. In this case, the SETCC result also must be split. This mirrors what is done in WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not widened the following calls to GetWidenedVector will assert (which is what was happening in the test case). llvm-svn: 232935
*	Remove the bare getSubtargetImpl call from the PPC port. As part	Eric Christopher	2015-03-21	1	-0/+43
\| \| \| \| \| \| \|	of this add a test that shows we can generate code with for functions that differ by subtarget feature. llvm-svn: 232882
*	Fix a nasty bug in DAGCombine of STORE nodes.	Owen Anderson	2015-03-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is very related to the bug fixed in r174431. The problem is that SelectionDAG does not include alignment in the uniquing of loads and stores. When an otherwise no-op DAGCombine would increase the alignment of a load or store, the original node would be returned (with the alignment increased), which would cause the node not to be processed by any further DAGCombines. I don't have a direct testcase for this that manifests on an in-tree target, but I did see some noise in the tests for other targets and have updated them for it. llvm-svn: 232780
*	Note that we don't support COFF on PPC.	Rafael Espindola	2015-03-19	1	-0/+4
\| \| \| \| \| \|	Should bring back the windows bots. llvm-svn: 232701
*	Fix R0 use in PowerPC VSX store for FastIsel.	Samuel Antao	2015-03-17	1	-0/+33
\| \| \| \| \| \|	The VSX stores are sometimes generated with a undefined index register, causing %noreg to be used and R0 to be emitted later on. The semantics of the VSX store (e.g. stdsdx) requires R0 to be used as base if we want zero to be used in the computation of the effective address instead of the content of R0. This patch checks if no index register was generated and forces R0 to be used as base address. llvm-svn: 232486
*	Use createTempSymbol to avoid collisions instead of an ad hoc method.	Rafael Espindola	2015-03-17	2	-8/+8
\| \| \| \|	llvm-svn: 232483
*	Add a bunch of CHECK missing colons in tests. NFC.	Ahmed Bougacha	2015-03-14	1	-3/+3
\| \| \| \| \| \|	Some wouldn't pass; fixed most, the rest will be fixed separately. llvm-svn: 232239
*	[opaque pointer type] Add textual IR support for explicit type parameter to ↵	David Blaikie	2015-03-13	41	-219/+219
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s$)((<\d\s+x\s+)?([^@]?)(\|\saddrspace\(\d+$)\s\(?(3)>)\s*)(?=$\|%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|zeroinitializer\|<\|\[\[[a-zA-Z]\|\{\{)", re.MULTILINE \| re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184
*	Add support for part-word atomics for PPC	Nemanja Ivanovic	2015-03-10	1	-0/+54
\| \| \| \| \| \|	http://reviews.llvm.org/D8090#inline-67337 llvm-svn: 231843
*	Change the generation of the vmuluwm instruction to be based on the MUL opcode.	Kit Barton	2015-03-10	2	-5/+6
\| \| \| \| \| \|	Phabricator review: http://reviews.llvm.org/D8185 llvm-svn: 231827
*	Use the correct func begin symbol in all places in ppc.	Rafael Espindola	2015-03-05	2	-3/+4
\| \| \| \| \| \|	I missed an occurrence of the old symbol in my previous patch. llvm-svn: 231398
*	Use the generic Lfunc_begin label on ppc.	Rafael Espindola	2015-03-05	5	-25/+89
\| \| \| \| \| \|	This removes yet another custom label to mark the start of a function. llvm-svn: 231390
*	While reviewing the changes to Clang to add builtin support for the vsld, ↵	Kit Barton	2015-03-05	1	-5/+8
\| \| \| \| \| \|	vsrd, and vsrad instructions, it was pointed out that the builtins are generating the LLVM opcodes (shl, lshr, and ashr) not calls to the intrinsics. This patch changes the implementation of the vsld, vsrd, and vsrad instructions from from intrinsics to VXForm_1 instructions and makes them legal with P8 Altivec. It also removes the definition of the int_ppc_altivec_vsld, int_ppc_altivec_vsrd, and int_ppc_altivec_vsrad intrinsics. llvm-svn: 231378
*	Add LLVM support for PPC cryptography builtins	Nemanja Ivanovic	2015-03-04	1	-0/+275
\| \| \| \| \| \|	Review: http://reviews.llvm.org/D7955 llvm-svn: 231285
*	Use the vanilla func_end symbol for .size.	Rafael Espindola	2015-03-04	2	-2/+2
\| \| \| \| \| \|	No need to create yet another temp symbol. llvm-svn: 231198
*	Add the following 64-bit vector integer arithmetic instructions added in POWER8:	Kit Barton	2015-03-03	5	-0/+428
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vaddudm vsubudm vmulesw vmulosw vmuleuw vmulouw vmuluwm vmaxsd vmaxud vminsd vminud vcmpequd vcmpequd. vcmpgtsd vcmpgtsd. vcmpgtud vcmpgtud. vrld vsld vsrd vsrad Phabricator review: http://reviews.llvm.org/D7959 llvm-svn: 231115
*	DebugInfo: Move new hierarchy into place	Duncan P. N. Exon Smith	2015-03-03	3	-423/+423
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the specialized metadata nodes for the new debug info hierarchy into place, finishing off PR22464. I've done bootstraps (and all that) and I'm confident this commit is NFC as far as DWARF output is concerned. Let me know if I'm wrong :). The code changes are fairly mechanical: - Bumped the "Debug Info Version". - `DIBuilder` now creates the appropriate subclass of `MDNode`. - Subclasses of DIDescriptor now expect to hold their "MD" counterparts (e.g., `DIBasicType` expects `MDBasicType`). - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp` for printing comments. - Big update to LangRef to describe the nodes in the new hierarchy. Feel free to make it better. Testcase changes are enormous. There's an accompanying clang commit on its way. If you have out-of-tree debug info testcases, I just broke your build. - `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to update all the IR testcases. - Unfortunately I failed to find way to script the updates to CHECK lines, so I updated all of these by hand. This was fairly painful, since the old CHECKs are difficult to reason about. That's one of the benefits of the new hierarchy. This work isn't quite finished, BTW. The `DIDescriptor` subclasses are almost empty wrappers, but not quite: they still have loose casting checks (see the `RETURN_FROM_RAW()` macro). Once they're completely gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I also expect to make a few schema changes now that it's easier to reason about everything. llvm-svn: 231082
*	Regenerated test case from pr 230801 for change in LLVM IR syntax	Bill Schmidt	2015-02-27	1	-0/+78
\| \| \| \|	llvm-svn: 230811
*	Revert test case until it can be fixed	Bill Schmidt	2015-02-27	1	-65/+0
\| \| \| \|	llvm-svn: 230803
*	[PowerPC] Fix PR22711 - Misaligned .toc section	Bill Schmidt	2015-02-27	1	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \|	Straightforward patch to emit an alignment directive when emitting a TOC entry. The test case was generated from the test in PR22711 that demonstrated a misaligned .toc section. The object code is run through llvm-readobj to verify that the correct alignment has been applied to the .toc section. Thanks to Ulrich Weigand for running down where the fix was needed. llvm-svn: 230801
*	[opaque pointer type] Add textual IR support for explicit type parameter to ↵	David Blaikie	2015-02-27	224	-1331/+1331
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
*	[PowerPC] Use vector types for memcpy and friends (sometimes)	Hal Finkel	2015-02-27	2	-2/+114
\| \| \| \| \| \| \| \| \| \|	When using Altivec, we can use vector loads and stores for aligned memcpy and friends. Starting with the P7 and VXS, we have reasonable unaligned vector stores. Starting with the P8, we have fast unaligned loads too. For QPX, we use vector loads are stores, but only for aligned memory accesses. llvm-svn: 230788
*	[opaque pointer type] Add textual IR support for explicit type parameter to ↵	David Blaikie	2015-02-27	103	-1360/+1360
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
*	Change the fast-isel-abort option from bool to int to enable "levels"	Mehdi Amini	2015-02-27	18	-26/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently fast-isel-abort will only abort for regular instructions, and just warn for function calls, terminators, function arguments. There is already fast-isel-abort-args but nothing for calls and terminators. This change turns the fast-isel-abort options into an integer option, so that multiple levels of strictness can be defined. This will help no being surprised when the "abort" option indeed does not abort, and enables the possibility to write test that verifies that no intrinsics are forgotten by fast-isel. Reviewers: resistor, echristo Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D7941 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 230775
*	[PowerPC] Make LDtocL and friends invariant loads	Hal Finkel	2015-02-25	4	-37/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node, represent loads from the TOC, which is invariant. As a result, these loads can be hoisted out of loops, etc. In order to do this, we need to generate GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory intrinsic node type. Once this is done, the MMO transfer is automatically handled for TableGen-driven instruction selection, and for nodes generated directly in PPCISelDAGToDAG, we need to transfer the MMOs manually. Also, we were not transferring MMOs associated with pre-increment loads, so do that too. Lastly, this fixes an exposed bug where R30 was not added as a defined operand of UpdateGBR. This problem was highlighted by an example (used to generate the test case) posted to llvmdev by Francois Pichet. llvm-svn: 230553
*	[PowerPC] Add triples to QPX tests	Hal Finkel	2015-02-25	7	-0/+7
\| \| \| \| \| \| \|	Some of these tests fail on Darwin systems because of a lack of a triple; fix that. llvm-svn: 230421
*	[PowerPC] Add support for the QPX vector instruction set	Hal Finkel	2015-02-25	13	-1/+850
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for the QPX vector instruction set, which is used by the enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes wide, holding 4 double-precision floating-point values. Boolean values, modeled here as <4 x i1> are actually also represented as floating-point values (essentially { -1, 1 } for { false, true }). QPX shares many features with Altivec and VSX, but is distinct from both of them. One major difference is that, instead of adding completely-separate vector registers, QPX vector registers are extensions of the scalar floating-point registers (lane 0 is the corresponding scalar floating-point value). The operations supported on QPX vectors mirrors that supported on the scalar floating-point values (with some additional ones for permutations and logical/comparison operations). I've been maintaining this support out-of-tree, as part of the bgclang project, for several years. This is not the entire bgclang patch set, but is most of the subset that can be cleanly integrated into LLVM proper at this time. Adding this to the LLVM backend is part of my efforts to rebase bgclang to the current LLVM trunk, but is independently useful (especially for codes that use LLVM as a JIT in library form). The assembler/disassembler test coverage is complete. The CodeGen test coverage is not, but I've included some tests, and more will be added as follow-up work. llvm-svn: 230413
*	I incorrectly marked the VORC instruction as isCommutable when I added it.	Kit Barton	2015-02-20	1	-5/+8
\| \| \| \| \| \| \| \|	This fix removes the VORC instruction definition from the isCommutable block. Phabricator review: http://reviews.llvm.org/D7772 llvm-svn: 230020
*	[PowerPC] Loop Data Prefetching for the BG/Q	Hal Finkel	2015-02-20	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it prefetches into its own L1P buffer, and the latency to access that buffer is significantly higher than that to the L1 cache (although smaller than the latency to the L2 cache). As a result, especially when multiple hardware threads are not actively busy, explicitly prefetching data into the L1 cache is advantageous. I've been using this pass out-of-tree for data prefetching on the BG/Q for well over a year, and it has worked quite well. It is enabled by default only for the BG/Q, but can be enabled for other cores as well via a command-line option. Eventually, we might want to add some TTI interfaces and move this into Transforms/Scalar (there is nothing particularly target dependent about it, although only machines like the BG/Q will benefit from its simplistic strategy). llvm-svn: 229966
*	This patch adds the VSX logical instructions introduced in the Power ISA ↵	Kit Barton	2015-02-18	2	-1/+52
\| \| \| \| \| \| \| \| \| \|	2.07. It also removes the added complexity that favors VMX versions of the three instructions. Phabricator review: http://reviews.llvm.org/D7616 Commiting on Nemanja's behalf. llvm-svn: 229694
*	Move ABI handling and 64-bitness to the PowerPC target machine.	Eric Christopher	2015-02-17	1	-4/+4
\| \| \| \| \| \| \|	This required changing how the computation of the ABI is handled and how some of the checks for ABI/target are done. llvm-svn: 229471
*	[PowerPC] Support non-direct-sub/superclass VSX copies	Hal Finkel	2015-02-16	2	-0/+248
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our register allocation has become better recently, it seems, and is now starting to generate cross-block copies into inflated register classes. These copies are not transformed into subregister insertions/extractions by the PPCVSXCopy class, and so need to be handled directly by PPCInstrInfo::copyPhysReg. The code to do this was almost there, but not quite (it was unnecessarily restricting itself to only the direct sub/super-register-class case (not copying between, for example, something in VRRC and the lower-half of VSRC which are super-registers of F8RC). Triggering this behavior manually is difficult; I'm including two bugpoint-reduced test cases from the test suite. llvm-svn: 229457
*	[CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to ↵	Andrea Di Biagio	2015-02-13	1	-41/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	speculate calls to cttz/ctlz. SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are 'cheap' for the target. Therefore, some of the logic in CodeGenPrepare that was originally added at revision 224899 can now be removed. This patch is basically a no functional change. It removes the duplicated logic in CodeGenPrepare and converts all the existing target specific tests for cttz/ctlz into SimplifyCFG tests. Differential Revision: http://reviews.llvm.org/D7608 llvm-svn: 229105
*	[SDAG] Don't try to use FP_EXTEND/FP_ROUND for int<->fp promotions	Hal Finkel	2015-02-12	1	-0/+23
\| \| \| \| \| \| \| \| \| \|	The PowerPC backend has long promoted some floating-point vector operations (such as select) to integer vector operations. Unfortunately, this behavior was broken by r216555. When using FP_EXTEND/FP_ROUND for promotions, we must check that both the old and new types are floating-point types. Otherwise, we must use BITCAST as we did prior to r216555 for everything. llvm-svn: 228969
*	[PowerPC] Mark jumps as expensive (using using CR bits)	Hal Finkel	2015-02-12	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On PowerPC, which has a full set of logical operations on (its multiple sets of) condition-register bits, it is not profitable to break of complex conditions feeding a jump into multiple jumps. We can turn off this feature of CGP/SDAGBuilder by marking jumps as "expensive". P7 test-suite speedups (no regressions): MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.626647% +/- 0.323583% MultiSource/Benchmarks/Olden/power/power -18.2821% +/- 8.06481% llvm-svn: 228895
*	Fix overly prescriptive test that broken on Mac after r228725.	Daniel Jasper	2015-02-10	1	-8/+8
\| \| \| \|	llvm-svn: 228742
*	[PowerPC] Fix reverted patch r227976 to avoid register assignment issues	Bill Schmidt	2015-02-10	3	-6/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See full discussion in http://reviews.llvm.org/D7491. We now hide the add-immediate and call instructions together in a separate pseudo-op, which is tagged to define GPR3 and clobber the call-killed registers. The PPCTLSDynamicCall pass prior to RA now expands this op into the two separate addi and call ops, with explicit definitions of GPR3 on both instructions, and explicit clobbers on the call instruction. The pass is now marked as requiring and preserving the LiveIntervals and SlotIndexes analyses, and fixes these up after the replacement sequences are introduced. Self-hosting has been verified on LE P8 and BE P7 with various optimization levels, etc. It has also been verified with the --no-tls-optimize flag workaround removed. llvm-svn: 228725
*	This change implements the following three logical vector operations:	Kit Barton	2015-02-09	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \|	veqv (vector equivalence) vnand vorc I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions. Phabricator review: http://reviews.llvm.org/D7469 llvm-svn: 228580
*	[PowerPC] Handle loop predecessor invokes	Hal Finkel	2015-02-07	1	-0/+50
\| \| \| \| \| \| \| \| \| \|	If a loop predecessor has an invoke as its terminator, and the return value from that invoke is used to determine the loop iteration space, then we can't insert a computation based on that value in the loop predecessor prior to the terminator (oops). If there's such an invoke, or just no predecessor for that matter, insert a new loop preheader. llvm-svn: 228488
*	[PowerPC] Fixup incomplete revert of test/CodeGen/PowerPC/tls-pic.ll	Hal Finkel	2015-02-06	1	-7/+7
\| \| \| \|	llvm-svn: 228467