bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add saving and restoring of r30 to the prologue and epilogue, respectively	Justin Hibbits	2015-01-08	2	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that. Test Plan: Tests updated. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6876 llvm-svn: 225450
*	[SelectionDAG] Allow targets to specify legality of extloads' result	Ahmed Bougacha	2015-01-08	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
*	[CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.	Ahmed Bougacha	2015-01-07	1	-8/+2
\| \| \| \| \| \| \| \|	A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392
*	[PowerPC] Transform a README.txt entry into a FIXME	Hal Finkel	2015-01-07	2	-14/+9
\| \| \| \| \| \| \| \| \| \|	Remove the README.txt entry regarding register allocation of CR logical ops, and replace it with a FIXME in PPCInstrInfo.td. The text in the README.txt was not really accurate, and thanks goes to Pat Haugen (and Bill Schmidt) from IBM for clarifying what was intended and highlighting the relevant text in the ISA specification. llvm-svn: 225325
*	Revert r224935 "Refactor duplicated code. No intended functionality change."	Lang Hames	2015-01-06	1	-2/+1
\| \| \| \| \| \| \| \|	This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin. Reverting to get the bots green while I track down the source of the changed behavior. llvm-svn: 225311
*	[PowerPC] Reuse a load operand in int->fp conversions	Hal Finkel	2015-01-06	3	-41/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	int->fp conversions on PPC must be done through memory loads and stores. On a modern core, this process begins by storing the int value to memory, then loading it using a (sometimes special) FP load instruction. Unfortunately, we would do this even when the value to be converted was itself a load, and we can just use that same memory location instead of copying it to another first. There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs, because the fp_to_int operand has not been lowered when the int_to_fp is being lowered. We handle this specially by invoking fp_to_int's lowering logic (partially) and getting the necessary memory location (some trivial refactoring was done to make this possible). This is all somewhat ugly, and it would be nice if some later CodeGen stage could just clean this stuff up, but because doing so would involve modifying target-specific nodes (or instructions), it is not immediately clear how that would work. Also, remove a related entry from the README.txt for which we now generate reasonable code. llvm-svn: 225301
*	[PowerPC] Remove old README.txt entry regarding struct passing	Hal Finkel	2015-01-06	1	-8/+0
\| \| \| \| \| \| \|	Because of how Clang represents structs as arrays (at least on non-Darwin platforms), and what SROA does, etc. this is no longer a problem. llvm-svn: 225251
*	[PowerPC] Add some missing names in getTargetNodeName	Hal Finkel	2015-01-06	1	-0/+7
\| \| \| \| \| \|	These are used for debugging output; NFC. llvm-svn: 225249
*	[PowerPC] Improve int_to_fp(fp_to_int(x)) combining	Hal Finkel	2015-01-06	2	-30/+74
\| \| \| \| \| \| \| \| \|	The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x)) without a load/store pair is updated here with support for unsigned integers, and to support single-precision values without a third rounding step, on newer cores with the appropriate instructions. llvm-svn: 225248
*	Revert r225048: It broke ObjC on AArch64.	Lang Hames	2015-01-06	1	-7/+9
\| \| \| \| \| \|	I've filed http://llvm.org/PR22100 to track this issue. llvm-svn: 225228
*	Revert "Use the integrated assembler by default on 32-bit PowerPC and SPARC"	Duncan P. N. Exon Smith	2015-01-05	1	-1/+2
\| \| \| \| \| \| \| \| \|	This reverts commit r225213. It's failing on multiple buildbots [1][2]. [1]: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/22032 [2]: http://lab.llvm.org:8080/green/view/Clang/job/clang-stage1-cmake-RA-incremental_check/2357/ llvm-svn: 225222
*	[PowerPC] Remove old README.txt entry	Hal Finkel	2015-01-05	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We no longer generate horrible code for the stated function: void f(signed char a, _Bool b, _Bool c) { signed char t = 0; if (b) t = a; if (c) *a = t; } for which we now generate: .L.f: andi. 5, 5, 1 cmpldi 1, 4, 0 li 5, 0 beq 1, .LBB0_2 lbz 5, 0(3) .LBB0_2: # %if.end bclr 4, 1, 0 stb 5, 0(3) blr so we don't need the README.txt entry. llvm-svn: 225217
*	[PowerPC] Convert a README.txt entry into a better test	Hal Finkel	2015-01-05	1	-13/+0
\| \| \| \| \| \| \|	We now produce the desired code as noted in the README.txt file (no spurious or). Remove the README entry and improve the regression test. llvm-svn: 225214
*	Use the integrated assembler by default on 32-bit PowerPC and SPARC	Brad Smith	2015-01-05	1	-2/+1
\| \| \| \|	llvm-svn: 225213
*	[PowerPC] Remove README.txt entry	Hal Finkel	2015-01-05	1	-34/+0
\| \| \| \| \| \| \|	This entry has been rendered irrelevant now that we have proper CR bit tracking. llvm-svn: 225211
*	[PowerPC] Add a test for truncating a shifted load	Hal Finkel	2015-01-05	1	-18/+0
\| \| \| \| \| \| \|	We now produce the desired code as noted in the README.txt file. Remove the README entry and add a regression test. llvm-svn: 225209
*	[PowerPC] Add another test for load/store with update	Hal Finkel	2015-01-05	1	-34/+0
\| \| \| \| \| \| \|	We now produce the desired code as noted in the README.txt file. Remove the README entry and add a regression test. llvm-svn: 225205
*	[PowerPC] Fold i1 extensions with other ops	Hal Finkel	2015-01-05	2	-17/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider this function from our README.txt file: int foo(int a, int b) { return (a < b) << 4; } We now explicitly track CR bits by default, so the comment in the README.txt about not really having a SETCC is no longer accurate, but we did generate this somewhat silly code: cmpw 0, 3, 4 li 3, 0 li 12, 1 isel 3, 12, 3, 0 sldi 3, 3, 4 blr which generates the zext as a select between 0 and 1, and then shifts the result by a constant amount. Here we preprocess the DAG in order to fold the results of operations on an extension of an i1 value into the SELECT_I[48] pseudo instruction when the resulting constant can be materialized using one instruction (just like the 0 and 1). This was not implemented as a DAGCombine because the resulting code would have been anti-canonical and depends on replacing chained user nodes, which does not fit well into the lowering paradigm. Now we generate: cmpw 0, 3, 4 li 3, 0 li 12, 16 isel 3, 12, 3, 0 blr which is less silly. llvm-svn: 225203
*	[PowerPC] Remove zexts after i32 ctlz	Hal Finkel	2015-01-05	2	-1/+11
\| \| \| \| \| \| \| \| \|	The 64-bit semantics of cntlzw are not special, the 32-bit population count is stored as a 64-bit value in the range [0,32]. As a result, it is always zero extended, and it can be added to the PPCISelDAGToDAG peephole optimization as a frontier instruction for the removal of unnecessary zero extensions. llvm-svn: 225192
*	[PowerPC] Remove zexts after byte-swapping loads	Hal Finkel	2015-01-05	2	-0/+16
\| \| \| \| \| \| \| \| \|	lhbrx and lwbrx not only load their data with byte swapping, but also clear the upper 32 bits (at least). As a result, they can be added to the PPCISelDAGToDAG peephole optimization as frontier instructions for the removal of unnecessary zero extensions. llvm-svn: 225189
*	[PowerPC] Enable speculation of cttz/ctlz	Hal Finkel	2015-01-05	1	-0/+8
\| \| \| \| \| \| \| \|	PPC has an instruction for ctlz with defined zero behavior, and our lowering of cttz (provided by DAGCombine) is also efficient and branchless, so speculating these makes sense. llvm-svn: 225150
*	[PowerPC] Materialize i64 constants using rotation with masking	Hal Finkel	2015-01-05	2	-26/+51
\| \| \| \| \| \| \| \| \|	r225135 added the ability to materialize i64 constants using rotations in order to reduce the instruction count. Sometimes we can use a rotation only with some extra masking, so that we take advantage of the fact that generating a bunch of extra higher-order 1 bits is easy using li/lis. llvm-svn: 225147
*	[PowerPC] Materialize i64 constants using rotation	Hal Finkel	2015-01-04	2	-29/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing a rotated constant, and then applying the inverse rotation, requires fewer instructions than the direct method. If so, do that instead. In r225132, I added support for forming constants using bit inversion. In effect, this reverts that commit and replaces it with rotation support. The bit inversion is useful for turning constants that are mostly ones into ones that are mostly zeros (thus enabling a more-efficient shift-based materialization), but the same effect can be obtained by using negative constants and a rotate, and that is at least as efficient, if not more. llvm-svn: 225135
*	[PowerPC] Materialize i64 constants using bit inversion	Hal Finkel	2015-01-04	1	-2/+30
\| \| \| \| \| \| \| \| \|	Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing the bit-reversed constant, and then flipping the bits, requires fewer instructions than the direct method. If so, do that instead. llvm-svn: 225132
*	[PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference	Hal Finkel	2015-01-03	2	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing code provided for specifying a global loop alignment preference. However, the preferred loop alignment might depend on the loop itself. For recent POWER cores, loops between 5 and 8 instructions should have 32-byte alignment (while the others are better with 16-byte alignment) so that the entire loop will fit in one i-cache line. To support this, getPrefLoopAlignment has been made virtual, and can be provided with an optional MachineLoop* so the target can inspect the loop before answering the query. The default behavior, as before, is to return the value set with setPrefLoopAlignment. MachineBlockPlacement now queries the target for each loop instead of only once per function. There should be no functional change for other targets. llvm-svn: 225117
*	[PowerPC] Use 16-byte alignment for modern cores for functions/loops	Hal Finkel	2015-01-03	2	-4/+45
\| \| \| \| \| \| \| \| \| \| \| \|	Most modern PowerPC cores prefer that functions and loops start on 16-byte-aligned boundaries (), so instruct block placement, etc. to make this happen. The branch selector has also been adjusted so account for the extra nops that might now be inserted before loop headers. () Some cores actually prefer other alignments for small loops, but that will be addressed in a follow-up commit. llvm-svn: 225115
*	Minor cleanup to all the switches after MatchInstructionImpl in all the ↵	Craig Topper	2015-01-03	1	-2/+1
\| \| \| \| \| \| \| \|	AsmParsers. Make sure they all have llvm_unreachable on the default path out of the switch. Remove unnecessary "default: break". Remove a 'return' after unreachable. Fix some indentation. llvm-svn: 225114
*	[PowerPC] Add support for the CMPB instruction	Hal Finkel	2015-01-03	8	-8/+278
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Newer POWER cores, and the A2, support the cmpb instruction. This instruction compares its operands, treating each of the 8 bytes in the GPRs separately, returning a 'mask' result of 0 (for false) or -1 (for true) in each byte. Code generation support is added, in the form of a PPCISelDAGToDAG DAG-preprocessing routine, that recognizes patterns close to what the instruction computes (either exactly, or related by a constant masking operation), and generates the cmpb instruction (along with any necessary constant masking operation). This can be expanded if use cases arise. llvm-svn: 225106
*	[PowerPC] use UINT64_C instead of ul	Hal Finkel	2015-01-01	1	-4/+4
\| \| \| \| \| \| \|	Attempting to fix PR22078 (building on 32-bit systems) by replacing my careless use of 1ul to be a uint64_t constant with UINT64_C(1). llvm-svn: 225066
*	[PowerPC] Improve instruction selection bit-permuting operations (64-bit)	Hal Finkel	2015-01-01	2	-97/+868
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the second installment of improvements to instruction selection for "bit permutation" instruction sequences. r224318 added logic for instruction selection for 32-bit bit permutation sequences, and this adds lowering for 64-bit sequences. The 64-bit sequences are more complicated than the 32-bit ones because: a) the 64-bit versions of the 32-bit rotate-and-mask instructions work by replicating the lower 32-bits of the value-to-be-rotated into the upper 32 bits -- and integrating this into the cost modeling for the various bit group operations is non-trivial b) unlike the 32-bit instructions in 32-bit mode, the rotate-and-mask instructions cannot, in one instruction, specify the mask starting index, the mask ending index, and the rotation factor. Also, forming arbitrary 64-bit constants is more complicated than in 32-bit mode because the number of instructions necessary is value dependent. Plus, support for 'late masking' was added: it is sometimes more efficient to treat the overall value as if it had no mandatory zero bits when planning the bit-group insertions, and then mask them in at the very end. Unfortunately, as the structure of the bit groups is different in the two cases, the more feasible implementation technique was to generate both instruction sequences, and then pick the shorter one. And finally, we now generate reasonable code for i64 bswap: rldicl 5, 3, 16, 0 rldicl 4, 3, 8, 0 rldicl 6, 3, 24, 0 rldimi 4, 5, 8, 48 rldicl 5, 3, 32, 0 rldimi 4, 6, 16, 40 rldicl 6, 3, 48, 0 rldimi 4, 5, 24, 32 rldicl 5, 3, 56, 0 rldimi 4, 6, 40, 16 rldimi 4, 5, 48, 8 rldimi 4, 3, 56, 0 vs. what we used to produce: li 4, 255 rldicl 5, 3, 24, 40 rldicl 6, 3, 40, 24 rldicl 7, 3, 56, 8 sldi 8, 3, 8 sldi 10, 3, 24 sldi 12, 3, 40 rldicl 0, 3, 8, 56 sldi 9, 4, 32 sldi 11, 4, 40 sldi 4, 4, 48 andi. 5, 5, 65280 andis. 6, 6, 255 andis. 7, 7, 65280 sldi 3, 3, 56 and 8, 8, 9 and 4, 12, 4 and 9, 10, 11 or 6, 7, 6 or 5, 5, 0 or 3, 3, 4 or 7, 9, 8 or 4, 6, 5 or 3, 3, 7 or 3, 3, 4 which is 12 instructions, instead of 25, and seems optimal (at least in terms of code size). llvm-svn: 225056
*	Add r224985 back with a fix.	Rafael Espindola	2014-12-31	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The issues was that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. Original message: Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225048
*	Revert "Remove doesSectionRequireSymbols."	Rafael Espindola	2014-12-31	1	-7/+9
\| \| \| \| \| \| \| \|	This reverts commit r224985. I am investigating why it made an Apple bot unhappy. llvm-svn: 225044
*	Remove doesSectionRequireSymbols.	Rafael Espindola	2014-12-30	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 224985
*	Refactor duplicated code.	Rafael Espindola	2014-12-29	1	-1/+2
\| \| \| \| \| \|	No intended functionality change. llvm-svn: 224935
*	PowerPC: CTR shouldn't fire if a TLS call is in the loop	David Majnemer	2014-12-27	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Determining the address of a TLS variable results in a function call in certain TLS models. This means that a simple ICmpInst might actually result in invalidating the CTR register. In such cases, do not attempt to rely on the CTR register for loop optimization purposes. This fixes PR22034. Differential Revision: http://reviews.llvm.org/D6786 llvm-svn: 224890
*	[PowerPC] [FastISel] i1 constants must be zero extended	Hal Finkel	2014-12-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	When materializing constant i1 values, they must be zero extended. We represent i1 values as [0, 1], not [0, -1], in i32 registers. As it turns out, this code path was dead for i1 values prior to r216006 (which is why this did not manifest in miscompiles until recently). Fixes -O0 self-hosting on PPC64/Linux. llvm-svn: 224842
*	[PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64	Hal Finkel	2014-12-23	5	-13/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction sequence is interpreted by the unwinding code in libgcc. To make sure these occur as a pair, as with other pairings interpreted by the linker, fuse the two instructions into one instruction (for code generation only). In the future, we might wish to do this by emitting CFI directives instead, but this solution is simpler, and mirrors what GCC does. Additional discussion on this point is contained in the PR. Fixes PR22015. llvm-svn: 224788
*	[PowerPC] Don't mark the return-address slot as immutable	Hal Finkel	2014-12-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	It is tempting to mark the fixed stack slot used to store the return address as immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the function, it is not completely immutable: it is written during the function prologue. When using post-RA instruction scheduling, the prologue instructions are available for scheduling, and we're not free to interchange the order of a particular store in the prologue with loads from that stack location. Fixes PR21976. llvm-svn: 224761
*	[PowerPC] Don't attempt a 64-bit pow2 division on PPC32	Hal Finkel	2014-12-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	In r224033, in moving the signed power-of-2 division expansion into BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a 64-bit division on PPC32. This later asserts. Fixes PR21928. llvm-svn: 224758
*	[PowerPC] Use MCPhysReg for tables of registers. Const-correct the tables. ↵	Craig Topper	2014-12-18	1	-12/+12
\| \| \| \| \| \|	Only put the anonymous namespace around classes. NFC. llvm-svn: 224498
*	Enable the P8Model entry	Will Schmidt	2014-12-17	1	-1/+1
\| \| \| \| \| \| \| \|	This was missed last time around, for the P8 Instruction Scheduling changes (223257). This will hook the P8Model entry in so those changes will actually be used. llvm-svn: 224452
*	[PowerPC] Improve instruction selection bit-permuting operations (32-bit)	Hal Finkel	2014-12-16	2	-55/+480
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PowerPC backend, somewhat embarrassingly, did not generate an optimal-length sequence of instructions for a 32-bit bswap. While adding a pattern for the bswap intrinsic to fix this would not have been terribly difficult, doing so would not have addressed the real problem: we had been generating poor code for many bit-permuting operations (by which I mean things like byte-swap that permute the bits of one or more inputs around in various ways). Here are some initial steps toward solving this deficiency. Bit-permuting operations are represented, at the SDAG level, using ISD::ROTL, SHL, SRL, AND and OR (mostly with constant second operands). Looking back through these operations, we can build up a description of the bits in the resulting value in terms of bits of one or more input values (and constant zeros). For each bit, we compute the rotation amount from the original value, and then group consecutive (value, rotation factor) bits into groups. Groups sharing these attributes are then collected and sorted, and we can then instruction select the entire permutation using a combination of masked rotations (rlwinm), imm ands (andi/andis), and masked rotation inserts (rlwimi). The result is that instead of lowering an i32 bswap as: rlwinm 5, 3, 24, 16, 23 rlwinm 4, 3, 24, 0, 7 rlwimi 4, 3, 8, 8, 15 rlwimi 5, 3, 8, 24, 31 rlwimi 4, 5, 0, 16, 31 we now produce: rlwinm 4, 3, 8, 0, 31 rlwimi 4, 3, 24, 16, 23 rlwimi 4, 3, 24, 0, 7 and for the 'test6' example in the PowerPC/README.txt file: unsigned test6(unsigned x) { return ((x & 0x00FF0000) >> 16) \| ((x & 0x000000FF) << 16); } we used to produce: lis 4, 255 rlwinm 3, 3, 16, 0, 31 ori 4, 4, 255 and 3, 3, 4 and now we produce: rlwinm 4, 3, 16, 24, 31 rlwimi 4, 3, 16, 8, 15 and, as a nice bonus, this fixes the FIXME in test/CodeGen/PowerPC/rlwimi-and.ll. This commit does not include instruction-selection for i64 operations, those will come later. llvm-svn: 224318
*	[PowerPC] Handle cmp op promotion for SELECT[_CC] nodes in ↵	Hal Finkel	2014-12-14	1	-18/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PPCTL::DAGCombineExtBoolTrunc PPCTargetLowering::DAGCombineExtBoolTrunc contains logic to remove unwanted truncations and extensions when dealing with nodes of the form: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) There was a FIXME in the implementation (now removed) regarding the fact that the function would abort the transformations if any of the non-output operands of a SELECT or SELECT_CC node would need to be promoted (because they were also output operands, for example). As a result, we continued to generate unnecessary zero-extends for code such as this: unsigned foo(unsigned a, unsigned b) { return (a <= b) ? a : b; } which would produce: cmplw 0, 3, 4 isel 3, 4, 3, 1 rldicl 3, 3, 0, 32 blr and now we produce: cmplw 0, 3, 4 isel 3, 4, 3, 1 blr which is better in the obvious way. llvm-svn: 224213
*	[PowerPC] Add a DAGToDAG peephole to remove unnecessary zero-exts	Hal Finkel	2014-12-12	3	-5/+310
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On PPC64, we end up with lots of i32 -> i64 zero extensions, not only from all of the usual places, but also from the ABI, which specifies that values passed are zero extended. Almost all 32-bit PPC instructions in PPC64 mode are defined to do something to the higher-order bits, and for some instructions, that action clears those bits (thus providing a zero-extended result). This is especially common after rotate-and-mask instructions. Adding an additional instruction to zero-extend the results of these instructions is unnecessary. This PPCISelDAGToDAG peephole optimization examines these zero-extensions, and looks back through their operands to see if all instructions will implicitly zero extend their results. If so, we convert these instructions to their 64-bit variants (which is an internal change only, the actual encoding of these instructions is the same as the original 32-bit ones) and remove the unnecessary zero-extension (changing where the INSERT_SUBREG instructions are to make everything internally consistent). llvm-svn: 224169
*	[PowerPC] Better lowering for add/or of a FrameIndex	Hal Finkel	2014-12-11	2	-30/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have an add (or an or that is really an add), where one operand is a FrameIndex and the other operand is a small constant, we can combine the lowering of the FrameIndex (which is lowered as an add of the FI and a zero offset) with the constant operand. Amusingly, this is an old potential improvement entry from lib/Target/PowerPC/README.txt which had never been resolved. In short, we used to lower: %X = alloca { i32, i32 } %Y = getelementptr {i32,i32}* %X, i32 0, i32 1 ret i32* %Y as: addi 3, 1, -8 ori 3, 3, 4 blr and now we produce: addi 3, 1, -4 blr which is much more sensible. llvm-svn: 224071
*	[CodeGen] Add print and verify pass after each MachineFunctionPass by default	Matthias Braun	2014-12-11	1	-13/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059
*	This reverts commit r224043 and r224042.	Rafael Espindola	2014-12-11	1	-9/+13
\| \| \| \| \| \|	check-llvm was failing. llvm-svn: 224045
*	[CodeGen] Add print and verify pass after each MachineFunctionPass by default	Matthias Braun	2014-12-11	1	-13/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042
*	[PowerPC] Implement BuildSDIVPow2, lower i64 pow2 sdiv using sradi	Hal Finkel	2014-12-11	3	-30/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PPCISelDAGToDAG contained existing code to lower i32 sdiv by a power-of-2 using srawi/addze, but did not implement the i64 case. DAGCombine now contains a callback specifically designed for this purpose (BuildSDIVPow2), and part of the logic has been moved to an implementation of that callback. Doing this lowering using BuildSDIVPow2 likely does not matter, compared to handling everything in PPCISelDAGToDAG, for the positive divisor case, but the negative divisor case, which generates an additional negation, can potentially benefit from additional folding from DAGCombine. Now, both the i32 and the i64 cases have been implemented. Fixes PR20732. llvm-svn: 224033
*	[PowerPC 4/4] Enable little-endian support for VSX.	Bill Schmidt	2014-12-09	1	-7/+0
\| \| \| \| \| \| \| \|	With the foregoing three patches, VSX instructions can be used for little endian. This patch removes the restriction that prevented this, and re-enables the test cases from the first three patches. llvm-svn: 223792