summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
* [PowerPC] Don't hard-code R2 as register when processing TOC relocationsHal Finkel2015-01-181-3/+3
| | | | | | | | | | | Instructions that have high-order TOC relocations always carry R2 as their base register, so it does not matter whether we take the register from the instruction or just hard-code it in PPCAsmPrinter. In the future, however, we might want to apply these relocations to instructions using a different register, so taking the register from the instruction is a better thing to do. No change in functionality here, however. llvm-svn: 226403
* [PowerPC] Add some FIXMEs for fastcc and FPR <-> GPR movesHal Finkel2015-01-181-0/+6
| | | | | | | So we don't forget, once we support FPR <-> GPR moves on the P8, we'll likely want to re-visit this part of the calling convention. llvm-svn: 226401
* [PowerPC] Initial PPC64 calling-convention changes for fastccHal Finkel2015-01-182-63/+161
| | | | | | | | | | | | | | | | | The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is designed to work with both prototyped and non-prototyped/varargs functions. As a result, GPRs and stack space are allocated for every argument, even those that are passed in floating-point or vector registers. GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that do not have their address taken) to use the 'fast' calling convention. When functions are using the 'fast' calling convention, don't allocate GPRs for arguments passed in other types of registers, and don't allocate stack space for arguments passed in registers. Other changes for the fast calling convention may be added in the future. llvm-svn: 226399
* [PM] Split the LoopInfo object apart from the legacy pass, creatingChandler Carruth2015-01-171-4/+4
| | | | | | | | | | a LoopInfoWrapperPass to wire the object up to the legacy pass manager. This switches all the clients of LoopInfo over and paves the way to port LoopInfo to the new pass manager. No functionality change is intended with this iteration. llvm-svn: 226373
* [PowerPC] Don't list R11 as a patchpoint scratch registerHal Finkel2015-01-171-9/+1
| | | | | | | | | | R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is reserved for use as an "environment pointer" for compilation models that require such a thing. We don't, we also don't need a second scratch register, and because we support only "local" patchpoint call targets, we might as well let R11 be used for anyregcc patchpoints. llvm-svn: 226369
* [PowerPC] Adjust PatchPoints for ppc64leHal Finkel2015-01-161-1/+9
| | | | | | | | | | Bill Schmidt pointed out that some adjustments would be needed to properly support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available as a scratch register, so we need to use R12. R12 is also available under ELF V1, so to maintain consistency, I flipped the order to make R12 the first scratch register in the array under both ABIs. llvm-svn: 226247
* [PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect callsHal Finkel2015-01-158-80/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the POWER7, A2 and earlier cores) are really pointers to a function descriptor, a structure with three pointers: the actual pointer to the code to which to jump, the pointer to the TOC needed by the callee, and an environment pointer. We used to chain these loads, and make them opaque to the rest of the optimizer, so that they'd always occur directly before the call. This is not necessary, and in fact, highly suboptimal on embedded cores. Once the function pointer is known, the loads can be performed ahead of time; in fact, they can be hoisted out of loops. Now these function descriptors are almost always generated by the linker, and thus the contents of the descriptors are invariant. As a result, by default, we'll mark the associated loads as invariant (allowing them to be hoisted out of loops). I've added a target feature to turn this off, however, just in case someone needs that option (constructing an on-stack descriptor, casting it to a function pointer, and then calling it cannot be well-defined C/C++ code, but I can imagine some JIT-compilation system doing so). Consider this simple test: $ cat call.c typedef void (*fp)(); void bar(fp x) { for (int i = 0; i < 1600000000; ++i) x(); } $ cat main.c typedef void (*fp)(); void bar(fp x); void foo() {} int main() { bar(foo); } On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads as invariant brings the execution time down to ~8 seconds from ~32 seconds with the loads in the loop. The difference on the POWER7 is smaller. Compiling with: gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2] clang -O3 -mcpu=native call.c main.c : ~5.3 seconds clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds (looks like we'd benefit from additional loop unrolling here, as a first guess, because this is faster with the extra loads) The -mno-invariant-function-descriptors will be added to Clang shortly. llvm-svn: 226207
* [PM] Separate the TargetLibraryInfo object from the immutable pass.Chandler Carruth2015-01-151-1/+2
| | | | | | | | | | | | | | The pass is really just a means of accessing a cached instance of the TargetLibraryInfo object, and this way we can re-use that object for the new pass manager as its result. Lots of delta, but nothing interesting happening here. This is the common pattern that is developing to allow analyses to live in both the old and new pass manager -- a wrapper pass in the old pass manager emulates the separation intrinsic to the new pass manager between the result and pass for analyses. llvm-svn: 226157
* [PM] Move TargetLibraryInfo into the Analysis library.Chandler Carruth2015-01-151-1/+1
| | | | | | | | | | | | | | | | While the term "Target" is in the name, it doesn't really have to do with the LLVM Target library -- this isn't an abstraction which LLVM targets generally need to implement or extend. It has much more to do with modeling the various runtime libraries on different OSes and with different runtime environments. The "target" in this sense is the more general sense of a target of cross compilation. This is in preparation for porting this analysis to the new pass manager. No functionality changed, and updates inbound for Clang and Polly. llvm-svn: 226078
* [PowerPC] Add assembler support for mcrfs and friendsHal Finkel2015-01-153-4/+69
| | | | | | | | | | Fill out our support for the floating-point status and control register instructions (mcrfs and friends). As it turns out, these are necessary for compiling src/test/harness_fp.h in TBB for PowerPC. Thanks to Raf Schietekat for reporting the issue! llvm-svn: 226070
* [PPC64] Add support for the ICBT instruction on POWER8.Bill Schmidt2015-01-144-15/+22
| | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Support for the ICBT instruction is currently present, but limited to embedded processors. This change adds a new FeatureICBT that can be used to identify whether the ICBT instruction is available on a specific processor. Two new tests are added: * Positive test to ensure the icbt instruction is present when using -mcpu=pwr8 * Negative test to ensure the icbt instruction is not generated when using -mcpu=pwr7 Both test cases use the Prefetch opcode in LLVM. They are based on the ppc64-prefetch.ll test case. llvm-svn: 226033
* Revert "Add r224985 back with two fixes."Rafael Espindola2015-01-141-7/+9
| | | | | | This reverts commit r225644 while I debug a regression. llvm-svn: 226022
* [cleanup] Re-sort all the #include lines in LLVM usingChandler Carruth2015-01-145-5/+5
| | | | | | | | | | | utils/sort_includes.py. I clearly haven't done this in a while, so more changed than usual. This even uncovered a missing include from the InstrProf library that I've added. No functionality changed here, just mechanical cleanup of the include order. llvm-svn: 225974
* Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""Hal Finkel2015-01-1411-94/+323
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-applies r225808, fixed to avoid problems with SDAG dependencies along with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs. These problems caused the original regression tests to assert/segfault on many (but not all) systems. Original commit message: This commit does two things: 1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this common code for lowering some common intrinsics, stackmap/patchpoint among them). 2. Adds support for stackmap/patchpoint lowering. For the most part, this is very similar to the support in the AArch64 target, with the obvious differences (different registers, NOP instructions, etc.). The test cases are adapted from the AArch64 test cases. One difference of note is that the patchpoint call sequence takes 24 bytes, so you can't use less than that (on AArch64 you can go down to 16). Also, as noted in the docs, we take the patchpoint address to be the actual code address (assuming the call is local in the TOC-sharing sense), which should yield higher performance than generating the full cross-DSO indirect-call sequence and is likely just as useful for JITed code (if not, we'll change it). StackMaps and Patchpoints are still marked as experimental, and so this support is doubly experimental. So go ahead and experiment! llvm-svn: 225909
* Use the integrated assembler as default on PowerPCUlrich Weigand2015-01-131-2/+1
| | | | | | | | | | | This was already done in clang, this commit now uses the integrated assembler as default when using LLVM tools directly. A number of test cases using inline asm had to be adapted, either by updating the expected output, or by using -no-integrated-as (for such tests that deliberately use an invalid instruction in inline asm). llvm-svn: 225819
* Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"Hal Finkel2015-01-1311-318/+98
| | | | | | | Reverting this while I investiage buildbot failures (segfaulting in GetCostForDef at ScheduleDAGRRList.cpp:314). llvm-svn: 225811
* [PowerPC] Add missing override keywordHal Finkel2015-01-131-1/+1
| | | | llvm-svn: 225809
* [PowerPC] Add StackMap/PatchPoint supportHal Finkel2015-01-1311-98/+318
| | | | | | | | | | | | | | | | | | | | | | | | | This commit does two things: 1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this common code for lowering some common intrinsics, stackmap/patchpoint among them). 2. Adds support for stackmap/patchpoint lowering. For the most part, this is very similar to the support in the AArch64 target, with the obvious differences (different registers, NOP instructions, etc.). The test cases are adapted from the AArch64 test cases. One difference of note is that the patchpoint call sequence takes 24 bytes, so you can't use less than that (on AArch64 you can go down to 16). Also, as noted in the docs, we take the patchpoint address to be the actual code address (assuming the call is local in the TOC-sharing sense), which should yield higher performance than generating the full cross-DSO indirect-call sequence and is likely just as useful for JITed code (if not, we'll change it). StackMaps and Patchpoints are still marked as experimental, and so this support is doubly experimental. So go ahead and experiment! llvm-svn: 225808
* [PowerPC] Split the blr definition into BLR and BLR8Hal Finkel2015-01-135-6/+13
| | | | | | | | | | We really need a separate 64-bit version of this instruction so that it can be marked as clobbering LR8 (instead of just LR). No change in functionality (although the verifier might be slightly happier), however, it is required for stackmap/patchpoint support. Thus, this will be covered by stackmap test cases once those are added. llvm-svn: 225804
* [PowerPC] Add DWARF numbers for CA (XER), etc.Hal Finkel2015-01-131-5/+4
| | | | | | | | | | | | For registers that have DWARF numbers (like CA, which is really part of XER), add them. Also, RM is not an SPR, and the declaration hack (where it is declared as an SPR with an arbitrary number) is not needed, so just declare it as a register. NFC; although CA's register number will be needed when stackmap/patchpoint support is added. llvm-svn: 225800
* Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now ↵Olivier Sallenave2015-01-132-0/+7
| | | | | | guarded with that hook. llvm-svn: 225795
* Add r224985 back with two fixes.Rafael Espindola2015-01-121-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One is that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. The other is that ld64 requires the relocations to cstring to use linker visible symbols on AArch64. Thanks to Michael Zolotukhin for testing this! Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225644
* [PowerPC] Fix calls to non-function objectsHal Finkel2015-01-121-5/+22
| | | | | | | | | | | | | | | | | | Looking at r225438 inspired me to see how the PowerPC backend handled the situation (calling a bitcasted TLS global), and it turns out we also produced an error (cannot select ...). What it means to "call" something that is not a function is implementation and platform specific, but in the name of doing something (besides crashing), this makes sure we do what GCC does (treat all such calls as calls through a function pointer -- meaning that the pointer is assumed, as is the convention on PPC, to point to a function descriptor structure holding the actual code address along with the function's TOC pointer and environment pointer). As GCC does, we now do the same for calling regular (non-TLS) non-function globals too. I'm not sure whether this is the most useful way to define the behavior, but at least we won't be alone. llvm-svn: 225617
* [PowerPC] Mark zext of a small scalar load as freeHal Finkel2015-01-102-0/+22
| | | | | | | | | | | | | This initial implementation of PPCTargetLowering::isZExtFree marks as free zexts of small scalar loads (that are not sign-extending). This callback is used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to determine whether a zext or an anyext is used to lower illegally-typed PHIs. Because later truncates of zero-extended values are nops, this allows for the elimination of later unnecessary truncations. Fixes the initial complaint associated with PR22120. llvm-svn: 225584
* Remove some whitespace.Justin Hibbits2015-01-101-1/+1
| | | | llvm-svn: 225583
* Fully fix Bug #22115.Justin Hibbits2015-01-102-6/+53
| | | | | | | | | | | | | | | | | Summary: In the previous commit, the register was saved, but space was not allocated. This resulted in the parameter save area potentially clobbering r30, leading to nasty results. Test Plan: Tests updated Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6906 llvm-svn: 225573
* [PowerPC] Readjust the loop unrolling thresholdHal Finkel2015-01-102-4/+4
| | | | | | | | Now that the way that the partial unrolling threshold for small loops is used to compute the unrolling factor as been corrected, a slightly smaller threshold is preferable. This is expected; other targets may need to re-tune as well. llvm-svn: 225566
* Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revisionLang Hames2015-01-091-1/+2
| | | | | | | | | | introduced. A test case for the bug was already committed in r225385. Patch by Rafael Espindola. llvm-svn: 225534
* [PowerPC] Enable late partial unrolling on the POWER7Hal Finkel2015-01-093-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | The P7 benefits from not have really-small loops so that we either have multiple dispatch groups in the loop and/or the ability to form more-full dispatch groups during scheduling. Setting the partial unrolling threshold to 44 seems good, empirically, for the P7. Compared to using no late partial unrolling, this yields the following test-suite speedups: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -66.3253% +/- 24.1975% SingleSource/Benchmarks/Misc-C++/oopack_v1p8 -44.0169% +/- 29.4881% SingleSource/Benchmarks/Misc/pi -27.8351% +/- 12.2712% SingleSource/Benchmarks/Stanford/Bubblesort -30.9898% +/- 22.4647% I've speculatively added a similar setting for the P8. Also, I've noticed that the unroller does not quite calculate the unrolling factor correctly for really tiny loops because it neglects to account for the fact that not every loop body replicant contains an ending branch and counter increment. I'll fix that later. llvm-svn: 225522
* [PowerPC] Add a flag for experimenting with subreg liveness trackingHal Finkel2015-01-092-0/+10
| | | | | | | This cannot yet be enabled by default, it causes ~50 miscompiles in the test suite. llvm-svn: 225497
* [PowerPC] Fold [sz]ext with fp_to_int lowering where possibleHal Finkel2015-01-092-4/+61
| | | | | | | On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32 to i64 into the load necessary for an i64 -> fp conversion. llvm-svn: 225493
* [PowerPC] Mark all instructions as non-cheap for MachineLICMHal Finkel2015-01-081-0/+9
| | | | | | | | | | | | MachineLICM uses a callback named hasLowDefLatency to determine if an instruction def operand has a 'low' latency. If all relevant operands have a 'low' latency, the instruction is considered too cheap to hoist out of loops even in low-register-pressure situations. On PowerPC cores, both the embedded cores and the others, there is no reason to believe that this is a good choice: all instructions have a cost inside a loop, and hoisting them when not limited by register pressure is a reasonable default. llvm-svn: 225471
* Add saving and restoring of r30 to the prologue and epilogue, respectivelyJustin Hibbits2015-01-082-0/+17
| | | | | | | | | | | | | | | | Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that. Test Plan: Tests updated. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6876 llvm-svn: 225450
* [SelectionDAG] Allow targets to specify legality of extloads' resultAhmed Bougacha2015-01-081-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type (in addition to the memory type). The *LoadExt* legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
* [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.Ahmed Bougacha2015-01-071-8/+2
| | | | | | | | A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392
* [PowerPC] Transform a README.txt entry into a FIXMEHal Finkel2015-01-072-14/+9
| | | | | | | | | | Remove the README.txt entry regarding register allocation of CR logical ops, and replace it with a FIXME in PPCInstrInfo.td. The text in the README.txt was not really accurate, and thanks goes to Pat Haugen (and Bill Schmidt) from IBM for clarifying what was intended and highlighting the relevant text in the ISA specification. llvm-svn: 225325
* Revert r224935 "Refactor duplicated code. No intended functionality change."Lang Hames2015-01-061-2/+1
| | | | | | | | This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin. Reverting to get the bots green while I track down the source of the changed behavior. llvm-svn: 225311
* [PowerPC] Reuse a load operand in int->fp conversionsHal Finkel2015-01-063-41/+142
| | | | | | | | | | | | | | | | | | | | | | | int->fp conversions on PPC must be done through memory loads and stores. On a modern core, this process begins by storing the int value to memory, then loading it using a (sometimes special) FP load instruction. Unfortunately, we would do this even when the value to be converted was itself a load, and we can just use that same memory location instead of copying it to another first. There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs, because the fp_to_int operand has not been lowered when the int_to_fp is being lowered. We handle this specially by invoking fp_to_int's lowering logic (partially) and getting the necessary memory location (some trivial refactoring was done to make this possible). This is all somewhat ugly, and it would be nice if some later CodeGen stage could just clean this stuff up, but because doing so would involve modifying target-specific nodes (or instructions), it is not immediately clear how that would work. Also, remove a related entry from the README.txt for which we now generate reasonable code. llvm-svn: 225301
* [PowerPC] Remove old README.txt entry regarding struct passingHal Finkel2015-01-061-8/+0
| | | | | | | Because of how Clang represents structs as arrays (at least on non-Darwin platforms), and what SROA does, etc. this is no longer a problem. llvm-svn: 225251
* [PowerPC] Add some missing names in getTargetNodeNameHal Finkel2015-01-061-0/+7
| | | | | | These are used for debugging output; NFC. llvm-svn: 225249
* [PowerPC] Improve int_to_fp(fp_to_int(x)) combiningHal Finkel2015-01-062-30/+74
| | | | | | | | | The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x)) without a load/store pair is updated here with support for unsigned integers, and to support single-precision values without a third rounding step, on newer cores with the appropriate instructions. llvm-svn: 225248
* Revert r225048: It broke ObjC on AArch64.Lang Hames2015-01-061-7/+9
| | | | | | I've filed http://llvm.org/PR22100 to track this issue. llvm-svn: 225228
* Revert "Use the integrated assembler by default on 32-bit PowerPC and SPARC"Duncan P. N. Exon Smith2015-01-051-1/+2
| | | | | | | | | This reverts commit r225213. It's failing on multiple buildbots [1][2]. [1]: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/22032 [2]: http://lab.llvm.org:8080/green/view/Clang/job/clang-stage1-cmake-RA-incremental_check/2357/ llvm-svn: 225222
* [PowerPC] Remove old README.txt entryHal Finkel2015-01-051-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | We no longer generate horrible code for the stated function: void f(signed char *a, _Bool b, _Bool c) { signed char t = 0; if (b) t = *a; if (c) *a = t; } for which we now generate: .L.f: andi. 5, 5, 1 cmpldi 1, 4, 0 li 5, 0 beq 1, .LBB0_2 lbz 5, 0(3) .LBB0_2: # %if.end bclr 4, 1, 0 stb 5, 0(3) blr so we don't need the README.txt entry. llvm-svn: 225217
* [PowerPC] Convert a README.txt entry into a better testHal Finkel2015-01-051-13/+0
| | | | | | | We now produce the desired code as noted in the README.txt file (no spurious or). Remove the README entry and improve the regression test. llvm-svn: 225214
* Use the integrated assembler by default on 32-bit PowerPC and SPARCBrad Smith2015-01-051-2/+1
| | | | llvm-svn: 225213
* [PowerPC] Remove README.txt entryHal Finkel2015-01-051-34/+0
| | | | | | | This entry has been rendered irrelevant now that we have proper CR bit tracking. llvm-svn: 225211
* [PowerPC] Add a test for truncating a shifted loadHal Finkel2015-01-051-18/+0
| | | | | | | We now produce the desired code as noted in the README.txt file. Remove the README entry and add a regression test. llvm-svn: 225209
* [PowerPC] Add another test for load/store with updateHal Finkel2015-01-051-34/+0
| | | | | | | We now produce the desired code as noted in the README.txt file. Remove the README entry and add a regression test. llvm-svn: 225205
* [PowerPC] Fold i1 extensions with other opsHal Finkel2015-01-052-17/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider this function from our README.txt file: int foo(int a, int b) { return (a < b) << 4; } We now explicitly track CR bits by default, so the comment in the README.txt about not really having a SETCC is no longer accurate, but we did generate this somewhat silly code: cmpw 0, 3, 4 li 3, 0 li 12, 1 isel 3, 12, 3, 0 sldi 3, 3, 4 blr which generates the zext as a select between 0 and 1, and then shifts the result by a constant amount. Here we preprocess the DAG in order to fold the results of operations on an extension of an i1 value into the SELECT_I[48] pseudo instruction when the resulting constant can be materialized using one instruction (just like the 0 and 1). This was not implemented as a DAGCombine because the resulting code would have been anti-canonical and depends on replacing chained user nodes, which does not fit well into the lowering paradigm. Now we generate: cmpw 0, 3, 4 li 3, 0 li 12, 16 isel 3, 12, 3, 0 blr which is less silly. llvm-svn: 225203
OpenPOWER on IntegriCloud