summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.Juergen Ributzka2014-09-301-135/+220
| | | | | | | | | | | | | | The sign-/zero-extension of the loaded value can be performed by the memory instruction for free. If the result of the load has only one use and the use is a sign-/zero-extend, then we emit the proper load instruction. The extend is only a register copy and will be optimized away later on. Other instructions that consume the sign-/zero-extended value are also made aware of this fact, so they don't fold the extend too. This fixes rdar://problem/18495928. llvm-svn: 218653
* [FastISel][AArch64] Factor out scale factor calculation. NFC.Juergen Ributzka2014-09-301-35/+29
| | | | | | | Factor out the code that determines the implicit scale factor of memory operations for a given value type. llvm-svn: 218652
* Simplify conditional.Eric Christopher2014-09-291-1/+1
| | | | llvm-svn: 218643
* [AVX512] Use X86VectorVTInfo in the masking helper classes and the FMAsAdam Nemet2014-09-291-155/+135
| | | | | | | | | | | | | | No functionality change. Makes the code more compact (see the FMA part). This needs a new type attribute MemOpFrag in X86VectorVTInfo. For now I only defined this in the simple cases. See the commment before the attribute. Diff of X86.td.expanded before and after is empty except for the appearance of the new attribute. llvm-svn: 218637
* WinCOFFObjectWriter: optimize the string table for common sufficesHans Wennborg2014-09-293-91/+62
| | | | | | | | This is a follow-up from r207670 which did the same for ELF. Differential Revision: http://reviews.llvm.org/D5530 llvm-svn: 218636
* Add soft-float to the key for the subtarget lookup in the TargetMachineEric Christopher2014-09-291-1/+13
| | | | | | | | | | | map, this makes sure that we can compile the same code for two different ABIs (hard and soft float) in the same module. Update one testcase accordingly (and fix some confusing naming) and add a new testcase as well with the ordering swapped which would highlight the problem. llvm-svn: 218632
* Fix spelling and reflow comments.Eric Christopher2014-09-291-6/+5
| | | | llvm-svn: 218631
* [AArch64] Refines the Cortex-A57 Machine ModelDave Estes2014-09-292-24/+395
| | | | | | | | | | | | | | | Primarily refines all of the instructions with accurate latency and micro-op information. Refinements largely focus on the NEON instructions. Additionally, a few advanced features are modeled, including forwarding for MAC instructions and hazards for floating point SQRT and DIV. Lastly, the issue-width is reduced to three so that the scheduler will better accommodate the narrower decode and dispatch width. llvm-svn: 218627
* Unit test r218187, changing RTDyldMemoryManager::getSymbolAddress's behavior ↵David Blaikie2014-09-291-1/+1
| | | | | | | | | | | favor mangled lookup over unmangled lookup. The contract of this function seems problematic (fallback in either direction seems like it could produce bugs in one client or another), but here's some tests for its current behavior, at least. See the commit/review thread of r218187 for more discussion. llvm-svn: 218626
* Fix include orderMatt Arsenault2014-09-291-1/+1
| | | | llvm-svn: 218611
* R600/SI: Fix hardcoded values for modifiers.Matt Arsenault2014-09-294-19/+22
| | | | | | Move enums to SIDefines.h llvm-svn: 218610
* R600/SI: Also fix fsub + fadd a, a to mad combinesMatt Arsenault2014-09-291-0/+22
| | | | llvm-svn: 218609
* R600/SI: Fix using mad with multiplies by 2Matt Arsenault2014-09-291-0/+35
| | | | | | | | | These turn into fadds, so combine them into the target mad node. fadd (fadd (a, a), b) -> mad 2.0, a, b llvm-svn: 218608
* [AArch64] Improve cost model to handle sdiv by a pow-of-two.Chad Rosier2014-09-291-0/+23
| | | | | | | | | | | This patch improves the target-specific cost model to better handle signed division by a power of two. The immediate result is that this enables the SLP vectorizer to do a better job. http://reviews.llvm.org/D5469 PR20714 llvm-svn: 218607
* Store TypeUnits in a SmallVector<DWARFUnitSection> instead of a single ↵Frederic Riss2014-09-293-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | DWARFUnitSection. There will be multiple TypeUnits in an unlinked object that will be extracted from different sections. Now that we have DWARFUnitSection that is supposed to represent an input section, we need a DWARFUnitSection<TypeUnit> per input .debug_types section. Once this is done, the interface is homogenous and we can move the Section parsing code into DWARFUnitSection. This is a respin of r218513 that got reverted because it broke some builders. This new version features an explicit move constructor for the DWARFUnitSection class to workaround compilers unable to generate correct C++11 default constructors. Reviewers: samsonov, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5482 llvm-svn: 218606
* Use a loop to simplify the runtime unrolling prologue.Kevin Qin2014-09-291-118/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. llvm-svn: 218604
* [Thumb2] ldrexd and strexd are not defined on v7MOliver Stannard2014-09-291-2/+4
| | | | | | | The Thumb2 ldrexd and strexd instructions are not defined for M-class architectures. llvm-svn: 218603
* [x86] Make the new vector shuffle lowering lower blends as VSELECTChandler Carruth2014-09-291-196/+149
| | | | | | | | | | | | | | | | | | | | | nodes, and rely exclusively on its logic. This removes a ton of duplication from the blend lowering and centralizes it in one place. One downside is that it requires a bunch of hacks to make this work with the current legalization framework. We have to manually speculate one aspect of legalizing VSELECT nodes to get everything to work nicely because the existing legalization framework isn't *actually* bottom-up. The other grossness is that we somewhat duplicate the analysis of constant blends. I'm on the fence here. If reviewers thing this would look better with VSELECT when it has constant operands dumping over tho VECTOR_SHUFFLE, we could go that way. But it would be a substantial change because currently all of the actual blend instructions are matched via patterns in the TD files based around VSELECT nodes (despite them not being perfect fits for that). Suggestions welcome, but at least this removes the rampant duplication in the backend. llvm-svn: 218600
* Remove dead code from DIBuilderJyoti Allur2014-09-291-43/+7
| | | | llvm-svn: 218593
* [x86] Delete a bunch of really bad and totally unnecessary code in theChandler Carruth2014-09-291-114/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86 target-specific DAG combining that tried to convert VSELECT nodes into VECTOR_SHUFFLE nodes that it "knew" would lower into immediate-controlled blend nodes. Turns out, we have perfectly good lowering of all these VSELECT nodes, and indeed that lowering already knows how to handle lowering through BLENDI to immediate-controlled blend nodes. The code just wasn't getting used much because this thing forced the world to go through the vector shuffle lowering. Yuck. This also exposes that I was too aggressive in avoiding domain crossing in v218588 with that lowering -- when the other option is to expand into two 128-bit vectors, it is worth domain crossing. Restore that behavior now that we have nice tests covering it. The test updates here fall into two camps. One is where previously we ended up with an unsigned encoding of the blend operand and now we get a signed encoding. In most of those places there were elaborate comments explaining exactly what these operands really mean. Rather than that, just switch these tests to use the nicely decoded comments that make it obvious that the final shuffle matches. The other updates are just removing pointless domain crossing by blending integers with PBLENDW rather than BLENDPS. llvm-svn: 218589
* [x86] Refactor all of the VSELECT-as-blend lowering code to avoid domainChandler Carruth2014-09-291-18/+57
| | | | | | | | | | | | | | | | | crossing and generally work more like the blend emission code in the new vector shuffle lowering. My goal is to have the new vector shuffle lowering just produce VSELECT nodes that are either matched here to BLENDI or are legal and matched in the .td files to specific blend instructions. That seems much cleaner as there are other ways to produce a VSELECT anyways. =] No *observable* functionality changed yet, mostly because this code appears to be near-dead. The behavior of this lowering routine did change though. This code being mostly dead and untestable will change with my next commit which will also point some new tests at it. llvm-svn: 218588
* [x86] Improve naming and comments for VSELECT lowering.Chandler Carruth2014-09-291-5/+6
| | | | | | No functionality changed. llvm-svn: 218586
* [x86] Add the dispatch skeleton to the new vector shuffle lowering forChandler Carruth2014-09-291-1/+143
| | | | | | | | | | | | | | | | | AVX-512. There is no interesting logic yet. Everything ends up eventually delegating to the generic code to split the vector and shuffle the halves. Interestingly, that logic does a significantly better job of lowering all of these types than the generic vector expansion code does. Mostly, it lets most of the cases fall back to nice AVX2 code rather than all the way back to SSE code paths. Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next up will be to incrementally add direct support for the basic instruction set to each type (adding tests first). llvm-svn: 218585
* [x86] Make the split-and-lower routine fully generic by relaxing theChandler Carruth2014-09-291-18/+18
| | | | | | | | | assertion, making the name generic, and improving the documentation. Step 1 in adding very primitive support for AVX-512. No functionality changed yet. llvm-svn: 218584
* [x86] Teach the new vector shuffle lowering to fall back on AVX-512Chandler Carruth2014-09-281-0/+5
| | | | | | | | | | | | | | vectors. Someone will need to build the AVX512 lowering, which should follow AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added a dummy test which is a port of the v8f32 and v8i32 tests from AVX and AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this is enough information for someone to implement proper lowering here. If not, I'll be happy to help, but right now the AVX-512 support isn't a priority for me. llvm-svn: 218583
* [x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2Chandler Carruth2014-09-281-4/+16
| | | | | | | | | | | | | | | | | lowerings. This was hopelessly broken. First, the x86 backend wants '-1' to be the element value representing true in a boolean vector, and second the operand order for VSELECT is backwards from the actual x86 instructions. To make matters worse, the backend is just using '-1' as the true value to get the high bit to be set. It doesn't actually symbolically map the '-1' to anything. But on x86 this isn't quite how it works: there *only* the high bit is relevant. As a consequence weird non-'-1' values like 0x80 actually "work" once you flip the operands to be backwards. Anyways, thanks to Hal for helping me sort out what these *should* be. llvm-svn: 218582
* Add MachineOperand::ChangeToFPImmediate and setFPImmMatt Arsenault2014-09-281-7/+25
| | | | llvm-svn: 218579
* [x86] Fix a really silly bug that I introduced fixing another bug in theChandler Carruth2014-09-281-1/+1
| | | | | | | | | | new vector shuffle target DAG combines -- it helps to actually test for the value you want rather than just using an integer in a boolean context. Have I mentioned that I loathe implicit conversions recently? :: sigh :: llvm-svn: 218576
* [x86] Fix yet another bug in the new vector shuffle lowering's handlingChandler Carruth2014-09-281-7/+16
| | | | | | | | | | | | | | | | | | of widening masks. We can't widen a zeroing mask unless both elements that would be merged are either zeroed or undef. This is the only way to widen a mask if it has a zeroed element. Also clean up the code here by ordering the checks in a more logical way and by using the symoblic values for undef and zero. I'm actually torn on using the symbolic values because the existing code is littered with the assumption that -1 is undef, and moreover that entries '< 0' are the special entries. While that works with the values given to these constants, using the symbolic constants actually makes it a bit more opaque why this is the case. llvm-svn: 218575
* WinCOFFObjectWriter.cpp: make write_uint32_le more efficientHans Wennborg2014-09-281-6/+4
| | | | llvm-svn: 218574
* [AArch64] Redundant store instructions should be removed as dead codeJames Molloy2014-09-271-0/+11
| | | | | | | | | | | | | | | If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! llvm-svn: 218569
* Fix llvm::huge_valf multiple initializations with Visual C++.Yaron Keren2014-09-272-0/+33
| | | | | | | | | | | | | | | | | | | llvm::huge_valf is defined in a header file, so it is initialized multiple times in every compiled unit upon program startup. With non-VC compilers huge_valf is set to a HUGE_VALF which the compiler can probably optimize out. With VC numeric_limits<float>::infinity() does not return a number but a runtime structure member which therotically may change between calls so the compiler does not optimize out the initialization and it happens many times. It can be easily seen by placing a breakpoint on the initialization line. This patch moves llvm::huge_valf initialization to a source file instead of the header. llvm-svn: 218567
* [x86] Fix yet another issue with widening vector shuffle elements.Chandler Carruth2014-09-271-2/+2
| | | | | | | | I spotted this by inspection when debugging something else, so I have no test case what-so-ever, and am not even sure it is possible to realistically trigger the bug. But this is what was intended here. llvm-svn: 218565
* [x86] Fix terrible bugs everywhere in the new vector shuffle loweringChandler Carruth2014-09-271-23/+54
| | | | | | | | | | | | | | | | | | | | | | and in the target shuffle combining when trying to widen vector elements. Previously only one of these was correct, and we didn't correctly propagate zeroing target shuffle masks (which have a different sentinel value from undef in non- target shuffle masks now). This isn't just a missed optimization, this caused us to drop zeroing shuffles on the floor and miscompile code. The added test case is one example of that. There are other fixes to the test suite as a consequence of this as well as restoring the undef elements in some of the masks that were lost when I brought sanity to the actual *value* of the undef and zero sentinels. I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW combining code, but that code really needs to go. It was a nice initial attempt, but it isn't very principled and the recursive shuffle combiner is much more powerful. llvm-svn: 218562
* [x86] Flip the sentinel values used in the target shuffle mask decodingChandler Carruth2014-09-271-1/+1
| | | | | | | | | | | | | | to significantly more sane sentinels. Notably, everywhere else in the backend's representation of shuffles uses '-1' to represent undef. The target shuffle masks really shouldn't diverge from that, especially as in a few places they are manipulated by shared code. This causes us to lose some undef lanes in various test masks. I want to get these back, but technically it isn't invalid and there are a *lot* of bugs here so I want to try to establish a saner baseline for fixing some of the bugs by aligning the specific senitnel values used. llvm-svn: 218561
* Refactor reciprocal and reciprocal square root estimate into ↵Sanjay Patel2014-09-263-208/+171
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553
* Object: BSS/virtual sections don't have contentsDavid Majnemer2014-09-262-0/+13
| | | | | | | | | | | | Users of getSectionContents shouldn't try to pass in BSS or virtual sections. In all instances, this is a bug in the code calling this routine. N.B. Some COFF implementations (like CL) will mark their BSS sections as taking space on disk. This would confuse COFFObjectFile into thinking the section is larger than the file. llvm-svn: 218549
* clang-format of ChangeStdinToBinary & ChangeStdoutToBinary.Yaron Keren2014-09-261-4/+4
| | | | llvm-svn: 218547
* [x86] Fix a moderately terrifying bug in the new 128-bit shuffle logicChandler Carruth2014-09-261-5/+10
| | | | | | | | | | | that managed to elude all of my fuzz testing historically. =/ Something changed to allow this code path to actually be exercised and it was doing bad things. It is especially heavily exercised by the patterns that emerge when doing AVX shuffles that end up lowered through the 128-bit code path. llvm-svn: 218540
* [IndVar] Don't widen loop compare unless IV user is sign extended.Chad Rosier2014-09-261-2/+6
| | | | | | PR21030 llvm-svn: 218539
* R600/SI: Use break instead of continueMatt Arsenault2014-09-261-1/+1
| | | | | | If an instruction doesn't have src1, it doesn't have src2 llvm-svn: 218536
* R600/SI: Add a note about the order of the operands to div_scaleMatt Arsenault2014-09-261-0/+6
| | | | llvm-svn: 218534
* R600/SI: Move finding SGPR operand to move to separate functionMatt Arsenault2014-09-262-63/+71
| | | | llvm-svn: 218533
* R600/SI Allow same SGPR to be used for multiple operandsMatt Arsenault2014-09-261-5/+32
| | | | | | | | | | | Instead of moving the first SGPR that is different than the first, legalize the operand that requires the fewest moves if one SGPR is used for multiple operands. This saves extra moves and is also required for some instructions which require that the same operand be used for multiple operands. llvm-svn: 218532
* R600/SI: Partially move operand legalization to post-isel hook.Matt Arsenault2014-09-264-70/+41
| | | | | | | | | Disable the SGPR usage restriction parts of the DAG legalizeOperands. It now should only be doing immediate folding until it can be replaced later. The real legalization work is now done by the other SIInstrInfo::legalizeOperands llvm-svn: 218531
* R600/SI: Implement findCommutedOpIndicesMatt Arsenault2014-09-262-1/+36
| | | | | | | | | | | The base implementation of commuteInstruction is used in some cases, but it turns out this has been broken for a long time since modifiers were inserted between the real operands. The base implementation of commuteInstruction also fails on immediates, which also needs to be fixed. llvm-svn: 218530
* R600/SI: Don't move operands that are required to be SGPRsMatt Arsenault2014-09-261-1/+20
| | | | | | | | e.g. v_cndmask_b32 requires the condition operand be an SGPR. If one of the source operands were an SGPR, that would be considered the one SGPR use and the condition operand would be illegally moved. llvm-svn: 218529
* R600/SI: Don't assert on exotic operand typesMatt Arsenault2014-09-261-1/+1
| | | | | | | | | This needs a test, but I'm not sure if it is currently possible and I originally hit it due to a bug. Right now the only global address operands have no reason to be VALU instructions, although it theoretically could be a problem. llvm-svn: 218528
* R600/SI: Fix using wrong operand indices when commutingMatt Arsenault2014-09-261-11/+20
| | | | | | | | | | | | | No test since the current SIISelLowering::legalizeOperands effectively hides this, and the general uses seem to only fire on SALU instructions which don't have modifiers between the operands. When trying to use legalizeOperands immediately after instruction selection, it now sees a lot more patterns it did not see before which break on this. llvm-svn: 218527
* R600/SI: Remove apparently dead code in legalizeOperandsMatt Arsenault2014-09-261-8/+0
| | | | | | | | No tests hit this, and I don't see any way a GlobalAddress node would survive beyond lowering on SI. It it would, the move should probably be inserted by selection. llvm-svn: 218526
OpenPOWER on IntegriCloud