summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Use a loop to simplify the runtime unrolling prologue.Kevin Qin2014-09-291-118/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. llvm-svn: 218604
* [Thumb2] ldrexd and strexd are not defined on v7MOliver Stannard2014-09-291-2/+4
| | | | | | | The Thumb2 ldrexd and strexd instructions are not defined for M-class architectures. llvm-svn: 218603
* [x86] Make the new vector shuffle lowering lower blends as VSELECTChandler Carruth2014-09-291-196/+149
| | | | | | | | | | | | | | | | | | | | | nodes, and rely exclusively on its logic. This removes a ton of duplication from the blend lowering and centralizes it in one place. One downside is that it requires a bunch of hacks to make this work with the current legalization framework. We have to manually speculate one aspect of legalizing VSELECT nodes to get everything to work nicely because the existing legalization framework isn't *actually* bottom-up. The other grossness is that we somewhat duplicate the analysis of constant blends. I'm on the fence here. If reviewers thing this would look better with VSELECT when it has constant operands dumping over tho VECTOR_SHUFFLE, we could go that way. But it would be a substantial change because currently all of the actual blend instructions are matched via patterns in the TD files based around VSELECT nodes (despite them not being perfect fits for that). Suggestions welcome, but at least this removes the rampant duplication in the backend. llvm-svn: 218600
* Remove dead code from DIBuilderJyoti Allur2014-09-291-43/+7
| | | | llvm-svn: 218593
* [x86] Delete a bunch of really bad and totally unnecessary code in theChandler Carruth2014-09-291-114/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86 target-specific DAG combining that tried to convert VSELECT nodes into VECTOR_SHUFFLE nodes that it "knew" would lower into immediate-controlled blend nodes. Turns out, we have perfectly good lowering of all these VSELECT nodes, and indeed that lowering already knows how to handle lowering through BLENDI to immediate-controlled blend nodes. The code just wasn't getting used much because this thing forced the world to go through the vector shuffle lowering. Yuck. This also exposes that I was too aggressive in avoiding domain crossing in v218588 with that lowering -- when the other option is to expand into two 128-bit vectors, it is worth domain crossing. Restore that behavior now that we have nice tests covering it. The test updates here fall into two camps. One is where previously we ended up with an unsigned encoding of the blend operand and now we get a signed encoding. In most of those places there were elaborate comments explaining exactly what these operands really mean. Rather than that, just switch these tests to use the nicely decoded comments that make it obvious that the final shuffle matches. The other updates are just removing pointless domain crossing by blending integers with PBLENDW rather than BLENDPS. llvm-svn: 218589
* [x86] Refactor all of the VSELECT-as-blend lowering code to avoid domainChandler Carruth2014-09-291-18/+57
| | | | | | | | | | | | | | | | | crossing and generally work more like the blend emission code in the new vector shuffle lowering. My goal is to have the new vector shuffle lowering just produce VSELECT nodes that are either matched here to BLENDI or are legal and matched in the .td files to specific blend instructions. That seems much cleaner as there are other ways to produce a VSELECT anyways. =] No *observable* functionality changed yet, mostly because this code appears to be near-dead. The behavior of this lowering routine did change though. This code being mostly dead and untestable will change with my next commit which will also point some new tests at it. llvm-svn: 218588
* [x86] Improve naming and comments for VSELECT lowering.Chandler Carruth2014-09-291-5/+6
| | | | | | No functionality changed. llvm-svn: 218586
* [x86] Add the dispatch skeleton to the new vector shuffle lowering forChandler Carruth2014-09-291-1/+143
| | | | | | | | | | | | | | | | | AVX-512. There is no interesting logic yet. Everything ends up eventually delegating to the generic code to split the vector and shuffle the halves. Interestingly, that logic does a significantly better job of lowering all of these types than the generic vector expansion code does. Mostly, it lets most of the cases fall back to nice AVX2 code rather than all the way back to SSE code paths. Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next up will be to incrementally add direct support for the basic instruction set to each type (adding tests first). llvm-svn: 218585
* [x86] Make the split-and-lower routine fully generic by relaxing theChandler Carruth2014-09-291-18/+18
| | | | | | | | | assertion, making the name generic, and improving the documentation. Step 1 in adding very primitive support for AVX-512. No functionality changed yet. llvm-svn: 218584
* [x86] Teach the new vector shuffle lowering to fall back on AVX-512Chandler Carruth2014-09-281-0/+5
| | | | | | | | | | | | | | vectors. Someone will need to build the AVX512 lowering, which should follow AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added a dummy test which is a port of the v8f32 and v8i32 tests from AVX and AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this is enough information for someone to implement proper lowering here. If not, I'll be happy to help, but right now the AVX-512 support isn't a priority for me. llvm-svn: 218583
* [x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2Chandler Carruth2014-09-281-4/+16
| | | | | | | | | | | | | | | | | lowerings. This was hopelessly broken. First, the x86 backend wants '-1' to be the element value representing true in a boolean vector, and second the operand order for VSELECT is backwards from the actual x86 instructions. To make matters worse, the backend is just using '-1' as the true value to get the high bit to be set. It doesn't actually symbolically map the '-1' to anything. But on x86 this isn't quite how it works: there *only* the high bit is relevant. As a consequence weird non-'-1' values like 0x80 actually "work" once you flip the operands to be backwards. Anyways, thanks to Hal for helping me sort out what these *should* be. llvm-svn: 218582
* Add MachineOperand::ChangeToFPImmediate and setFPImmMatt Arsenault2014-09-281-7/+25
| | | | llvm-svn: 218579
* [x86] Fix a really silly bug that I introduced fixing another bug in theChandler Carruth2014-09-281-1/+1
| | | | | | | | | | new vector shuffle target DAG combines -- it helps to actually test for the value you want rather than just using an integer in a boolean context. Have I mentioned that I loathe implicit conversions recently? :: sigh :: llvm-svn: 218576
* [x86] Fix yet another bug in the new vector shuffle lowering's handlingChandler Carruth2014-09-281-7/+16
| | | | | | | | | | | | | | | | | | of widening masks. We can't widen a zeroing mask unless both elements that would be merged are either zeroed or undef. This is the only way to widen a mask if it has a zeroed element. Also clean up the code here by ordering the checks in a more logical way and by using the symoblic values for undef and zero. I'm actually torn on using the symbolic values because the existing code is littered with the assumption that -1 is undef, and moreover that entries '< 0' are the special entries. While that works with the values given to these constants, using the symbolic constants actually makes it a bit more opaque why this is the case. llvm-svn: 218575
* WinCOFFObjectWriter.cpp: make write_uint32_le more efficientHans Wennborg2014-09-281-6/+4
| | | | llvm-svn: 218574
* [AArch64] Redundant store instructions should be removed as dead codeJames Molloy2014-09-271-0/+11
| | | | | | | | | | | | | | | If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! llvm-svn: 218569
* Fix llvm::huge_valf multiple initializations with Visual C++.Yaron Keren2014-09-272-0/+33
| | | | | | | | | | | | | | | | | | | llvm::huge_valf is defined in a header file, so it is initialized multiple times in every compiled unit upon program startup. With non-VC compilers huge_valf is set to a HUGE_VALF which the compiler can probably optimize out. With VC numeric_limits<float>::infinity() does not return a number but a runtime structure member which therotically may change between calls so the compiler does not optimize out the initialization and it happens many times. It can be easily seen by placing a breakpoint on the initialization line. This patch moves llvm::huge_valf initialization to a source file instead of the header. llvm-svn: 218567
* [x86] Fix yet another issue with widening vector shuffle elements.Chandler Carruth2014-09-271-2/+2
| | | | | | | | I spotted this by inspection when debugging something else, so I have no test case what-so-ever, and am not even sure it is possible to realistically trigger the bug. But this is what was intended here. llvm-svn: 218565
* [x86] Fix terrible bugs everywhere in the new vector shuffle loweringChandler Carruth2014-09-271-23/+54
| | | | | | | | | | | | | | | | | | | | | | and in the target shuffle combining when trying to widen vector elements. Previously only one of these was correct, and we didn't correctly propagate zeroing target shuffle masks (which have a different sentinel value from undef in non- target shuffle masks now). This isn't just a missed optimization, this caused us to drop zeroing shuffles on the floor and miscompile code. The added test case is one example of that. There are other fixes to the test suite as a consequence of this as well as restoring the undef elements in some of the masks that were lost when I brought sanity to the actual *value* of the undef and zero sentinels. I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW combining code, but that code really needs to go. It was a nice initial attempt, but it isn't very principled and the recursive shuffle combiner is much more powerful. llvm-svn: 218562
* [x86] Flip the sentinel values used in the target shuffle mask decodingChandler Carruth2014-09-271-1/+1
| | | | | | | | | | | | | | to significantly more sane sentinels. Notably, everywhere else in the backend's representation of shuffles uses '-1' to represent undef. The target shuffle masks really shouldn't diverge from that, especially as in a few places they are manipulated by shared code. This causes us to lose some undef lanes in various test masks. I want to get these back, but technically it isn't invalid and there are a *lot* of bugs here so I want to try to establish a saner baseline for fixing some of the bugs by aligning the specific senitnel values used. llvm-svn: 218561
* Refactor reciprocal and reciprocal square root estimate into ↵Sanjay Patel2014-09-263-208/+171
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553
* Object: BSS/virtual sections don't have contentsDavid Majnemer2014-09-262-0/+13
| | | | | | | | | | | | Users of getSectionContents shouldn't try to pass in BSS or virtual sections. In all instances, this is a bug in the code calling this routine. N.B. Some COFF implementations (like CL) will mark their BSS sections as taking space on disk. This would confuse COFFObjectFile into thinking the section is larger than the file. llvm-svn: 218549
* clang-format of ChangeStdinToBinary & ChangeStdoutToBinary.Yaron Keren2014-09-261-4/+4
| | | | llvm-svn: 218547
* [x86] Fix a moderately terrifying bug in the new 128-bit shuffle logicChandler Carruth2014-09-261-5/+10
| | | | | | | | | | | that managed to elude all of my fuzz testing historically. =/ Something changed to allow this code path to actually be exercised and it was doing bad things. It is especially heavily exercised by the patterns that emerge when doing AVX shuffles that end up lowered through the 128-bit code path. llvm-svn: 218540
* [IndVar] Don't widen loop compare unless IV user is sign extended.Chad Rosier2014-09-261-2/+6
| | | | | | PR21030 llvm-svn: 218539
* R600/SI: Use break instead of continueMatt Arsenault2014-09-261-1/+1
| | | | | | If an instruction doesn't have src1, it doesn't have src2 llvm-svn: 218536
* R600/SI: Add a note about the order of the operands to div_scaleMatt Arsenault2014-09-261-0/+6
| | | | llvm-svn: 218534
* R600/SI: Move finding SGPR operand to move to separate functionMatt Arsenault2014-09-262-63/+71
| | | | llvm-svn: 218533
* R600/SI Allow same SGPR to be used for multiple operandsMatt Arsenault2014-09-261-5/+32
| | | | | | | | | | | Instead of moving the first SGPR that is different than the first, legalize the operand that requires the fewest moves if one SGPR is used for multiple operands. This saves extra moves and is also required for some instructions which require that the same operand be used for multiple operands. llvm-svn: 218532
* R600/SI: Partially move operand legalization to post-isel hook.Matt Arsenault2014-09-264-70/+41
| | | | | | | | | Disable the SGPR usage restriction parts of the DAG legalizeOperands. It now should only be doing immediate folding until it can be replaced later. The real legalization work is now done by the other SIInstrInfo::legalizeOperands llvm-svn: 218531
* R600/SI: Implement findCommutedOpIndicesMatt Arsenault2014-09-262-1/+36
| | | | | | | | | | | The base implementation of commuteInstruction is used in some cases, but it turns out this has been broken for a long time since modifiers were inserted between the real operands. The base implementation of commuteInstruction also fails on immediates, which also needs to be fixed. llvm-svn: 218530
* R600/SI: Don't move operands that are required to be SGPRsMatt Arsenault2014-09-261-1/+20
| | | | | | | | e.g. v_cndmask_b32 requires the condition operand be an SGPR. If one of the source operands were an SGPR, that would be considered the one SGPR use and the condition operand would be illegally moved. llvm-svn: 218529
* R600/SI: Don't assert on exotic operand typesMatt Arsenault2014-09-261-1/+1
| | | | | | | | | This needs a test, but I'm not sure if it is currently possible and I originally hit it due to a bug. Right now the only global address operands have no reason to be VALU instructions, although it theoretically could be a problem. llvm-svn: 218528
* R600/SI: Fix using wrong operand indices when commutingMatt Arsenault2014-09-261-11/+20
| | | | | | | | | | | | | No test since the current SIISelLowering::legalizeOperands effectively hides this, and the general uses seem to only fire on SALU instructions which don't have modifiers between the operands. When trying to use legalizeOperands immediately after instruction selection, it now sees a lot more patterns it did not see before which break on this. llvm-svn: 218527
* R600/SI: Remove apparently dead code in legalizeOperandsMatt Arsenault2014-09-261-8/+0
| | | | | | | | No tests hit this, and I don't see any way a GlobalAddress node would survive beyond lowering on SI. It it would, the move should probably be inserted by selection. llvm-svn: 218526
* Ignore annotation function calls in cost computationDavid Peixotto2014-09-261-0/+1
| | | | | | | | | | | The annotation instructions are dropped during codegen and have no impact on size. In some cases, the annotations were preventing the unroller from unrolling a loop because the annotation calls were pushing the cost over the unrolling threshold. Differential Revision: http://reviews.llvm.org/D5335 llvm-svn: 218525
* [x86] The mnemonic is SHUFPS not SHUPFS. =[ I'm very bad at spellingChandler Carruth2014-09-261-3/+3
| | | | | | sadly. llvm-svn: 218524
* [x86] In the new vector shuffle lowering, when trying to do anotherChandler Carruth2014-09-261-10/+11
| | | | | | | | layer of tie-breaking sorting, it really helps to check that you're in a tie first. =] Otherwise the whole thing cycles infinitely. Test case added, another one found through fuzz testing. llvm-svn: 218523
* [x86] Fix a large collection of bugs that crept in as I fleshed out theChandler Carruth2014-09-261-13/+27
| | | | | | | | | | | | AVX support. New test cases included. Note that none of the existing test cases covered these buggy code paths. =/ Also, it is clear from this that SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[ These were all detected by fuzz-testing. (I <3 fuzz testing.) llvm-svn: 218522
* Elide repeated register operand in Thumb1 instructionsRenato Golin2014-09-261-1/+43
| | | | | | | | | | | | | | | | | | | This patch makes the ARM backend transform 3 operand instructions such as 'adds/subs' to the 2 operand version of the same instruction if the first two register operands are the same. Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'. Currently for some instructions such as 'adds' if you try to assemble 'adds r0, r0, #8' for thumb v6m the assembler would throw an error message because the immediate cannot be encoded using 3 bits. The backend should be smart enough to transform the instruction to 'adds r0, #8', which allows for larger immediate constants. Patch by Ranjeet Singh. llvm-svn: 218521
* [X86][SchedModel] SSE reciprocal square root instruction latencies.Andrea Di Biagio2014-09-267-15/+39
| | | | | | | | | | | | | | | | | The SSE rsqrt instruction (a fast reciprocal square root estimate) was grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very slow) SSE sqrt instruction. For code which uses rsqrt (possibly with newton-raphson iterations) this poor scheduling was affecting performances. This patch splits off the rsqrt instruction from the sqrt instruction scheduling classes and creates new IIC_SSE_RSQER* classes with latency values based on Agner's table. Differential Revision: http://reviews.llvm.org/D5370 Patch by Simon Pilgrim. llvm-svn: 218517
* Revert "Store TypeUnits in a SmallVector<DWARFUnitSection> instead of a ↵Frederic Riss2014-09-262-23/+16
| | | | | | | | | | | single DWARFUnitSection." This reverts commit r218513. Buildbots using libstdc++ issue an error when trying to copy SmallVector<std::unique_ptr<>>. Revert the commit until we have a fix. llvm-svn: 218514
* Store TypeUnits in a SmallVector<DWARFUnitSection> instead of a single ↵Frederic Riss2014-09-262-16/+23
| | | | | | | | | | | | | | | | | | | | | DWARFUnitSection. Summary: There will be multiple TypeUnits in an unlinked object that will be extracted from different sections. Now that we have DWARFUnitSection that is supposed to represent an input section, we need a DWARFUnitSection<TypeUnit> per input .debug_types section. Once this is done, the interface is homogenous and we can move the Section parsing code into DWARFUnitSection. Reviewers: samsonov, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5482 llvm-svn: 218513
* Fix unused variable warning added in r218509Daniel Sanders2014-09-261-1/+0
| | | | llvm-svn: 218510
* [mips] Generalize the handling of f128 return values to support f128 arguments.Daniel Sanders2014-09-263-50/+112
| | | | | | | | | | | | | | | | | | Summary: This will allow us to handle f128 arguments without duplicating code from CCState::AnalyzeFormalArguments() or CCState::AnalyzeCallOperands(). No functional change. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5292 llvm-svn: 218509
* [AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables.Robert Khasanov2014-09-262-6/+61
| | | | | | Added lowering tests for these instructions. llvm-svn: 218508
* Fix build breakage on MSVC 2013David Majnemer2014-09-261-1/+1
| | | | llvm-svn: 218499
* Target: Fix build breakage.David Majnemer2014-09-261-2/+2
| | | | | | No functional change intended. llvm-svn: 218497
* Support: Remove undefined behavior from &raw_ostream::operator<<David Majnemer2014-09-261-1/+1
| | | | | | | Don't negate signed integer types in &raw_ostream::operator<<(const FormattedNumber &FN). llvm-svn: 218496
* Revert patch ofr218493David Xu2014-09-261-14/+0
| | | | llvm-svn: 218494
OpenPOWER on IntegriCloud