summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.Juergen Ributzka2014-09-301-54/+116
| | | | | | | | | | | | | | Note: This version fixed an issue with the TBZ/TBNZ instructions that were generated in FastISel. The issue was that the 64bit version of TBZ (TBZX) automagically sets the upper bit of the immediate field that is used to specify the bit we want to test. To test for any of the lower 32bits we have to first extract the subregister and use the 32bit version of the TBZ instruction (TBZW). Original commit message: Teach selectBranch to fold bit test and branch into a single instruction (TBZ or TBNZ). llvm-svn: 218693
* R600/SI: Fix printing of clamp and omodMatt Arsenault2014-09-304-17/+55
| | | | | | | | No tests for omod since nothing uses it yet, but this should get rid of the remaining annoying trailing zeros after some instructions. llvm-svn: 218692
* R600/SI: Update VOP3b to not include obsolete operandsMatt Arsenault2014-09-303-15/+16
| | | | | | abs / neg are now part of the srcN_modifiers operands llvm-svn: 218691
* Extend C disassembler API to allow specifying target featuresBradley Smith2014-09-301-10/+16
| | | | llvm-svn: 218682
* Add numeric extend, trunctate to mips fast-iselReed Kotler2014-09-301-5/+168
| | | | | | | | | | | | | | | | | | | | | Summary: Add numeric extend, trunctate to mips fast-isel Reactivates D4827 Test Plan: fpext.ll loadstoreconv.ll Reviewers: dsanders Subscribers: mcrosier Differential Revision: http://reviews.llvm.org/D5251 llvm-svn: 218681
* [AArch64] Remove unnecessary whitespace. (Test commit)Tom Coxon2014-09-301-2/+2
| | | | llvm-svn: 218680
* [DAG] Check in advance if a build_vector has a legal type before attempting ↵Andrea Di Biagio2014-09-301-4/+4
| | | | | | | | | | | | | | to convert it into a shuffle. Currently, the DAG Combiner only tries to convert type-legal build_vector nodes into shuffles. This patch simply moves the logic that checks if a build_vector has a legal value type up before we even start analyzing the operands. This allows to early exit immediately from method 'visitBUILD_VECTOR' if the node type is known to be illegal. No functional change intended. llvm-svn: 218677
* llvm-cov: Use the number of executed functions for the function coverage metric.Alex Lorenz2014-09-301-1/+3
| | | | | | | | This commit fixes llvm-cov's function coverage metric by using the number of executed functions instead of the number of fully covered functions. Differential Revision: http://reviews.llvm.org/D5196 llvm-svn: 218672
* Introduce support for custom wrappers for vararg functions.Lorenzo Martignoni2014-09-301-9/+18
| | | | | | Differential Revision: http://reviews.llvm.org/D5412 llvm-svn: 218671
* [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}.Robert Khasanov2014-09-301-0/+12
| | | | | | Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 218670
* [AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}Robert Khasanov2014-09-303-7/+47
| | | | | | | | | | | | | | Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1. Now cmp intrinsics lower in the following way: (i8 (int_x86_avx512_mask_pcmpeq_q_128 (v2i64 %a), (v2i64 %b), (i8 %mask))) -> (i8 (bitcast (v8i1 (insert_subvector undef, (v2i1 (and (PCMPEQM %a, %b), (extract_subvector (v8i1 (bitcast %mask)), 0))), 0)))) llvm-svn: 218669
* [AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.Robert Khasanov2014-09-302-18/+25
| | | | | | Added new operand type for intrinsics (IIT_V64) llvm-svn: 218668
* [AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ.Robert Khasanov2014-09-302-3/+30
| | | | | | Added CMP_MASK intrinsic type llvm-svn: 218667
* Make sure aggregates are properly alligned on MSP430.Job Noorman2014-09-301-1/+1
| | | | llvm-svn: 218665
* [IndVarSimplify] Widen loop unsigned compares.Chad Rosier2014-09-301-6/+2
| | | | | | | This patch extends r217953 to handle unsigned comparison. Phabricator revision: http://reviews.llvm.org/D5526 llvm-svn: 218659
* [x86] Revert r218588, r218589, and r218600. These patches were pursuingChandler Carruth2014-09-301-149/+261
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a flawed direction and causing miscompiles. Read on for details. Fundamentally, the premise of this patch series was to map VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because we are going to *have* to lower to VSELECT nodes for some blends to trigger the instruction selection patterns of variable blend instructions. This doesn't actually work out so well. In order to match performance with the existing VECTOR_SHUFFLE lowering code, we would need to re-slice the blend in order to fit it into either the integer or floating point blends available on the ISA. When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources) this works well because the X86 backend ensures that these types of operands to VSELECT get sign extended into '-1' and '0' for true and false, allowing us to re-slice the bits in whatever granularity without changing semantics. However, if the VSELECT condition comes from some other source, for example code lowering vector comparisons, it will likely only have the required bit set -- the high bit. We can't blindly slice up this style of VSELECT. Reid found some code using Halide that triggers this and I'm hopeful to eventually get a test case, but I don't need it to understand why this is A Bad Idea. There is another aspect that makes this approach flawed. When in VECTOR_SHUFFLE form, we have very distilled information that represents the *constant* blend mask. Converting back to a VSELECT form actually can lose this information, and so I think now that it is better to treat this as VECTOR_SHUFFLE until the very last moment and only use VSELECT nodes for instruction selection purposes. My plan is to: 1) Clean up and formalize the target pre-legalization DAG combine that converts a VSELECT with a constant condition operand into a VECTOR_SHUFFLE. 2) Remove any fancy lowering from VSELECT during *legalization* relying entirely on the DAG combine to catch cases where we can match to an immediate-controlled blend instruction. One additional step that I'm not planning on but would be interested in others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV which encodes a fully legalized VSELECT node. Then it would be easy to write isel patterns only in terms of this to ensure VECTOR_SHUFFLE legalization only ever forms the fully legalized construct and we can't cycle between it and VSELECT combining. llvm-svn: 218658
* Fix missing C++ mode commentMatt Arsenault2014-09-301-1/+1
| | | | llvm-svn: 218654
* [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.Juergen Ributzka2014-09-301-135/+220
| | | | | | | | | | | | | | The sign-/zero-extension of the loaded value can be performed by the memory instruction for free. If the result of the load has only one use and the use is a sign-/zero-extend, then we emit the proper load instruction. The extend is only a register copy and will be optimized away later on. Other instructions that consume the sign-/zero-extended value are also made aware of this fact, so they don't fold the extend too. This fixes rdar://problem/18495928. llvm-svn: 218653
* [FastISel][AArch64] Factor out scale factor calculation. NFC.Juergen Ributzka2014-09-301-35/+29
| | | | | | | Factor out the code that determines the implicit scale factor of memory operations for a given value type. llvm-svn: 218652
* Simplify conditional.Eric Christopher2014-09-291-1/+1
| | | | llvm-svn: 218643
* [AVX512] Use X86VectorVTInfo in the masking helper classes and the FMAsAdam Nemet2014-09-291-155/+135
| | | | | | | | | | | | | | No functionality change. Makes the code more compact (see the FMA part). This needs a new type attribute MemOpFrag in X86VectorVTInfo. For now I only defined this in the simple cases. See the commment before the attribute. Diff of X86.td.expanded before and after is empty except for the appearance of the new attribute. llvm-svn: 218637
* WinCOFFObjectWriter: optimize the string table for common sufficesHans Wennborg2014-09-293-91/+62
| | | | | | | | This is a follow-up from r207670 which did the same for ELF. Differential Revision: http://reviews.llvm.org/D5530 llvm-svn: 218636
* Add soft-float to the key for the subtarget lookup in the TargetMachineEric Christopher2014-09-291-1/+13
| | | | | | | | | | | map, this makes sure that we can compile the same code for two different ABIs (hard and soft float) in the same module. Update one testcase accordingly (and fix some confusing naming) and add a new testcase as well with the ordering swapped which would highlight the problem. llvm-svn: 218632
* Fix spelling and reflow comments.Eric Christopher2014-09-291-6/+5
| | | | llvm-svn: 218631
* [AArch64] Refines the Cortex-A57 Machine ModelDave Estes2014-09-292-24/+395
| | | | | | | | | | | | | | | Primarily refines all of the instructions with accurate latency and micro-op information. Refinements largely focus on the NEON instructions. Additionally, a few advanced features are modeled, including forwarding for MAC instructions and hazards for floating point SQRT and DIV. Lastly, the issue-width is reduced to three so that the scheduler will better accommodate the narrower decode and dispatch width. llvm-svn: 218627
* Unit test r218187, changing RTDyldMemoryManager::getSymbolAddress's behavior ↵David Blaikie2014-09-291-1/+1
| | | | | | | | | | | favor mangled lookup over unmangled lookup. The contract of this function seems problematic (fallback in either direction seems like it could produce bugs in one client or another), but here's some tests for its current behavior, at least. See the commit/review thread of r218187 for more discussion. llvm-svn: 218626
* Fix include orderMatt Arsenault2014-09-291-1/+1
| | | | llvm-svn: 218611
* R600/SI: Fix hardcoded values for modifiers.Matt Arsenault2014-09-294-19/+22
| | | | | | Move enums to SIDefines.h llvm-svn: 218610
* R600/SI: Also fix fsub + fadd a, a to mad combinesMatt Arsenault2014-09-291-0/+22
| | | | llvm-svn: 218609
* R600/SI: Fix using mad with multiplies by 2Matt Arsenault2014-09-291-0/+35
| | | | | | | | | These turn into fadds, so combine them into the target mad node. fadd (fadd (a, a), b) -> mad 2.0, a, b llvm-svn: 218608
* [AArch64] Improve cost model to handle sdiv by a pow-of-two.Chad Rosier2014-09-291-0/+23
| | | | | | | | | | | This patch improves the target-specific cost model to better handle signed division by a power of two. The immediate result is that this enables the SLP vectorizer to do a better job. http://reviews.llvm.org/D5469 PR20714 llvm-svn: 218607
* Store TypeUnits in a SmallVector<DWARFUnitSection> instead of a single ↵Frederic Riss2014-09-293-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | DWARFUnitSection. There will be multiple TypeUnits in an unlinked object that will be extracted from different sections. Now that we have DWARFUnitSection that is supposed to represent an input section, we need a DWARFUnitSection<TypeUnit> per input .debug_types section. Once this is done, the interface is homogenous and we can move the Section parsing code into DWARFUnitSection. This is a respin of r218513 that got reverted because it broke some builders. This new version features an explicit move constructor for the DWARFUnitSection class to workaround compilers unable to generate correct C++11 default constructors. Reviewers: samsonov, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5482 llvm-svn: 218606
* Use a loop to simplify the runtime unrolling prologue.Kevin Qin2014-09-291-118/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. llvm-svn: 218604
* [Thumb2] ldrexd and strexd are not defined on v7MOliver Stannard2014-09-291-2/+4
| | | | | | | The Thumb2 ldrexd and strexd instructions are not defined for M-class architectures. llvm-svn: 218603
* [x86] Make the new vector shuffle lowering lower blends as VSELECTChandler Carruth2014-09-291-196/+149
| | | | | | | | | | | | | | | | | | | | | nodes, and rely exclusively on its logic. This removes a ton of duplication from the blend lowering and centralizes it in one place. One downside is that it requires a bunch of hacks to make this work with the current legalization framework. We have to manually speculate one aspect of legalizing VSELECT nodes to get everything to work nicely because the existing legalization framework isn't *actually* bottom-up. The other grossness is that we somewhat duplicate the analysis of constant blends. I'm on the fence here. If reviewers thing this would look better with VSELECT when it has constant operands dumping over tho VECTOR_SHUFFLE, we could go that way. But it would be a substantial change because currently all of the actual blend instructions are matched via patterns in the TD files based around VSELECT nodes (despite them not being perfect fits for that). Suggestions welcome, but at least this removes the rampant duplication in the backend. llvm-svn: 218600
* Remove dead code from DIBuilderJyoti Allur2014-09-291-43/+7
| | | | llvm-svn: 218593
* [x86] Delete a bunch of really bad and totally unnecessary code in theChandler Carruth2014-09-291-114/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86 target-specific DAG combining that tried to convert VSELECT nodes into VECTOR_SHUFFLE nodes that it "knew" would lower into immediate-controlled blend nodes. Turns out, we have perfectly good lowering of all these VSELECT nodes, and indeed that lowering already knows how to handle lowering through BLENDI to immediate-controlled blend nodes. The code just wasn't getting used much because this thing forced the world to go through the vector shuffle lowering. Yuck. This also exposes that I was too aggressive in avoiding domain crossing in v218588 with that lowering -- when the other option is to expand into two 128-bit vectors, it is worth domain crossing. Restore that behavior now that we have nice tests covering it. The test updates here fall into two camps. One is where previously we ended up with an unsigned encoding of the blend operand and now we get a signed encoding. In most of those places there were elaborate comments explaining exactly what these operands really mean. Rather than that, just switch these tests to use the nicely decoded comments that make it obvious that the final shuffle matches. The other updates are just removing pointless domain crossing by blending integers with PBLENDW rather than BLENDPS. llvm-svn: 218589
* [x86] Refactor all of the VSELECT-as-blend lowering code to avoid domainChandler Carruth2014-09-291-18/+57
| | | | | | | | | | | | | | | | | crossing and generally work more like the blend emission code in the new vector shuffle lowering. My goal is to have the new vector shuffle lowering just produce VSELECT nodes that are either matched here to BLENDI or are legal and matched in the .td files to specific blend instructions. That seems much cleaner as there are other ways to produce a VSELECT anyways. =] No *observable* functionality changed yet, mostly because this code appears to be near-dead. The behavior of this lowering routine did change though. This code being mostly dead and untestable will change with my next commit which will also point some new tests at it. llvm-svn: 218588
* [x86] Improve naming and comments for VSELECT lowering.Chandler Carruth2014-09-291-5/+6
| | | | | | No functionality changed. llvm-svn: 218586
* [x86] Add the dispatch skeleton to the new vector shuffle lowering forChandler Carruth2014-09-291-1/+143
| | | | | | | | | | | | | | | | | AVX-512. There is no interesting logic yet. Everything ends up eventually delegating to the generic code to split the vector and shuffle the halves. Interestingly, that logic does a significantly better job of lowering all of these types than the generic vector expansion code does. Mostly, it lets most of the cases fall back to nice AVX2 code rather than all the way back to SSE code paths. Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next up will be to incrementally add direct support for the basic instruction set to each type (adding tests first). llvm-svn: 218585
* [x86] Make the split-and-lower routine fully generic by relaxing theChandler Carruth2014-09-291-18/+18
| | | | | | | | | assertion, making the name generic, and improving the documentation. Step 1 in adding very primitive support for AVX-512. No functionality changed yet. llvm-svn: 218584
* [x86] Teach the new vector shuffle lowering to fall back on AVX-512Chandler Carruth2014-09-281-0/+5
| | | | | | | | | | | | | | vectors. Someone will need to build the AVX512 lowering, which should follow AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added a dummy test which is a port of the v8f32 and v8i32 tests from AVX and AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this is enough information for someone to implement proper lowering here. If not, I'll be happy to help, but right now the AVX-512 support isn't a priority for me. llvm-svn: 218583
* [x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2Chandler Carruth2014-09-281-4/+16
| | | | | | | | | | | | | | | | | lowerings. This was hopelessly broken. First, the x86 backend wants '-1' to be the element value representing true in a boolean vector, and second the operand order for VSELECT is backwards from the actual x86 instructions. To make matters worse, the backend is just using '-1' as the true value to get the high bit to be set. It doesn't actually symbolically map the '-1' to anything. But on x86 this isn't quite how it works: there *only* the high bit is relevant. As a consequence weird non-'-1' values like 0x80 actually "work" once you flip the operands to be backwards. Anyways, thanks to Hal for helping me sort out what these *should* be. llvm-svn: 218582
* Add MachineOperand::ChangeToFPImmediate and setFPImmMatt Arsenault2014-09-281-7/+25
| | | | llvm-svn: 218579
* [x86] Fix a really silly bug that I introduced fixing another bug in theChandler Carruth2014-09-281-1/+1
| | | | | | | | | | new vector shuffle target DAG combines -- it helps to actually test for the value you want rather than just using an integer in a boolean context. Have I mentioned that I loathe implicit conversions recently? :: sigh :: llvm-svn: 218576
* [x86] Fix yet another bug in the new vector shuffle lowering's handlingChandler Carruth2014-09-281-7/+16
| | | | | | | | | | | | | | | | | | of widening masks. We can't widen a zeroing mask unless both elements that would be merged are either zeroed or undef. This is the only way to widen a mask if it has a zeroed element. Also clean up the code here by ordering the checks in a more logical way and by using the symoblic values for undef and zero. I'm actually torn on using the symbolic values because the existing code is littered with the assumption that -1 is undef, and moreover that entries '< 0' are the special entries. While that works with the values given to these constants, using the symbolic constants actually makes it a bit more opaque why this is the case. llvm-svn: 218575
* WinCOFFObjectWriter.cpp: make write_uint32_le more efficientHans Wennborg2014-09-281-6/+4
| | | | llvm-svn: 218574
* [AArch64] Redundant store instructions should be removed as dead codeJames Molloy2014-09-271-0/+11
| | | | | | | | | | | | | | | If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! llvm-svn: 218569
* Fix llvm::huge_valf multiple initializations with Visual C++.Yaron Keren2014-09-272-0/+33
| | | | | | | | | | | | | | | | | | | llvm::huge_valf is defined in a header file, so it is initialized multiple times in every compiled unit upon program startup. With non-VC compilers huge_valf is set to a HUGE_VALF which the compiler can probably optimize out. With VC numeric_limits<float>::infinity() does not return a number but a runtime structure member which therotically may change between calls so the compiler does not optimize out the initialization and it happens many times. It can be easily seen by placing a breakpoint on the initialization line. This patch moves llvm::huge_valf initialization to a source file instead of the header. llvm-svn: 218567
* [x86] Fix yet another issue with widening vector shuffle elements.Chandler Carruth2014-09-271-2/+2
| | | | | | | | I spotted this by inspection when debugging something else, so I have no test case what-so-ever, and am not even sure it is possible to realistically trigger the bug. But this is what was intended here. llvm-svn: 218565
OpenPOWER on IntegriCloud