summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* R600/SI: Add a note about the order of the operands to div_scaleMatt Arsenault2014-09-261-0/+6
| | | | llvm-svn: 218534
* R600/SI: Move finding SGPR operand to move to separate functionMatt Arsenault2014-09-262-63/+71
| | | | llvm-svn: 218533
* R600/SI Allow same SGPR to be used for multiple operandsMatt Arsenault2014-09-261-5/+32
| | | | | | | | | | | Instead of moving the first SGPR that is different than the first, legalize the operand that requires the fewest moves if one SGPR is used for multiple operands. This saves extra moves and is also required for some instructions which require that the same operand be used for multiple operands. llvm-svn: 218532
* R600/SI: Partially move operand legalization to post-isel hook.Matt Arsenault2014-09-264-70/+41
| | | | | | | | | Disable the SGPR usage restriction parts of the DAG legalizeOperands. It now should only be doing immediate folding until it can be replaced later. The real legalization work is now done by the other SIInstrInfo::legalizeOperands llvm-svn: 218531
* R600/SI: Implement findCommutedOpIndicesMatt Arsenault2014-09-262-1/+36
| | | | | | | | | | | The base implementation of commuteInstruction is used in some cases, but it turns out this has been broken for a long time since modifiers were inserted between the real operands. The base implementation of commuteInstruction also fails on immediates, which also needs to be fixed. llvm-svn: 218530
* R600/SI: Don't move operands that are required to be SGPRsMatt Arsenault2014-09-261-1/+20
| | | | | | | | e.g. v_cndmask_b32 requires the condition operand be an SGPR. If one of the source operands were an SGPR, that would be considered the one SGPR use and the condition operand would be illegally moved. llvm-svn: 218529
* R600/SI: Don't assert on exotic operand typesMatt Arsenault2014-09-261-1/+1
| | | | | | | | | This needs a test, but I'm not sure if it is currently possible and I originally hit it due to a bug. Right now the only global address operands have no reason to be VALU instructions, although it theoretically could be a problem. llvm-svn: 218528
* R600/SI: Fix using wrong operand indices when commutingMatt Arsenault2014-09-261-11/+20
| | | | | | | | | | | | | No test since the current SIISelLowering::legalizeOperands effectively hides this, and the general uses seem to only fire on SALU instructions which don't have modifiers between the operands. When trying to use legalizeOperands immediately after instruction selection, it now sees a lot more patterns it did not see before which break on this. llvm-svn: 218527
* R600/SI: Remove apparently dead code in legalizeOperandsMatt Arsenault2014-09-261-8/+0
| | | | | | | | No tests hit this, and I don't see any way a GlobalAddress node would survive beyond lowering on SI. It it would, the move should probably be inserted by selection. llvm-svn: 218526
* Ignore annotation function calls in cost computationDavid Peixotto2014-09-261-0/+1
| | | | | | | | | | | The annotation instructions are dropped during codegen and have no impact on size. In some cases, the annotations were preventing the unroller from unrolling a loop because the annotation calls were pushing the cost over the unrolling threshold. Differential Revision: http://reviews.llvm.org/D5335 llvm-svn: 218525
* [x86] The mnemonic is SHUFPS not SHUPFS. =[ I'm very bad at spellingChandler Carruth2014-09-261-3/+3
| | | | | | sadly. llvm-svn: 218524
* [x86] In the new vector shuffle lowering, when trying to do anotherChandler Carruth2014-09-261-10/+11
| | | | | | | | layer of tie-breaking sorting, it really helps to check that you're in a tie first. =] Otherwise the whole thing cycles infinitely. Test case added, another one found through fuzz testing. llvm-svn: 218523
* [x86] Fix a large collection of bugs that crept in as I fleshed out theChandler Carruth2014-09-261-13/+27
| | | | | | | | | | | | AVX support. New test cases included. Note that none of the existing test cases covered these buggy code paths. =/ Also, it is clear from this that SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[ These were all detected by fuzz-testing. (I <3 fuzz testing.) llvm-svn: 218522
* Elide repeated register operand in Thumb1 instructionsRenato Golin2014-09-261-1/+43
| | | | | | | | | | | | | | | | | | | This patch makes the ARM backend transform 3 operand instructions such as 'adds/subs' to the 2 operand version of the same instruction if the first two register operands are the same. Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'. Currently for some instructions such as 'adds' if you try to assemble 'adds r0, r0, #8' for thumb v6m the assembler would throw an error message because the immediate cannot be encoded using 3 bits. The backend should be smart enough to transform the instruction to 'adds r0, #8', which allows for larger immediate constants. Patch by Ranjeet Singh. llvm-svn: 218521
* [X86][SchedModel] SSE reciprocal square root instruction latencies.Andrea Di Biagio2014-09-267-15/+39
| | | | | | | | | | | | | | | | | The SSE rsqrt instruction (a fast reciprocal square root estimate) was grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very slow) SSE sqrt instruction. For code which uses rsqrt (possibly with newton-raphson iterations) this poor scheduling was affecting performances. This patch splits off the rsqrt instruction from the sqrt instruction scheduling classes and creates new IIC_SSE_RSQER* classes with latency values based on Agner's table. Differential Revision: http://reviews.llvm.org/D5370 Patch by Simon Pilgrim. llvm-svn: 218517
* Revert "Store TypeUnits in a SmallVector<DWARFUnitSection> instead of a ↵Frederic Riss2014-09-262-23/+16
| | | | | | | | | | | single DWARFUnitSection." This reverts commit r218513. Buildbots using libstdc++ issue an error when trying to copy SmallVector<std::unique_ptr<>>. Revert the commit until we have a fix. llvm-svn: 218514
* Store TypeUnits in a SmallVector<DWARFUnitSection> instead of a single ↵Frederic Riss2014-09-262-16/+23
| | | | | | | | | | | | | | | | | | | | | DWARFUnitSection. Summary: There will be multiple TypeUnits in an unlinked object that will be extracted from different sections. Now that we have DWARFUnitSection that is supposed to represent an input section, we need a DWARFUnitSection<TypeUnit> per input .debug_types section. Once this is done, the interface is homogenous and we can move the Section parsing code into DWARFUnitSection. Reviewers: samsonov, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5482 llvm-svn: 218513
* Fix unused variable warning added in r218509Daniel Sanders2014-09-261-1/+0
| | | | llvm-svn: 218510
* [mips] Generalize the handling of f128 return values to support f128 arguments.Daniel Sanders2014-09-263-50/+112
| | | | | | | | | | | | | | | | | | Summary: This will allow us to handle f128 arguments without duplicating code from CCState::AnalyzeFormalArguments() or CCState::AnalyzeCallOperands(). No functional change. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5292 llvm-svn: 218509
* [AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables.Robert Khasanov2014-09-262-6/+61
| | | | | | Added lowering tests for these instructions. llvm-svn: 218508
* Fix build breakage on MSVC 2013David Majnemer2014-09-261-1/+1
| | | | llvm-svn: 218499
* Target: Fix build breakage.David Majnemer2014-09-261-2/+2
| | | | | | No functional change intended. llvm-svn: 218497
* Support: Remove undefined behavior from &raw_ostream::operator<<David Majnemer2014-09-261-1/+1
| | | | | | | Don't negate signed integer types in &raw_ostream::operator<<(const FormattedNumber &FN). llvm-svn: 218496
* Revert patch ofr218493David Xu2014-09-261-14/+0
| | | | llvm-svn: 218494
* Redundant store instructions should be removed as dead codeDavid Xu2014-09-261-0/+14
| | | | llvm-svn: 218493
* Add the first backend support for on demand subtarget creationEric Christopher2014-09-262-13/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | based on the Function. This is currently used to implement mips16 support in the mips backend via the existing module pass resetting the subtarget. Things to note: a) This involved running resetTargetOptions before creating a new subtarget so that code generation options like soft-float could be recognized when creating the new subtarget. This is to deal with initialization code in isel lowering that only paid attention to the initial value. b) Many of the existing testcases weren't using the soft-float feature correctly. I've corrected these based on the check values assuming that was the desired behavior. c) The mips port now pays attention to the target-cpu and target-features strings when generating code for a particular function. I've removed these from one function where the requested cpu and features didn't match the check lines in the testcase. llvm-svn: 218492
* Move resetTargetOptions from taking a MachineFunction to a FunctionEric Christopher2014-09-262-13/+9
| | | | | | | since we are accessing the TargetMachine that we're a member function of. llvm-svn: 218489
* R600/SI: Fix emitting trailing whitespace after s_waitcntMatt Arsenault2014-09-261-5/+19
| | | | llvm-svn: 218486
* [AVX512] Simplify use of !con()Adam Nemet2014-09-261-4/+2
| | | | | | No change in X86.td.expanded. llvm-svn: 218485
* [AVX512] Pull pattern for subvector extract into the instruction definitionAdam Nemet2014-09-251-9/+6
| | | | | | | | | | | | | | | | | | No functional change. I initially thought that pulling the Pat<> into the instruction pattern was not possible because it was doing a transform on the index in order to convert it from a per-element (extract_subvector) index into a per-chunk (vextract*x4) index. Turns out this also works inside the pattern because the vextract_extract PatFrag has an OperandTransform EXTRACT_get_vextract{128,256}_imm, so the index in $idx goes through the same conversion. The existing test CodeGen/X86/avx512-insert-extract.ll extended in the previous commit provides coverage for this change. llvm-svn: 218480
* [AVX512] Refactor subvector extractsAdam Nemet2014-09-251-98/+69
| | | | | | | | | | | | | | | | | | | | | No functional change. These are now implemented as two levels of multiclasses heavily relying on the new X86VectorVTInfo class. The multiclass at the first level that is called with float or int provides the 128 or 256 bit subvector extracts. The second level provides the register and memory variants and some more Pat<>s. I've compared the td.expanded files before and after. One change is that ExeDomain for 64x4 is SSEPackedDouble now. I think this is correct, i.e. a bugfix. (BTW, this is the change that was blocked on the recent tablegen fix. The class-instance values X86VectorVTInfo inside vextract_for_type weren't properly evaluated.) Part of <rdar://problem/17688758> llvm-svn: 218478
* [AVX512] Fix typoAdam Nemet2014-09-251-1/+1
| | | | | | F->I in VEXTRACTF32x4rr. llvm-svn: 218477
* [MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfoBruno Cardoso Lopes2014-09-251-6/+23
| | | | | | | | | | | | | | | Machine Sink uses loop depth information to select between successors BBs to sink machine instructions into, where BBs within smaller loop depths are preferable. This patch adds support for choosing between successors by using profile information from BlockFrequencyInfo instead, whenever the information is available. Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5% execution speedup in average on x86-64 darwin. <rdar://problem/18021659> llvm-svn: 218472
* [Support] Add type-safe alternative to llvm::format()Nick Kledzik2014-09-251-0/+57
| | | | | | | | | | | | | | | | | | | | | llvm::format() is somewhat unsafe. The compiler does not check that integer parameter size matches the %x or %d size and it does not complain when a StringRef is passed for a %s. And correctly using a StringRef with format() is ugly because you have to convert it to a std::string then call c_str(). The cases where llvm::format() is useful is controlling how numbers and strings are printed, especially when you want fixed width output. This patch adds some new formatting functions to raw_streams to format numbers and StringRefs in a type safe manner. Some examples: OS << format_hex(255, 6) => "0x00ff" OS << format_hex(255, 4) => "0xff" OS << format_decimal(0, 5) => " 0" OS << format_decimal(255, 5) => " 255" OS << right_justify(Str, 5) => " foo" OS << left_justify(Str, 5) => "foo " llvm-svn: 218463
* Refactoring: raw pointer -> unique_ptrAnton Yartsev2014-09-251-5/+3
| | | | llvm-svn: 218462
* ARM: Remove unneeded check for MI->hasPostISelHook()Tom Stellard2014-09-251-6/+0
| | | | llvm-svn: 218459
* SelectionDAG: Remove #if NDEBUG from check for a post-isel hookTom Stellard2014-09-251-2/+0
| | | | | | | | | | | | | | The InstrEmitter will skip the check of MI.hasPostISelHook() before calling AdjustInstrPostInstrSelection() when NDEBUG is not defined. This was added in r140228, and I'm not sure if it is intentional or not, but it is a likely source for bugs, because it means with Release+Asserts builds you can forget to set the hasPostISelHook flag on TableGen definitions and AdjustInstrPostInstrSelection() will still be called. llvm-svn: 218458
* R600/SI: Add support for global atomic addTom Stellard2014-09-254-3/+111
| | | | llvm-svn: 218457
* Lower idempotent RMWs to fence+loadRobin Morisset2014-09-253-6/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I originally tried doing this specifically for X86 in the backend in D5091, but it was rather brittle and generally running too late to be general. Furthermore, other targets may want to implement similar optimizations. So I reimplemented it at the IR-level, fitting it into AtomicExpandPass as it interacts with that pass (which could not be cleanly done before at the backend level). This optimization relies on a new target hook, which is only used by X86 for now, as the correctness of the optimization on other targets remains an open question. If it is found correct on other targets, it should be trivial to enable for them. Details of the optimization are discussed in D5091. Test Plan: make check-all + a new test Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5422 llvm-svn: 218455
* Add missing attributes !cmp.[eq,gt,gtu] instructions.Sid Manning2014-09-253-30/+46
| | | | | | | | These instructions do not indicate they are extendable or the number of bits in the extendable operand. Rename to match architected names. Add a testcase for the intrinsics. llvm-svn: 218453
* Add llvm_unreachables() for [ASZ]ExtUpper to X86FastISel.cpp to appease the ↵Daniel Sanders2014-09-251-0/+3
| | | | | | buildbots. llvm-svn: 218452
* [mips] Add CCValAssign::[ASZ]ExtUpper and CCPromoteToUpperBitsInType and ↵Daniel Sanders2014-09-252-4/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | handle struct's correctly on big-endian N32/N64 return values. Summary: The N32/N64 ABI's require that structs passed in registers are laid out such that spilling the register with 'sd' places the struct at the lowest address. For little endian this is trivial but for big-endian it requires that structs are shifted into the upper bits of the register. We also require that structs passed in registers have the 'inreg' attribute for big-endian N32/N64 to work correctly. This is because the tablegen-erated calling convention implementation only has access to the lowered form of struct arguments (one or more integers of up to 64-bits each) and is unable to determine the original type. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5286 llvm-svn: 218451
* Add aliases for VAND imm to VBIC ~immRenato Golin2014-09-253-19/+111
| | | | | | | | | | | | | On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with the same type size. Adding that logic to the parser, and generating VBIC instructions from VAND asm files. This patch also fixes the validation routines for NEON splat immediates which were wrong. Fixes PR20702. llvm-svn: 218450
* [x86] Teach the new vector shuffle lowering to use AVX2 instructions forChandler Carruth2014-09-251-16/+31
| | | | | | | | | | | | | | | | | | | | | | | v4f64 and v8f32 shuffles when they are lane-crossing. We have fully general lane-crossing permutation functions in AVX2 that make this easy. Part of this also changes exactly when and how these vectors are split up when we don't have AVX2. This isn't always a win but it usually is a win, so on the balance I think its better. The primary regressions are all things that just need to be fixed anyways such as modeling when a blend can be completely accomplished via VINSERTF128, etc. Also, this highlights one of the few remaining big features: we do a really poor job of inserting elements into AVX registers efficiently. This completes almost all of the big tricks I have in mind for AVX2. The only things left that I plan to add: 1) element insertion smarts 2) palignr and other fairly specialized lowerings when they happen to apply llvm-svn: 218449
* [x86] Teach the new vector shuffle lowering a fancier way to lowerChandler Carruth2014-09-251-33/+65
| | | | | | | | | | | 256-bit vectors with lane-crossing. Rather than immediately decomposing to 128-bit vectors, try flipping the 256-bit vector lanes, shuffling them and blending them together. This reduces our worst case shuffle by a pretty significant margin across the board. llvm-svn: 218446
* [Thumb2] BXJ should be undefined for v7M, v8AOliver Stannard2014-09-251-1/+1
| | | | | | | | The Thumb2 BXJ instruction (Branch and Exchange Jazelle) is not defined for v7M or v8A. It is defined for all other Thumb2-supporting architectures (v6T2, v7A and v7R). llvm-svn: 218445
* [x86] Fix an oversight in the v8i32 path of the new vector shuffleChandler Carruth2014-09-251-2/+2
| | | | | | | | | | | | | | | | lowering where it only used the mask of the low 128-bit lane rather than the entire mask. This allows the new lowering to correctly match the unpack patterns for v8i32 vectors. For reference, the reason that we check for the the entire mask rather than checking the repeated mask is because the repeated masks don't abide by all of the invariants of normal masks. As a consequence, it is safer to use the full mask with functions like the generic equivalence test. llvm-svn: 218442
* [x86] Rearrange the code for v16i16 lowering a bit for clarity and toChandler Carruth2014-09-251-29/+18
| | | | | | | | | | | | | | | | | | | | reduce the amount of checking we do here. The first realization is that only non-crossing cases between 128-bit lanes are handled by almost the entire function. It makes more sense to handle the crossing cases first. THe second is that until we actually are going to generate fancy shared lowering strategies that use the repeated semantics of the v8i16 lowering, we should waste time checking for repeated masks. It is simplest to directly test for the entire unpck masks anyways, so we gained nothing from this. This also matches the structure of v32i8 more closely. No functionality changed here. llvm-svn: 218441
* [x86] Implement AVX2 support for v32i8 in the new vector shuffleChandler Carruth2014-09-251-5/+57
| | | | | | | | | | lowering. This completes the basic AVX2 feature support, but there are still some improvements I'd like to do to really get the last mile of performance here. llvm-svn: 218440
* MC: Use @IMGREL instead of @IMGREL32, which we can't parseReid Kleckner2014-09-251-1/+1
| | | | | | | | | | | | Nico Rieck added support for this 32-bit COFF relocation some time ago for Win64 stuff. It appears that as an oversight, the assembly output used "foo"@IMGREL32 instead of "foo"@IMGREL, which is what we can parse. Sadly, there were actually tests that took in IMGREL and put out IMGREL32, and we didn't notice the inconsistency. Oh well. Now LLVM can assemble it's own output with slightly more fidelity. llvm-svn: 218437
OpenPOWER on IntegriCloud