summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Add aliases for VAND imm to VBIC ~immRenato Golin2014-09-253-19/+111
| | | | | | | | | | | | | On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with the same type size. Adding that logic to the parser, and generating VBIC instructions from VAND asm files. This patch also fixes the validation routines for NEON splat immediates which were wrong. Fixes PR20702. llvm-svn: 218450
* [x86] Teach the new vector shuffle lowering to use AVX2 instructions forChandler Carruth2014-09-251-16/+31
| | | | | | | | | | | | | | | | | | | | | | | v4f64 and v8f32 shuffles when they are lane-crossing. We have fully general lane-crossing permutation functions in AVX2 that make this easy. Part of this also changes exactly when and how these vectors are split up when we don't have AVX2. This isn't always a win but it usually is a win, so on the balance I think its better. The primary regressions are all things that just need to be fixed anyways such as modeling when a blend can be completely accomplished via VINSERTF128, etc. Also, this highlights one of the few remaining big features: we do a really poor job of inserting elements into AVX registers efficiently. This completes almost all of the big tricks I have in mind for AVX2. The only things left that I plan to add: 1) element insertion smarts 2) palignr and other fairly specialized lowerings when they happen to apply llvm-svn: 218449
* [x86] Teach the new vector shuffle lowering a fancier way to lowerChandler Carruth2014-09-251-33/+65
| | | | | | | | | | | 256-bit vectors with lane-crossing. Rather than immediately decomposing to 128-bit vectors, try flipping the 256-bit vector lanes, shuffling them and blending them together. This reduces our worst case shuffle by a pretty significant margin across the board. llvm-svn: 218446
* [Thumb2] BXJ should be undefined for v7M, v8AOliver Stannard2014-09-251-1/+1
| | | | | | | | The Thumb2 BXJ instruction (Branch and Exchange Jazelle) is not defined for v7M or v8A. It is defined for all other Thumb2-supporting architectures (v6T2, v7A and v7R). llvm-svn: 218445
* [x86] Fix an oversight in the v8i32 path of the new vector shuffleChandler Carruth2014-09-251-2/+2
| | | | | | | | | | | | | | | | lowering where it only used the mask of the low 128-bit lane rather than the entire mask. This allows the new lowering to correctly match the unpack patterns for v8i32 vectors. For reference, the reason that we check for the the entire mask rather than checking the repeated mask is because the repeated masks don't abide by all of the invariants of normal masks. As a consequence, it is safer to use the full mask with functions like the generic equivalence test. llvm-svn: 218442
* [x86] Rearrange the code for v16i16 lowering a bit for clarity and toChandler Carruth2014-09-251-29/+18
| | | | | | | | | | | | | | | | | | | | reduce the amount of checking we do here. The first realization is that only non-crossing cases between 128-bit lanes are handled by almost the entire function. It makes more sense to handle the crossing cases first. THe second is that until we actually are going to generate fancy shared lowering strategies that use the repeated semantics of the v8i16 lowering, we should waste time checking for repeated masks. It is simplest to directly test for the entire unpck masks anyways, so we gained nothing from this. This also matches the structure of v32i8 more closely. No functionality changed here. llvm-svn: 218441
* [x86] Implement AVX2 support for v32i8 in the new vector shuffleChandler Carruth2014-09-251-5/+57
| | | | | | | | | | lowering. This completes the basic AVX2 feature support, but there are still some improvements I'd like to do to really get the last mile of performance here. llvm-svn: 218440
* MC: Use @IMGREL instead of @IMGREL32, which we can't parseReid Kleckner2014-09-251-1/+1
| | | | | | | | | | | | Nico Rieck added support for this 32-bit COFF relocation some time ago for Win64 stuff. It appears that as an oversight, the assembly output used "foo"@IMGREL32 instead of "foo"@IMGREL, which is what we can parse. Sadly, there were actually tests that took in IMGREL and put out IMGREL32, and we didn't notice the inconsistency. Oh well. Now LLVM can assemble it's own output with slightly more fidelity. llvm-svn: 218437
* [x86] Remove the defunct X86ISD::BLENDV entry -- we use vector selectsChandler Carruth2014-09-252-4/+0
| | | | | | | | | for this now. Should prevent folks from running afoul of this and not knowing why their code won't instruction select the way I just did... llvm-svn: 218436
* [x86] Fix the v16i16 blend logic I added in the prior commit and add theChandler Carruth2014-09-251-3/+5
| | | | | | | | | | missing test cases for it. Unsurprisingly, without test cases, there were bugs here. Surprisingly, this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV. It isn't wired to anything. Oops. I'll fix than next. llvm-svn: 218434
* llvm-cov: Combine segments that cover the same locationJustin Bogner2014-09-251-4/+18
| | | | | | | | If we have multiple coverage counts for the same segment, we need to add them up rather than arbitrarily choosing one. This fixes that and adds a test with template instantiations to exercise it. llvm-svn: 218432
* [X86,AVX] Add an isel pattern for X86VBroadcast.Akira Hatanaka2014-09-251-0/+3
| | | | | | This fixes PR21050 and rdar://problem/18434607. llvm-svn: 218431
* [x86] Implement v16i16 support with AVX2 in the new vector shuffleChandler Carruth2014-09-254-58/+174
| | | | | | | | | | | | | | | lowering. This also implements the fancy blend lowering for v16i16 using AVX2 and teaches the X86 backend to print shuffle masks for 256-bit PSHUFB and PBLENDW instructions. It also makes the mask decoding correct for PBLENDW instructions. The yaks, they are legion. Tests are updated accordingly. There are some missing tests for the VBLENDVB lowering, but I'll add those in a follow-up as this commit has accumulated enough cruft already. llvm-svn: 218430
* [asan] don't instrument module CTORs that may be run before ↵Kostya Serebryany2014-09-241-4/+6
| | | | | | asan.module_ctor. This fixes asan running together -coverage llvm-svn: 218421
* Revert 218406 - Refactor the RelocVisitor::visit methodRenato Golin2014-09-241-1/+1
| | | | llvm-svn: 218416
* Revert r218380. This was breaking Apple internal build bots.Akira Hatanaka2014-09-241-6/+14
| | | | llvm-svn: 218409
* Refactor the RelocVisitor::visit methodRenato Golin2014-09-241-1/+1
| | | | | | | | | | | | | | | | | | This change replaces the brittle if/else chain of string comparisons with a switch statement on the detected target triple, removing the need for testing arbitrary architecture names returned from getFileFormatName, whose primary purpose seems to be for display (user-interface) purposes. The visitor now takes a reference to the object file, rather than its arbitrary file format name to figure out whether the file is a 32 or 64-bit object file and what the detected target triple is. A set of tests have been added to help show that the refactoring processes relocations for the same targets as the original code. Patch by Charlie Turner. llvm-svn: 218406
* Adding #ifdef around TermColorMutex based on feedback from Craig Topper.Chris Bieneman2014-09-241-0/+2
| | | | llvm-svn: 218401
* [x86] Factor out the logic to generically decombose a vector shuffleChandler Carruth2014-09-241-72/+42
| | | | | | | | | | | into unblended shuffles and a blend. This is the consistent fallback for the lowering paths that have fast blend operations available, and its getting quite repetitive. No functionality changed. llvm-svn: 218399
* Revert "Refactor the RelocVisitor::visit method"Kaelyn Takata2014-09-241-1/+1
| | | | | | | | | This reverts commit faac033f7364bb4226e22c8079c221c96af10d02. The test depends on all targets to be enabled in llc in order to pass, and needs to be rewritten/refactored to not have that dependency. llvm-svn: 218393
* Refactor the RelocVisitor::visit methodRenato Golin2014-09-241-1/+1
| | | | | | | | | | | | | | | | | | This change replaces the brittle if/else chain of string comparisons with a switch statement on the detected target triple, removing the need for testing arbitrary architecture names returned from getFileFormatName, whose primary purpose seems to be for display (user-interface) purposes. The visitor now takes a reference to the object file, rather than its arbitrary file format name to figure out whether the file is a 32 or 64-bit object file and what the detected target triple is. A set of tests have been added to help show that the refactoring processes relocations for the same targets as the original code. Patch by Charlie Turner. llvm-svn: 218388
* Fix assertion in LICM doFinalization()David Peixotto2014-09-242-0/+24
| | | | | | | | | | | | | | | | The doFinalization method checks that the LoopToAliasSetMap is empty. LICM populates that map as it runs through the loop nest, deleting the entries for child loops as it goes. However, if a child loop is deleted by another pass (e.g. unrolling) then the loop will never be deleted from the map because LICM walks the loop nest to find entries it can delete. The fix is to delete the loop from the map and free the alias set when the loop is deleted from the loop nest. Differential Revision: http://reviews.llvm.org/D5305 llvm-svn: 218387
* [Thumb] Make load/store optimizer less conservative.Moritz Roth2014-09-241-60/+195
| | | | | | | | | | | If it's safe to clobber the condition flags, we can do a few extra things: it's then possible to reset the base register writeback using a SUBS, so we can try to merge even if the base register isn't dead after the merged instruction. This is effectively a (heavily bug-fixed) rewrite of r208992. llvm-svn: 218386
* [Thumb] 32-bit encodings of 'cps' are not valid for v7MOliver Stannard2014-09-242-1/+4
| | | | | | | | v7M only allows the 16-bit encoding of the 'cps' (Change Processor State) instruction, and does not have the 32-bit encoding which is valid from v6T2 onwards. llvm-svn: 218382
* Silencing an "enumeral and non-enumeral type in conditional expression" ↵Aaron Ballman2014-09-241-1/+2
| | | | | | warning. NFC. llvm-svn: 218381
* Replace a hand-written suffix compare with std::lexicographical_compare.Benjamin Kramer2014-09-241-14/+6
| | | | | | No functionality change. llvm-svn: 218380
* [x86] Teach the instruction lowering to add comments describing constantChandler Carruth2014-09-241-12/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pool data being loaded into a vector register. The comments take the form of: # ymm0 = [a,b,c,d,...] # xmm1 = <x,y,z...> The []s are used for generic sequential data and the <>s are used for specifically ConstantVector loads. Undef elements are printed as the letter 'u', integers in decimal, and floating point values as floating point values. Suggestions on improving the formatting or other aspects of the display are very welcome. My primary use case for this is to be able to FileCheck test masks passed to vector shuffle instructions in-register. It isn't fantastic for that (no decoding special zeroing semantics or other tricks), but it at least puts the mask onto an instruction line that could reasonably be checked. I've updated many of the new vector shuffle lowering tests to leverage this in their test cases so that we're actually checking the shuffle masks remain as expected. Before implementing this, I tried a *bunch* of different approaches. I looked into teaching the MCInstLower code to scan up the basic block and find a definition of a register used in a shuffle instruction and then decode that, but this seems incredibly brittle and complex. I talked to Hal a lot about the "right" way to do this: attach the raw shuffle mask to the instruction itself in some form of unencoded operands, and then use that to emit the comments. I still think that's the optimal solution here, but it proved to be beyond what I'm up for here. In particular, it seems likely best done by completing the plumbing of metadata through these layers and attaching the shuffle mask in metadata which could have fully automatic dropping when encoding an actual instruction. llvm-svn: 218377
* Allow BB duplication threshold to be adjusted through JumpThreading's ctorMichael Liao2014-09-241-7/+10
| | | | | | | - BB duplication may not be desired on targets where there is no or small branch penalty and code duplication needs restrict control. llvm-svn: 218375
* Windows/Host.inc: Reformat the header to fit 80-col.NAKAMURA Takumi2014-09-241-1/+1
| | | | llvm-svn: 218374
* Unix/Host.inc: Remove <cstdlib>. It has been unused for a long time.NAKAMURA Takumi2014-09-241-1/+0
| | | | llvm-svn: 218373
* Unix/Host.inc: Wrap a comment line in 80-col.NAKAMURA Takumi2014-09-241-1/+2
| | | | llvm-svn: 218371
* Unix/Host.inc: Remove leading whitespace. It had been here since r56942!NAKAMURA Takumi2014-09-241-1/+1
| | | | llvm-svn: 218370
* Clear PreferredExtendType for in each function-specific state ↵Jiangning Liu2014-09-241-0/+1
| | | | | | FunctionLoweringInfo. llvm-svn: 218364
* [x86] More refactoring of the shuffle comment emission. The previousChandler Carruth2014-09-241-38/+38
| | | | | | | | | | | attempt didn't work out so well. It looks like it will be much better for introducing extra logic to find a shuffle mask if the finding logic is totally separate. This also makes it easy to sink the opcode logic completely out of the routine so we don't re-dispatch across it. Still no functionality changed. llvm-svn: 218363
* [x86] Bypass the shuffle mask comment generation when not using verboseChandler Carruth2014-09-241-0/+2
| | | | | | | | asm. This can be somewhat expensive and there is no reason to do it outside of tests or debugging sessions. I'm also likely to make it significantly more expensive to support more styles of shuffles. llvm-svn: 218362
* [x86] Hoist the logic for extracting the relevant bits of informationChandler Carruth2014-09-241-16/+20
| | | | | | | | | | | | | | | from the MachineInstr into the caller which is already doing a switch over the instruction. This will make it more clear how to compute different operands to feed the comment selection for example. Also, in a drive-by-fix, don't append an empty comment string (which is a no-op ultimately). No functionality changed. llvm-svn: 218361
* R600/SI: Add new helper isSGPRClassIDMatt Arsenault2014-09-242-8/+14
| | | | | | Move these into header since they are trivial llvm-svn: 218360
* R600/SI: Fix hardcoded and wrong operand numbers.Matt Arsenault2014-09-241-5/+3
| | | | | | Also fix leftover debug printing llvm-svn: 218359
* R600/SI: Enable named operand table for SALU instructionsMatt Arsenault2014-09-241-0/+8
| | | | llvm-svn: 218358
* [x86] Start refactoring the comment printing logic in the MC lowering ofChandler Carruth2014-09-241-87/+102
| | | | | | | | | | | | | | | vector shuffles. This is just the beginning by hoisting it into its own function and making use of early exit to dramatically simplify the flow of the function. I'm going to be incrementally refactoring this until it is a bit less magical how this applies to other instructions, and I can teach it how to dig a shuffle mask out of a register. Then I plan to hook it up to VPERMD so we get our mask comments for it. No functionality changed yet. llvm-svn: 218357
* R600/SI: Enable selecting SALU inside branchesTom Stellard2014-09-242-27/+0
| | | | | | We can do this now that the FixSGPRLiveRanges pass is working. llvm-svn: 218353
* R600/SI: Move PHIs that define SGPRs to the VALU in most casesTom Stellard2014-09-241-0/+52
| | | | | | | This fixes a bug that is uncovered by a future commit and will be tested by the test/CodeGen/R600/sgpr-control-flow.ll test case. llvm-svn: 218352
* R600/SI: Fix the FixSGPRLiveRanges passTom Stellard2014-09-244-33/+121
| | | | | | | | | | | The previous implementation was extending the live range of SGPRs by modifying the live intervals directly. This was causing a lot of machine verification errors when the machine scheduler was enabled. The new implementation adds pseudo instructions with implicit uses to extend the live ranges of SGPRs, which works much better. llvm-svn: 218351
* R600/SI: Mark EXEC_LO and EXEC_HI as reservedTom Stellard2014-09-241-0/+6
| | | | | | | These registers can be allocated and used like other 32-bit registers, but it seems like a likely source for bugs. llvm-svn: 218350
* R600/SI: Fix SIRegisterInfo::getPhysRegSubReg()Tom Stellard2014-09-241-1/+10
| | | | | | | | Correctly handle special registers: EXEC, EXEC_LO, EXEC_HI, VCC_LO, VCC_HI, and M0. The previous implementation would assertion fail when passed these registers. llvm-svn: 218349
* R600/SI: Implement VGPR register spilling for compute at -O0 v3Tom Stellard2014-09-248-48/+332
| | | | | | | | | | | | | | VGPRs are spilled to LDS. This still needs more testing, but we need to at least enable it at -O0, because the fast register allocator spills all registers that are live at the end of blocks and without this some future commits will break the flat-address-space.ll test. v2: Only calculate thread id once v3: Move insertion of spill instructions to SIRegisterInfo::eliminateFrameIndex() llvm-svn: 218348
* [x86] Teach the new vector shuffle lowering to lower v8i32 shuffles withChandler Carruth2014-09-241-5/+50
| | | | | | | | | | | | the native AVX2 instructions. Note that the test case is really frustrating here because VPERMD requires the mask to be in the register input and we don't produce a comment looking through that to the constant pool. I'm going to attempt to improve this in a subsequent commit, but not sure if I will succeed. llvm-svn: 218347
* [x86] Fix a really terrible bug in the repeated 128-bin-lane shuffleChandler Carruth2014-09-241-13/+36
| | | | | | | | | | | | detection. It was incorrectly handling undef lanes by actually treating an undef lane in the first 128-bit lane as a *numeric* shuffle value. Fortunately, this almost always DTRT and disabled detecting repeated patterns. But not always. =/ This patch introduces a much more principled approach and fixes the miscompiles I spotted by inspection previously. llvm-svn: 218346
* [x86] Teach the new vector shuffle lowering to lower v4i64 vectorChandler Carruth2014-09-231-6/+57
| | | | | | | | | | | shuffles using the AVX2 instructions. This is the first step of cutting in real AVX2 support. Note that I have spotted at least one bug in the test cases already, but I suspect it was already present and just is getting surfaced. Will investigate next. llvm-svn: 218338
* GlobalOpt: Preserve comdats of unoptimized initializersReid Kleckner2014-09-231-45/+26
| | | | | | | | | | | | | Rather than slurping in and splatting out the whole ctor list, preserve the existing array entries without trying to understand them. Only remove the entries that we know we can optimize away. This way we don't need to wire through priority and comdats or anything else we might add. Fixes a linker issue where the .init_array or .ctors entry would point to discarded initialization code if the comdat group from the TU with the faulty global_ctors entry was dropped. llvm-svn: 218337
OpenPOWER on IntegriCloud