summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/vec_ctbits.ll
Commit message (Collapse)AuthorAgeFilesLines
* [x86] transform vector inc/dec to use -1 constant (PR33483)Sanjay Patel2017-06-261-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert vector increment or decrement to sub/add with an all-ones constant: add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...> The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency), so that's better than loading a splat 1 constant. AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better way to produce 512 one-bits. The general advantages of this lowering are: 1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but... 2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar, but... 3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too. 4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass). This should fix: https://bugs.llvm.org/show_bug.cgi?id=33483 Differential Revision: https://reviews.llvm.org/D34336 llvm-svn: 306289
* [VectorLegalizer] Expansion of CTLZ using CTPOP when possibleSimon Pilgrim2016-11-081-25/+72
| | | | | | | | | | This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233
* [X86][SSE] Don't decide when to scalarize CTTZ/CTLZ for performance at ↵Simon Pilgrim2016-08-041-21/+40
| | | | | | | | lowering - this is what cost models are for Improved CTTZ/CTLZ costings will be added shortly llvm-svn: 277713
* [SelectionDAG] change getConstant() to use the input SDLoc when building ↵Sanjay Patel2016-02-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | splat vectors The code change is simple enough: instead of attaching an anonymous SDLoc to splatted vector constants, use the scalar constant's existing SDLoc since that is what is passed into getConstant() as a param. But this changes instruction scheduling, so I'll explain why that happens. The motivation for this patch starts near: http://reviews.llvm.org/rL258833 ...x86's getZeroVector() could be similarly cleaned up and I thought it would be 'NFC'. But when I made that change locally, several x86 codegen tests wiggled. It turns out that the lack of SDLoc consistency in getConstant() changes the way ScheduleDAGRRList behaves. This is because the SDLoc contains 'IROrder' and some DAG scheduler algorithms use IROrder for tie-breaking. Differential Revision: http://reviews.llvm.org/D16972 llvm-svn: 260582
* Make utils/update_llc_test_checks.py note that the assertions areJames Y Knight2015-11-231-0/+1
| | | | | | | | | autogenerated. Also update existing test cases which appear to be generated by it and weren't modified (other than addition of the header) by rerunning it. llvm-svn: 253917
* [X86][SSE] Refreshed vector bit count tests.Simon Pilgrim2015-07-261-26/+102
| | | | llvm-svn: 243249
* llvm/test/CodeGen/X86/vec_ctbits.ll: Add explicit -mtriple=x86_64-unknown. ↵NAKAMURA Takumi2014-09-121-1/+1
| | | | | | It was incompatible to Win32 x64. llvm-svn: 217683
* Legalizer: Use the scalar bit width when promoting bit counting instrs onBenjamin Kramer2014-09-121-1/+50
| | | | | | | | | vectors. e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup the result by 32 bits, not 64. PR20917. llvm-svn: 217671
* Manually upgrade the test suite to specify the flag to cttz and ctlz.Chandler Carruth2011-12-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | I followed three heuristics for deciding whether to set 'true' or 'false': - Everything target independent got 'true' as that is the expected common output of the GCC builtins. - If the target arch only has one way of implementing this operation, set the flag in the way that exercises the most of codegen. For most architectures this is also the likely path from a GCC builtin, with 'true' being set. It will (eventually) require lowering away that difference, and then lowering to the architecture's operation. - Otherwise, set the flag differently dependending on which target operation should be tested. Let me know if anyone has any issue with this pattern or would like specific tests of another form. This should allow the x86 codegen to just iteratively improve as I teach the backend how to differentiate between the two forms, and everything else should remain exactly the same. llvm-svn: 146370
* Eliminate more uses of llvm-as and llvm-dis.Dan Gohman2009-09-081-1/+1
| | | | llvm-svn: 81290
* Add nounwind.Evan Cheng2008-05-291-3/+3
| | | | llvm-svn: 51665
* Allow vector integer constants to be created withDan Gohman2007-12-121-0/+18
SelectionDAG::getConstant, in the same way as vector floating-point constants. This allows the legalize expansion code for @llvm.ctpop and friends to be usable with vector types. llvm-svn: 44954
OpenPOWER on IntegriCloud