summaryrefslogtreecommitdiffstats
path: root/clang/utils/TableGen/NeonEmitter.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Use 'override/final' instead of 'virtual' for overridden methodsAlexander Kornienko2015-04-111-15/+15
| | | | | | | | | | | | | | | | | | | | Summary: The patch is generated using clang-tidy misc-use-override check. This command was used: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \ -checks='-*,misc-use-override' -header-filter='llvm|clang' -j=32 -fix Reviewers: dblaikie Reviewed By: dblaikie Subscribers: klimek, cfe-commits Differential Revision: http://reviews.llvm.org/D8926 llvm-svn: 234678
* Fix a call to std::unique to actually discard the trailing (junk) elements.James Dennett2015-04-061-1/+2
| | | | | | Found by inspection. (No other instances of this problem were found.) llvm-svn: 234221
* Replace size() calls on containers with empty() calls where appropriate. NFCAlexander Kornienko2015-01-231-1/+1
| | | | | | | | http://reviews.llvm.org/D7090 Patch by Gábor Horváth! llvm-svn: 226914
* [cleanup] Re-sort the #include lines using llvm/utils/sort_includes.pyChandler Carruth2015-01-141-3/+3
| | | | | | | No functionality changed, this is a purely mechanical cleanup to ensure the #include order remains consistent across the project. llvm-svn: 225975
* Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or ↵Craig Topper2014-08-271-1/+1
| | | | | | just letting them be implicitly created. llvm-svn: 216528
* Fix typosAlp Toker2014-07-141-4/+3
| | | | | | Also consolidate 'backward compatibility' llvm-svn: 212974
* [ARM-BE] Generate correct NEON intrinsics for big endian systems.James Molloy2014-06-271-62/+172
| | | | | | | | | | | | | The NEON intrinsics in arm_neon.h are designed to work on vectors "as-if" loaded by (V)LDR. We load vectors "as-if" (V)LD1, so the intrinsics are currently incorrect. This patch adds big-endian versions of the intrinsics that does the "obvious but dumb" thing of reversing all vector inputs and all vector outputs. This will produce extra REVs, but we trust the optimizer to remove them. llvm-svn: 211893
* Replace some assert(0)'s with llvm_unreachable.Craig Topper2014-06-181-6/+6
| | | | llvm-svn: 211139
* Convert assert(0) to llvm_unreachable to silence a warning about Addend ↵Craig Topper2014-06-181-1/+1
| | | | | | being uninitialized in default case. llvm-svn: 211138
* Rewrite ARM NEON intrinsic emission completely.James Molloy2014-06-171-2982/+1843
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There comes a time in the life of any amateur code generator when dumb string concatenation just won't cut it any more. For NeonEmitter.cpp, that time has come. There were a bunch of magic type codes which meant different things depending on the context. There were a bunch of special cases that really had no reason to be there but the whole thing was so creaky that removing them would cause something weird to fall over. There was a 1000 line switch statement for code generation involving string concatenation, which actually did lexical scoping to an extent (!!) with a bunch of semi-repeated cases. I tried to refactor this three times in three different ways without success. The only way forward was to rewrite the entire thing. Luckily the testing coverage on this stuff is absolutely massive, both with regression tests and the "emperor" random test case generator. The main change is that previously, in arm_neon.td a bunch of "Operation"s were defined with special names. NeonEmitter.cpp knew about these Operations and would emit code based on a huge switch. Actually this doesn't make much sense - the type information was held as strings, so type checking was impossible. Also TableGen's DAG type actually suits this sort of code generation very well (surprising that...) So now every operation is defined in terms of TableGen DAGs. There are a bunch of operators to use, including "op" (a generic unary or binary operator), "call" (to call other intrinsics) and "shuffle" (take a guess...). One of the main advantages of this apart from making it more obvious what is going on, is that we have proper type inference. This has two obvious advantages: 1) TableGen can error on bad intrinsic definitions easier, instead of just generating wrong code. 2) Calls to other intrinsics are typechecked too. So we no longer need to work out whether the thing we call needs to be the Q-lane version or the D-lane version - TableGen knows that itself! Here's an example: before: case OpAbdl: { std::string abd = MangleName("vabd", typestr, ClassS) + "(__a, __b)"; if (typestr[0] != 'U') { // vabd results are always unsigned and must be zero-extended. std::string utype = "U" + typestr.str(); s += "(" + TypeString(proto[0], typestr) + ")"; abd = "(" + TypeString('d', utype) + ")" + abd; s += Extend(utype, abd) + ";"; } else { s += Extend(typestr, abd) + ";"; } break; } after: def OP_ABDL : Op<(cast "R", (call "vmovl", (cast $p0, "U", (call "vabd", $p0, $p1))))>; As an example of what happens if you do something wrong now, here's what happens if you make $p0 unsigned before the call to "vabd" - that is, $p0 -> (cast "U", $p0): arm_neon.td:574:1: error: No compatible intrinsic found - looking up intrinsic 'vabd(uint8x8_t, int8x8_t)' Available overloads: - float64x2_t vabdq_v(float64x2_t, float64x2_t) - float64x1_t vabd_v(float64x1_t, float64x1_t) - float64_t vabdd_f64(float64_t, float64_t) - float32_t vabds_f32(float32_t, float32_t) ... snip ... This makes it seriously easy to work out what you've done wrong in fairly nasty intrinsics. As part of this I've massively beefed up the documentation in arm_neon.td too. Things still to do / on the radar: - Testcase generation. This was implemented in the previous version and not in the new one, because - Autogenerated tests are not being run. The testcase in test/ differs from the autogenerated version. - There were a whole slew of special cases in the testcase generation that just felt (and looked) like hacks. If someone really feels strongly about this, I can try and reimplement it too. - Big endian. That's coming soon and should be a very small diff on top of this one. llvm-svn: 211101
* [C++11] Use 'nullptr'.Craig Topper2014-05-071-1/+1
| | | | llvm-svn: 208163
* ARM NEON: add _f16 support to a couple of vector-shuffling intrinsics.Tim Northover2014-02-251-5/+11
| | | | llvm-svn: 202137
* [AArch64] Change int64_t from 'long long int' to 'long int' for AArch64 target.Kevin Qin2014-02-241-3/+3
| | | | | | | | | | Most 64-bit targets define int64_t as long int, and AArch64 should make same definition to follow LP64 model. In GNU tool chain, int64_t is defined as long int for 64-bit target. So to get consistent with GNU, it's better Changing int64_t from 'long long int' to 'long int', otherwise clang will get different name mangling suffix compared with g++. llvm-svn: 202004
* AArch64: look up EmitAArch64Scalar support before calling.Tim Northover2014-02-191-4/+12
| | | | | | | | | | | | This fixes one immediate bug where an expression with side-effects could be emitted twice during a NEON call. It also prepares the way for folding CodeGen for many of the SISD intrinsics into a table, reducing code size and hopefully increasing performance eventually ("binary search + few switch cases" should be better than "lots of switch cases"). llvm-svn: 201667
* ARM & AArch64: move struct definition outside function.Tim Northover2014-02-191-5/+5
| | | | | | | | Apparently it's not True C++. rdar://problem/16035743 still. llvm-svn: 201663
* ARM NEON: use more flexible TableGen field for defs.Tim Northover2014-02-191-85/+64
| | | | | | | | | | | | | | | | | | | | We used to have special handling for isCrypto and isA64 bits in the NeonEmitter.cpp file (it knew the former was predicated on __ARM_FEATURE_CRYPTO and the latter on __aarch64__ and went through various contortions to make sure the correct intrinsics were emitted under the correct guard. This is ugly and has obvious scalability problems (e.g. vcvtX intrinsics are needed, which are ARMv8 only but available on both, yet another category). This patch moves the #if predicate into the arm_neon.td file directly and makes NeonEmitter.cpp agnostic about what goes in there. It also deduplicates arm_neon.td so that each desired intrinsic is mentioned in just one place (necessary because of the new mechanism for creating arm_neon.h). rdar://problem/16035743 llvm-svn: 201660
* ARM & AArch64: merge the semantic checking of NEON intrinsicsTim Northover2014-02-191-93/+50
| | | | | | | | | | | | | | | | | | | | | There are two kinds of automatically generated tests for NEON intrinsics, both of which can be merged without adversely affecting users. 1. We check that a valid kind of __builtin_neon_XYZ overload is requested (e.g. we're not asking for a float32x4_t version when it only accepts integers. Since the __builtin_neon_XYZ intrinsics should only be used in arm_neon.h, relaxing this test and permitting AArch64 types for AArch32 should not cause a problem. The extra arm_neon.h definitions should be #ifdefed out anyway. 2. We check that intrinsics which take immediates are actually given compile-time constants within range. Since all NEON intrinsics should be backwards compatible, these tests should be identical on AArch64 and AArch32 anyway. This patch, therefore, merges the separate AArch64 and 32-bit checks. rdar://problem/16035743 llvm-svn: 201659
* Whitespace cleanup (mostly stray tabs, a few not-quite-empty lines).Tim Northover2014-02-121-15/+15
| | | | llvm-svn: 201234
* ARM NEON: fix range checking on immediates.Tim Northover2014-02-121-0/+8
| | | | | | | | | | Previously, range checking on the __builtin_neon_XYZ_v Clang intrinsics didn't take account of the type actually passed to the call, which meant a request like "vext_s16(a, b, 7)" was allowed through (TableGen was conservative and allowed 0-7 for all types). This caused an assert in the backend because the lane doesn't make sense. llvm-svn: 201232
* [AArch64] Fixed vget/vset_lane_f16 implementationAna Pazos2014-02-101-17/+28
| | | | | | | | Replaced cast and vreinterepret operations with code to reinterpret bitwise the types float16_t and int16_t. llvm-svn: 201112
* ARM: implement support for crypto intrinsics in arm_neon.hTim Northover2014-02-031-3/+4
| | | | llvm-svn: 200708
* ARM & AArch64: share the BI__builtin_neon enum defs.Tim Northover2014-01-301-42/+9
| | | | llvm-svn: 200470
* For AArch64 Neon, fix intrinsics implementation using nested macros.Jiangning Liu2014-01-261-48/+78
| | | | llvm-svn: 200114
* [AArch64 NEON] Support poly128_t and implement relevant intrinsic.Kevin Qin2013-12-101-2/+40
| | | | llvm-svn: 196888
* Implemented vget/vset_lane_f16 intrinsicsAna Pazos2013-12-051-2/+33
| | | | llvm-svn: 196535
* [AArch64]Add missing floating point convert, round and misc intrinsics.Hao Liu2013-12-031-1/+2
| | | | | | E.g. int64x1_t vcvt_s64_f64(float64x1_t a) -> FCVTZS Dd, Dn llvm-svn: 196211
* revert r196152. Hao Liu2013-12-031-14/+8
| | | | | | | | | | | | | | This is a duplicate implementation. E.g. this patch defines: float64_t vabd_f64(float64_t a, float64_t b) But there is already a similar intrinsic "vabdd_f64" with the same types. Also, this intrinsic will be conflicted to the vector type intrinsic as following(Which is implemented by me and will be committed to trunk): float64x1_t vabd_f64(float64x1_t a, float64x1_t b). Two functions shouldn't have a same name in arm_neon.h. According to ARM ACLE document, such vabd_f64 with float64_t is not existing. So I revert this commit. llvm-svn: 196205
* Add some missing AArch64 Neon intrinsics like vmull_high_n_s16 and friends.Jiangning Liu2013-12-031-0/+48
| | | | llvm-svn: 196189
* [AArch64] Add missing NEON scalar floating-point to integer convert ACLEs.Chad Rosier2013-12-021-8/+14
| | | | llvm-svn: 196152
* AArch64: Two intrinsics are expected to return float64 not float32 in arm_neon.hHao Liu2013-11-291-3/+8
| | | | llvm-svn: 195943
* Fix the problem that the range check for scalar narrow shift is too wide.Hao Liu2013-11-291-2/+6
| | | | | | E.g. the immediate value of vshrns_n_s16 is [1,16], which should be [1,8]. llvm-svn: 195942
* Fix the AArch64 NEON bug exposed by checking constant integer argument range ↵Jiangning Liu2013-11-271-9/+16
| | | | | | of ACLE intrinsics. llvm-svn: 195844
* Remove a whole lot of unused variablesAlp Toker2013-11-271-2/+0
| | | | | | | There are about 30 removed in this patch, generated by a new FixIt I haven't got round to submitting yet. llvm-svn: 195814
* ARM: define & use __ARM_NEON on ARM32 (as per ACLE)Tim Northover2013-11-211-1/+1
| | | | | | | | | There seem to be quite a few references to the old macro __ARM_NEON__ on the internet, so I don't think it's a good idea to remove it entirely (at least yet), but the canonical name does not have the trailing underscores so we should use that ourselves. llvm-svn: 195353
* Implemented Neon scalar by element intrinsics.Ana Pazos2013-11-211-4/+58
| | | | | | | Intrinsics implemented: vqdmull_lane, vqdmulh_lane, vqrdmulh_lane, vqdmlal_lane, vqdmlsl_lane scalar Neon intrinsics. llvm-svn: 195326
* Add predicate for AArch64 crypto instructions.Jiangning Liu2013-11-191-0/+20
| | | | llvm-svn: 195069
* Clean up predefined macros for AArch64 to follow ACLE 2.0.Jiangning Liu2013-11-191-1/+1
| | | | llvm-svn: 195068
* Implement the newly added AArch64 ACLE functions for ld1/st1 with 2/3/4 vectors.Hao Liu2013-11-181-1/+15
| | | | | | The functions are like: vst1_s8_x2 ... llvm-svn: 194991
* Implemented aarch64 Neon scalar vmulx_lane intrinsicsAna Pazos2013-11-151-3/+86
| | | | | | | | | | | | | | Implemented aarch64 Neon scalar vfma_lane intrinsics Implemented aarch64 Neon scalar vfms_lane intrinsics Implemented legacy vmul_n_f64, vmul_lane_f64, vmul_laneq_f64 intrinsics (v1f64 parameter type) using Neon scalar instructions. Implemented legacy vfma_lane_f64, vfms_lane_f64, vfma_laneq_f64, vfms_laneq_f64 intrinsics (v1f64 parameter type) using Neon scalar instructions. llvm-svn: 194889
* [AArch64 neon] support poly64 and relevant intrinsic functions.Kevin Qin2013-11-141-10/+21
| | | | llvm-svn: 194660
* Implement aarch64 neon instruction class misc.Kevin Qin2013-11-141-1/+69
| | | | llvm-svn: 194657
* Implement AArch64 NEON instruction set AdvSIMD (table).Jiangning Liu2013-11-141-11/+28
| | | | llvm-svn: 194649
* [AArch64] Add support for NEON scalar floating-point convert to fixed-point ↵Chad Rosier2013-11-111-2/+6
| | | | | | instructions. llvm-svn: 194395
* Implement AArch64 Neon instruction set Perm.Jiangning Liu2013-11-061-0/+48
| | | | llvm-svn: 194124
* Implemented aarch64 neon intrinsic vcopy_lane with float type.Kevin Qin2013-11-051-5/+23
| | | | llvm-svn: 194042
* [AArch64] Add support for NEON scalar shift immediate instructions.Chad Rosier2013-10-311-0/+31
| | | | llvm-svn: 193791
* [AArch64] Add support for NEON scalar floating-point compare instructions.Chad Rosier2013-10-301-0/+2
| | | | llvm-svn: 193692
* [AArch64] Add support for NEON scalar extract narrow instructions.Chad Rosier2013-10-181-0/+6
| | | | llvm-svn: 192971
* Implemented aarch64 SIMD copy related ACLE intrinsic :Kevin Qin2013-10-111-5/+25
| | | | | | vget_lane, vset_lane, vcopy_lane, vcreate, vdup_n, vdup_lane, vmov_n. llvm-svn: 192411
* [AArch64] Add support for NEON scalar signed/unsigned integer to floating-pointChad Rosier2013-10-081-0/+7
| | | | | | convert instructions. llvm-svn: 192232
OpenPOWER on IntegriCloud