summaryrefslogtreecommitdiffstats
path: root/libclc/generic/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Implement acospi builtinTom Stellard2015-04-022-0/+173
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233925
* Implement fmax using __builtin_fmaxTom Stellard2015-03-312-4/+27
| | | | | | | | This ensures correct handling of NaNi. This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233713
* Implement fmin using __builtin_fminTom Stellard2015-03-312-4/+27
| | | | | | | | This ensures correct handling of NaN. This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233712
* Implement fast_distance builtinTom Stellard2015-03-233-0/+56
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 232978
* Implement fast_length builtinTom Stellard2015-03-232-0/+61
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 232977
* Implement half_sqrt builtin v2Tom Stellard2015-03-233-0/+56
| | | | | | | | | | This is a generic implementation which just calls sqrt. Targets should override this if they want a faster implementation. v2: - Alphabetize SOURCES llvm-svn: 232965
* Implement distance builtin v2Tom Stellard2015-03-233-0/+56
| | | | | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Remove unnecessary copyright. llvm-svn: 232964
* Fix implementation of length builtin v2Tom Stellard2015-03-232-6/+82
| | | | | | | | v2: - Move common code into a macro - Use the same constant for all vector types. llvm-svn: 232963
* Add __clc_ prefix to functions in sincos_helpers.clTom Stellard2015-03-234-28/+24
| | | | | | | This will help avoid naming conflicts with functions defined in kernels linking with libclc. llvm-svn: 232960
* math: Implement erfcAaron Watry2015-03-182-0/+414
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 232674
* Fix bitselect for float/double types v2Tom Stellard2015-03-053-0/+79
| | | | | | | | | | | | | | We need to reinterpret float/double types as uint/ulong in order to perform the bitwise operations. This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Use vector operations rather than splitting vectors into scalar components. Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 231373
* Move mix from math to commonAaron Watry2015-03-033-1/+1
| | | | | | | | It has been part of the common functions since 1.0 Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 231137
* Implement step builtinTom Stellard2015-03-023-0/+78
| | | | | | This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 230970
* Implement smoothstep builtin v2Tom Stellard2015-03-023-0/+79
| | | | | | | | | This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Fix typo in smoothstep.h llvm-svn: 230969
* Implement radians builtin v2Tom Stellard2015-03-022-0/+46
| | | | | | | | | This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Move to the common/ directory llvm-svn: 230968
* Implement degrees builtin v2Tom Stellard2015-03-022-0/+46
| | | | | | | | | This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Move to the common/ directory llvm-svn: 230967
* libclc/math: Add cospiAaron Watry2015-02-264-0/+271
| | | | | | | | | | | | | | | | | | | Ported from the libclc/amd-builtins branch v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4 Add cospi(double) implementation instead of using llvm.cos Notes: The sincosD_piby4.h file is mostly the same as the builtin implementation released by AMD. The inline attribute declaration is changed, and M_PI is used instead of a constant double. Otherwise, the only difference is that the header explicitly enables the fp64 pragma. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk> CC: Tom Stellard <tom@stellard.net> CC: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 230641
* Implement log10Jan Vesely2015-01-303-0/+22
| | | | | | | | v2: Use constant and multiplication instead of division v3: Use hex constants Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 227585
* Implement log1p builtinTom Stellard2014-10-075-0/+621
| | | | llvm-svn: 219230
* Implement fmodJan Vesely2014-10-052-0/+13
| | | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 219087
* Implement async_work_group_copy builtin v3Tom Stellard2014-10-033-0/+27
| | | | | | | | | | | | | This is a simple implementation which just copies data synchronously. v2: - Use size_t. v3: - Fix possible race condition by splitting the copy among multiple work items. llvm-svn: 219008
* Implement async_work_group_strided_copy builtin v2Tom Stellard2014-10-033-0/+44
| | | | | | | | | This is a simple implementation which just copies data synchronously. v2: - Use size_t. llvm-svn: 219007
* Implement wait_group_events builtin v2Tom Stellard2014-10-032-0/+6
| | | | | | | | | This is a simple default implemetation which just calls barrier(). v2: - Only call barrier() once. llvm-svn: 219006
* atomic: Add generic atom[ic]_cmpxchgAaron Watry2014-09-164-0/+34
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217918
* atomic: Implement generic atom[ic]_xchgAaron Watry2014-09-165-0/+42
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217917
* atomic: Add generic atomic_min implementationAaron Watry2014-09-164-0/+44
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217916
* atomic: Add generic atom[ic]_xorAaron Watry2014-09-164-0/+33
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217915
* atomic: Add atom[ic]_orAaron Watry2014-09-164-0/+31
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217914
* atomics: Add generic atom[ic]_andAaron Watry2014-09-164-0/+32
| | | | | | | | Not used yet. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217913
* atomic: Add generic implementation of atom[ic]_maxAaron Watry2014-09-164-0/+44
| | | | | | | | | | Not used yet... v2: Correct int/uint behavior Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217912
* atomic: define extension functions for existing atomic implementationsAaron Watry2014-09-165-0/+40
| | | | | | | | We were missing the local versions of the atom_* before Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217911
* math: Add tan implementationAaron Watry2014-09-103-0/+17
| | | | | | | | | | | | | | | | | Uses the algorithm: tan(x) = sin(x) / sqrt(1-sin^2(x)) An alternative is: tan(x) = sin(x) / cos(x) Which produces more verbose bitcode and longer assembly. Either way, the generated bitcode seems pretty nasty and a more optimized but still precise-enough solution is welcome. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 217511
* math: Add asin implementationAaron Watry2014-09-103-0/+12
| | | | | | | | | | | | | | asin(x) = atan2(x, sqrt( 1-x^2 )) alternatively: asin(x) = PI/2 - acos(x) Use the atan2 implementation since it produces slightly shorter bitcode and R600 machine code. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 217510
* math: Add acos implementationAaron Watry2014-09-103-0/+30
| | | | | | | | | | Passes the tests that were submitted to the piglit list Tested on R600 (Pitcairn) Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 217509
* add isordered builtinJan Vesely2014-09-052-0/+24
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217247
* add isunordered builtinJan Vesely2014-09-052-0/+23
| | | | | | | | v2: remove trailing newline Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217246
* add islessgreater builtinJan Vesely2014-09-052-0/+23
| | | | | | | | v2: remove trailing newline Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217245
* add isnormal builtinJan Vesely2014-09-052-0/+19
| | | | | | | | | v2: simplify and remove isnan leftovers remove trailing newline Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217244
* add isfinite builtinJan Vesely2014-09-052-0/+19
| | | | | | | | | v2: simplify and remove isinf leftovers remove trailing newline Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217243
* Implement isinf builtinTom Stellard2014-09-032-0/+19
| | | | llvm-svn: 217046
* Fix implementation of copysignTom Stellard2014-09-032-0/+13
| | | | | | | | | This was previously implemented with a macro and we were using __builtin_copysign(), which takes double inputs for the float version of copysign(). Reviewed-and-Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217045
* Implement generic mad_satJan Vesely2014-09-023-0/+95
| | | | | | | | | | | | v2: Fix trailing whitespace Fix signed long overflow improve comment v3: fix typo Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 216923
* Revert "Implement generic mad_sat"Aaron Watry2014-08-233-95/+0
| | | | | | | | This reverts commit cf62eded8b623a1c10d3692d25e5882b7939f564. I didn't mean to commit this... Jan has a v3 incoming llvm-svn: 216322
* Implement generic mad_satAaron Watry2014-08-233-0/+95
| | | | | | | | | v2: Fix trailing whitespace Fix signed long overflow improve comment Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu> llvm-svn: 216320
* Implement prefetch builtinTom Stellard2014-08-203-0/+11
| | | | | | | The default implementation is a no-op. Targets should override this with their own implementations. llvm-svn: 216127
* vload/vstore: Use casts instead of scalarizing everything in CLC versionAaron Watry2014-08-205-188/+21
| | | | | | | | | | | | | | | This generates bitcode which is indistinguishable from what was hand-written for int32 types in v[load|store]_impl.ll. v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom) v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll v2: (Per Matt Arsenault) Fix alignment issues with vector load stores Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> CC: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 216069
* relational: Add islessequal(floatN) builtinJan Vesely2014-08-012-0/+23
| | | | | | | | v2: remove the initial undef Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 214568
* relational: Add isless(floatN) builtinJan Vesely2014-08-012-0/+23
| | | | | | | | v2: remove the initial undef Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 214567
* Implement sin builtin for float typesTom Stellard2014-07-232-0/+71
| | | | | | This double version still uses @llvm.sin. llvm-svn: 213762
* Implement cos builtin for float typesTom Stellard2014-07-234-0/+402
| | | | | | The double version still uses @llvm.cos. llvm-svn: 213761
OpenPOWER on IntegriCloud