summaryrefslogtreecommitdiffstats
path: root/arch/x86/crypto/Makefile
Commit message (Collapse)AuthorAgeFilesLines
* crypto: serpent - add 4-way parallel i586/SSE2 assembler implementationJussi Kivilinna2011-11-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch adds i586/SSE2 assembler implementation of serpent cipher. Assembler functions crypt data in four block chunks. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios): Intel Atom N270: size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16 0.95x 1.12x 1.02x 1.07x 0.97x 0.98x 64 1.73x 1.82x 1.08x 1.82x 1.72x 1.73x 256 2.08x 2.00x 1.04x 2.07x 1.99x 2.01x 1024 2.28x 2.18x 1.05x 2.23x 2.17x 2.20x 8192 2.28x 2.13x 1.05x 2.23x 2.18x 2.20x Full output: http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-generic.txt http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-sse2.txt Userspace test results: Encryption/decryption of sse2-i586 vs generic on Intel Atom N270: encrypt: 2.35x decrypt: 2.54x Encryption/decryption of sse2-i586 vs generic on AMD Phenom II: encrypt: 1.82x decrypt: 2.51x Encryption/decryption of sse2-i586 vs generic on Intel Xeon E7330: encrypt: 2.99x decrypt: 3.48x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: serpent - add 8-way parallel x86_64/SSE2 assembler implementationJussi Kivilinna2011-11-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch adds x86_64/SSE2 assembler implementation of serpent cipher. Assembler functions crypt data in eigth block chunks (two 4 block chunk SSE2 operations in parallel to improve performance on out-of-order CPUs). Glue code is based on one from AES-NI implementation, so requests from irq context are redirected to cryptd. v2: - add missing include of linux/module.h (appearently crypto.h used to include module.h, which changed for 3.2 by commit 7c926402a7e8c9b279968fd94efec8700ba3859e) Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios): AMD Phenom II 1055T (fam:16, model:10): size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16B 1.03x 1.01x 1.03x 1.05x 1.00x 0.99x 64B 1.00x 1.01x 1.02x 1.04x 1.02x 1.01x 256B 2.34x 2.41x 0.99x 2.43x 2.39x 2.40x 1024B 2.51x 2.57x 1.00x 2.59x 2.56x 2.56x 8192B 2.50x 2.54x 1.00x 2.55x 2.57x 2.57x Intel Celeron T1600 (fam:6, model:15, step:13): size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16B 0.97x 0.97x 1.01x 1.01x 1.01x 1.02x 64B 1.00x 1.00x 1.00x 1.02x 1.01x 1.01x 256B 3.41x 3.35x 1.00x 3.39x 3.42x 3.44x 1024B 3.75x 3.72x 0.99x 3.74x 3.75x 3.75x 8192B 3.70x 3.68x 0.99x 3.68x 3.69x 3.69x Full output: http://koti.mbnet.fi/axh/kernel/crypto/phenom-ii-1055t/serpent-generic.txt http://koti.mbnet.fi/axh/kernel/crypto/phenom-ii-1055t/serpent-sse2.txt http://koti.mbnet.fi/axh/kernel/crypto/celeron-t1600/serpent-generic.txt http://koti.mbnet.fi/axh/kernel/crypto/celeron-t1600/serpent-sse2.txt Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: twofish - add 3-way parallel x86_64 assembler implementionJussi Kivilinna2011-10-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch adds 3-way parallel x86_64 assembly implementation of twofish as new module. New assembler functions crypt data in three blocks chunks, improving cipher performance on out-of-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Summary of the tcrypt benchmarks: Twofish 3-way-asm vs twofish asm (128bit 8kb block ECB) encrypt: 1.3x speed decrypt: 1.3x speed Twofish 3-way-asm vs twofish asm (128bit 8kb block CBC) encrypt: 1.07x speed decrypt: 1.4x speed Twofish 3-way-asm vs twofish asm (128bit 8kb block CTR) encrypt: 1.4x speed Twofish 3-way-asm vs AES asm (128bit 8kb block ECB) encrypt: 1.0x speed decrypt: 1.0x speed Twofish 3-way-asm vs AES asm (128bit 8kb block CBC) encrypt: 0.84x speed decrypt: 1.09x speed Twofish 3-way-asm vs AES asm (128bit 8kb block CTR) encrypt: 1.15x speed Full output: http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-3way-asm-x86_64.txt http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-asm-x86_64.txt http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-aes-asm-x86_64.txt Tests were run on: vendor_id : AuthenticAMD cpu family : 16 model : 10 model name : AMD Phenom(tm) II X6 1055T Processor Also userspace test were run on: vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping : 11 Userspace test results: Encryption/decryption of twofish 3-way vs x86_64-asm on AMD Phenom II: encrypt: 1.27x decrypt: 1.25x Encryption/decryption of twofish 3-way vs x86_64-asm on Intel Xeon E7330: encrypt: 1.36x decrypt: 1.36x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blowfish - add x86_64 assembly implementationJussi Kivilinna2011-09-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch adds x86_64 assembly implementation of blowfish. Two set of assembler functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'four-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 4-way functions should be equal to 1-way functions with in-order CPUs. Summary of the tcrypt benchmarks: Blowfish assembler vs blowfish C (256bit 8kb block ECB) encrypt: 2.2x speed decrypt: 2.3x speed Blowfish assembler vs blowfish C (256bit 8kb block CBC) encrypt: 1.12x speed decrypt: 2.5x speed Blowfish assembler vs blowfish C (256bit 8kb block CTR) encrypt: 2.5x speed Full output: http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-asm-x86_64.txt http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-c-x86_64.txt Tests were run on: vendor_id : AuthenticAMD cpu family : 16 model : 10 model name : AMD Phenom(tm) II X6 1055T Processor stepping : 0 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sha1 - SSSE3 based SHA1 implementation for x86-64Mathias Krause2011-08-101-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an assembler implementation of the SHA1 algorithm using the Supplemental SSE3 (SSSE3) instructions or, when available, the Advanced Vector Extensions (AVX). Testing with the tcrypt module shows the raw hash performance is up to 2.3 times faster than the C implementation, using 8k data blocks on a Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25% faster. Since this implementation uses SSE/YMM registers it cannot safely be used in every situation, e.g. while an IRQ interrupts a kernel thread. The implementation falls back to the generic SHA1 variant, if using the SSE/YMM registers is not possible. With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3 variant -- a speedup of +34.8%. Saving and restoring SSE/YMM state might make the actual throughput fluctuate when there are FPU intensive userland applications running. For example, meassuring the performance using iperf2 directly on the machine under test gives wobbling numbers because iperf2 uses the FPU for each packet to check if the reporting interval has expired (in the above test I got min/max/avg: 402/484/464 MBit/s). Using this algorithm on a IPsec gateway gives much more reasonable and stable numbers, albeit not as high as in the directly connected case. Here is the result from an RFC 2544 test run with a EXFO Packet Blazer FTB-8510: frame size sha1-generic sha1-ssse3 delta 64 byte 37.5 MBit/s 37.5 MBit/s 0.0% 128 byte 56.3 MBit/s 62.5 MBit/s +11.0% 256 byte 87.5 MBit/s 100.0 MBit/s +14.3% 512 byte 131.3 MBit/s 150.0 MBit/s +14.2% 1024 byte 162.5 MBit/s 193.8 MBit/s +19.3% 1280 byte 175.0 MBit/s 212.5 MBit/s +21.4% 1420 byte 175.0 MBit/s 218.7 MBit/s +25.0% 1518 byte 150.0 MBit/s 181.2 MBit/s +20.8% The throughput for the largest frame size is lower than for the previous size because the IP packets need to be fragmented in this case to make there way through the IPsec tunnel. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: aesni-intel - Merge with fpu.koAndy Lutomirski2011-05-161-3/+1
| | | | | | | | | | | | | | | Loading fpu without aesni-intel does nothing. Loading aesni-intel without fpu causes modes like xts to fail. (Unloading aesni-intel will restore those modes.) One solution would be to make aesni-intel depend on fpu, but it seems cleaner to just combine the modules. This is probably responsible for bugs like: https://bugzilla.redhat.com/show_bug.cgi?id=589390 Signed-off-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: ghash - Add PCLMULQDQ accelerated implementationHuang Ying2009-10-191-0/+3
| | | | | | | | | | | | | | | | PCLMULQDQ is used to accelerate the most time-consuming part of GHASH, carry-less multiplication. More information about PCLMULQDQ can be found at: http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/ Because PCLMULQDQ changes XMM state, its usage must be enclosed with kernel_fpu_begin/end, which can be used only in process context, the acceleration is implemented as crypto_ahash. That is, request in soft IRQ context will be defered to the cryptd kernel thread. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: fpu - Add template for blkcipher touching FPUHuang Ying2009-06-021-0/+2
| | | | | | | | | | | | Blkcipher touching FPU need to be enclosed by kernel_fpu_begin() and kernel_fpu_end(). If they are invoked in cipher algorithm implementation, they will be invoked for each block, so that performance will be hurt, because they are "slow" operations. This patch implements "fpu" template, which makes these operations to be invoked for each request. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platformHuang Ying2009-02-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. These instructions enable fast and secure data encryption and decryption, using the Advanced Encryption Standard (AES), defined by FIPS Publication number 197. The architecture introduces six instructions that offer full hardware support for AES. Four of them support high performance data encryption and decryption, and the other two instructions support the AES key expansion procedure. The white paper can be downloaded from: http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf AES may be used in soft_irq context, but MMX/SSE context can not be touched safely in soft_irq context. So in_interrupt() is checked, if in IRQ or soft_irq context, the general x86_64 implementation are used instead. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: crc32c - Use Intel CRC32 instructionAustin Zhang2008-08-291-0/+2
| | | | | | | | | | | | | From NHM processor onward, Intel processors can support hardware accelerated CRC32c algorithm with the new CRC32 instruction in SSE 4.2 instruction set. The patch detects the availability of the feature, and chooses the most proper way to calculate CRC32c checksum. Byte code instructions are used for compiler compatibility. No MMX / XMM registers is involved in the implementation. Signed-off-by: Austin Zhang <austin.zhang@intel.com> Signed-off-by: Kent Liu <kent.liu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* [CRYPTO] twofish: Merge common glue codeSebastian Siewior2008-01-141-2/+2
| | | | | | | There is almost no difference between 32 & 64 bit glue code. Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* [CRYPTO] salsa20: Add x86-64 assembly versionTan Swee Heng2008-01-111-0/+2
| | | | | | | | | | This is the x86-64 version of the Salsa20 stream cipher algorithm. The original assembly code came from <http://cr.yp.to/snuffle/salsa20/amd64-3/salsa20.s>. It has been reformatted for clarity. Signed-off-by: Tan Swee Heng <thesweeheng@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* [CRYPTO] salsa20_i586: Salsa20 stream cipher algorithm (i586 version)Tan Swee Heng2008-01-111-0/+2
| | | | | | | | | | | This patch contains the salsa20-i586 implementation. The original assembly code came from <http://cr.yp.to/snuffle/salsa20/x86-pm/salsa20.s>. I have reformatted it (added indents) so that it matches the other algorithms in arch/x86/crypto. Signed-off-by: Tan Swee Heng <thesweeheng@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* [CRYPTO] aes-asm: Merge common glue codeSebastian Siewior2008-01-111-2/+2
| | | | | | | | 32 bit and 64 bit glue code is using (now) the same piece code. This patch unifies them. Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* x86: merge arch/x86/crypto MakefilesThomas Gleixner2007-10-231-5/+15
| | | | | | Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86_64: move cryptoThomas Gleixner2007-10-111-1/+1
| | | | | Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* i386: move cryptoThomas Gleixner2007-10-111-0/+5
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
OpenPOWER on IntegriCloud