Since the introduction of Intel 4004 (the first commercial microprocessor) and even with today's multicore chips, there is and always will be a need for computers with greater processing power. For 30 years this was achieved by increasing the CPU clocks, however, because of numerous physical limitations in the fabrication process of integrated circuits, it was uneconomical to continue with this trend. Only recently the computing market shifted towards parallel software and hardware design in sough of performance increase. Modern graphics processing units (GPUs) are specialized circuits initially designed for the computer gaming market, their special architecture makes them more efficient than general-purpose CPUs for massively parallel algorithms. This project discusses the implementation of two basic image processing algorithms (convolution and normalized cross correlation) on the latest Nvidia architecture - Kepler. The chosen algorithms can represent several other image processing algorithms with similar memory access pattern and arithmetic complexity. In the project, using datasets from an industrial environment, we discus several implementations and optimization techniques, compare the achieved throughput to the card's theoretical throughput, identify bottlenecks and draw conclusions.