- Year - 2017
Crying is one of the major means of infants to communicate with their surroundings, and is intended to point out any distress and attachment needs to their caregivers. Automatic detection of a baby cry in audio signals can be used for various purposes – from every-day applications to academic research. In this project we have developed a deep-learning based algorithm for automatic detection of baby cry in domestic audio recordings. The algorithm, based on a convolutional neural network (CNN), operates on log Mel-filter bank representation of the recordings. In order to give a better understanding of the advantages of CNN in identifying cry segments, we have also compared the results to a simple -NN algorithm, and a simple Artificial Neural Network (ANN) algorithm. The CNN based classifier yield considerably better results, producing 89% success in identifying baby cry segments.
This study deals with the comparison of the ionospheric delay assessment by the GPS (Global Navigation System) and EGNOS (European Geostationary Navigation Overlay Service) systems.
Ionospheric delay is a known phenomenon in the world of satellite navigation, and it describes the excessive delay of the RF signal when it is passed through that medium. This issue has many solutions.
In this study, I will implement the solutions of the GPS and the EGNOS systems to the ionospheric delay problem and compare them to the values measured at the UPC Analysis facility.
Doing computer work with an incorrect posture can lead to headaches, neck pains and more. In this project, we propose a system that assists the user to sit at the computer correctly. To do so, we developed a real-time system that analyzes the sitting position of the user and alerts when it might be harmful. The system classifies the user's posture and informs the user how to correct it, if necessary. For this analysis, the system uses Microsoft Kinect-2TM sensor’s body tracking capabilities, which provides data such as location and orientation of joints like head and shoulders. Using this data, the system can identify a wide variety of features consistent with wrong posture such as wrong head neck angle.
The system was tested on a group of 6 participants showing good identification abilities along with no false alarms.
The need for underwater wireless communications exists in many applications such as unmanned underwater vehicles, speech transmission between divers, defense and collection of data recorded at ocean-bottom stations.
The project goal is to implement an underwater acoustic channel estimator based on sparsity assumption. We compare the performance of this estimator to the performance of other channel estimators in terms of estimation error, error robustness, bitrate and computational complexity.
Underwater acoustic channels pose grand challenges for effective communications due to their multipath, significant Doppler shifts and rapidly changing channel. Underwater acoustic channels are characterized by a small number of paths. Each path is characterized by Doppler shift, delay and amplitude attenuation. This kind of channel has a sparse transfer function and we want to exploit this character.
The existing solution uses the least squares method. The disadvantages of this method are: the addition of delays that doesn't exist, no differentiation between close paths, Doppler compensation that does no differentiate between paths and the sparsity characteristic is not used. All those factors led us to investigate and explore other methods.
We analyze the performance of three channel estimators: the least squares one tap estimator versus two sparse estimation methods. Both methods are based on the orthogonal matching pursuit algorithm and its extension to narrow search grids. We then suggest a modification on the latter by adding a second step, where the channel attenuations are calculated by performing iterations of the least squares algorithm to reduce the inter-carrier-Interference effects. The performance of all four algorithms is tested in simulation. Comparison to the LS and OMP algorithms indicates an improvement in terms of mean squared error and bit error rate.
Data compression denotes the task of representing information in a compact way
so it can be stored and transmitted e ciently. In the case of lossy-compression,
this process may discard some information so that the reconstructed data is similar
enough to the original one, by compromising between accuracy and le size. Image
compression has a huge importance in a world where image resolution capabilities
of digital devices are constantly growing. Therefore , e cient and practical image
compression algorithms are of great concern where the goal is to produce higher
quality images with smaller le sizes.
JPEG2000 is currently the gold standard for image compression and just as its
former version, JPEG, is based on a sparsifying transformation (analytical dictio-
nary), by which the data is approximated by a few coe cients. On the other hand,
the K-SVD algorithm is a method that trains a dictionary to sparsely represent
real image examples. Since looking for a sparsifying transform is the core of any
popular compression algorithm, this concept holds considerable potential for data
In last past years many eye tracking technologies were developed. This features letting disable people do their daily tasks and communicate with the surroundings, find disease like autism in little babies that cannot communicate verbally yet, improve sportsman achievements and even conducting advertisement research on people’s point of interest at specific advertisement.
The only product available today in the market is eye detecting cameras. This camera must be physically in front of the person using it. This complicate the daily use of this products and making hard to track the eye movement.
Therefor we try at this project to detect the eye movement only by the EOG signals. This way by only wearing minimal electrode on the head the user can truck his eyes without any other physically device.
EEG sensors are used to capture electric waves from the human brain. One of its main purposes is to sample signal from Epilepsy patients, to sample their seizures. Nowadays, seizures detection done manually by professional physicians, which examine EEG signals and recognize when the seizures accrued. There is a need to build automated algorithms to identify seizures. Accurate evaluation, pre-surgery assessments, and emergency alerts for medical aid all depend on the detection of the onset of seizures.
The project's main goal is to classify time segments, by processing EEG signals, into two groups: Epileptic seizures and Non-epileptic.
We are using kernel based geometric methods, based on Diffusion Maps and Alternating Diffusion Maps and their improvements. This document describes all the methods we used.
The Improved Alternating Diffusion method yielded the best results.
Our project idea came from a group of students at the Technion with visual impairment. They described a problem where blind people have great difficulty to pass safely pedestrian crossings. A possible solution to this problem is to use a smartphone to take images of the pedestrian crossing area and then, using a sophisticated image processing algorithm, to get an output whether it is safe to pass the pedestrian crossing.
Our main target of this project was to implement an algorithm which will receive as input a color image and his output will be the information if there is a traffic light, in case that there will be a traffic light the algorithm would identify if the traffic light is green or red.
In this project, we investigated and implemented such an algorithm using MATLAB. We implemented the algorithm with machine learning boosting methods. To train our system, we have created a database of images that contain about 300 pedestrian traffic lights. We tested our algorithm with image from the database in a leave-one-out approach.
After improving our algorithm we got the next results:
81% for identifying green traffic light.
92% for identifying red traffic light.
The project deals with automatic segmentation of the hippocampus from MRI scans of mice brains. The hippocampus is an organ inside the brain, which has roles in short-term memory, spatial memory, and even in navigability, and therefore it has great significance. Nonetheless, manual segmentation of the hippocampus can take long time; therefore, we need an algorithm that will make the segmentation of the hippocampus automatically.
In the project, some methods of segmentation examined. The main difficulty caused by the fact that the boundaries of the hippocampus are not clearly visible in the scans images.
The selected solution is atlas-based segmentation. This solution includes a registration between already-segmented sets of images, and a new set of images. The solution based on the following assumptions- the shape of the brain and its organs, the textures and the boundaries are similar between the scans. The registration performed by MIND algorithm.
The result of the automatic segmentation that received by preforming the algorithm, examined in the cases of different resolution scans and in the case of different mice scans.
Classification methods, usually based on a definition of distance between two points. When representing the information we want to classify as covariance matrices, using the definition of the Riemannian distance instead of the Euclidean, produce higher success rates.
When classifying these EEG signals from the brain, even when using Riemann geometry, it is still difficult to classify new information that comes from someone we do not have prior information about him.
Using calibration methods, as well as innovative lowering dimensions methods, the results are significantly improve, and success rates are quite high.
The project goal was to develop an algorithm that will detect a cardiac arrest and to develop a device that will use this algorithm in real time to alert about occurrence of cardiac arrest in case it happens.
We used the work done in the previous project and in our work we took the tested algorithm, tested it on other databases and examined an options for its improvements. Those improvements were related to some changes in several parameters in the algorithm and in the possibility of the accelerometer's integration.
During the work, we managed to improve the algorithm and examined those improvements. Eventually we came up to 95% of real time alerts. With those results, we can use the device to help heart patient in case of a cardiac arrests to get a quick help.
In the USA alone, there are 500,000 below the elbow amputees. The causes vary from trauma, disease, or congenital conditions. Since this condition has a significant impact on the daily functionality of these people, the use of a hand prosthesis can assist in regaining the lost functionality. The current solutions are far from perfect, due to both the price and functionality range. Current hand prostheses range from mechanical prostheses, allowing only a single movement and costing several thousand dollars, to myoelectric prostheses, a prosthetic hand powered by Electromyography (EMG) which allows several movement types but with a price range of 20,000$ and up. These high costs mean that many people around the world, especially in the developing world, cannot afford such solutions. This is especially problematic for children amputees since new prostheses are needed as the child grows, which can accumulate to very high costs. Current solutions for the problem are mostly provided by non-profit organizations such as e-NABLE, which offers free online blueprints for printing single action prostheses in 3D printers, costing only 50$. However, these prostheses are limited in their functionality and are capable of performing only a single movement (closing of the hand).
This project handles image processing for the underwater microscope. The purpose of the microscope is to take pictures of planktons – small creatures that lives underwater. Two main problems arose:
The microscope takes many pictures each second. As a result, in many pictures there are no planktons at all, or they’re size is negligible to the size of the background. In addition, the setup of the microscope caused the images to be with non-uniform illumination.
In order to make the pro-processing easier for marine biologists, there is a need to classify the images according to the plankton’s specie.
This project contains solutions for both of these problems. For the first one, we’ve managed to create an algorithm which identifies “interesting” parts in an image (not necessarily planktons), and gets good results even for pictures that contain more than one plankton. For the second problem we’ve created an algorithm that, in an unsupervised learning method, achieves 55% success in classification.
Classification methods, usually based on a definition of distance between two points. When representing the information we want to classify as covariance matrices, using the definition of the Riemannian distance instead of the Euclidean, produces higher success rates.
When classifying these EEG signals from the brain, even when using Riemann geometry, it is still difficult to classify new information that comes from someone we do not have prior information about.
Using calibration methods, as well as innovative lowering dimensions methods, the results are significantly improved, and success rates are quite high.
The Telecommunication branch of the IDF adverted the SIPL laboratory with a mission to improve the quality of their video feed originated from security cameras purposed to locate suspicious activities. SIPL was asked to find a solution enabling to detect hidden information or improve existing one in the video and provide new information about the security footage for the user. The solution proposed is the use of SR (super resolution) algorithms. Super resolution algorithms utilize various sources of information such as the characteristics of the camera, information existing in other frames, previous knowledge of the acquired scene, external database, which are exploited for enhancing the video’s resolution. At the beginning of the project, we had to characterize the requirements for the implementation of our SR algorithm on the given data. We had to take into account that we do not have prior information on the camera or the scene. The algorithm needs to be efficient in terms of processing time and memory usage as there is need for immediate transfer of information received by the security cameras (sensitive information that may contain threats).
Infra-red (IR) image colorization has always been a challenging goal. Reducing human error and speeding up reaction time are just some of the benefits achieved by this process. However, essential differences exist between IR, which are temperature dependent sensors, and regular color visible light ones. These differences cause difficulties when trying to use color images in the rendering process. In this paper we present an implementation of a novel method for automatically coloring IR images. The method uses a reference (source) color image which is selected from a database by a texture-descriptor algorithm, searching for resemblance to the IR (target) image. Next, by dividing the images into main texture segments and assigning local characteristics, a best matching color pixel is found per IR pixel. As opposed to other methods, the color pixels are clustered in every segment to form a palette, and not randomly selected. This process expresses global as well as local features of each pixel and causes the transferred color to appear more accurate. Results show that our method produced more natural looking images than achieved heretofore.
The project used data that consisted of EMG recordings from both healthy and post-CVA individuals. The subjects were asked to extend their dominant hand towards different directions in front of them, and the goal was to find parameters that could be used to distinguish between the groups.
The project worked under the assumption that everyday actions (such as walking or gesturing) are constructed from basic muscle activation patterns (called synergies).
These synergies were extracted from the data using matrix factorization techniques, and a standard, “healthy profile” set, was created (based on the data from the healthy group).
It was then used to find dissimilarities between the healthy and post-CVA individuals, and within the healthy group itself.
Eventually, the main parameters that were found were in context with the directional bias of each synergy, and the average activation power of each synergy itself.
Part 2 of project 2970-1-15
This study deals with the comparison of the ionospheric delay assessment by the GPS (Global Navigation System) and EGNOS (European Geostationary Navigation Overlay Service) systems.
Ionospheric delay is a known phenomenon in the world of satellite navigation, and it describes the excessive delay of the RF signal when it is passed through that medium. This issue has many solutions.
In this study, I will implement the solutions of the GPS and the EGNOS systems to the ionospheric delay problem and compare them to the values measured at the UPC Analysis facility.
This project deals with the detection of anomalies in seabed surveys collected using multibeam sonar. Detecting anomalies is a common task in the world of signal processing in general and in image processing in particular. Therefore, there are established methods and methodologies for dealing with such problems in these worlds.
As oppose to typical data in these fields, information collected by the multibeam sonar presents various problems that are unique - sampling on irregular grids, variability of the sampling nature depending on the position of the boat and the depth of the soil and areas lacking samples. The presence of these unique problems does not enable immediate implementation of existing methods.
In this project, we reviewed solutions from different fields and generalized methods for identifying anomalies and improving the data collected by the sonar. Our solution proposes an innovative method for detecting "holes", areas lacking in samples, that combines methods from the realm of cosmology into signal processing. Then, interpolating the missing data using a novel multiscale iterative interpolation method. Finally, we Implemented an algorithm we developed for anomalies detection based on local sparse characteristics.
- Year - 2016
We describe a system that delivers a website address to a cellular phone by encoding inaudible binary data in an analogue audio signal, which is received by the microphone of the cellular phone. This is an alternative to encoding a web site address in a QR code label, which is scanned by the cellular phones camera.
Data embedding in the audio signal is done by modifying the phase of the signal's modulated complex lapped transform (MCLT) coefficients, while the perceived quality of the embedded audio signal remains the same as that of the original audio signal.
A whole system was implemented and tested both in simulation and in reality. The data rate achieved in reality at a distance of 1.5 meters was 48 bits per second. This rate is sufficient to deliver the same amount of information contained in a QR code label faster and easier than by camera scan.
The aim of this project is to estimate blood pressure using PPG signal extracted from video captured using a smartphone camera. In part A of the project we will develop a technique for estimating blood pressure using noise removal and feature extraction in order to make the PPG signal more reliable. In a possible part B of this project, we will write an Android application that implements the algorithm developed in part A.
A relevant paper: Noise Cleaning and Gaussian Modeling of Smart Phone Photoplethysmogram to Improve Blood Pressure Estimation
Wireless communication between underwater vehicles such as side scan sonar (SSS) and its operator is crucial for perceiving correct and updated intelligence understanding of the seabed. This has many military applications such as underwater mine discovery, and civilian applications such as seabed texture analysis.
SSS images usually contain high resolution data, and have high frequency content. Hence, aren’t compressed well by simple compression schemes such as JPEG.
The Goal of this project is finding a compression algorithm that on the one hand manages to compress SSS with high compression ratios and low complexity, and on the other, preserve the images' features that has intelligence value.
Few compression schemes that specialize at high resolution image compression were examined, implemented and tested through the project. The best algorithm found, was compared to the JPEG 2000 standard as reference, by having subjective quality assessment tests, and comparing quality factors such as PSNR and SSIM.
Communication between underwater vehicles and their operator is crucial for underwater communication. There are few underwater autonomic vehicles that manage to explore the seabed efficiently; one of them is the side scan sonar (SSS). SSS creates sonar mapping of the seabed and provides understanding of the differences in material and texture type of the seabed.
Data provided by the SSS, present large uniform areas disrupted by rocks, shipwrecks or pipelines and is inherently noisy. Therefore we must find effective ways for transmitting high resolution data should be found, to withstand the system’s limitation of bit rate and bandwidth.
Figure 2 - SSS image patch
The goal of the project is designing an efficient compression scheme for SSS images that preserve the image's important details, achieves high compression ratios and have low complexity.
Classic image compression algorithms such as JPEG don’t work well on SSS images for its high resolution data, and high frequency content.
The approach we chose was wavelet based compression. Wavelets are a family of functions, obtained from a prototype function by scaling and translating.
Wavelet representation allows fine frequency analysis and good localization in time. Wavelets are useful multi resolution signal analysis tool. They enable studying different resolution layers of the image and help de-noising some of the speckles of SSS images.
Figure 3 – Wavelet Decomposition
We implemented in matlab several wavelet based compression schemes. The first one is based on sparse representation of the wavelet coefficients and Huffman coding:implemented and tested through the project. The best algorithm found, was compared to the JPEG 2000 standard as reference, by having subjective quality assessment tests, and comparing quality factors such as PSNR and SSIM.
The project deals with segmentation and classification of florescent samples of lunge cells captured in a hyperspectral microscope (FRET). The output of such a measurement could serve the search for thorough understanding of intercellular biological process and malignant diseases.
Analyzing the samples that were supplied is a challenging task, mainly because the cells in the samples were very dense and sometimes overlapped.
Segmentation and tracking of the cells was accomplished, almost fully automated (only in the first time lapse user supervision is needed). The method we use combines techniques from various fields in computer vision in order to achieve a robust and versatile segmentation tool.
Recently, Intel has released a Perceptual Computing SDK. This software kit uses a depth camera from Creative (similar to Kinect) and enables development of advanced applications which enrich user interaction. This kit provides many abilities, e.g. face and speech recognition, tracking, hand pose and gestures recognition.
In this project we will investigate the Perceptual Computing SDK from Intel and the depth camera from Creative and use them to create a tool for DJs. The tool will enable a user to control music at real-time using hand gestures and other perceptual means.
The Following project addresses the action of tuning a classical guitar.
This procedure, that is done daily by any guitarist, can proof tedious for the seasoned guitarist and difficult for the novice.
The project handles this issue as a technical task and, as such, aims to automate nearly all stages of the process
Tuning a guitar is a task that requires technical know-how and skill that beginner guitarists often lack.
The tuning process is seen as complicated and above their capabilities. Automation of the tuning procedure can help beginners pass the hurdle and focus on getting familiar with the instrument.
By crafting and building an adequate design it is possible to reach a solution to this problem that does not require any technical skill from the user. This design can tune the guitar in a way that is more sensitive than the human ear.
When a person is in a distress situation, there are signs which are reflected in his speech or in the audio of its surroundings.
The project deals with speaker-independent distress detection in speech of a single speaker.
The solution involves the extraction of relevant features from the speech signal and the comparison between the different methods of extraction.
A distress situation is defined by the discrete emotions of anger and fear.
The project concludes with the classification of distress in speech with 91% accuracy, using the Berlin Emotional Speech Database in the German language.
We describes here a JPEG based compression scheme, adjusted specific for the underwater acoustical channel.
The project goal was to deal with common bit error rates for the underwater acoustic channel, .
Tow measures were used to quantify different solutions: compression ratio, which we tried to minimize, and SSIM, a measure that describes the quality of the image, which we tried to maximize.
The scheme described in this report was inspired by initial research conducted by RAFAEL, common schemes to deal with errors in images, and intensive acquaintance with the international JPEG compression standard. The implemented scheme was tested on a variety of images and the results are described in detail in this report.
In this project we deal with fast HEVC encoding algorithms suited for parallel coding using the GPU. In the recent years the HEVC coding format has become popular and is expected to replace the current used format-H.264. The HEVC format is expected to reach better data compression ability without deteriorating the coding quality.
Therefore, many applications and implementations involving the HEVC format are being issued and many have even offered methods for improvements.
The project is conducted in cooperation with Harmonic and it should be mentioned that our work relies on previous project that was conducted in the SIPL lab and dealt with fast Quad-Tree partition algorithm.
After a literary survey preformed on the topic of fast HEVC encoding algorithms, we chose to focus on the work of R.Fan et al.; called Similarity Based Decision (SBD). Moreover, an implementing of this work is available so we were able to compare the performance of the algorithm.
In our work, we proposed and implemented some methods for parallelization of the SBD algorithm. We reached performance that is not worst then the “serial” implementation of the SBD and even better in some cases.
Following our work, an article dealing with the parallel process of Coding Unit size selection was issued.
The goal of the project is to develop an automatic algorithm for identification and distinction between different right whale items, relying on aerial photographs taken throughout a decade. The project was suggested as a challenge by Kaggle community, which promotes competitions of data processing and learning systems.
The database provided by Kaggle included 4544 labeled photos and 6925 unlabeled photos of 447 right whale items. In the project, we learned and implemented deep learning with convolutional neural networks, which turned out to be the state-of-the-art solution to the Right Whales challenge.
Point clouds are discrete sets of points describing a hyper-surface in a certain dimension. The particular case of refers to real objects and surfaces such as a table, a chair or a part of landscape, where the coordinates are the familiar (X, Y, Z) spatial coordinates.
Point cloud registration is an important task in the field of computer vision. Its goal is to align under one coordinate system data sets which describe the same hyper-surface sampled from different directions, distances and visual conditions.
In general, this task is divided to two parts:
1) Find for each point in a source point cloud (have to be aligned) her matching point in the target cloud. This is the correspondence problem.
2) Calculate the best transformation to align the source cloud with the target cloud according the correspondences (consider a cost e.g. MSE), and applying it on the source cloud.
Our goal is to compare state of the art methods for registration between two 3-D point clouds as well as suggesting new algorithms and improvements. We focus on isolated and rigid objects. Thus, deformation is not handled and only translation and rotation transformations are allowed (we only deal with 6 DOFs – degrees of freedom which include rotations and translations with respect to X, Y, Z spatial axes).
Different registration methods, one from the literature and the other based on our original descriptor, were implemented in MATLAB. In addition, different correspondences filtering methods were tested for each algorithm.
In this project we implemented a version of grep, the classic UNIX text searching application, that runs on GPUs and utilizes direct access to the file system using GPUfs. The goals of the project was to compare the performance of using the GPU as a co-processor and giving the GPU control over file system access, especially in I/O heavy applications. We wanted to compare these models both in performance and ease of use for the programmer.
In the last few years GPUs have begun to be increasingly for general computing purposes. This shift has introduced numerous developments in the way programming is done extending, and in some cases revolutionizing, the co-processor model. These approaches seek to move more responsibility to the GPU from the CPU, for instance our project has looked at letting the GPU handle I/O directly instead of moving data manually from the CPU. This promotes ease of use for the programmer and allows to move data between storage devices and the GPU more efficiently.
The project's goal is successfully create a system which measures the human pulse from an audio signal (classification to high or low & regression).
A database named Munich Biovoice Corpus (MBC) of recordings labeled with pulse and other recordings made by us will be in use.
The project constitutes 2 stages: In the first stage of the project, the system that was built tried to perform the classification or regression using SVM. In the second stage, an algorithm called "Diffusion Maps" was implemented which tries to reduce the dimensionality of the recordings (extracting their main features).
Diarization problem is well known problem in the world of speech recognition and speech processing.
Our project goal is Speaker diarization in recorded conversation. We try a new approach for solving this problem, using dimension reduction algorithm (LLE).
The results are compared to a famous method for solving this problem, using Bottom-Up algorithm.
We tested our method on merged TIMIT files, and recordings we recorded by ourselves.
Along with the fast development of the world of visual media, arises the need for developing tools for compression of the raw data. However, in some cases the compressed data is subject to unwanted artifacts. As a way of controlling the compression process, we wish to use an objective quality assessment metric that will give us information on the quality of the compressed data. This kind of metric can be applied to a large amount of data and give us results in real time. In this work, we test the performance of the DSS video quality assessment (VQA) algorithm in an encoding and decoding environment of video streams (Transcoder). We compare this algorithm to two other methods in order to check its suitability to this environment, and make adjustments to it if necessary.
Image & Video quality assessment becomes increasingly important due to the many applications of video where the end user is a human.
Therefore, it is desirable to develop a visual quality metric that correlates well with human visual perception.
This paper presents an automatic full-reference image quality assessment technique based on DCT Sub-bands Similarity (DSS).
The proposed technique exploits important characteristics of human visual perception by working in sub-bands in the Discrete Cosine Transform (DCT) domain and weighting the results for these sub-bands. By careful temporal pooling, DSS can also be used for video quality assessment.
This project task was to tune and adjust the algorithm in order to get better results.
Developmental disorders are a group of neurological conditions originating at childhood, that involve serious impairments in various areas (language, learning, motor skills). These conditions also comprise Autism Spectrum Disorders. As of 2008, approximately 15% of children in the United States have been diagnosed with some sort of developmental disorder, is comparison to only 12.8% in 1997 . Early detection of developmental disorders is crucial, as it enables early intervention (e.g. speech therapist, occupational therapy), which may reduce neurological and functional deficits in infants.In this project we have developed a tool for an early identification of developmental disorders in infants. The tool exploits the correlation between acoustical features of an infant’s cry (e.g. pitch and formants) and the risk of having developmental disorders. We have used digital signal processing tools to characterize the input cry signals, and a k-NN based machine learning system to estimate the infant’s risk of having a developmental disorder. The tool has been tested against a database of diagnosed infants, and produced 85% success in estimating developmental disorders in infants in cross-validation testing.
This project involves methods of background modeling in video for the purpose of segmentation between foreground (e.g. people, moving cars etc.) and background (e.g. sidewalks or roads but also more challenging cases such as vegetation and dynamic bodies of water). In numerous computer vision applications in video, a separation between the background and the foreground, which contains interest regions for a human viewer, is often required in the initial stage of processing.
Until several years ago, comprehensive labeled databases for the purpose of evaluating and comparing the performance of background segmentation methods in many scenarios, were not available in the computer vision community. In 2012, the website changedetection.net was launched with an extensive database of this kind and since then, much progress was made in this field regarding both the number of new algorithms proposed and segmentation performance.
There are many methods for mapping and localization based on sensor measurements and a known functional model. In these methods the creation of the map is performed by applying the functional model on the sensor measurements. The problem is that many times the model is unknown or very complicated.
This project introduces a mapping algorithm based on measurements. The algorithm uses unsupervised learning implementing a manifold learning technique that does not require a model nor depends on measurements type, thus its advantage. The algorithm was implemented in Matlab and tested using measurements (panorama images) taken from a real room model simulation implemented in Blender.
The results show the algorithm succeed in reconstruction of locations map based on measurements, for convex as well as non convex maps.
Voice activity detection has attracted significant research efforts in the last two decades. Despite much progress in designing voice activity detectors (VADs) in presence of stationary noise, voice activity detection in presence of transient noise is a challenging problem. In this project we implemented and tried to enhance a novel voice activity detector based on spectral clustering methods. The proposed VAD is a supervised learning algorithm, we used labeled data in order to adjust a specific parameters in order to use them later for the online processing. Simulation results show the advantage of the proposed method compared to other conventional methods.
Dyslexia is a learning disorder characterized by difficulties with accurate or fluent word recognition and by poor spelling and decoding abilities. Current diagnosis of dyslexia lacks objective criteria, which can decrease treatment efficacy. Diagnosis relies on a discrepancy between reading ability and intelligence, a measure which can be unreliable, and has been criticized for its poor validity.
Functional magnetic resonance imaging (fMRI) is a fairly new and unique tool that enables widespread, noninvasive investigation of brain functions. A growing body of studies are exploring the use of resting state fMRI techniques in examining possible functional disconnectivity effects in neurologic and psychiatric brain disorders, including Alzheimer's disease, depression, dementia and schizophrenia. Functional connectivity is defined as the temporal dependency of neuronal activation patterns of anatomically separated brain regions. Therefore, functional connectivity studies have the potential to characterize and classify brain disorders such as dyslexia, too.
In this project we deal with the problem of falls, response and identification of the event in the fastest way with minimum miss-detection.
We'll explain the solutions that are available for this problem, what are their disadvantages and suggest a method to use wearable technology to resolve this problem.
With wearable technology, we will try to solve this problem in simple ways of threshold comparisons and see why this method is problematic and is not able to give a proper solution to the problem.
Next, find different methods of machine learning, dimensionality reduction, neural network, which will give some good solution to the problem with high detection rates and no false detection, also explain why other methods could not provide the appropriate solution to the problem that we were looking for
High peak-to-average power ratio (PAPR) is one of the major drawbacks of orthogonal frequency division multiplexing (OFDM) communication schemes. In this report, we propose a novel low complexity and low overhead technique for PAPR reduction, to be called Modulo Technique (MT). We compare our proposed technique to partial transmit sequences (PTS) and show that it achieves greater PAPR reduction while retaining similar complexity. Afterwards, we find a connection between signals randomness and their PAPR. From this connection, we develop a model for estimating PAPR from signal characteristics in the frequency domain. This model can be used for comparing different techniques, and specifically to explain why our technique is superior to PTS. We suggest how other existing PAPR reduction techniques can take advantage of this model to reduce their complexity.
Many studies of ornithologists are based on tracking the nutrition of their explored species of birds. This project assists the studying of nutrition patterns of raptors. Raptors are fed from various rodents. The bones of the rodents are indigestible and emitted out. By identifying the rodent species from its bone, one can learn about the nutrition pattern of the raptor. Classifying the rodent species requires high proficiency, and is a time consuming process.
In this project we introduce a technique to determine the rodent species and the bone type, from a bone picture. This technique will assist researchers as well as amateur birdwatchers (and many students required to do the classification process). The process will be held in two steps. First, a classification of the rodent type will be done through its jaw picture. Second, the rest of the bones in the pellet will be classified. The classification will be achieved using machine learning methods. Taking into account the dataset of images used for training, the proposed technique achieves good results and high accuracy.
In recent years, we observe a trend of video resolution increase. To support this trend, a constant improvement in compression ratio for the same video quality is required. The HEVC standard, which was officially released in 2013, manages to achieve about x2 improvement over its predecessor, H264 and many researchers explore different approaches to improve it even further. One of these approaches is called block-removal. In block-removal, blocks which aren’t significant for a human observer are removed by the encoder and later filled in the decoder using inpainting methods.
In this project, a block removal scheme was implemented as part of an HEVC encoder-decoder scheme using different inpainting technics. The implemented scheme does not manage to improve the bitrate due to some limitations in the algorithms applied and the lack of change in the decoding order (which is being used in other articles in the field). Nevertheless, the implemented scheme can be easily altered to examine other inpainting technics or distortion measures which might be able to improve the system’s performance.
The project goal is to develop a tool for detecting music plagiarism by comparing melodies. First, the fundamental frequencies of the melody are extracted from two sections of songs where one is suspected of being a melody plagiarism of the other. Different algorithms dealing with melody extraction from monophonic and polyphonic music were tested. The algorithm chosen is Durrieu algorithm, based on audio signal modeling and parameter estimation. After melody extraction, a comparison between the extracted fundamental frequencies of the two songs is performed. This comparison aims to determine, based on criteria of similarity between the melody sequences, whether one is a melody plagiarism of the other. The comparison process is performed using the DTW algorithm, an algorithm that measures similarities between two sequences. The developed system can identify cases of plagiarism for polyphonic music. The system yields the expected results for known plagiarism songs.
People with Hearing Disabilities experience many difficulties in everyday life that affects them and their surroundings. The technological development in our lives helped them in many areas, but it made Driving even harder experience. They can't hear noises and beeping, but most importantly they can't hear approaching emergency vehicles.
This inability makes them a safety hazard both to themselves and to their surroundings, because they can accidentally cause a roadblock or even an accident.
There isn't a uniform standard for sirens today, and there isn't an algorithm that can detect sirens from different countries.
In this paper we will present a generic algorithm for detection of sirens in noisy environments, that is based on Advanced methods of signal processing and machine learning, which can detect Emergency signals, Regardless of their origin.
Our results show that our algorithm guarantees at least 98% detection, with close to zero false-negative rate.
The deblurring problem is a difficult yet popular problem in the field of image processing. Blur can be caused from various reasons; movement of the photographed object, optics of the camera, image coding, etc. There are various methods for dealing with the deblurring problem; some assume prior knowledge on the way the blur occurred. In this project, we implemented a non-blind deblurring system based on the SDM algorithm, which was proved effective in facial landmark tracking . This algorithm, is composed of several regression layers each one contains two parts; feature extraction and linear regression.
The system consists of two stages; training and applying. In the training stage, features’ functions are chosen, and regression parameters are learned from a train set of blurred images, which were blurred with the same blurring kernel. In the applying stage the algorithm receives an image which was blurred in the same way as the training images, and deblurring the image using the same features’ functions and regression parameters which were learned in the training stage.
We focused on finding non-linear features in order to utilize the SDM algorithm and receive better results than the optimal linear estimator. We found that using a feature, which is based on histogram equalization, produces such results. We tested the performance of the system by comparing it to Wiener deconvolution and ForWaRD methods . We found that in some cases our system produces better results than the compared methods. In General our system is of low computation complexity, and can be served as a basis for solving other problems in image processing.
- Year - 2015
Tracking world are dealing with surveillance and finding objects, to have a solid solution to the problem the demand of estimate self-location of the sensors are coming up. In environment without GPS we must have alternative solution to estimate self-location of the sensors.
This work describes a reliable and efficient solution to the problem of self-localization of sensors using environmental far field signals with the method of time difference of arrival (TDOA).
We describes here a JPEG based compression scheme, adjusted specific for the underwater acoustical channel. The project goal was to deal with common bit error rates for the underwater acoustic channel, . Tow measures were used to quantify different solutions: compression ratio, which we tried to minimize, and SSIM, a measure that describes the quality of the image, which we tried to maximize.
The scheme described in this report was inspired by initial research conducted by RAFAEL, common schemes to deal with errors in images, and intensive acquaintance with the international JPEG compression standard. The implemented scheme was tested on a variety of images and the results are described in detail in this report.
Introduction (The problem or the background)
In the past recent years, the usage of underwater applications was widely increased in both military and civilian industries. In order for those application to share information, a reliable communication scheme is to be archived using the underwater acoustic channel.
In this project we discussed image transmission through the underwater acoustic channel which is characterized by high bit error probability and slow communication rate. A naïve solution is to use the JPEG compression standard, however JPEG has zero tolerance for errors and a better solution should be found.
JPEG compressed image as decoded after being transmitted through the underwater acoustic channel.
The solution (or the basic approach)
In this project we created a JPEG based compression scheme that is adjusted for the underwater acoustic channel. We chose JPEG as the initial scheme due to the importance of compression ratio and its simplicity. Several features were added in order to deal with errors which make it a robust compression scheme.
Figure 1 - Block diagram of the solution. Several features were added to the original JPEG standard.
Initial results, conducted by RAFAEL:
Our compression scheme results:
In order to deal with the mentioned problem several features were added to the original JPEG standard. We analyzed the scheme’s performance using SSIM, and the image compression ratio on a large variety of images.
Finally, we created a robust JPEG based compression scheme adjustable for the underwater acoustic channel. The scheme adds up to 25 percent overhead for the compressed image size (comparing with the original JPEG standard) and can be modified according to the user’s preferences.
 G. Wallace, The JPEG Still Picture Compression Standard1991 .
 A. Puri, X. Chen ו A. Luthra, Video Coding Using the H.264/MPEG-4 AVC Compression Standart, Signal Proccesing: Image Communication. 2004
 S. Kumar, L. Xu, M. Mandal ו S. Panchanathan, Error Resilency Schemesin H.264 standard, Visual communication&Image Representation. 2006
Occasionally, while taking a photo, unwanted objects enter the frame.
For example, when taking pictures using smartphones, in surveillance cameras, etc.
The project's goal is to allow a user to interactively remove objects from an image background in order to get a clean shot.
The next part of this project would include development of an Android application which implements the current project.
The process in which the object is removed starts with taking a short video, in which the last frame is the user's desired photo. The next step is foreground/background segmentation, to discover the unwanted objects using different algorithms. The final step is object removal, in which the object is replaced by its background from another frame, and possible image matting process to improve the final result.
During the project, 12 videos in different difficulty levels were filmed in order to test the results in different conditions.
The results are good when using easy/medium videos (static camera, not too crowded area) and require improvement when there are hard conditions (trees, flags in background, unstable camera, etc.)
When using the smartphone's camera to take a photo, usually unwanted objects enter the frame, such as unwanted cars on the road, people who walk in the background of the frame, etc.
Another example is when using surveillance cameras, sometimes an image free of unwanted objects is desired.
The project's goal is to deal with those situations in a way that allows the user to select the unwanted object that he wants to remove from a list of unwanted objects, and remove him in a way that the background of the objects is completed from another frame.
For example: (taken from Scalado application which claims to have similar capability)
This project’s goal is estimation of the distance of floating objects, such as boats and personal water craft (water scooters) from a video of maritime environment for the Protector USV, which is a product of Rafael. We propose a novel and efficient algorithm to achieve this goal. The algorithm receives as input a video of a marine environment. In addition, the algorithm receives as input for every video frame the location of a pixel that is on or near the object of interest, which we want to estimate the distance to. For every video frame, the algorithm identifies the horizon line, which we take as a reference point whose distance from the camera can be calculated according to environmental conditions. The algorithm proceeds to identify the contour of the object or its wake and choose the point whose distance is the farthest from the horizon line. We show that this distance, measured in pixels, can be translated to distance in meters according to environmental conditions, height of the CCD camera and its specification. The algorithm has been tested on a number of videos of marine environments taken at various environmental conditions and with different floating objects.
Presented in EUSIPCO 2016, Budapest
Sudden Cardiac Arrest (SCA) is an unpredictable heart failure. According to the American Heart Association (AHA), 8 out of 10,000 adults will experience out-of-hospital cardiac arrests (OHCA) in the U.S. The problem is that treatment must be received immediately because 6-10 minutes later the person is likely to end up dead.
The suggested solution is using a wearable ECG recording device. The device will analyze the signal in order to detect a SCA and will notify the surrounding when SDA occurs.
Our project deals with analyzing the ECG. The algorithm proposed here detected a simulated cardiac arrest within 3.078 sec in average. The percentage of correctly detected heartbeats is 97.87% out of all the existing beats. The percentage of false beats out of all the identified beats is 2.73%.
The algorithm was implemented as an android application and it shows potential in being the analyzer in the integrated suggested system.
This project is based on previous work done by Google Deep Mind, in which reinforcement learning was used in order to teach a computer to play computer games on an Atari 2600 game console, which was popular in the 70s and 80s.
In our project, we build a more advanced learning environment thus supporting a more advanced game console, Super Nintendo Entertainment System (SNES), and by so more complex and stochastic computer games and with the proper modifications to the algorithm, improve the human-like behavior and decision process of the computer.
Hand prosthesis with embedded control with multiple gesture type can reach high prices, and offer less flexibility and patient specific adjustments. With the introduction of 3D printing to the problem, many open source designs for prosthesis emerged, most of them with a single type of hand movement and limited control of the hand. In this research we aim to develop an accessible 3d printed hand with a control mechanism which is reliable, portable, working in real time and controlled intuitively. The prosthesis used was an upgraded version of an open source hand from e-enable, modified to be powered from 3 servo engines, not from the movement of the Hand Stump. The classification algorithm offline was developed on Matlab, it was designed to take into account the needs for real time (less then 250ms) and reliability while taking into account the low frequency sampling rate. Different features and machine learning methods where tested to reach maximum accuracy. Different recording method where tested to make the learning process short but efficient. The real time implementation of the algorithm was done on Intel Edison using Python. The sensor used was the MYO Armband connected via Bluetooth to the Edison.
the method developed here stand to all the criteria aimed for.
Image registration is a challenging task in medical imaging analyses.
The goal of the registration process: integrate the information obtained from different sources to gain more complex and detailed image.
In our project we explored the possibilities for the expansion of the MIND descriptor using previous work done using the image’s gradient.
Our project dealt with multi-model medical image registration.
Medical imaging can be performed in various ways each producing vastly different image. As a result the need of model independent registration has risen.
In a previous work, the MIND registration algorithm was examined and its results for MRI-CT registration looked promising. However, it was claimed that MIND performs poorly on edges therefore an improvement was made taking into account the image's gradient (the GMIND algorithm).
The new algorithm performed better on edges but worse overall.
The goal of this project is to extract the coordinate of the kingfisher flight course from two videos acquired by GoPro camera.
To achieve this goal we have learned and applied methods of matching two images of the scene acquired by adjacent cameras. 5 experiments were done in which we have faced the differences derived from the placement and configuration of the cameras. Additionally we have time synchronized videos from two cameras.
We have applied a TLD algorithm for tracking the flight course of the kingfisher. The coordinates of the course were extracted using stereoscopic methods.
Secondary goal that have been achieved is implementing a Matlab based software which is user friendly that gets two videos as input and its output is 3-D coordinates Exel file.
Speech Scrambling techniques are used to transform a speech signal into an unintelligible
signal in order to avoid eavesdropping.
Such systems are used to guarantee end-to-end security for speech in real time
communication systems such as GSM, VoIP,
Telephone, analogue Radio and so on.
SIPER is an educational tool developed in SIPL. It demonstrates speech, audio and image
processing techniques and can also be
used as an analysis tool for research purposes. SIPER allows experimentation with key
parameters of each technique and shows
both intermediate and final results. It is very modular and its modules can be written in C/C++.
Speech signals have been a research topic for over 50 years. However, many research and engineering challenges are still presented in a field of speech modeling and synthesis.
Speech parameterization techniques that are able, on the one hand to reconstruct a signal transparently, and on the other hand to modify it (in the parametric domain) are very important for flexible speech synthesis and advanced speech transformations (such as voice morphing, emotion modification etc.
This project deals with understanding different approaches for speech analysis and synthesis such as the sinusoidal model (SM), harmonic model (HM), and the adaptive harmonic model (aHM). Informal listening tests show that aHM achieves the best synthesized quality than the other examined methods.
Recently, Salient object detection has attracted a lot of interest in computer vision as it essential for many applications such as object detection and recognition, image compression, video summarization and photo collage.
SIPER is an educational tool developed in SIPL. It demonstrates speech, audio and image processing techniques and can also be used as an analysis tool for research purposes. SIPER allows experimentation with key parameters of each technique and shows both intermediate and final results. It is very modular and its modules can be written in C/C++.
Two Algorithms for Saliency Detection were simulated, and then implemented and compared in this project:
1. Spectral Residual Approach
2. Salient Object Segmentation Based on Context and Shape Prior
The goal of Brain-Computer Interface (BCI) systems is to enable paralyzed individuals independent control of external devices using the operator’s brain’s activity. As many BCI systems are based on electroencephalographic (EEG) signals to avoid invasive procedures, a consistent challenge is to design more robust and reliable classifiers for these signals. Although BCIs are intended for individuals who cannot move, oftentimes classifiers are calibrated on signals from healthy subjects executing movements. In this project, we simulate the real-world scenario by testing on signals from a new subject (unseen in training) imagining movements. We show training on actual, imagined, or both signals, leads to erratic classification results. We propose this is because the signals lay in different domains, and apply a domain adaptation technique, Frustratingly Easy Domain Adaptation (FEDA), to adapt a classifier to the clinically-relevant imagined domain. Using FEDA, we significantly enhance classifier performance by 8% (from 25% to 33%) on a difficult four-class discrimination task.
The aim of this project is to implement an acoustic echo canceler that works in real-time on Texas Instruments TMS320C6748 DSP Development Kit (LCDK). This implementation may be part of a future undergraduate experiment in SIPL.
An acoustic echo canceler (AEC) greatly enhances the audio quality of multipoint hands-free communication systems. It allows the participants in a call to speak smoothly, naturally and feel more comfortable. Acoustic echo is most noticeable and annoying when delay is present in the transmission path. This would happen primarily in long distance circuits, or systems utilizing speech compression - such as in videoconferencing or cellular phones. Even though this echo might not be as annoying with short links, room acoustics will still affect the sound and may hamper communication.
In theory, a simple adaptive filter can be used to estimate the room echo path. Since the loudspeaker output is known, subtracting an estimation of the loudspeaker echo from the microphone input will cancel the echo signal. In practice, however, more factors exist that require attention. For example, a nonlinear behavior of the microphone and loudspeaker, double talk, limited adaptation rate of the filter etc.
Listening to music, as well as singing, are activities most people commonly enjoy. However, not everyone has the ability to sing properly, musically wise. In order to allow the average singer to enjoy his singing without inaccuracies, a wide use of signal processing techniques is being implemented.
Throughout our project we've created a system that fixes singing inaccuracies, based on an input which defines what the singing should sound like. This is accomplished by extracting the main singing features for each vocal frame, and synthesizing a new one in a controllable length. By using "Time Warping" method the frame is being resampled, so its length and typical pitch are being changed. Using These methods, a corrected singing signal is created while preserving the singer's voice features.
MIR is used in applications which organize music databases. It characterizes the user’s musical taste, reproduces and classifies music. The most common method for solving this task uses “Collaborative Filtering”, predicting the taste of one user using data from the other users. The method used in this project is based on processing of the signal itself. As of today, there is no good enough MIR system.
This project uses deep convolutional neural network to extract a given song’s Genre and artist. This work compares the use of deep learning with audio signal and deep learning with common audio features, the MFCC.
This paper also introduces contemporary methods for working with big data.
The goal of this project in to develop a technique for detection of developmental disorders in infants based on analysis of their spontaneous movements. We will use a standard depth camera (Kinect) to track of movements of specific sections of the body. The algorithm will have to be tailored to the configuration of an infant lying and performing spontaneous movements. Then, we will apply machine learning techniques to identify infants that suffer from developmental disorders.
Preterm infants, a continuously growing population, are at risk for multi-system impairments as their body systems are not ripe yet for life out of uterus. In the long run, prematurity increases risk for brain damage and developmental impairments ranging from Minor Neurological Deficits to Cerebral Palsy. Early detection for developmental impairments is crucial, as it enables early beginning of intervention, which may minimize neurological and functional deficits.
We describe an algorithm for estimating heart rate from an optically measured PPG signal when physical exercises are performed.
In this case, the PPG signal is contaminated by motion artifacts caused by hand movements, making it difficult to find its fundamental frequency that corresponds to the heart rate. To overcome the noise, a soft decision approach is taken, by which several candidates for the fundamental frequency of the PPG signal are extracted and assigned grades.
By appropriate grade weighting, the candidate having the maximal grade is selected. The presented algorithm is of low complexity and shown to provide good results. As such, it can be used in low-power portable devices for real time heart rate estimation.
Heart rate monitoring during physical exercise has become increasingly popular recent years. The monitoring is performed using wearable devices, which estimate heart rate in real time using photoplethysmographic (PPG) signals. These PPG signals are obtained by illuminating the skin by a light-emitting diode and measuring changes in light absorption by a photodiode. As the heart pumps blood through organs, volumetric changes of organs occur, reflected in periodic variations in measured light intensity. These variations are used in turn to determine the heart rate, usually in terms of beats-per-minute.
The project's goal is to classify test subjects into two groups: control and patients based on their EEG signals. The secondary goal is to determine the severity of the disease among the patients.
The methods in use in this project are Manifold Learning and specifically Diffusion Maps. These methods were adapted to the problem, given the EEG data, and implemented in the Time Domain, Frequency Domain and using Scattering Transform.
Alzheimer disease is characterized by damaged neuron connections that create unique Alpha waves patterns in the brain. Therefore, we examined different BPF in order to filter noise and classify mainly according to Alpha and\or Theta waves (Theta waves are responsible to learning and memory process).
The BPF improved the results (better classification).
After using the mentioned methods in the time, frequency and scattering domains we managed to get good separation between different people. However, we didn't get adequate classification between patients and control group. Therefore, we explored a new method of invariant matric which is meant to reduce the variance within a classification group.
In this project we try to find a way to fix intonation problems in Hebrew speech files, and make them sound "better". This solution is meant to fix Text-to-Speech sound files, which today have a lot of intonation problems.
Audio books are often used by blind / visually impaired people, who cannot read text. Recording a whole book is hard and requires good reading skills. Therefore audiobooks are read today by professional people. That makes audiobooks very expensive, and requires a lot of time.
Optional solution: text-to-speech. The problem is that the quality of an audiobook produced by a text-to-speech engine is not acceptable either due to errors in reading and due to the lack of correct intonation. However, it may be possible to use a text-to-speech engine whose results will be corrected by a non-professional reader.
Text detection in natural scene images is an important preprocessing for many content-based image analysis tasks. Deep learning is a set of brain-inspired algorithms that involves deep multi-layered neural networks. These neural networks are trained to find a set of features that represent the fed data, thus allowing machine learning and computer-vision usages such as classification and detection. This approach is the state-of-the-art in the field of computer-vision, voice recognition and natural language processing, and used by Google, Microsoft, Yahoo and more.
Text detection in the last few years has become one of the biggest problems to solve in the field of computer vision. The classical ways to detect features in images in computer vision, such as SURF or SIFT, have shown their limits, given the fact that text is present in different scales, colors, and fonts on the images, and hence it is very hard to find common characteristics within all kinds of text. Furthermore, as we know, the human being detects text very easily, which hints that the detection system should be alike the human brain, constituted of neurons, and the system should be able to improve itself with proper training. This technique of detection, known as Deep Learning, is already known as a technique providing state-of-the-art results in the field. Our approach in this project derivate for classical Deep Learning systems, since our goal was to establish classification at the pixel level and not simply detect text zones in images, as it was previously done. Consequently, the establishment of a database constituted of examples with labels at the pixel level was mandatory. This database allowed us to train our network and obtain satisfying results.
The goal in the project is to repair and improve the processing of an image displayed upon a touch screen, in order to gain the ability of shape recognition. The aspect of image processing dealt with in the project, is the combing and stitching of three images derived from three different cameras into one single image that describes the surface of the touch screen.
The touch screen is based on an existing system. It consists of a projective surface and capturing cameras that were built in a previous project, and from an open source code - CCV – Computer Core Vision. Other system that based on the CCV code does not usually use multiple cameras or does not require precise image recognition, and therefore there was no need for a better image stitching than the existing one.
In this document two different methods will be shown, which were used in order to get an image that reflects in a good way the objects and events occurring on the system surface. We will present the main idea behind each method, the implementation and the guidelines that were used in order to choose the suitable one, in light of the results received from the different implementations.
The project’s goal is developing a system such that its input is a person’s hand-writing and output is a personality evaluation according to a graphological analysis.
Graphology is the analysis of handwriting. It helps identifying and evaluating the writer, indicating their psychological state at the time of writing. The basic assumption of graphology is that writing is a reflex and therefore reflects a person’s personality traits. Graphology has a part in many profiling applications. An example for such a process is choosing people for jury duty in the U.S. judicial system.
There are several reasons for using computers to extract and analyze graphological features:
Manual feature extraction from handwriting samples is tedious, subjective and error-prone.
The same handwriting features can be extracted differently by two different graphologists.
The content of the writing can affect the analysis.
Computer Aided Graphology can help overcome these problems, extract features and analyze them faster and more accurately.
The project can be implemented as a smartphone or a web application.
This project was conducted within the frame of a Magneton with the company LinguisTech. In this project we explored different speech recognition methods and tested them on speech recordings of children. Particularly, we examined the benefits of using the Scattering transform as a feature extraction method using different known classification algorithms such as GMM and SVM. We compared the performance of the features from the Scattering transform to the features of the MFCC which are known in the literature as efficient audio descriptors. The Scattering coefficients are a general case of the MFCC coefficients and are characterized by their stability to numerous signal deformations which allows successful classification, as was demonstrated in other tasks, such as image texture and musical genre classification.
Using an SVM classifier, the results we acquired with the features of the Scattering transform are not as successful as those acquired within the Magneton based on the MFCC features, but successful enough to indicate an applicable further research.
High Efficiency Video Coding (HEVC) is a new video coding standard that has recently been finalized. Due to its substantially improved performance, it is expected to replace the H.264 video coding standard and to become the most common video coding technique in few years. A major innovation in HEVC is the use of a quad-tree based coding tree block for images. In this representation, an image is first divided into non-overlapping coding units, which can be recursively divided into smaller coding units. This recursive quad-tree decomposition of the image is an efficient representation of variable block sizes, so that regions of different sizes can be better coded. However, the flexibility of the variable block size structure greatly increases the search domain hence the computational complexity of the encoder.
In this project, we developed a simple yet efficient algorithm for selecting a quad-tree representation of images in an HEVC video encoder. This is the most computationally complex part of such an encoder and optimizing it is a must when implementing any practical HEVC encoding system. The efficiency is measured in the total encoding time, the quality of the compressed video and the bit rate.
Augmented Reality is a computing technology that uses a copy of reality, in which virtual elements are combined with the real environment in real-time. Technology is implemented by a user looking through a semi-transparent medium, when the implementation projects virtual information through it.
The project's main goal is calibration of camera glasses and user’s eyes in order to coordinate their point of view.
Using the calibration results, recognition of items, objects, and their contour will be possible when the user looks at them.
During this project, several implementation ways were tested using tools from the glasses workspace that allows running an application, which performs the calibration. In addition, other tools such as Matlab were used, allowing to perform a theoretical calibration which its result demonstrates the required parameters to the calibration with the user's eyes.
- Year - 2014
This project deals with creating a computerized simulation of a rotating speaker called Leslie Speaker, so that it would be possible to create the same effect without the speakers themselves.
the goal of the project is to find an algorithm for underwater video compression, in order to enable a video transmission in a good video quality, under the constraints of a low channel capacity of the underwater acoustic channel, by using the redundancies of the underwater video, like slow motion, blured background, and strong correlation between the color channels.
the underwater medium has a very low channel capacity, and therefore it is necessary to find algorithms to compress the information, in order to transmit video films. the rate required to do so is 30kbps, a very low rate for video which causes a very strong distortion in the signal. The idea of this project is to use the properties of the underwater videos: high correlation between the color channels, blurred background and slow motion.
this project is based on a previous project, in which a method of adaptive frame skip was developed. Eventually an interpolation block reconstructs the skipped frames.
A new generation video coding standard, named High Efficiency Video Coding (HEVC), has been developed by JCT-VC . This new standard provides a significant improvement in picture quality, especially for high resolution videos.
HEVC adopts a QuadTree (QT) based Coding Unit (CU) block partitioning structure, which is flexible to adapt to various texture characteristics of images, is created for the encoding and decoding processes and the Rate Distortion (RD) cost is calculated for all possible dimensions of CUs in the QT. This is one of the strong features for HEVC that brings higher coding efficiency compared to previous video coding standards that mostly uses a Macroblock of size 16×16; but also causes a dramatic increase in computational complexity.
One the most important challenges in HEVC is time complexity.
Sensing in the Shortwave Infrared (SWIR) range has only recently been made practical. The SWIR
band has an important advantage – it is not visible to the human eye, but since it is reflective,
shows shadows and contrast in its imagery. Moreover, SWIR sensors are highly tolerant to
challenging atmospheric conditions such as fog and smoke. They can be made extremely sensitive,
thus can work in very dark conditions. However, fundamental differences exist in the appearance
between images sensed in visible and SWIR bands. In particular, human faces in SWIR images do
not match human intuition and make it difficult to recognize familiar faces by looking at such images.
Only few previous works in the literature consider the difference in appearance between visible and
SWIR images. These works deal with extraction of band-invariant features from images but they do not
try to map the tones of a SWIR image to the tones of its counterpart visible image. In this project,
we deal with a novel tone mapping application for SWIR face images. We propose a method to map
the tones of a human face acquired in the SWIR band to make it more similar to its appearance in the
visible band. The proposed technique is easy to implement and produces natural looking face images.
The goal of our project is to produce music from harmonic signals, supplied by a system we built as the first part of our project. This system activates as an anemometer – recognizing different directions of the wind, as well as different velocities. One of our project's requirements is that the system will be easy to use and that the sounds produced by it will be pleasant to the ear.
As mentioned above, the project presented here and the applications included, are applied on a physical system built by us, at the first part of the project. Those applications are optional, and the physical system is actually a prototype, on which one can apply a lot of different applications.
The project was executed mostly by Matlab, with which we processed the signals that was received from the physical system.
Hoarseness is a common physiological problem among humans, whether it is permanent (absence of vocal-cords) or temporary. This problem may disturb people in different life situations.
We would like to improve the hoarse/whispered signals in order to receive a better (more humane and coherent) voice, while taking advantage the fact that the hoarse part of the voice signal (as we shall see in the future) is separable from the other parts.
Many uses for such application could come into hand, such as: correcting a singing signal, clarification of whispered voiced that originate from battlefields, different medical needs, etc.
Motion Estimation of objects between consecutive video frames pose a unique challenge due to the difficulty to implement standard motion estimation techniques in this case, given the lack of distinctive features and the great difference between the frames. In this project we developed a system which, given 2 consecutive video frames, matches large objects between them and estimates the translation transformation from one frame to another, while dealing with occlusions. The algorithm consists of 4 main steps. First, Region Merging segmentation is performed. Then, large segments identified in both frames are matched using EMD algorithm. Next, segmentation is improved by merging segments, using the matching results between the frames. Last, motion estimation of large segments is done using incremental search of segment from one frame within the matching region in the other frame. This is done while dealing with occlusions by removing artificial edges created due to the occlusion. The system successfully estimates translation transformations of large objects in synthetic images as well as in natural images while dealing with occlusions.
The main goal of the first part of the project was to perform an Iterative Closest Point registration on two depth maps obtained using the Kinect depth sensor in C++ on the windows platform. The other purposes of this first part was to learn how to integrate alone big libraries (dynamic or not) to the project and to handle with the difficulties of implementing an algorithm on the different classes of the libraries whom do not match necessarily one with the other. The second part of the project was to bond, two by two with the precedent algorithm, different scan frames get by the Kinect with the help of its motor to get a whole body depth image.
In our project we implemented and improved a method to perform
audience measurements using audio sampled by mobile phones. The idea is to sample data from mobile phones and compare the sampled data with data sampled from known broadcasting source. We implemented an algorithm introduced in a paper by Covell & Baluja in 2006. The method in this project differs from most known methods by using image processing methods in order to create the audio samples fingerprints. The performance of the algorithm was tested in simulations checking the audio matching capabilities under several distortions.
EEG (Electro-Encephalography) is the recording of electrical activity along the scalp. The EEG signals are produced by 64 electrodes along the scalp, and are the basis of any BCI system (Brain Computer Interface). BCI technology is capable of making world revolution in the way people communicate. However in order to do that many problems should be solved. One of the main problems of using EEG is eye-blinking. Eye blinking causes major disturbances affecting the EEG signals and makes it very hard to analyze. Therefore, the first step of implementing a BCI is properly detecting the eye-blinking.
Our project's goal is to develop a fully automated system for detecting eye-blinking. We need to take into account that out system (algorithm) should yield better (or at least equal) performance than the existing semi-automated system.
Since the introduction of Intel 4004 (the first commercial microprocessor) and even with today's multicore chips, there is and always will be a need for computers with greater processing power. For 30 years this was achieved by increasing the CPU clocks, however, because of numerous physical limitations in the fabrication process of integrated circuits, it was uneconomical to continue with this trend. Only recently the computing market shifted towards parallel software and hardware design in sough of performance increase. Modern graphics processing units (GPUs) are specialized circuits initially designed for the computer gaming market, their special architecture makes them more efficient than general-purpose CPUs for massively parallel algorithms. This project discusses the implementation of two basic image processing algorithms (convolution and normalized cross correlation) on the latest Nvidia architecture - Kepler. The chosen algorithms can represent several other image processing algorithms with similar memory access pattern and arithmetic complexity. In the project, using datasets from an industrial environment, we discus several implementations and optimization techniques, compare the achieved throughput to the card's theoretical throughput, identify bottlenecks and draw conclusions.
Created in an undergraduate project in the Signal and Image Processing Laboratory (SIPL), Department of Electrical Engineering, Technion – Israel Institute of Technology.
We have built a standalone, robust and portable platform that enables an augmented reality pinball game based on virtual and real objects and using common hardware.
A link to a video of the project:
Capsule endoscopy is a method for recording images of the digestive tract. A patient swallows a capsule containing a tiny camera, which captures images that are then transmitted wirelessly to an external receiver for examination by a physician. Due to limited computational capabilities in the capsule and bandwidth constraints derives from dimensions of capsule, low-complexity and efficient compression of the images is required before transmission. In addition, the images are captured using a Bayer filter mosaic, such that each pixel in raw captured images represents only one color: red, green or blue. This special format requires the adaptation of current compression schemes or development of new schemes.
In this project, we evaluate the performance of several existing compression methods and develop additional methods for compressing Bayer images. We begin with learning new run-length tables for JPEG compression of Bayer images, leading to an improvement of 2dB in PSNR compared to standard JPEG. We later transform the images into the YCgCo color space, which is more natural for representing endoscopic Bayer images. By applying several compression schemes in the YCgCo color space, additional improvement is obtained. In order to reduce computational complexity, we use the hardware-efficient Integer DCT transform, known as ICT, instead of the DCT transform used by JPEG.
Video quality assessment becomes increasingly important nowadays. Therefore, it is desirable to develop a visual quality metric that correlates well with human visual perception.
In previous projects in SIPL, a new technique for quality assessment has been developed, based on DCT sub-bands similarity (DSS). The new technique shows excellent results in comparison to subjective results. Its low complexity makes it highly suitable for a variety of live-streaming applications. Yet, some changes have to be made.
The main goal of this project is to build up a prototype of a video quality assessment system, based on network similar to the IDF network, while using the DSS method. At this part of the project, we implemented the algorithm in C, improving execution times and memory consumption.
We start from presenting the algorithm for pairs of reference and distorted images. Later, we’ll adjust it for videos as well.
In addition, we start by presenting the Full Reference approach (assuming that the whole reference image is available at the receiver side). After that, we’ll discuss the Reduced Reference approach (assuming that only a few features of the reference image are available at the receiver side). It is a more practical approach and we’ve used it in our implementation of the system.
The CT (Computerized Tomography) scan enables estimation of the interior of a scanned object but involves exposure to high amounts of radiation. In some cases it is desirable to reconstruct only a local region of interest (ROI) with fewer measurements and as a result, less radiation.
This project deals with implementation and examination of reconstruction algorithms for a ROI, using fewer samples. The reconstruction method is based on the Compressed Sensing (CS) theory (adding a minimization constraint of a cost function) and is carried out using iterative algorithms.
The aim of this project is to ‘teach the robotic head to speak Hebrew’. This involves two aspects:
Use a speech synthesizer for Hebrew or adapt a generic speech synthesizer for the Hebrew language.
Control the motors in the robotic head to move the robot lips and mouth in synchronization with the spoken text. These tasks, and especially the second one, are challenging as they involve both voice processing and control of few motors in a synchronized manner.
In this project we investigate and implement body movement tracking for sound modulation. The idea began when one of the student wanted to have the ability to pitch bend like guitar players while playing with piano. The basic idea is to track the movement of motion sensor (in this project with android sensors API) and modulate the sound playing by the musician according to the sensors data. The first challenge was to connect all the devices to the computer, in this project the backbone for the connection is UDP over Wi-Fi connection. Next challenge is to understand how to transform movement signals into sound modulation, we implement several filters and time series algorithms to give the player the best ability to control while disabling the noise of playing without modulation intention. Next challenge was to create real time application which processes the inputs as following and connect the output generically, so the musician will be able to choose the preferable synthesizer, the code was written in python efficiently (response ? 0 us) and the output is forwarded to virtual MIDI driver which acts like physical MIDI driver and can connect to any software based synthesizer. The most challenging task was to integrate all the above system (android, python, virtual MIDI driver, synthesizer and physical MIDI).
The performance of traditional speech algorithms, which are based on audio signal, deteriorates in highly non stationary acoustic environment. Visual information is immune for that type of interruptions and it is helpful for speech perception. Variety of speech algorithms, which are based on the visual signal, assume that a bounding box of the location of lips is know. Therefore, the performance of these algorithms depends on accurate detection of the bounding box of lips. In this project we present two algorithms for accurate detection of lips bounding box.
The first algorithm is purely based on face geometry.
The second algorithm exploits face geometry to define a search area for the location of the lips. Then, a search procedure is applied for the detection of the lips bounding box. The experimental results demonstrates the benefit of the search procedure for accurate detection of the bounding box of lips.
The necessity of Detection and Recognition of main point of interest or salient point has become very common due to the large tracking application and adaptive image compression applications. Every application oriented to the open space shall need to acquire the relevant target to be focused on. Saliency Detection is the area dealing with Detection and Recognition of main point of interest in a picture. This action, which is very basic to human brain and eye, is very complex to the machine.
2 main types of approach are implemented: The computational and algorithmical approach, and Emulation of the human brain cognitive abilities. Computational and algorithmical approach was implemented for this project.
The scope of this project is to define and implement a system that will be able to detect and recognize the salient point of a given picture. An extensive literature survey was performed in order to study different algorithms.
In the nowadays western music, it is common that a song is played on semi-tone scale. This means that the notes of the song can be chosen only from a specific set of frequencies, called the semi-tone scale. For example, the set of notes of a piano or a guitar is finite, and singing or playing "between" the notes, is considered being out of tune. Therefore, singing out of tune can be defined by the amount of deviation from this scale. In this project, given an arbitrary standard single human voice song, we are able to find out of tune notes and change them, either automatically to the repaired tuned semi-tone scale, or to actually change them to any other predefined desired notes. This is done in 3 parts - (1) analyzing the frequency and improving it (2) classifying the song into notes and determining a fixing function (3) fixing the song.
This project is divided into two main parts: In the first part, we built a neural network that implements the MLP algorithm (Multilayer Perceptron). The basic implementation of this network is built upon an implementation made by Professor Olivier Temam.
This implementation was further adjusted in a way it will fit our needs. On this part we firstly built a naïve implementation for the network, and then further developed it in order to better utilize the capabilities of the parallel processor. For that we introduced several stages of parallelization, in order to achieve substantial speed against the original serial implementation base line. Throughout the entire process we tested and ran those stages on different network layouts, and on different flavors of inputs.
As some movies, and specifically opera videos, contains embedded subtitles without accompanied text files, there is a need for a robust system capable of extracting these subtitles from the movie and into readable text. Extraction of this information involves detection, localization, tracking, enhancement, and recognition of the text from a given image.
However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction challenging. This project is aimed at delivering such a system, for the purpose of movie subtitles extraction and possible translation later on. Whereas the first part of the project was researched and implemented within the Matlab environment, the second project consisted of a C++ implementation using the OpenCV library.
Parkinson's Disease (PD) is a degenerative disease of the central nervous system with a profound effect on the motor system. Symptoms include slowness of movement, rigidity of motion and in some patients, tremor.
The severity of the disease is quantified using the Unified Parkinson Disease Rating Scale (UPDRS) which is a subjective scale performed and scored by physicians.
In this work, we present an automated, objective quantitative analysis of four UPDRS motor examinations of Hand Movement and Finger Taps.
For this purpose, a non-invasive system for recording and analysis of fine motor skills of hands was developed.
The system is based on a simple low-cost depth acquisition sensor, similar to the second generation of Microsoft's Kinect sensor, and novel recursive self-correcting hand tracking algorithm.
The system allows patients to perform test tasks in a natural and unhindered manner.
The evaluation of the system was carried out on PD patients and controls. Machine Learning based classification was performed on the acquired data, followed by a decision making scheme.
In recent years, we can see an increasing use of image processing systems in various business sectors, such as agriculture, where quality testing and monitoring processes are still performed manually. Because of low equipment prices and advances in the field of image processing, many traditional areas are leaning towards automation solutions.
In this project we attempt to characterize fish behavior in pools during the day and build up a system that will alarm in case of untypical fish behavior, such a behavior may indicate of a change in the fish living conditions or diseases.
Assumptions and Difficulties
The main assumption which the project is based on is that sick fish do not eat and are apathetic to environment changes.
In addition, fish are frequently ill.
Fish pools are dynamic environment and therefore it is hard to separate the fish movement from the pools background.
The water in the pools always moving and has reflections from the light changes during the day. Another difficulty is to recognize the fish food because it is given in small doses.
- Year - 2013
Today, the vast majority of online video systems are wired, enabling high bit rate communications with the cost of range and mobility limitations. Wireless underwater acoustic modems have been developed in the past few years. Using orthogonal frequency division multiplexing (OFDM), rates up to tens of kilobits-per-second were reached. In this project we propose a near-online video compression codec suited for data rates achieved with underwater acoustic OFDM systems. Since the relevant data rates are considered very low for online video transmission, massive compression should be applied. The H.264 standard is an effective compression scheme even for low data rates. However, significant data loss is usually inevitable. By exploiting the unique characteristics of underwater photography, we propose pre-encoder and post-decoder processing stages for reducing the data loss. The proposed solution distributes complexity and power usage between encoder and decoder. The algorithm was tested and analyzed in simulation environment with several video samples photographed in the shallow waters of the Mediterranean, off the shore of Hadera by divers. The results were evaluated both by PSNR and human impression. Satisfying results were achieved compared to H.264 compression.
The subject of this project is shadow detection in aerial images using details of the flight: time, position and the plane’s direction during the photograph. Intelligence decoding of images is interested in temporal differences in images. These differences include the unneeded shadow differences, since the image is captured in different time, location and flight directions. The detection method in our algorithm is based on shadow intensity (gray level), shadow direction (based on sun position in specific time and location), sharp changes in the image and object size. After combining all methods, we get a shadows image.
Dysfluency and stuttering are a break or interruption of normal speech such as repetition, prolongation, interjection of syllables, sounds, words or phrases and involuntary silent pauses or blocks in communication.
The goal of this project is building an algorithm for detecting stuttering in high reliability and as less false alarm as possible.
The algorithm was tested on 12 audio files- Speech signal samples which include stuttering events while stuttering assessment was manually classified.
Several approaches were examined in the project. The approach which produced the best results is machine learning. Two main parameterization methods were examined: Weight function for reduction of false alarm and classifier.
Experimental results demonstrate that the proposed features and classification algorithm give very promising detection ratio of 80% with 0.5% of false alarm.