Dena Bazazian

Dena Bazazian is a lecturer (assistant professor) in machine vision and robotics at the University of Plymouth. Previously, she was a senior research associate at the Visual Information Laboratory of the University of Bristol. Prior to that, she was a research scientist at CTTC (Centre Tecnològic de Telecomunicacions de Catalunya) and a postdoctoral researcher at the Computer Vision Center (CVC), Universitat Autònoma de Barcelona (UAB) where she accomplished her PhD in 2018.
She had long-term research visits at NAVER LABS Europe in Grenoble, France in 2019 and at the Media Integration and Communication Center (MICC), University of Florence, Italy in 2017. She was working with the Image Processing Group (GPI) at Universitat Politècnica de Catalunya (UPC) between 2013 and 2015.

Dena Bazazian was one of the main organizers of the series of Deep Learning for Geometric Computing (DLGC) workshops at CVPR2024-20, ICCV2021, Women in Computer Vision (WiCV) Workshops at CVPR2018 and ECCV2018, Robust Reading Challenge on Omnidirectional Video at ICDAR2017.

Research

ViGLAD: Vision Graph Neural Networks for Logical Anomaly Detection
Zoghlami, D., Bazazian, D. , Masala, G., Gianni, M., Khan, A.
IEEE Access, 2024.

@article{zoghlami2024viglad,
title={ViGLAD: Vision Graph Neural Networks for Logical Anomaly Detection},
author={Zoghlami, Firas and Bazazian, Dena and Masala, Giovanni and Gianni, Mario and Khan, Asiya},
journal={IEEE Access},
publisher={IEEE},
year={2024}
}

Quality inspection is an industrial field with a growing interest in anomaly detection research.An anomaly in an image can either be structural or logical. While structural anomalies lie on the image objects, challenging logical anomalies are hidden in the global relations between the image components.The proposed approach, Vision Graph based Logical Anomaly Detection (ViGLAD), uses the graph representation of an image for logical anomaly detection. Defining an image as a structure of nodes and edges leverages new possibilities for detecting hidden logical anomalies by introducing vision graph autoencoders. Our experiments on public datasets show that using vision graphs enhances the performance of state-of-the-art teacher-student-autoencoder neural networks in logical anomaly detection while achieving robust results in structural anomaly detection.

Localised-NeRF: Specular Highlights and Colour Gradient Localising in NeRF
Selvaratnam, D., Bazazian, D.
CVPR w, 2024.

@inproceedings{selvaratnam2024localised,
title={Localised-NeRF: Specular Highlights and Colour Gradient Localising in NeRF},
author={Selvaratnam, Dharmendra and Bazazian, Dena},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={2791--2801},
year={2024}
}

Neural Radiance Field (NeRF) based systems predominantly operate within the RGB (Red, Green, and Blue) space; however, the distinctive capability of the HSV (Hue, Saturation, and Value) space to discern between specular and diffuse regions are seldom utilised in the literature. We introduce Localised-NeRF, which projects the queried pixel point onto multiple training images to obtain a multi-view feature representation on HSV space and gradient space to obtain important features that can be used to synthesise novel view colour. This integration is pivotal in identifying specular highlights within scenes, thereby enriching the model’s understanding of specular changes as the viewing angle alters. Our proposed Localised-NeRF model uses an attention-driven approach that not only maintains local view direction consistency but also leverages image-based features namely the HSV colour space and colour gradients. These features serve as effective indirect priors for both the training and testing phases to predict the diffuse and specular colour. Our model exhibits competitive performance with prior NeRF-based models, as demonstrated on the Shiny Blender and Synthetic datasets. The code of Localised-NeRF is publicly available.

GPr-Net: Geometric Prototypical Network for Point Cloud Few-Shot Learning
Anvekar, T., Bazazian, D.
CVPR w, 2023.

@inproceedings{anvekar2023gpr,
title={Gpr-net: Geometric prototypical network for point cloud few-shot learning},
author={Anvekar, Tejas and Bazazian, Dena},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={4178--4187},
year={2023}
}

In the realm of 3D-computer vision applications, point cloud few-shot learning plays a critical role. However, it poses an arduous challenge due to the sparsity, irregularity, and unordered nature of the data. Current methods rely on complex local geometric extraction techniques such as convolution, graph, and attention mechanisms, along with extensive data-driven pre-training tasks. These approaches contradict the fundamental goal of few-shot learning, which is to facilitate efficient learning. To address this issue, we propose GPr-Net (Geometric Prototypical Network), a lightweight and computationally efficient geometric prototypical network that captures the intrinsic topology of point clouds and achieves superior performance. Our proposed method, IGI++ (Intrinsic Geometry Interpreter++) employs vector-based hand-crafted intrinsic geometry interpreters and Laplace vectors to extract and evaluate point cloud morphology, resulting in improved representations for FSL (Few-Shot Learning). Additionally, Laplace vectors enable the extraction of valuable features from point clouds with fewer points. To tackle the distribution drift challenge in few-shot metric learning, we leverage hyperbolic space and demonstrate that our approach handles intra and inter-class variance better than existing point cloud few-shot learning methods. Experimental results on the ModelNet40 dataset show that GPr-Net out-performs state-of-the-art methods in few-shot learning on point clouds, achieving utmost computational efficiency that is 170× better than all existing works. The code is publicly available at https://github.com/TejasAnvekar/GPr-Net.

Perceptually Grounded Quantification of 2D Shape Complexity
Bazazian, D. , Magland, B., Grimm, C., Chambers, E., Leonard, K.
The Visual Computer Journal, 2022.

@article{Bazazian-EDCNet2021,
author={D. {Bazazian} and B. {Magland} and C. {Grimm} and E. {Chambers} and K. {Leonard} },
journal={The Visual Computer},
title={Perceptually grounded quantification of 2D shape complexity},
year={2022},
volume={38},
pages={3351–3363},
doi={https://doi.org/10.1007/s00371-022-02634-8}}

The importance of measuring the complexity of shapes can be seen by the wide range of its application such as computer vision, robotics, cognitive studies, eye tracking, and psychology. However, it is very challenging to define an accurate and precise metric to measure the complexity of the shapes. In this paper, we explore different notions of shape complexity, drawing from established work in mathematics, computer science, and computer vision. We integrate results from user studies with quantitative analyses to identify three measures that capture important axes of shape complexity, out of a list of almost 300 measures previously considered in the literature. We then explore the connection between specific measures and the types of complexity that each one can elucidate. Finally, we contribute a dataset of both abstract and meaningful shapes with designated complexity levels both to support our findings and to share with other researchers.

Dual-Domain Image Synthesis using Segmentation-Guided GAN
Bazazian, D. , Calway, A., Damen, D.
CVPR w, 2022.

@article{Bazazian_DDS2022,
author={Bazazian, Dena and Calway, Andrew and Damen, Dima},
journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
title={Dual-Domain Image Synthesis using Segmentation-Guided GAN},
year={2022},
pages={1-16},

We introduce a segmentation-guided approach to synthesise images that integrate features from two distinct domains. Images synthesised by our dual-domain model belong to one domain within the semantic-mask, and to another in the rest of the image - smoothly integrated. We build on the successes of few-shot StyleGAN and single-shot semantic segmentation to minimise the amount of training required in utilising two domains. The method combines few-shot cross-domain StyleGAN with a latent optimiser to achieve images containing features of two distinct domains. We use a segmentation-guided perceptual loss, which compares both pixel-level and activations between domain-specific and dual-domain synthetic images. Results demonstrate qualitatively and quantitatively that our model is capable of synthesising dual-domain images on a variety of objects (faces, horses, cats, cars), domains (natural, caricature, sketches) and part-based masks (eyes, nose, mouth, hair, car bonnet). The code is publicly available at this URL.

EDC-Net: Edge Detection Capsule Network for 3D Point Clouds
Bazazian, D. , Parés, ME.
Applied Sciences Journal, 2021.

@article{Bazazian-EDCNet2021,
author={D. {Bazazian} and ME. {Parés}},
journal={Applied Sciences},
title={EDC-Net: Edge Detection Capsule Network for 3D Point Clouds},
year={2021},
volume={11},
number={4: 1833},
pages={1-16},
doi={https://doi.org/10.3390/app11041833}}

Edge features in point clouds are prominent due to the capability of describing an abstract shape of a set of points. Point clouds obtained by 3D scanner devices are often immense in terms of size. Edges are essential features in large scale point clouds since they are capable of describing the shapes in down-sampled point clouds while maintaining the principal information. In this paper, we tackle challenges of edge detection tasks in 3D point clouds. To this end, we propose a novel technique to detect edges of point clouds based on a capsule network architecture. In this approach, we define the edge detection task of point clouds as a semantic segmentation problem. We built a classifier through the capsules to predict edge and non-edge points in 3D point clouds. We applied a weakly-supervised learning approach in order to improve the performance of our proposed method and built in the capability of testing the technique in wider range of shapes. We provide several quantitative and qualitative experimental results to demonstrate the robustness of our proposed EDC-Net for edge detection in 3D point clouds. We performed a statistical analysis over the ABC and ShapeNet datasets. Our numerical results demonstrate the robust and efficient performance of EDC-Net.

DCG-Net: Dynamic Capsule Graph Convolutional Network for Point Clouds
Bazazian, D. , Nahat, D.
IEEEAccess Journal, 2020.

@article{Bazazian2020,
author={D. {Bazazian} and D. {Nahata}},
journal={IEEE Access},
title={DCG-Net: Dynamic Capsule Graph Convolutional Network for Point Clouds},
year={2020},
volume={8},
number={},
pages={188056-188067},
doi={10.1109/ACCESS.2020.3031812}}

This paper introduces DCG-Net (Dynamic Capsule Graph Network) to analyze point clouds for the tasks of classification and segmentation. DCG-Net aggregates point cloud features to build and update the graphs based on the dynamic routing mechanism of capsule networks at each layer of aconvolutional network. The first layer of DGC-Net exploits the geometrical attributes of the point cloudto build a graph by neighborhood aggregation while the deeper layers of the network dynamically updatethe graph based on the feature space of convolutions. We conduct extensive experiments on public datasets, ModelNet40, ShapeNet-Part. Our experimental results demonstrate that DCG-Net achieves state-of-the-art performance on public datasets, 93.4% accuracy on ModelNet40, and 85.4% instance mIoU (mean Intersection over Union) on ShapeNet-Part.

Can Generative Adversarial Networks Teach Themselves Text Segmentation?
Al-Rawi, M., Bazazian, D., Valveny, E.
ICCV w, 2019.

In the information age in which we live, text segmentationfrom scene images is a vital prerequisite task used in manytext understanding applications. Text segmentation is a dif-ficult problem because of the potentially vast variation intext and scene landscape. Moreover, systems that learn toperform text segmentation usually need non-trivial annota-tion efforts. We present in this work a novel unsupervisedmethod to segment text at the pixel-level from scene images.The model we propose, which relies on generative adversar-ial neural networks, segments text intelligently; and doesnot therefore need to associate the scene image that con-tains the text to the ground-truth of the text. The main ad-vantage is thus skipping the need to obtain the pixel-levelannotation dataset, which is normally required in trainingpowerful text segmentation models. The results are promis-ing, and to the best of our knowledge, constitute the firststep towards reliable unsupervised text segmentation. Ourwork opens a new research path in unsupervised text seg-mentation and poses many research questions with a lot oftrends available for further improvement.

Word Spotting in Scene Images based on Character Recognition
Bazazian, D., Karatzas, D. and Bagdanov, A.
CVPR w, 2018.

In this work we address the challenge of spotting text in scene images without restricting the words to a fixed lexicon or dictionary. Words which are typically out of dictionary include, for instance, cases where exclamation or other punctuation marks are present in words, telephone numbers, URLs, dates, etc. To this end, we train a Fully Convolutional Network to produce heatmaps of all the character classes. Then, we employ the Text Proposals approach and, via a rectangle classifier, detect the most likely rectangle for each query word based on the character attribute maps. The key advantage of the proposed method is that it allows unconstrained out-of-dictionary word spotting independent from any dictionary or lexicon. We evaluate the proposed method on ICDAR2015 and show that it is capable of identifying and recognizing query words in natural scene images.

FAST: Facilitated and Accurate Scene TextProposal through FCN Guided Pruning
Bazazian, D., Gomez, R., Gomez, L., Nicolaou, A., Karatzas, D., and Bagdanov, A.
Pattern Recognition Letter (PRL) Journal, 2017.

Class-specific text proposal algorithms can efficiently reduce the search space for possible text object locations in an image. In this paper we combine the Text Proposals algorithm with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same recall level and thus gaining a significant speed up. Our experiments demonstrate that such text proposal approaches yield significantly higher recall rates than state-of-the-art text localization techniques, while also producing better-quality localizations. Our results on the ICDAR 2015 Robust Reading Competition (Challenge 4) and the COCO-text datasets show that, when combined with strong word classifiers, this recall margin leads to state-of-the-art results in end-to-end scene text recognition.

Reading Text in the Wild from Compressed Images
Galteri, L., Bazazian, D., Seidenari, L., Bertini, M., Bagdanov, A., Nicolaou, A., Karatzas, D.
ICCV w, 2017.

Reading text in the wild is gaining attention in the computer vision community. Images captured in the wild are almost always compressed to varying degrees, depending on application context, and this compression introduces artifacts that distort image content into the captured images. In this paper we investigate the impact these compression artifacts have on text localization and recognition in the wild. We also propose a deep Convolutional Neural Network (CNN) that can eliminate text-specific compression artifacts and which leads to an improvement in text recognition. Experimental results on the ICDAR-Challenge4 dataset demonstrate that compression artifacts have a significant impact on text localization and recognition and that our approach yields an improvement in both – especially at high compression rates.

Improving Text Proposals for Scene Images with Fully Convolutional Networks
Bazazian, D., Gomez, R., Gomez, L., Nicolaou, A., Karatzas, D., and Bagdanov, A.
ICPR, 2016.

Text Proposals have emerged as a class-dependent version of object proposals – efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm, combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art.

ICDAR2017 Robust Reading Challenge on Omnidirectional Video
Iwamura, M., Morimoto, N., Tainaka, K., Bazazian, D., Gomez, L., Karatzas, D.
ICDAR, 2017.

This challenge focuses on scene text localization and recognition on the Downtown Osaka Scene Text (DOST) dataset. Five tasks will be opened within this Challenge: Text Localisation in Videos, Text Localisation in Still Images, Cropped Word Recognition, End-to-End Recognition in Videos, and End-toEnd Recognition in Still Images. The DOST dataset preserves scene texts observed in the real environment as they were. The dataset contains videos (sequential images) captured in shopping streets in downtown Osaka with an omnidirectional camera. Use of the omnidirectional camera contributes to excluding user’s intention in capturing images. Sequential images contained in the dataset contribute to encouraging developing a new kind of text detection and recognition techniques that utilize temporal information. Another important feature of DOST dataset is that it contains non-Latin text. Since the images were captured in Japan, a lot of Japanese text is contained while it also contains adequate amount of Latin text. Because of these features of the dataset, we can say that the DOST dataset preserved scene texts in the wild.

Segmentation-based Multi-Scale Edge Extraction to Measure the Persistence of Features in Unorganized Point Clouds
Bazazian, D., , Casas, J.R., Ruiz-Hidalgo, J.
VISAPP, 2017.

Edge extraction has attracted a lot of attention in computer vision. The accuracy of extracting edges in point clouds can be a significant asset for a variety of engineering scenarios. To address these issues, we propose a segmentation-based multi-scale edge extraction technique. In this approach, different regions of a point cloud are segmented by a global analysis according to the geodesic distance. Afterwards, a multi-scale operator is defined according to local neighborhoods. Thereupon, by applying this operator at multiple scales of the point cloud, the persistence of features is determined. We illustrate the proposed method by computing a feature weight that measures the likelihood of a point to be an edge, then detects the edge points based on that value at both global and local scales. Moreover, we evaluate quantitatively and qualitatively our method. Experimental results show that the proposed approach achieves a superior accuracy. Furthermore, we demonstrate the robustness of our approach in noisier real-world datasets.

Fast and Robust Edge Extraction in unorganized Point Clouds
Bazazian, D., , Casas, J.R., Ruiz-Hidalgo, J.
DICTA, 2015.

Edges provide important visual information in scene surfaces. The need for fast and robust feature extraction from 3D data is nowadays fostered by the widespread availability of cheap commercial depth sensors and multi-camera setups. This article investigates the challenge of detecting edges in surfaces represented by unorganized point clouds. Generally, edge recognition requires the extraction of geometric features such as normal vectors and curvatures. Since the normals alone do not provide enough information about the geometry of the cloud, further analysis of extracted normals is needed for edge extraction, such as a clustering method. Edge extraction through these techniques consists of several steps with parameters which depend on the density and the scale of the point cloud. In this paper we propose a fast and precise method to detect sharp edge features by analysing the eigenvalues of the covariance matrix that are defined by each point’s k-nearest neighbors. Moreover, we evaluate quantitatively, and qualitatively the proposed methods for sharp edge extraction using several dihedral angles and well known examples of unorganized point clouds. Furthermore, we demonstrate the robustness of our approach in the noisier real-world datasets

Dena Bazazian

Lecturer (Assistant Professor)

Machine Vision and Robotics

University of Plymouth

Research

Media