Amit Israeli - Research Engineer

Experience

Reality Defender

Computer Vision Research Engineer

Jan 2025 - Present

Developing and optimizing AI-driven security solutions to detect deepfakes and fraudulent media.

Freelance

Computer Vision Research Engineer

Dec 2024 - Mar 2025

Collaborated with LuckyLab to develop and deploy cutting-edge deep learning solutions for computer vision challenges on edge devices.
Focused on real-time segmentation, object detection, and optimization techniques for resource-constrained environments.

NLPearl

Deep Learning Research Engineer

Jul 2024 - Jan 2025

Developed real-time systems to detect conversational pauses and suggest optimal starter sentences for AI agents using fine-tuned LLMs, with specialized prediction heads.
Experimented with various architectures, including encoder-based and decoder-pretrained models, applying LoRA and multi-stage training to enhance prediction accuracy.
Designed a small language model (SLM) to generate task-specific tokens, enabling multi-task outputs from a single fine-tuned model for efficient real-time inference.
Designed solutions using pre-trained SOTA audio tokenization models and large language models (LLMs) tailored for audio-specific objectives.

Pashoot Robotics

Computer Vision and Deep Learning Research Engineer

May 2023 - Jul 2024

As a pivotal member of the Algorithms team at Pashoot Robotics, my role focused on enhancing and innovating vision solutions critical to manufacturing automation. I applied deep learning methods to practical applications, including research in 3D reconstruction, object detection, segmentation (few-shot, zero-shot), tracking, 6DOF estimation, and simulation expertise (specifically using Blender).

Projects

Token-Budget-Aware LLM Reasoning (VLM Implementation) GitHub wandb

NLP VLM

Implemented the paper Token-Budget-Aware LLM Reasoning for visual-language model (VLM) chain-of-thought tasks.
Implemented a token-budget-aware method for multimodal environments by fine-tuning an LLM with LoRA and integrating a frozen SigLIP image encoder. Designed a head combining the SigLIP CLS token and LLM hidden states to predict token budgets, evaluated with text-only and multimodal inputs.

PopYou2 - VAR Text GitHub Hugging Face🤗 wandb

Image Generation Computer Vision

Generated a comprehensive dataset of approximately 100,000 Funko Pop! images with detailed prompts using SDXL Turbo for high-quality data creation.
Fine-tuned the Visual AutoRegressive (VAR) model, pretrained on ImageNet, adapting it for Funko Pop! generation by injecting a custom embedding representing the "doll" class.
Trained an adapter with the frozen SigLIP image encoder and a lightweight LoRA module to map image embeddings to text representations.
Enabled text-to-image generation by replacing the SigLIP image encoder with its text encoder, retaining frozen components like the VAE and generator for efficiency and quality.

A Funko Pop figure of , styled as a , performing .

Few-Shot Segmentation with SAM and LoRA GitHub

Segmentation Computer Vision

Employs LoRA to adapt SAM for few-shot segmentation with minimal images.
Eliminates reliance on external prompts or detection models like Grounding SAM or YOLO.
Outperforms prior methods (e.g., PerSAM) in class-specific segmentation quality and flexibility.
Trained on diverse datasets, including COCO, Soccer, and Cityscapes, with varying sample sizes and class distributions.
Explored enhancements by leveraging other foundation models with prior class-oriented knowledge (e.g., CLIP, SigLIP).

MusicGen Genre Fine-Tuning with LoRA GitHub

Audio Generation

Fine-tune Meta's MusicGen model using LoRA to generate genre-specific music.
Preserves the original controllable generation capabilities of MusicGen.
Evaluated with MapleStory-like background music to achieve distinct style adaptation.

CelebrityLook GitHub

Image Generation Mobile App Computer Vision

Developed a mobile app for real-time face transformation.
Utilized advanced GAN technologies on-device.
Achieved 30fps on mobile devices with optimized CoreML models.
Won the MobileXGenAI hackathon hosted by Samsung Next.
Combined multiple losses from foundation models and facial feature extractors to ensure high-quality transformations.
Implemented the paper "Bridging CLIP and StyleGAN through Latent Alignment for Image Editing" while researching enhancements through varied loss functions, latent spaces, and mapper architectures.

PopYou - fastgan clip GitHub

Image Generation Computer Vision

Utilized an image upscaling model and Deci Diffusion to create a semi-synthetic dataset of Funko Pop figures.
Trained a GAN model on the dataset using FastGAN to generate high-quality Funko Pop designs.
Developed an inversion model based on a frozen CLIP backbone, enabling image generation from both text and real images, leveraging techniques from CelebrityLook.
Explored 3D generation of Funko Pop figures—including mesh creation and multi-view rendering—using diffusion models and textual inversion.
Benchmarked the PopYou! model against Deci Diffusion using CLIP similarity and FID scores.

KoalaReadingAI GitHub Spotify YouTube

Image Generation Audio Generation Computer Vision

Founded an AI podcast that automatically summarizes the latest AI research papers.
Utilizes ElevenLabs text-to-speech and ChatPDF APIs for summarization and speech synthesis.
Automated downloading of the latest papers from Hugging Face daily papers.
Implemented a free TTS option using Tortoise-TTS; currently working on summaries using Llama 2.

Amit Israeli - Research Engineer

Profile

Experience

Reality Defender

Computer Vision Research Engineer

Freelance

Computer Vision Research Engineer

NLPearl

Deep Learning Research Engineer

Pashoot Robotics

Computer Vision and Deep Learning Research Engineer

Projects

Token-Budget-Aware LLM Reasoning (VLM Implementation) GitHub wandb

PopYou2 - VAR Text GitHub Hugging Face🤗 wandb

Few-Shot Segmentation with SAM and LoRA GitHub

MusicGen Genre Fine-Tuning with LoRA GitHub

CelebrityLook GitHub

PopYou - fastgan clip GitHub

KoalaReadingAI GitHub Spotify YouTube

Contact