<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Jiajun's Cyber ​​Garden]]></title><description><![CDATA[Hi, I'm Jiajun Wang, a researcher focusing on the application of cutting-edge AI in engineering. This blog is my digital garden.]]></description><link>https://jiajun.de</link><image><url>https://cdn.hashnode.com/uploads/logos/698c94f239413f8a70063736/8a6d3d6d-c9c1-4c64-957e-55f7170d1bad.png</url><title>Jiajun&apos;s Cyber ​​Garden</title><link>https://jiajun.de</link></image><generator>RSS for Node</generator><lastBuildDate>Fri, 10 Apr 2026 09:09:16 GMT</lastBuildDate><atom:link href="https://jiajun.de/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[基于生成式AI的 3D PCCT 去噪算法研发：6个月工作计划]]></title><description><![CDATA[第1个月：入职培训、环境部署与数据预处理
目标： 完成企业与医院的入职流程，熟悉研发环境，完成数据清洗与经典深度学习 Baseline 的初步验证。

第1-2周：入职与环境熟悉

完成西门子医疗中国及北京协和医院的入职手续与安全/合规培训。

熟悉协和医院放射科驻场工作环境，获取联合实验室高性能计算集群（8卡A100）的访问权限及环境配置。

与院方医生及西门子/FAU导师对齐项目最终预期，确认]]></description><link>https://jiajun.de/3d-pcct</link><guid isPermaLink="true">https://jiajun.de/3d-pcct</guid><dc:creator><![CDATA[Jiajun Wang(Jesse)]]></dc:creator><pubDate>Tue, 31 Mar 2026 12:37:06 GMT</pubDate><content:encoded><![CDATA[<h4><strong>第1个月：入职培训、环境部署与数据预处理</strong></h4>
<p><strong>目标：</strong> 完成企业与医院的入职流程，熟悉研发环境，完成数据清洗与经典深度学习 Baseline 的初步验证。</p>
<ul>
<li><p><strong>第1-2周：入职与环境熟悉</strong></p>
<ul>
<li><p>完成西门子医疗中国及北京协和医院的入职手续与安全/合规培训。</p>
</li>
<li><p>熟悉协和医院放射科驻场工作环境，获取联合实验室高性能计算集群（8卡A100）的访问权限及环境配置。</p>
</li>
<li><p>与院方医生及西门子/FAU导师对齐项目最终预期，确认数据安全规范。</p>
</li>
</ul>
</li>
<li><p><strong>第3-4周：数据处理与基础 Baseline</strong></p>
<ul>
<li><p>对 400 份 3D PCCT 数据（200例配对）进行清洗、预处理及标准化（划分训练集/验证集/测试集，后续创新算法按无配对设定处理）。</p>
</li>
<li><p>跑通前几年经典的 3D CT 去噪深度学习网络（如 3D U-Net、RED-CNN 等），获取基础定量指标，熟悉 3D 数据的 I/O 与显存管理。</p>
</li>
</ul>
</li>
</ul>
<h4><strong>第2个月：3D CycleGAN Baseline 搭建与严格评估</strong></h4>
<p><strong>目标：</strong> 建立公平且极具竞争力的无配对生成式 Baseline，确立评估标准。</p>
<ul>
<li><p><strong>第1-3周：公平性控制下的 CycleGAN 训练</strong></p>
<ul>
<li><p>搭建 3D CycleGAN 架构。严格控制变量：确保 CycleGAN 的 Generator 与后续创新方法的 Encoder/Decoder 采用相同深度的 3D U-Net 主干网络。</p>
</li>
<li><p>对齐超参数：使用相同的 3D patch size、batch size，并在无配对（unpaired）的数据划分下进行训练。</p>
</li>
</ul>
</li>
<li><p><strong>第4周：多维度指标评估系统构建</strong></p>
<ul>
<li><p><strong>像素级：</strong> PSNR/SSIM（明确其倾向于平滑结果的特性）。</p>
</li>
<li><p><strong>感知级：</strong> LPIPS（检验结构细节是否丢失）。</p>
</li>
<li><p><strong>临床级：</strong> HU 偏差统计（测量肝脏、肌肉等特定组织的 CT 值分布是否与真实正常剂量一致，排查系统性偏移）。</p>
</li>
<li><p><strong>结构级：</strong> 轴向一致性分数（测量相邻 slice 间的结构连续性）。</p>
</li>
</ul>
</li>
</ul>
<h4><strong>第3个月：3D 潜空间构建与扩散模型基础验证</strong></h4>
<p><strong>目标：</strong> 解决 3D 数据显存瓶颈，构建高质量的解剖感知潜空间（Latent Space）。</p>
<ul>
<li><p><strong>第1-2周：3D Latent Autoencoder 训练</strong></p>
<ul>
<li><p>构建并训练 3D 编码器 \(E(\cdot)\) 和解码器 \(D(\cdot)\)。</p>
</li>
<li><p>使用自编码重建损失 $$\mathcal{L}_{\mathrm{AE}} = |D(E(u))-u|1 + \beta,\mathcal{L}{\mathrm{perc}}$$，确保 latent 空间足够紧凑且保留解剖信息。</p>
</li>
</ul>
</li>
<li><p><strong>第3-4周：潜空间分布定义与 Conditional Diffusion 探索</strong></p>
<ul>
<li><p>将低剂量（LD）和正常剂量（ND）数据映射至潜空间，获得潜分布 \(\nu_{\mathrm{LD}} = E_{}\mu_{\mathrm{LD}}\) 与 \(\nu_{\mathrm{ND}} = E_{}\mu_{\mathrm{ND}}\)。</p>
</li>
<li><p>初步运行基础的 Latent Conditional Diffusion 模型作为对照，分析其在“条件生成”而非“精准恢复”上的局限性，为后续薛定谔桥（SB）方法提供反面论证。</p>
</li>
</ul>
</li>
</ul>
<h4><strong>第4个月：核心创新实现 (一) - Observation-Centered Schrödinger Bridge</strong></h4>
<p><strong>目标：</strong> 抛弃全局无条件桥，实现以低剂量输入为中心的 3D 潜空间薛定谔桥。</p>
<ul>
<li><p><strong>第1-2周：参考过程与桥的定义</strong></p>
<ul>
<li><p>对每个输入低剂量体 \(y\)，编码得到 \(z_0 = E(y)\)。</p>
</li>
<li><p>定义一个以 \(z_0\) 为中心的 input-centered Ornstein–Uhlenbeck 参考过程： $$dZ_t = -\lambda(t)(Z_t-z_0),dt + \sigma(t,y),dW_t$$</p>
</li>
</ul>
</li>
<li><p><strong>第3-4周：最优化桥接与主损失函数实现</strong></p>
<ul>
<li><p>在满足起点为 \(z_0\)、终点边缘为 \(\nu_{\mathrm{ND}}\) 的路径分布中，求解最优桥 \(Q^{z_0,*} = \arg\min_{Q^{z_0}} \mathrm{KL}(Q^{z_0},|,P^{z_0})\)。</p>
</li>
<li><p>实现并调试薛定谔桥主损失 \(\mathcal{L}_{\mathrm{SB}}\)，确保桥能够正确连接起始与目标分布，同时局部轨迹贴近参考过程，实现“保结构”的天然动力学约束。</p>
</li>
</ul>
</li>
</ul>
<h4><strong>第5个月：核心创新实现 (二) - Barycentric Anchor 与方向约束</strong></h4>
<p><strong>目标：</strong> 融入 Noise2Flow 核心思想，利用软耦合提取平均恢复方向，完成模型调优与蒸馏。</p>
<ul>
<li><p><strong>第1-2周：基于分布几何的平均恢复方向提取</strong></p>
<ul>
<li><p>在一个 batch 内，计算 LD 潜点 \(z_i^{\mathrm{LD}}\) 与 ND 潜点 \(z_j^{\mathrm{ND}}\) 之间的代价矩阵 \(c_{ij} = |z_i^{\mathrm{LD}} - z_j^{\mathrm{ND}}|^2\)。</p>
</li>
<li><p>利用 Sinkhorn/OT 算法计算软耦合矩阵 \(\pi_{ij}\)，提取 Barycentric target： $$\bar{z}i^{\mathrm{ND}} = \frac{\sum_j \pi{ij} z_j^{\mathrm{ND}}}{\sum_j \pi_{ij}}$$</p>
</li>
<li><p>计算出分布诱导的平均恢复方向：\(d_i = \bar{z}_i^{\mathrm{ND}} - z_i^{\mathrm{LD}}\)。</p>
</li>
</ul>
</li>
<li><p><strong>第3周：方向损失融合与联合训练</strong></p>
<ul>
<li><p>引入方向损失 $$\mathcal{L}{\mathrm{dir}} = 1- \frac{ \langle \bar{v}\theta(z_i^{\mathrm{LD}}), d_i\rangle }{ |\bar{v}_\theta(z_i^{\mathrm{LD}})|, |d_i| }$$ ，约束桥的 drift 早期方向。</p>
</li>
<li><p>组合总损失 $$\mathcal{L} = \mathcal{L}{\mathrm{AE}} + \lambda{\mathrm{SB}}\mathcal{L}{\mathrm{SB}} + \lambda{\mathrm{dir}}\mathcal{L}{\mathrm{dir}} + \lambda{\mathrm{tube}}\mathcal{L}_{\mathrm{tube}}$$ 进行全链路联合训练。</p>
</li>
</ul>
</li>
<li><p><strong>第4周：模型蒸馏（可选 / 进阶）</strong></p>
<ul>
<li>探索将训练期间学到的复杂桥演化压缩为 few-step 或 one-step 映射（\(\hat{z} = F_\phi(z_0)\)），提升测试与未来临床部署的推理效率。</li>
</ul>
</li>
</ul>
<h4><strong>第6个月：总结评估、论文撰写与离职交接</strong></h4>
<p><strong>目标：</strong> 全面收尾项目，固化研究成果，完成企业及学校的结题与交接。</p>
<ul>
<li><p><strong>第1-2周：最终验证与多维度对比分析</strong></p>
<ul>
<li><p>将创新方法（Observation-centered SB + Barycentric Direction）与前期的经典 DL Baseline、3D CycleGAN 进行全面对比。</p>
</li>
<li><p>汇总 PSNR、LPIPS、HU 偏差、轴向一致性等定量结果，并与协和医院放射科医师进行定性（视觉效果、病灶保留度）交流反馈。</p>
</li>
</ul>
</li>
<li><p><strong>第3周：论文撰写与答辩准备</strong></p>
<ul>
<li>整理实验数据，绘制图表，撰写实习/项目总结报告，以及为 FAU Pattern Recognition Lab 的项目论文做准备。</li>
</ul>
</li>
<li><p><strong>第4周：工作交接与离职</strong></p>
<ul>
<li><p>整理代码仓库，完善注释与运行文档（Readme），确保西门子医疗及协和医院的后续跟进人员能够顺利复现。</p>
</li>
<li><p>归还企业与医院资产，注销系统权限，完成西门子医疗的正式离职流程与实习鉴定。</p>
</li>
</ul>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[DSS-SQA: Decoupling Structure and Semantics for Semantic Quality Assessment]]></title><description><![CDATA[Links: GitHub Repository | LoViF 2026 Challenge


As Generative AI (AIGC) fundamentally transforms low-level vision tasks, the way we evaluate image quality is undergoing a massive paradigm shift. In ]]></description><link>https://jiajun.de/dss-sqa</link><guid isPermaLink="true">https://jiajun.de/dss-sqa</guid><dc:creator><![CDATA[Jiajun Wang(Jesse)]]></dc:creator><pubDate>Sat, 28 Mar 2026 18:42:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/698c94f239413f8a70063736/004151f1-3602-461a-ad3a-fb135fbbecd5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Links:</strong> <a href="https://github.com/atJesse/SIQAv3">GitHub Repository</a> | <a href="https://lovif-cvpr2026-workshop.github.io/">LoViF 2026 Challenge</a></p>
<img src="https://cdn.hashnode.com/uploads/covers/698c94f239413f8a70063736/a4438c9b-0188-4ef6-8247-8d1e0de8ad23.png" alt="" style="display:block;margin:0 auto" />

<p>As Generative AI (AIGC) fundamentally transforms low-level vision tasks, the way we evaluate image quality is undergoing a massive paradigm shift. In modern generative models, the most pressing issue is no longer just blur or noise, but <strong>AIGC semantic hallucinations</strong>.</p>
<p>To address this, our team (<strong>DSS-SQA</strong>: Jiajun Wang, Yipeng Sun, Kaiwei Lian) participated in the <strong>LoViF 2026 Challenge on Semantic Quality Assessment</strong>. We proposed a novel Full-Reference Image Quality Assessment (FR-IQA) framework that explicitly decouples structural fidelity from semantic alignment.</p>
<p>Our method achieved highly competitive results: a <strong>Final Score of 0.8469 on the official test phase</strong> and <strong>0.9121 on the validation phase</strong>, significantly outperforming traditional metrics, deep-feature metrics (LPIPS, DISTS), and even zero-shot large Vision-Language Models (GPT-5.4, Gemini 3.1 Pro).</p>
<hr />
<h2>🛑 The Challenge: "Shortcut Learning" in Semantic IQA</h2>
<p>The LoViF 2026 challenge provides a highly timely and critical benchmark. However, modeling this high-level semantic alignment presents significant obstacles.</p>
<p>The extreme scarcity of training data (only 510 pairs) makes deep neural networks highly prone to <strong>"shortcut learning"</strong>. Instead of learning to evaluate semantic alignment, models easily degenerate into evaluating absolute image sharpness. For instance, a model might erroneously predict a high score for a visually pristine but semantically completely unrelated generated image.</p>
<p>Explicit gating constraints and self-supervised structural priors were vital to overcoming this bottleneck.</p>
<hr />
<h2>🧠 Our Solution: The DSS-SQA Architecture</h2>
<p>Our proposed solution, DSS-SQA, explicitly decouples structural degradation from semantic hallucinations using a dual-vision backbone (DINOv3 and CLIP).</p>
<img src="https://cdn.hashnode.com/uploads/covers/698c94f239413f8a70063736/b1b72e2f-aabd-46e5-9e14-56b0c84e2696.png" alt="" style="display:block;margin:0 auto" />

<h3>1. Dual-Vision Siamese Encoders</h3>
<p>Recognizing that structure and semantics are orthogonal dimensions, we employ two distinct, frozen pre-trained foundation models:</p>
<ul>
<li><p><strong>Structural Awareness (DINOv3):</strong> We utilize the DINOv3-Base vision transformer to extract structure-aware global representations (implemented with the DINOv3 CLS token by default, with patch-token fallback only when needed).</p>
</li>
<li><p><strong>Semantic Awareness (CLIP):</strong> We employ the CLIP-ViT-L/14 vision encoder to extract global, human-aligned semantic embeddings.</p>
</li>
</ul>
<h3>2. Element-wise Multiplication Fusion</h3>
<p>To fuse these features, we compute the absolute differences for both branches (\(Diff_{s}\) and \(Diff_{c}\)). Crucially, we introduce an <strong>Element-wise Multiplication Fusion</strong> for the CLIP features.
$$ $$</p>
<ul>
<li><p>By default, we apply \(L_{2}\) normalization before the Hadamard product: <code>mult_clip</code> \(=L_{2}(F_{c_ref}) \odot L_{2}(F_{c_dist})\).</p>
</li>
<li><p>We then replace the raw distorted semantic feature with this multiplication result to force the network to rely on fine-grained semantic interaction.</p>
</li>
<li><p>The scalar cosine similarity \(S_{cos}\) is also explicitly injected into the concatenated feature vector.</p>
</li>
</ul>
<h3>3. Explicit Semantic Gating Mechanism</h3>
<p>To directly combat shortcut learning, we designed an explicit semantic gate. During inference (<code>model.eval()</code>) with semantic gating enabled, if the cosine similarity \(S_{cos}\) falls below a threshold (0.4), the gate applies a hard veto by forcefully pushing the predicted logits toward the lowest quality class (Class 0). This ensures the final expected score approaches zero, effectively penalizing visually pleasing but semantically completely hallucinated pairs.</p>
<h3>4. Robust Ensemble Strategy</h3>
<p>To maximize robustness and prevent overfitting on the extremely small dataset, we utilized a <strong>5-fold stratified cross-validation strategy</strong> during training. During the final testing/inference phase, the expected quality scores from all 5 trained models are averaged (Mean Ensemble) to produce the highly stable final output.</p>
<hr />
<h2>📊 Experimental Results</h2>
<h3>LoViF 2026 Validation Set</h3>
<p>Our method significantly outperforms all baseline metrics, proving that decoupling structure and semantics is highly effective in AIGC evaluation.</p>
<table>
<thead>
<tr>
<th>Method</th>
<th>SROCC \(\uparrow\)</th>
<th>PLCC \(\uparrow\)</th>
<th>Final Score \(\uparrow\)</th>
</tr>
</thead>
<tbody><tr>
<td>LPIPS (VGG)</td>
<td>0.7602</td>
<td>0.7389</td>
<td>0.7516</td>
</tr>
<tr>
<td>DISTS</td>
<td>0.8172</td>
<td>0.8055</td>
<td>0.8125</td>
</tr>
<tr>
<td>GPT-5.4</td>
<td>0.7861</td>
<td>0.7957</td>
<td>0.7900</td>
</tr>
<tr>
<td>Gemini 3.1 Pro</td>
<td>0.8068</td>
<td>0.8174</td>
<td>0.8110</td>
</tr>
<tr>
<td><strong>DSS-SQA (Ours)</strong></td>
<td><strong>0.9062</strong></td>
<td><strong>0.9209</strong></td>
<td><strong>0.9121</strong></td>
</tr>
</tbody></table>
<p><em>The Final Score is calculated as \(0.6 \times SROCC + 0.4 \times PLCC\). On the official blind test phase, our model achieved a Final Score of 0.8469.</em></p>
<h3>Zero-Shot Generalization on BAPPS</h3>
<p>To demonstrate that our model learns universally applicable quality representations rather than overfitting the small LoViF dataset, we conducted zero-shot evaluations on the BAPPS perceptual dataset.</p>
<table>
<thead>
<tr>
<th>Method</th>
<th>Trad</th>
<th>CNN</th>
<th>Superres</th>
<th>Overall</th>
</tr>
</thead>
<tbody><tr>
<td>Human (Ceiling)</td>
<td>80.8%</td>
<td>84.4%</td>
<td>73.4%</td>
<td>73.9%</td>
</tr>
<tr>
<td>LPIPS (VGG)</td>
<td>71.4%</td>
<td><strong>81.4%</strong></td>
<td><strong>69.0%</strong></td>
<td><strong>66.8%</strong></td>
</tr>
<tr>
<td><strong>DSS-SQA (Ours)</strong></td>
<td><strong>73.9%</strong></td>
<td>79.6%</td>
<td>65.5%</td>
<td>65.0%</td>
</tr>
</tbody></table>
<p>Despite being optimized for high-level semantic quality, DSS-SQA remains highly competitive with metrics explicitly trained on BAPPS (like LPIPS), proving that our dense DINOv3 representations robustly capture low-level generative artifacts.</p>
<hr />
<h2>💻 Code &amp; Resources</h2>
<p>We have open-sourced the complete training and inference pipeline.</p>
<ul>
<li><strong>GitHub Repository:</strong> <a href="https://github.com/atJesse/SIQAv3">atJesse/SIQAv3</a></li>
</ul>
<p>If you find our work helpful for your research, feel free to drop a star on GitHub! 🌟</p>
]]></content:encoded></item><item><title><![CDATA[BVM2026 in Lübeck]]></title><description><![CDATA[Just wrapped up an incredibly inspiring two days at the BVM 2026 (German Conference on Medical Image Computing)!
I had the great honor of presenting our recent work, "Explainable Radiologist-Aligned V]]></description><link>https://jiajun.de/bvm2026</link><guid isPermaLink="true">https://jiajun.de/bvm2026</guid><category><![CDATA[conference]]></category><dc:creator><![CDATA[Jiajun Wang(Jesse)]]></dc:creator><pubDate>Wed, 18 Mar 2026 13:37:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/698c94f239413f8a70063736/f3e7bb41-c821-4d7f-92c6-90cf4c2d5171.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Just wrapped up an incredibly inspiring two days at the BVM 2026 (German Conference on Medical Image Computing)!</p>
<p>I had the great honor of presenting our recent work, "Explainable Radiologist-Aligned VLM for CT Image Quality Assessment", during the poster session. It was a fantastic experience to share how we used QLoRA to fine-tune MedGemma-4B-IT, enabling it to provide radiologist-level interpretable feedback for CT images.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698c94f239413f8a70063736/b307b084-c1a8-4092-b284-e9ced35239a8.jpg" alt="" style="display:block;margin:0 auto" />

<p>It was a privilege to connect with fellow researchers and domain experts in the medical AI field. We had some truly thought-provoking discussions about the limitations of current black-box models and exciting future directions for interpretable AI and Neuro-symbolic methods.</p>
<p>I was absolutely amazed by the exceptional quality of the other posters and presentations. Having the chance to interact face-to-face with the authors behind these outstanding works was invaluable and sparked many new ideas for my own upcoming Master's thesis!</p>
<p>A huge thank you to my amazing co-authors and supervisor for their continuous support: <a href="https://yipengsun.com/">Yipeng Sun</a>, <a href="https://lme.tf.fau.de/person/sbayer/">Siming Bayer</a> and <a href="https://lme.tf.fau.de/person/maier/">Andreas Maier</a>. Looking forward to applying these fresh insights to my next research steps!</p>
<img src="https://cdn.hashnode.com/uploads/covers/698c94f239413f8a70063736/9222423d-91c9-429d-8bb9-9594e03b8c95.jpg" alt="" style="display:block;margin:0 auto" />

<p>#BVM2026 #MedicalImaging #ArtificialIntelligence #VLM #GenerativeAI #FAU #DeepLearning #Research</p>
]]></content:encoded></item><item><title><![CDATA[Fine-Tuning Qwen2.5-VL on Your Own Images using LLaMA-Factory]]></title><description><![CDATA[The world of Large Language Models (LLMs) is evolving rapidly into Vision-Language Models (VLMs). Models that can see and understand images—like Qwen2.5-VL—are game changers for tasks like OCR, medica]]></description><link>https://jiajun.de/sftllm</link><guid isPermaLink="true">https://jiajun.de/sftllm</guid><category><![CDATA[llm]]></category><dc:creator><![CDATA[Jiajun Wang(Jesse)]]></dc:creator><pubDate>Fri, 13 Feb 2026 12:04:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/gEpncIlZq7c/upload/58777f1aca8c76f8d83249d3f04e5dff.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The world of Large Language Models (LLMs) is evolving rapidly into <strong>Vision-Language Models (VLMs)</strong>. Models that can <em>see</em> and <em>understand</em> images—like <strong>Qwen2.5-VL</strong>—are game changers for tasks like OCR, medical imaging analysis, and visual agents.</p>
<p>However, fine-tuning these multimodal models has historically been a complex engineering nightmare.</p>
<p>Enter <strong>LLaMA-Factory</strong>.</p>
<p>This unified framework makes fine-tuning state-of-the-art models accessible to everyone. In this tutorial, I will guide you step-by-step through fine-tuning <strong>Qwen2.5-VL-7B-Instruct</strong> on a custom image dataset. Whether you are a researcher or a hobbyist, this guide will take you from an empty folder to a working custom VLM.</p>
<h2>Prerequisites</h2>
<p>Before we begin, ensure you have:</p>
<ul>
<li><strong>Hardware:</strong> An NVIDIA GPU (24GB VRAM recommended for 7B models using LoRA; A100/H100 is ideal for faster training).</li>
<li><strong>OS:</strong> Linux (Ubuntu/CentOS) or Windows via WSL2.</li>
<li><strong>Python:</strong> Version 3.10 or higher.</li>
</ul>
<hr />
<h2>Step 1: Environment Setup</h2>
<p>We need a clean environment with the specific dependencies for Qwen's visual processing capabilities.</p>
<ol>
<li><strong>Create a Conda environment:</strong></li>
</ol>
<pre><code class="language-bash">conda create -n qwen_vl_ft python=3.10
conda activate qwen_vl_ft
</code></pre>
<ol>
<li><strong>Clone LLaMA-Factory:</strong></li>
</ol>
<pre><code class="language-bash">git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
</code></pre>
<ol>
<li><strong>Install dependencies:</strong>
This step is crucial. Qwen2.5-VL requires <code>qwen-vl-utils</code> to handle image inputs.</li>
</ol>
<pre><code class="language-bash">pip install -e .[metrics]
pip install qwen-vl-utils
</code></pre>
<p><em>(Optional but Recommended: Install Flash Attention 2 for faster training if you have an Ampere/Ada GPU like A100/RTX3090/4090)</em>:</p>
<pre><code class="language-bash">pip install flash-attn --no-build-isolation
</code></pre>
<hr />
<h2>Step 2: Prepare Your Multimodal Dataset</h2>
<p>Data preparation for VLMs is slightly different from text-only models. You need to link your text instructions to specific image files.</p>
<h3>1. Organize your images</h3>
<p>Create a folder named <code>data/my_images</code> inside the LLaMA-Factory directory and put all your training images there (e.g., <code>.jpg</code> or <code>.png</code> files).</p>
<h3>2. Create the JSON file</h3>
<p>Create a file named <code>data/my_vl_data.json</code>. The format should include an <code>images</code> list containing the path to the image.</p>
<p><strong>Example Format:</strong></p>
<pre><code class="language-json">[
  {
    "instruction": "Analyze this image and describe the defects found.",
    "input": "",
    "output": "The image shows a crack in the metal surface located at the top left corner.",
    "images": [
      "data/my_images/defect_001.png"
    ]
  },
  {
    "instruction": "What is the text written on the sign?",
    "input": "",
    "output": "The sign says 'Do Not Enter'.",
    "images": [
      "data/my_images/sign_045.jpg"
    ]
  }
]
</code></pre>
<p><em>Note: Ensure the image paths are relative to the LLaMA-Factory root directory or absolute paths.</em></p>
<h3>3. Register the dataset</h3>
<p>Open <code>data/dataset_info.json</code> and add your new dataset definition:</p>
<pre><code class="language-json">"my_vl_dataset": {
  "file_name": "my_vl_data.json",
  "formatting": "alpaca",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "images": "images"
  }
}
</code></pre>
<hr />
<h2>Step 3: Download the Base Model</h2>
<p>For stability, download the model weights manually before training.</p>
<pre><code class="language-bash">pip install huggingface_hub
# Download Qwen2.5-VL-7B-Instruct
huggingface-cli download Qwen/Qwen2.5-VL-7B-Instruct --local-dir models/Qwen2.5-VL-7B
</code></pre>
<hr />
<h2>Step 4: Configure and Run Training (LoRA)</h2>
<p>We will use <strong>LoRA (Low-Rank Adaptation)</strong>. This is efficient and perfect for VLMs. We need to create a YAML configuration file.</p>
<p>Create <code>train_qwen25_vl.yaml</code> in the root folder:</p>
<pre><code class="language-yaml">### Model Configuration
model_name_or_path: models/Qwen2.5-VL-7B
template: qwen2_vl                     # CRITICAL: Must use 'qwen2_vl' for correct tokenization
trust_remote_code: true

### Method Configuration
stage: sft                             # Supervised Fine-Tuning
do_train: true
finetuning_type: lora
lora_target: all                       # Qwen-VL benefits from training all linear layers
lora_rank: 16
lora_alpha: 16

### Dataset Configuration
dataset: my_vl_dataset                 # Your custom dataset name
cutoff_len: 2048                       # VLMs need longer context for image tokens
overwrite_cache: true
preprocessing_num_workers: 16

### Training Configuration
output_dir: saves/qwen2.5-vl/lora/sft  # Save path
logging_steps: 10
save_steps: 100
plot_loss: true
overwrite_output_dir: true

### Hyperparameters
per_device_train_batch_size: 4         # Adjust based on VRAM (Try 2 if OOM)
gradient_accumulation_steps: 4
learning_rate: 1.0e-4
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true                             # Use pure bf16 for A100/3090
flash_attn: fa2                        # Use Flash Attention 2
</code></pre>
<p><strong>Start the Training:</strong>
Run the following command:</p>
<pre><code class="language-bash">llamafactory-cli train train_qwen25_vl.yaml
</code></pre>
<p>LLaMA-Factory will now handle the complex task of encoding your images into visual tokens and training the LoRA adapter to understand them.</p>
<hr />
<h2>Step 5: Inference (Testing Your Model)</h2>
<p>Once training finishes, let's see if the model learned your task. You can use the CLI or the WebUI.</p>
<p><strong>Using the WebUI (Easiest Method):</strong></p>
<pre><code class="language-bash">llamafactory-cli webui
</code></pre>
<ol>
<li>Go to the <strong>Chat</strong> tab.</li>
<li>Select the <strong>Checkpoint</strong>: <code>saves/qwen2.5-vl/lora/sft</code>.</li>
<li>Upload an image in the chat box.</li>
<li>Type your instruction and see the magic!</li>
</ol>
<p><strong>Using CLI:</strong></p>
<pre><code class="language-bash">llamafactory-cli chat \
  --model_name_or_path models/Qwen2.5-VL-7B \
  --adapter_name_or_path saves/qwen2.5-vl/lora/sft \
  --template qwen2_vl \
  --finetuning_type lora
</code></pre>
<hr />
<h2>Step 6: Merge and Export (Optional)</h2>
<p>If you want to deploy your model (e.g., using vLLM or Ollama), you need to merge the LoRA weights into the base model.</p>
<p>Create <code>merge_vl.yaml</code>:</p>
<pre><code class="language-yaml">model_name_or_path: models/Qwen2.5-VL-7B
adapter_name_or_path: saves/qwen2.5-vl/lora/sft
template: qwen2_vl
finetuning_type: lora
export_dir: models/Qwen2.5-VL-FinelyTuned
export_size: 5
export_device: cpu   # Use CPU for merging to save VRAM
</code></pre>
<p>Run the export:</p>
<pre><code class="language-bash">llamafactory-cli export merge_vl.yaml
</code></pre>
<hr />
<h2>Conclusion</h2>
<p>Fine-tuning multimodal models used to require specialized knowledge of visual encoders and projector layers. <strong>LLaMA-Factory</strong> abstracts this away, allowing you to treat images just like another data input.</p>
<p>By following this guide, you have successfully fine-tuned <strong>Qwen2.5-VL</strong>, one of the most powerful open-source VLMs available, on your own custom data.</p>
<p><strong>Key Takeaways:</strong></p>
<ol>
<li><strong>Dependencies matter:</strong> Don't forget <code>qwen-vl-utils</code>.</li>
<li><strong>Data format:</strong> Ensure your JSON correctly points to your image paths.</li>
<li><strong>Template:</strong> Always use <code>template: qwen2_vl</code> for this specific model family.</li>
</ol>
<p>Happy Fine-Tuning!</p>
<hr />
<p><em>If you found this tutorial helpful, please share it with your community!</em></p>
]]></content:encoded></item><item><title><![CDATA[The Ultimate Beginner’s Guide to FAU HPC: From Zero to A100]]></title><description><![CDATA[So, you’ve received an invitation to use the High-Performance Computing (HPC) cluster at FAU (likely a Tier3 project). You want to run Deep Learning, VLM, or RL experiments, but you are staring at a black terminal screen and don't know where to start...]]></description><link>https://jiajun.de/fauhpc</link><guid isPermaLink="true">https://jiajun.de/fauhpc</guid><category><![CDATA[hpc]]></category><category><![CDATA[Experience ]]></category><dc:creator><![CDATA[Jiajun Wang(Jesse)]]></dc:creator><pubDate>Thu, 12 Feb 2026 19:46:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/s0XabTAKvak/upload/d79a0234feba7f9842c211661c45ebb1.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>So, you’ve received an invitation to use the High-Performance Computing (HPC) cluster at FAU (likely a Tier3 project). You want to run Deep Learning, VLM, or RL experiments, but you are staring at a black terminal screen and don't know where to start.</p>
<p>Don't worry. I went through the exact same struggle—configuring SSH keys, getting "Permission Denied" errors, and wondering why PyTorch couldn't see the GPU.</p>
<p>This guide will take you from <strong>receiving the email</strong> to <strong>training on an NVIDIA A100</strong>, step-by-step.</p>
<hr />
<h2 id="heading-step-1-accept-the-invitation">Step 1: Accept the Invitation</h2>
<ol>
<li><p>Log in to the <a target="_blank" href="https://portal.hpc.fau.de/"><strong>FAU HPC Portal</strong></a> using your standard IdM credentials (e.g., <code>abc1234d</code>).</p>
</li>
<li><p>Go to <strong>User</strong> -&gt; <strong>Your Invitations</strong> and accept the project invitation.</p>
</li>
<li><p><strong>Wait overnight.</strong> The system runs a synchronization script every night. You usually cannot log in immediately after accepting; your home directory needs time to be created.</p>
</li>
</ol>
<hr />
<h2 id="heading-step-2-generate-and-upload-your-ssh-key">Step 2: Generate and Upload Your SSH Key</h2>
<p>HPC systems don't use passwords; they use "keys." You need to generate a lock (Public Key) and a key (Private Key).</p>
<h3 id="heading-for-windows-powershell-mac-linux">For Windows (PowerShell) / Mac / Linux:</h3>
<p>Open your terminal and run:</p>
<pre><code class="lang-bash">ssh-keygen -t rsa -b 4096
</code></pre>
<ol>
<li><p>Press <strong>Enter</strong> to save it in the default location.</p>
</li>
<li><p>Press <strong>Enter</strong> twice to skip setting a password (useful for automation).</p>
</li>
<li><p>Display your public key:</p>
<ul>
<li><p><strong>Windows:</strong> <code>type %userprofile%\.ssh\id_rsa.pub</code></p>
</li>
<li><p><strong>Mac/Linux:</strong> <code>cat ~/.ssh/id_rsa.pub</code></p>
</li>
</ul>
</li>
<li><p><strong>Copy everything</strong> (starting with <code>ssh-rsa</code> and ending with your username).</p>
</li>
</ol>
<h3 id="heading-upload-to-portal">Upload to Portal:</h3>
<ol>
<li><p>Go back to the <a target="_blank" href="https://portal.hpc.fau.de/">HPC Portal</a>.</p>
</li>
<li><p>Go to <strong>User</strong> -&gt; <strong>Your Accounts</strong>.</p>
</li>
<li><p>Find your HPC account (e.g., <code>abc1234d</code>), click on it, and paste your key into the <strong>"Add new SSH Key"</strong> section.</p>
</li>
<li><p><strong>Wait 15-20 minutes</strong> for the key to sync to the servers.</p>
</li>
</ol>
<hr />
<h2 id="heading-step-3-configure-vs-code-the-pro-setup">Step 3: Configure VS Code (The "Pro" Setup)</h2>
<p><strong>Do not</strong> try to use the raw terminal for everything. Use <strong>VS Code</strong> with the <strong>Remote - SSH</strong> extension. It allows you to edit code on the server as if it were on your laptop.</p>
<h3 id="heading-the-proxyjump-trick">The "ProxyJump" Trick</h3>
<p>Direct access to GPU nodes (like <code>tinyx</code>) is often blocked from outside the university network. We need to jump through a "gatekeeper" server called <code>csnhr</code>.</p>
<ol>
<li><p>In VS Code, install the <strong>Remote - SSH</strong> extension.</p>
</li>
<li><p>Click the blue <code>&gt;&lt;</code> icon (bottom left) -&gt; <strong>Open Configuration File</strong> -&gt; Select your <code>.ssh/config</code>.</p>
</li>
<li><p>Paste the following configuration (Replace <code>abc1234d</code> with <strong>YOUR</strong> HPC username):</p>
</li>
</ol>
<pre><code class="lang-text"># 1. The Gatekeeper (Jump Host)
Host csnhr
    HostName csnhr.nhr.fau.de
    User abc1234d
    IdentityFile ~/.ssh/id_rsa
    IdentitiesOnly yes
    PasswordAuthentication no

# 2. Woody (CPU Frontend - Good for data transfer)
Host woody
    HostName woody.nhr.fau.de
    User abc1234d
    ProxyJump csnhr
    IdentityFile ~/.ssh/id_rsa
    IdentitiesOnly yes

# 3. TinyX (Tier3 GPU Frontend - RUN YOUR EXPERIMENTS HERE)
Host tinyx
    HostName tinyx.nhr.fau.de
    User abc1234d
    ProxyJump csnhr
    IdentityFile ~/.ssh/id_rsa
    IdentitiesOnly yes
</code></pre>
<ol start="4">
<li><p>Save the file.</p>
</li>
<li><p>Click the blue <code>&gt;&lt;</code> icon -&gt; <strong>Connect to Host</strong> -&gt; Select <strong>tinyx</strong>.</p>
</li>
</ol>
<hr />
<h2 id="heading-step-4-know-your-territory-home-vs-work">Step 4: Know Your Territory ($HOME vs $WORK)</h2>
<p>Once logged in, you need to know where to put your files. This is the most common mistake beginners make.</p>
<ul>
<li><p><strong>$HOME (</strong><code>/home/hpc/...</code>):</p>
<ul>
<li><p><strong>Size:</strong> Very small (100GB).</p>
</li>
<li><p><strong>Use for:</strong> Config files, scripts, source code.</p>
</li>
<li><p><strong>NEVER put:</strong> Datasets, Conda environments, or Model checkpoints here. You will run out of space immediately.</p>
</li>
</ul>
</li>
<li><p><strong>$WORK (</strong><code>/home/woody/...</code>):</p>
<ul>
<li><p><strong>Size:</strong> Huge (1TB+?).</p>
</li>
<li><p><strong>Use for:</strong> <strong>EVERYTHING BIG.</strong> Install Miniforge here. Download datasets here.</p>
</li>
<li><p><strong>How to find it:</strong> Run <code>echo $WORK</code> in the terminal.</p>
</li>
</ul>
</li>
</ul>
<p><strong>Always switch to WORK before doing anything:</strong></p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> <span class="hljs-variable">$WORK</span>
</code></pre>
<hr />
<h2 id="heading-step-5-setting-up-the-environment-the-right-way">Step 5: Setting Up the Environment (The Right Way)</h2>
<p>Do not use the default Python. Do not use Anaconda (it's too bloated). Use <strong>Miniforge</strong>.</p>
<ol>
<li><p><strong>Download and Install (in the terminal on</strong> <code>tinyx</code>):</p>
<pre><code class="lang-bash"> <span class="hljs-built_in">cd</span> <span class="hljs-variable">$WORK</span>
 wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
 bash Miniforge3-Linux-x86_64.sh
</code></pre>
<ul>
<li><p><strong>Crucial:</strong> When asked for the installation path, ensure it says <code>/home/woody/...</code>. If it says <code>/home/hpc/...</code>, edit it manually!</p>
</li>
<li><p>Type <code>yes</code> to initialize.</p>
</li>
</ul>
</li>
<li><p><strong>Restart your terminal</strong> (close and reopen the terminal pane in VS Code).</p>
</li>
<li><p><strong>Create your Environment:</strong> (e.g., vlm_env)</p>
<pre><code class="lang-bash"> mamba create -n vlm_env python=3.10
 mamba activate vlm_env
</code></pre>
</li>
</ol>
<hr />
<h2 id="heading-step-6-installing-pytorch-the-trap">Step 6: Installing PyTorch (The Trap!)</h2>
<p>Here is where many fail.</p>
<ol>
<li><p><strong>Compute nodes (GPUs) have NO INTERNET.</strong> You must install packages on the <strong>Login Node (</strong><code>tinyx</code>).</p>
</li>
<li><p><strong>Mamba sometimes defaults to CPU versions.</strong> Use <code>pip</code> to force the CUDA version.</p>
</li>
</ol>
<p><strong>The Golden Command (run this on</strong> <code>tinyx</code>):</p>
<pre><code class="lang-bash">mamba activate vlm_env
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate bitsandbytes
</code></pre>
<p><em>Note: We verify installation on the Login Node, but we run it on the Compute Node.</em></p>
<hr />
<h2 id="heading-step-7-accessing-gpus">Step 7: Accessing GPUs</h2>
<p>You are currently on <code>tinyx</code> (a shared login node). <strong>DO NOT</strong> run training here. You must request a <strong>Compute Node</strong>.</p>
<h3 id="heading-option-a-interactive-mode-debuggingtesting">Option A: Interactive Mode (Debugging/Testing)</h3>
<p>Use <code>salloc</code> to get a GPU for a short time (e.g., 30 mins).</p>
<p><strong>To check what GPUs are available:</strong></p>
<pre><code class="lang-bash">sinfo -o <span class="hljs-string">"%20P %G"</span>
</code></pre>
<p><em>(You might see partitions like</em> <code>a100</code>, <code>v100</code>, <code>rtx3080</code>).</p>
<p><strong>To request an A100 (The Beast):</strong></p>
<pre><code class="lang-bash">salloc --partition=a100 --gres=gpu:a100:1 --time=00:30:00
</code></pre>
<p><strong>To request a V100 (Reliable):</strong></p>
<pre><code class="lang-bash">salloc --partition=v100 --gres=gpu:v100:1 --time=00:30:00
</code></pre>
<p>Once inside (prompt changes to <code>tgXXX</code>), <strong>reactivate your environment</strong> and test:</p>
<pre><code class="lang-bash">mamba activate vlm_env
python -c <span class="hljs-string">"import torch; print(f'CUDA: {torch.cuda.is_available()}'); print(f'Device: {torch.cuda.get_device_name(0)}')"</span>
</code></pre>
<p>If it says <code>True</code> and <code>NVIDIA A100</code>, you win!</p>
<h3 id="heading-option-b-batch-jobs-real-training">Option B: Batch Jobs (Real Training)</h3>
<p>For long training runs (e.g., 24 hours), create a script called <code>run.sh</code>:</p>
<pre><code class="lang-bash"><span class="hljs-meta">#!/bin/bash</span>
<span class="hljs-comment">#SBATCH --job-name=vlm_train</span>
<span class="hljs-comment">#SBATCH --output=logs/%j.out</span>
<span class="hljs-comment">#SBATCH --partition=a100       # or v100</span>
<span class="hljs-comment">#SBATCH --gres=gpu:a100:1      # or gpu:v100:1</span>
<span class="hljs-comment">#SBATCH --time=24:00:00</span>

<span class="hljs-built_in">source</span> <span class="hljs-variable">$WORK</span>/miniforge3/bin/activate vlm_env
python train.py
</code></pre>
<p>Submit it with: <code>sbatch run.sh</code></p>
<hr />
<h2 id="heading-summary-cheat-sheet">Summary Cheat Sheet</h2>
<ol>
<li><p><strong>Connect:</strong> VS Code -&gt; <code>tinyx</code>.</p>
</li>
<li><p><strong>Workspace:</strong> <code>cd $WORK</code>.</p>
</li>
<li><p><strong>Install:</strong> Run installs on <code>tinyx</code> (Login Node).</p>
</li>
<li><p><strong>Debug:</strong> <code>salloc ...</code> to get an interactive GPU.</p>
</li>
<li><p><strong>Train:</strong> <code>sbatch run.sh</code> for long jobs.</p>
</li>
</ol>
<p>Good luck with your experiments! 🚀</p>
]]></content:encoded></item><item><title><![CDATA[Explainable Radiologist-Aligned VLM for CT Image Quality Assessment]]></title><description><![CDATA[Authors: Jiajun Wang, Yipeng Sun, Siming Bayer, Andreas MaierAffiliation: Pattern Recognition Lab, Friedrich-Alexander Universität Erlangen-Nürnberg, GermanyLinks: GitHub Repository

📖 Abstract
The a]]></description><link>https://jiajun.de/ctiqa</link><guid isPermaLink="true">https://jiajun.de/ctiqa</guid><category><![CDATA[paper]]></category><dc:creator><![CDATA[Jiajun Wang(Jesse)]]></dc:creator><pubDate>Fri, 19 Dec 2025 16:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770902202479/c3efe579-ee23-441a-9b8a-638e583657a7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Authors:</strong> <strong>Jiajun Wang</strong>, Yipeng Sun, Siming Bayer, Andreas Maier<br /><strong>Affiliation:</strong> Pattern Recognition Lab, Friedrich-Alexander Universität Erlangen-Nürnberg, Germany<br /><strong>Links:</strong> <a href="https://github.com/atJesse/VLM-CT-IQA">GitHub Repository</a></p>
<hr />
<h2>📖 Abstract</h2>
<p>The assessment of computed tomography (CT) image quality has traditionally relied on manual evaluation by radiologists—a method that is both subjective and time-consuming. While Deep Learning methods exist, they often only give quantitative scores and lack explainability. To address this, we propose a <strong>parameter-efficient supervised fine-tuning (SFT) framework</strong> for the medical VLM, <strong>MedGemma-4B-IT</strong>. By employing <strong>Quantized Low-Rank Adaptation (QLoRA)</strong>, we aligned the model's visual perception with expert quantitative judgment.</p>
<p>Our results demonstrate a substantial improvement in correlation with expert scores (<strong>SRCC=0.7950</strong>, <strong>PLCC=0.7907</strong>) , significantly outperforming zero-shot baselines like Gemini 2.5 Pro and Gemini 2.5 Flash. Most importantly, our model generates <strong>professional textual explanations</strong> that emulate the reasoning and explanation style of radiologists.</p>
<hr />
<h2>💡 Motivation: Why Explainable AI for CT?</h2>
<p>Diagnostic utility in CT scans depends critically on image quality. Degradation due to noise, artifacts, or insufficient contrast can lead to misdiagnoses and repeated examinations.</p>
<ul>
<li><p><strong>The Problem with Manual Scoring:</strong> It is labor-intensive, time-consuming, and prone to inter-observer variability.</p>
</li>
<li><p><strong>The Problem with Previous AI:</strong> Conventional Deep Learning methods (NR-IQA) provide a score but remain opaque, making it difficult to interpret why specific scores are assigned.</p>
</li>
<li><p><strong>The Problem with Cloud VLMs:</strong> General-purpose, closed-source VLMs are often constrained by patient privacy regulations.</p>
</li>
</ul>
<p>Our goal was to create a <strong>locally deployable, privacy-preserving, and explainable</strong> solution.</p>
<hr />
<h2>🛠️ Methodology: The Framework</h2>
<p>We formulated CT-IQA (Image Quality Assessment) as a multimodal reasoning task.</p>
<h3>1. The Model Architecture</h3>
<p>We selected <strong>MedGemma-4B-IT</strong> as our base model due to its strong medical priors. The architecture consists of:</p>
<ul>
<li><p><strong>Vision Encoder:</strong> SigLIP, to capture local anatomical and noise patterns.</p>
</li>
<li><p><strong>Multimodal Projector:</strong> Aligns visual representations with the language space.</p>
</li>
<li><p><strong>Language Model:</strong> Gemma, generating both the textual reasoning and the final quality scores.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698c94f239413f8a70063736/1bb83288-e999-4180-b389-27af9790c989.png" alt="" style="display:block;margin:0 auto" /></li>
</ul>
<h3>2. Parameter-Efficient Fine-Tuning (QLoRA)</h3>
<p>To make training efficient, we used <strong>QLoRA</strong>.</p>
<ul>
<li><p><strong>4-bit Quantization:</strong> The backbone weights are quantized to 4-bit precision and frozen to reduce memory consumption.</p>
</li>
<li><p><strong>Trainable Adapters:</strong> Only the low-rank adapters (~1% of parameters) are trainable.</p>
</li>
</ul>
<h3>3. Data Construction with "Teacher" Distillation</h3>
<p>Since large labeled datasets with explanations are scarce, we created a novel pipeline using the <strong>LDCTIQA dataset</strong> (1,000 CT slices):</p>
<ul>
<li><p><strong>Teacher Model:</strong> We employed <strong>Gemini 2.5 Pro</strong> (the best zero-shot performer) to generate expert-level textual explanations for the training data.</p>
</li>
<li><p><strong>Fine-tuning:</strong> We trained our model to mimic these high-quality explanations, pairing them with radiologist scores.</p>
</li>
</ul>
<hr />
<h2>📊 Results</h2>
<h3>Quantitative Performance</h3>
<p>Our fine-tuned model achieved state-of-the-art results compared to zero-shot baselines on the test set.</p>
<table>
<thead>
<tr>
<th>Model</th>
<th>SRCC (Correlation)</th>
<th>PLCC (Linearity)</th>
<th>MAE (Error)</th>
</tr>
</thead>
<tbody><tr>
<td><strong>MedGemma-4B-IT (Fine-tuned)</strong></td>
<td><strong>0.7950</strong></td>
<td><strong>0.7907</strong></td>
<td><strong>0.5780</strong></td>
</tr>
<tr>
<td>Gemini 2.5 Pro (Zero-shot)</td>
<td>0.7328</td>
<td>0.7204</td>
<td>0.6540</td>
</tr>
<tr>
<td>Gemini 2.5 Flash (Zero-shot)</td>
<td>0.7170</td>
<td>0.6946</td>
<td>0.7360</td>
</tr>
<tr>
<td>MedGemma-4B-IT (Zero-shot)</td>
<td>-0.2438</td>
<td>-0.2029</td>
<td>1.4790</td>
</tr>
</tbody></table>
<p><em>Data Source: Table 1 in Paper</em></p>
<p>The fine-tuning improved the SRCC by over <strong>+1.0</strong> compared to the original weights.</p>
<h3>Qualitative Analysis (Case Study)</h3>
<p>In testing, when presented with a low-quality image (Ground Truth Score: 1.2), the zero-shot baseline incorrectly rated it as high quality (3.2).</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770903965417/0e79b429-3500-4b2f-bb02-970a3d01745f.png" alt="" style="display:block;margin:0 auto" />

<p><strong>Our Fine-Tuned Model:</strong></p>
<ul>
<li><p><strong>Predicted Score:</strong> 1.0 (Very close to GT 1.2).</p>
</li>
<li><p><strong>Generated Reasoning:</strong> Correctly identified "Severe Artifacts," "Streak Artifacts," and "High Noise". It explicitly noted that streaks were radiating from the pelvic girdle, obscuring tissue texture.</p>
</li>
</ul>
<hr />
<h2>🚀 Conclusion</h2>
<p>We demonstrated that a specialized medical VLM can be fine-tuned to emulate the reasoning style of radiologists. This provides a <strong>locally deployable, privacy-preserving, and explainable</strong> tool for automated CT image quality assessment.</p>
<h3>🔗 Resources</h3>
<ul>
<li><strong>Code:</strong> <a href="https://github.com/atJesse/VLM-CT-IQA">GitHub - VLM-CT-IQA</a></li>
</ul>
<hr />
<h3>📝 Citation</h3>
<p>If you find this work helpful, please consider citing our paper:</p>
<pre><code class="language-plaintext">@inproceedings{wang2025explainable,
  title={Explainable Radiologist-Aligned VLM for CT Image Quality Assessment},
  author={Wang, Jiajun and Sun, Yipeng and Bayer, Siming and Maier, Andreas},
  booktitle={German Conference on Medical Image Computing (BVM)},
  year={2026}
}
</code></pre>
]]></content:encoded></item><item><title><![CDATA[My personal experiences and introduction]]></title><description><![CDATA[Hi, I am Jiajun Wang(Jesse as my nickname). I am currently a Master's student at FAU Erlangen-Nürnberg, combining a solid Engineering foundation with cutting-edge AI research.
Engineering Background: ]]></description><link>https://jiajun.de/exp</link><guid isPermaLink="true">https://jiajun.de/exp</guid><category><![CDATA[Personal growth  ]]></category><dc:creator><![CDATA[Jiajun Wang(Jesse)]]></dc:creator><pubDate>Mon, 08 Dec 2025 16:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/ewGMqs2tmJI/upload/63da5ca6bb0bed90611728b1551ce103.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hi, I am Jiajun Wang(Jesse as my nickname). I am currently a Master's student at FAU Erlangen-Nürnberg, combining a solid Engineering foundation with cutting-edge AI research.</p>
<p>Engineering Background: Experience in telecommunications and industrial automation, with granted patents for innovative system designs.</p>
<p>AI Research: Published author (BVM Conference) on fine-tuning Vision-Language Models (VLM). Currently researching Diffusion Models for medical image denoising.</p>
<p>Goal: I am actively seeking an internship in AI, Computer Vision, or Deep Learning. I bring a cheerful personality and a strong ability to bridge engineering challenges with advanced AI solutions.</p>
<h1>📅 Work and Education History</h1>
<h3>🔬 Project Research Student</h3>
<p><a href="https://lme.tf.fau.de/"><strong>Pattern Recognition Lab, FAU Erlangen-Nürnberg</strong></a> | <em>Oct 2025 – Present</em></p>
<ul>
<li><p><strong>Focus:</strong> Generative AI, VLM, Medical Imaging.</p>
</li>
<li><p><strong>Key Achievement:</strong> Fine-tuned <strong>MedGemma-4B</strong> using <strong>QLoRA</strong> for CT image quality assessment.</p>
</li>
<li><p><strong>Outcome:</strong> Paper accepted at the <strong>German Conference on Medical Image Computing (BVM)</strong>. Currently researching <strong>Diffusion Models</strong> for medical image denoising.</p>
</li>
</ul>
<h3>🎓 M.Sc. in Electromobility</h3>
<p><a href="https://www.fau.eu/"><strong>Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)</strong></a> | <em>Oct 2022 – Present</em></p>
<ul>
<li><p><strong>Specialization:</strong> Artificial Intelligence, Deep Learning, and Computer Vision.</p>
</li>
<li><p>Transitioned from engineering to advanced AI research.</p>
</li>
</ul>
<h3>📡 TETRA System Test Engineer</h3>
<p><a href="https://www.hytera.com/en/about-hytera/hytera-profile.html#hy-overview-1"><strong>Hytera Communications</strong></a> | <em>Jul 2022 – Oct 2022</em></p>
<ul>
<li><p><strong>Deployment:</strong> Configured TETRA digital trunking systems (BSCU, CHU) for public safety networks.</p>
</li>
<li><p><strong>Skills:</strong> Root cause analysis using <strong>Linux, Wireshark, and Xshell</strong>.</p>
</li>
</ul>
<h3>🎓 B.E. in Automation</h3>
<p><a href="https://en.xpu.edu.cn/"><strong>Xi'an Polytechnic University</strong></a> | <em>Sep 2018 – Jun 2022</em></p>
<ul>
<li><p><strong>Major:</strong> Automation Engineer Technology.</p>
</li>
<li><p><strong>Foundation:</strong> Built a strong background in Control Systems and Hardware-Software integration.</p>
</li>
</ul>
<h3>⚙️ Engineering Intern</h3>
<p><a href="https://www.esquel.com/"><strong>Esquel Group</strong></a> | <em>Jul 2021 – Aug 2021</em></p>
<ul>
<li><p><strong>Innovation:</strong> Designed a <strong>PLC-based monitoring system</strong> using position sensors to track roller displacement.</p>
</li>
<li><p><strong>Impact:</strong> Reduced manual errors and secured <strong>1 Invention Patent</strong> &amp; <strong>1 Utility Model Patent</strong>.</p>
</li>
</ul>
<h1>🛠 Skills</h1>
<ul>
<li><p><strong>AI &amp; Deep Learning:</strong> PyTorch, LLMs/VLMs (Fine-tuning, RAG), Diffusion Models, PEFT (LoRA/QLoRA), SFT, RLHF, Computer Vision.</p>
</li>
<li><p><strong>Programming:</strong> Python, C, MATLAB.</p>
</li>
<li><p><strong>Tools &amp; Platforms:</strong> Linux, HPC, Docker, Git, Wireshark, Jira.</p>
</li>
<li><p><strong>Languages:</strong> English (Professional), German (Basic), Chinese (Native).</p>
</li>
</ul>
<h1>🏔️ Beyond Work</h1>
<h3>🏃‍♂️ Active Lifestyle</h3>
<p>I believe a healthy body fuels a creative mind.</p>
<ul>
<li><p>🏂 <strong>Snowboarding:</strong> Passionate about carving through snow in winter.</p>
</li>
<li><p>🏊‍♂️ <strong>Swimming:</strong> Building endurance and focus in the water.</p>
</li>
<li><p>🥾 <strong>Hiking &amp; Travel:</strong> Exploring nature and experiencing diverse cultures.</p>
</li>
<li><p>📷 <strong>Photography:</strong> Capturing life's moments and finding unique perspectives through both modern digital lenses and the timeless charm of classic vintage film cameras.</p>
</li>
</ul>
<h3>🥗 Healthy Habits</h3>
<p>I maintain a disciplined lifestyle to stay at peak performance.</p>
<ul>
<li><p>🚭 <strong>Substance-Free:</strong> Non-smoker and non-drinker.</p>
</li>
<li><p>🍬 <strong>Conscious Diet:</strong> Committed to a <a href="https://www.sciencedirect.com/topics/agricultural-and-biological-sciences/low-sugar-diet"><strong>low-sugar lifestyle</strong></a>, strictly minimizing refined sugar intake for better health and mental clarity.</p>
</li>
</ul>
<h3>❤️ Social Impact</h3>
<p>Giving back to the community is an essential part of my life.</p>
<ul>
<li><em>🧒</em> <strong>Child Development:</strong> Deeply care about the growth, education, and well-being of the next generation, actively supporting children's welfare through periodic charitable donations.</li>
</ul>
]]></content:encoded></item></channel></rss>