CryoLens

GlacierLens Script Suite — Overview

The GlacierLens suite is a collection of tools designed to help an LLM understand the structure, purpose, and internal logic of your GlacierEdge archive. Each script contributes a different layer of insight, allowing the system to reason about how your datasets are organized today and how they should be organized in the future.

Dataset Semantic Summarizer

The Dataset Semantic Summarizer creates a short narrative profile for each dataset. It describes how many files the dataset contains, how large it is, what kinds of categories dominate it, and how those categories shape the dataset’s identity.

By examining timestamps, hash coverage, and representative file paths, the summarizer helps the LLM infer whether a dataset is primarily a photo collection, a document archive, a code repository, or something else entirely. This script gives the LLM a clear sense of what each dataset is fundamentally about.

Extension Entropy Analyzer

The Extension Entropy Analyzer evaluates how coherent or chaotic each dataset is by measuring how many different file extensions it contains. A dataset with only a handful of extensions tends to have a clear purpose, while one with dozens or even hundreds of extensions is usually a mixed‑purpose dump. The analyzer assigns an entropy score and explains what that score means, helping the LLM identify datasets that need cleanup or splitting.

Top‑50 Extension Profiler

The Top‑50 Extension Profiler examines the most common extensions in each dataset and describes how they are used. For each extension, it reports how many files use it, how large those files are, and which categories dominate that extension. This allows the LLM to see that a .jpg in one dataset may represent faces, while a .jpg in another dataset may represent scanned documents or sensitive content. The profiler gives a detailed, sentence‑level explanation of how extensions behave differently across datasets.

Category–Extension Cross‑Mapper

The Category–Extension Cross‑Mapper reveals how categories and extensions interact within each dataset. It shows which extensions are associated with photos, documents, archives, source code, or sensitive material, and it highlights mismatches such as source code appearing in a media dataset or tax documents appearing in a photo collection. This mapping helps the LLM understand the semantic fingerprint of each extension and detect files that are out of place.

Anomaly Detector

The Anomaly Detector scans for files that do not belong in their dataset based on category, extension, or path patterns. It identifies misplaced tax files, source code in media folders, adult content outside secure datasets, and oversized archives in working directories. These anomalies become strong signals for reorganization and help the LLM focus on structural problems that need attention.

Purpose Inference Engine The Purpose Inference Engine synthesizes all available metadata to infer the intended role of each dataset. It determines whether a dataset is meant for photos, documents, backups, sensitive material, source code, or long‑term archives. By expressing these conclusions in natural language, the engine gives the LLM a clear understanding of what each dataset is supposed to be, which is essential for proposing a cleaner, more logical folder structure.

3.3 KiB Raw Permalink Blame History Unescape Escape