Update README.md

2025-12-26 06:36:40 +00:00
parent f2a42f7058
commit e8a38a7522
1 changed files with 41 additions and 7 deletions
--- a/README.md
+++ b/README.md
@@ -1,9 +1,43 @@
 # CryoLens

-CryoLens Script Suite
-
-Measures how “chaotic” each dataset is based on how many different extensions it contains.
-
-Scripts detect  datasets with too many file types, weak extension dominance, or mixed‑purpose content.
-
-Generated summaries give an LLM a compact understanding of each dataset’s identity so it can propose reorganizations.
+## GlacierLens Script Suite — Overview
+
+The GlacierLens suite is a collection of tools designed to help an LLM understand the structure, purpose, and internal logic of your GlacierEdge archive. Each script contributes a different layer of insight, allowing the system to reason about how your datasets are organized today and how they should be organized in the future.
+
+## Dataset Semantic Summarizer
+
+The Dataset Semantic Summarizer creates a short narrative profile for each dataset.
+It describes how many files the dataset contains, how large it is, what kinds of categories dominate it, and how those categories shape the dataset’s identity.
+
+By examining timestamps, hash coverage, and representative file paths, the summarizer helps the LLM infer whether a dataset is primarily a photo collection, a document archive, a code repository, or something else entirely.
+This script gives the LLM a clear sense of what each dataset is fundamentally about.
+
+## Extension Entropy Analyzer
+
+The Extension Entropy Analyzer evaluates how coherent or chaotic each dataset is by measuring how many different file extensions it contains.
+A dataset with only a handful of extensions tends to have a clear purpose, while one with dozens or even hundreds of extensions is usually a mixed‑purpose dump.
+The analyzer assigns an entropy score and explains what that score means, helping the LLM identify datasets that need cleanup or splitting.
+
+## Top‑50 Extension Profiler
+
+The Top‑50 Extension Profiler examines the most common extensions in each dataset and describes how they are used.
+For each extension, it reports how many files use it, how large those files are, and which categories dominate that extension.
+This allows the LLM to see that a .jpg in one dataset may represent faces, while a .jpg in another dataset may represent scanned documents or sensitive content.
+The profiler gives a detailed, sentence‑level explanation of how extensions behave differently across datasets.
+
+## Category–Extension Cross‑Mapper
+
+The Category–Extension Cross‑Mapper reveals how categories and extensions interact within each dataset.
+It shows which extensions are associated with photos, documents, archives, source code, or sensitive material, and it highlights mismatches such as source code appearing in a media dataset or tax documents appearing in a photo collection.
+This mapping helps the LLM understand the semantic fingerprint of each extension and detect files that are out of place.
+
+## Anomaly Detector
+
+The Anomaly Detector scans for files that do not belong in their dataset based on category, extension, or path patterns.
+It identifies misplaced tax files, source code in media folders, adult content outside secure datasets, and oversized archives in working directories.
+These anomalies become strong signals for reorganization and help the LLM focus on structural problems that need attention.
+
+Purpose Inference Engine
+The Purpose Inference Engine synthesizes all available metadata to infer the intended role of each dataset.
+It determines whether a dataset is meant for photos, documents, backups, sensitive material, source code, or long‑term archives.
+By expressing these conclusions in natural language, the engine gives the LLM a clear understanding of what each dataset is supposed to be, which is essential for proposing a cleaner, more logical folder structure.