Update README.md

This commit is contained in:
2025-12-26 06:36:40 +00:00
parent f2a42f7058
commit e8a38a7522

View File

@@ -1,9 +1,43 @@
# CryoLens
CryoLens Script Suite
Measures how “chaotic” each dataset is based on how many different extensions it contains.
Scripts detect datasets with too many file types, weak extension dominance, or mixedpurpose content.
Generated summaries give an LLM a compact understanding of each datasets identity so it can propose reorganizations.
## GlacierLens Script Suite — Overview
The GlacierLens suite is a collection of tools designed to help an LLM understand the structure, purpose, and internal logic of your GlacierEdge archive. Each script contributes a different layer of insight, allowing the system to reason about how your datasets are organized today and how they should be organized in the future.
## Dataset Semantic Summarizer
The Dataset Semantic Summarizer creates a short narrative profile for each dataset.
It describes how many files the dataset contains, how large it is, what kinds of categories dominate it, and how those categories shape the datasets identity.
By examining timestamps, hash coverage, and representative file paths, the summarizer helps the LLM infer whether a dataset is primarily a photo collection, a document archive, a code repository, or something else entirely.
This script gives the LLM a clear sense of what each dataset is fundamentally about.
## Extension Entropy Analyzer
The Extension Entropy Analyzer evaluates how coherent or chaotic each dataset is by measuring how many different file extensions it contains.
A dataset with only a handful of extensions tends to have a clear purpose, while one with dozens or even hundreds of extensions is usually a mixedpurpose dump.
The analyzer assigns an entropy score and explains what that score means, helping the LLM identify datasets that need cleanup or splitting.
## Top50 Extension Profiler
The Top50 Extension Profiler examines the most common extensions in each dataset and describes how they are used.
For each extension, it reports how many files use it, how large those files are, and which categories dominate that extension.
This allows the LLM to see that a .jpg in one dataset may represent faces, while a .jpg in another dataset may represent scanned documents or sensitive content.
The profiler gives a detailed, sentencelevel explanation of how extensions behave differently across datasets.
## CategoryExtension CrossMapper
The CategoryExtension CrossMapper reveals how categories and extensions interact within each dataset.
It shows which extensions are associated with photos, documents, archives, source code, or sensitive material, and it highlights mismatches such as source code appearing in a media dataset or tax documents appearing in a photo collection.
This mapping helps the LLM understand the semantic fingerprint of each extension and detect files that are out of place.
## Anomaly Detector
The Anomaly Detector scans for files that do not belong in their dataset based on category, extension, or path patterns.
It identifies misplaced tax files, source code in media folders, adult content outside secure datasets, and oversized archives in working directories.
These anomalies become strong signals for reorganization and help the LLM focus on structural problems that need attention.
Purpose Inference Engine
The Purpose Inference Engine synthesizes all available metadata to infer the intended role of each dataset.
It determines whether a dataset is meant for photos, documents, backups, sensitive material, source code, or longterm archives.
By expressing these conclusions in natural language, the engine gives the LLM a clear understanding of what each dataset is supposed to be, which is essential for proposing a cleaner, more logical folder structure.