Update README.md
This commit is contained in:
48
README.md
48
README.md
@@ -1,9 +1,43 @@
|
||||
# CryoLens
|
||||
|
||||
CryoLens Script Suite
|
||||
|
||||
Measures how “chaotic” each dataset is based on how many different extensions it contains.
|
||||
|
||||
Scripts detect datasets with too many file types, weak extension dominance, or mixed‑purpose content.
|
||||
|
||||
Generated summaries give an LLM a compact understanding of each dataset’s identity so it can propose reorganizations.
|
||||
## GlacierLens Script Suite — Overview
|
||||
|
||||
The GlacierLens suite is a collection of tools designed to help an LLM understand the structure, purpose, and internal logic of your GlacierEdge archive. Each script contributes a different layer of insight, allowing the system to reason about how your datasets are organized today and how they should be organized in the future.
|
||||
|
||||
## Dataset Semantic Summarizer
|
||||
|
||||
The Dataset Semantic Summarizer creates a short narrative profile for each dataset.
|
||||
It describes how many files the dataset contains, how large it is, what kinds of categories dominate it, and how those categories shape the dataset’s identity.
|
||||
|
||||
By examining timestamps, hash coverage, and representative file paths, the summarizer helps the LLM infer whether a dataset is primarily a photo collection, a document archive, a code repository, or something else entirely.
|
||||
This script gives the LLM a clear sense of what each dataset is fundamentally about.
|
||||
|
||||
## Extension Entropy Analyzer
|
||||
|
||||
The Extension Entropy Analyzer evaluates how coherent or chaotic each dataset is by measuring how many different file extensions it contains.
|
||||
A dataset with only a handful of extensions tends to have a clear purpose, while one with dozens or even hundreds of extensions is usually a mixed‑purpose dump.
|
||||
The analyzer assigns an entropy score and explains what that score means, helping the LLM identify datasets that need cleanup or splitting.
|
||||
|
||||
## Top‑50 Extension Profiler
|
||||
|
||||
The Top‑50 Extension Profiler examines the most common extensions in each dataset and describes how they are used.
|
||||
For each extension, it reports how many files use it, how large those files are, and which categories dominate that extension.
|
||||
This allows the LLM to see that a .jpg in one dataset may represent faces, while a .jpg in another dataset may represent scanned documents or sensitive content.
|
||||
The profiler gives a detailed, sentence‑level explanation of how extensions behave differently across datasets.
|
||||
|
||||
## Category–Extension Cross‑Mapper
|
||||
|
||||
The Category–Extension Cross‑Mapper reveals how categories and extensions interact within each dataset.
|
||||
It shows which extensions are associated with photos, documents, archives, source code, or sensitive material, and it highlights mismatches such as source code appearing in a media dataset or tax documents appearing in a photo collection.
|
||||
This mapping helps the LLM understand the semantic fingerprint of each extension and detect files that are out of place.
|
||||
|
||||
## Anomaly Detector
|
||||
|
||||
The Anomaly Detector scans for files that do not belong in their dataset based on category, extension, or path patterns.
|
||||
It identifies misplaced tax files, source code in media folders, adult content outside secure datasets, and oversized archives in working directories.
|
||||
These anomalies become strong signals for reorganization and help the LLM focus on structural problems that need attention.
|
||||
|
||||
Purpose Inference Engine
|
||||
The Purpose Inference Engine synthesizes all available metadata to infer the intended role of each dataset.
|
||||
It determines whether a dataset is meant for photos, documents, backups, sensitive material, source code, or long‑term archives.
|
||||
By expressing these conclusions in natural language, the engine gives the LLM a clear understanding of what each dataset is supposed to be, which is essential for proposing a cleaner, more logical folder structure.
|
||||
Reference in New Issue
Block a user