🐟 🐟 🐟 🐟 🐟

School Your Documents with
Markov Chains

Fast, interpretable document relevance prediction in Python.
50K+ inferences per second. No GPU required.

Get Started Star on GitHub
pip install markrel

Why markrel?

🎯

Learns From Your Data

Unlike fixed thresholds, markrel learns P(relevance) from your specific domain. Each similarity bin discovers its own probability.

Lightning Fast

50,000+ predictions per second on CPU. Perfect for real-time applications and high-throughput pipelines.

🔍

Interpretable

See exactly why each document was selected. Inspect P(relevance) for every similarity bin.

🧩

Embedding Agnostic

Works with BERT, OpenAI, sentence-transformers, or TF-IDF. Use the embeddings that work best for you.

Benchmarks

Tested on WikiQA dataset (6,165 question-answer pairs)

Embedding Model Dimensions F1 Score AUC Speed
BGE-M3 1024 0.343 0.815 51K/s
RoBERTa-large 1024 0.323 0.828 54K/s
MiniLM-L6 384 0.322 0.799 61K/s

Optimized Configurations

F1
0.370
35 bins, euclidean
Recall
1.000
7 bins, euclidean
Precision
1.000
24 bins, cos+euc

Quick Start

Python
# Install
pip install markrel

# Import and train
from markrel import MarkovRelevanceModel

model = MarkovRelevanceModel(
    metrics=["euclidean"],  # Best single metric
    n_bins=35              # Optimized for F1
)

model.fit(queries, documents, labels)

# Predict relevance
probs = model.predict_proba(new_queries, new_documents)
# [0.82, 0.15, 0.91, ...]
With Modern Embeddings
from sentence_transformers import SentenceTransformer

# Load BGE-M3 (best per benchmarks)
encoder = SentenceTransformer('BAAI/bge-m3')

# Encode your texts
query_emb = encoder.encode(["what is ML?"])
doc_emb = encoder.encode(["machine learning is..."])

# Train with embeddings
model = MarkovRelevanceModel(
    metrics=["euclidean"],
    use_text_vectorizer=False
)
model.fit(query_emb, doc_emb, [1])

How It Works

📥
Input
Query + Document
🔤
Embed
BGE-M3 or TF-IDF
📏
Similarity
8 metrics
🐟
Markov Chain
Learn P(relevance)
Output
Relevance Score

Perfect For

🔍 Semantic Search Re-ranking
📧 Email Classification
📄 Document Similarity
🤖 Chatbot Response Selection
Real-time Filtering
📊 Recommendation Systems