group :: socialrom

bookmarks (hide)60
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1MACE - Multi-Annotator Competence Estimation
MACE (Multi-Annotator Competence Estimation) is an implementation of an item-response model that let's you evaluate redundant annotations of categorical data. It provides competence estimates of the individual annotators and the most likely answer to each item. If we have 10 annotators answer a question, and five answer with 'yes' and five with 'no' (a surprisingly frequent event), we would normaly have to flip a coin to decide what the right answer is. If we knew, however, that one of the people who answered 'yes' is an expert on the question, while one of the others just alwas selects 'no', we would take this information into account to weight their answers. MACE does exactly that. It tries to find out which annotators are more trustworthy and upweighs their answers. All you need to provide is a CSV file with one item per line. In tests, MACE's trust estimates correlated highly wth the annotators' true competence, and it achieved accuracies of over 0.9 on several test sets. MACE can take annotated items into account, if they are available. This helps to guide the training and improves accuracy.
4 years ago by @ghagerer
show all tags
annotation-bias
inter-rater-agreement
crowdsourcing
annotation-biasinter-rater-agreementcrowdsourcing
copydelete
- community post
- history of this post
1Building a Private ChatGPT Interface With Azure OpenAI – Baldacchino Automation
https://automation.baldacchino.net/building-a-private-chatgpt-interface-with-azure-openai/
10 months ago by @ghagerer
show all tags
cloud
llms
gpt3
chatgpt
azure
cloudllmsgpt3chatgptazure
copydelete
- community post
- history of this post
1NLP Profiler - Profiling of Textual Dataset | Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources
6 months ago by @ghagerer
show all tags
nlp-profiling
python
data-profiler
data-quality
nlp-profilingpythondata-profilerdata-quality
copydelete
- community post
- history of this post
1LlamaFS: An Open-Source Self-Organizing File system with Llama-3
The recent release of this open-source project, LlamaFS, addresses the challenges associated with traditional file management systems, particularly in the context of overstuffed download folders, inefficient file organization, and the limitations of knowledge-based organization. These issues arise due to the manual nature of file sorting, which often leads to inconsistent structures and difficulty finding specific files. The disorganization in the file system hampers productivity and makes it challenging to locate important files quickly.
2 days ago by @ghagerer
show all tags
metadata
file-system
llamafs
llms
metadatafile-systemllamafsllms
copydelete
- community post
- history of this post
2Statistical Significance Tests for Comparing Machine Learning Algorithms
Comparing machine learning methods and selecting a final model is a common operation in applied machine learning. Models are commonly evaluated using resampling methods like k-fold cross-validation from which mean skill scores are calculated and compared directly. Although simple, this approach can be misleading as it is hard to know whether the difference between mean skill scores is real or the result of a statistical fluke.
5 years ago by @ghagerer
show all tags
machine-learning
statistical-significance-tests
significance-tests
machine-learningstatistical-significance-testssignificance-tests
copydelete
- community post
- history of this post
1Latent Semantic Analysis & Sentiment Classification with Python
Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. LSA is an information retrieval technique which analyzes and identifies the pattern in unstructured collection of text and the relationship between them. LSA itself is an unsupervised way of uncovering synonyms in a collection of documents. To start, we take a look how Latent Semantic Analysis is used in Natural Language Processing to analyze relationships between a set of documents and the terms that they contain. Then we go steps further to analyze and classify sentiment. We will review Chi Squared for feature selection along the way.
5 years ago by @ghagerer
show all tags
downprojection
chi-square
clustering
lsa
downprojectionchi-squareclusteringlsa
copydelete
- community post
- history of this post
1Jensen–Shannon divergence - Wikipedia
In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)[1] or total divergence to the average.[2] It is based on the Kullback–Leibler divergence, with some notable (and useful) differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen-Shannon distance.[3][4][5]
4 years ago by @ghagerer
show all tags
probability-distribution-similarity
probability-distribution-similarity
copydelete
- community post
- history of this post
1Exploratory Data Analysis Using D-Tale
D-Tale is an interactive web-based library that consists of a Flask backend and a React front-end serving as an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/ipython terminals. Currently, this tool supports such Pandas objects as DataFrame, Series, MultiIndex, DatetimeIndex & RangeIndex.
4 years ago by @ghagerer
show all tags
pandas
development
python
visualization
data-science
pandasdevelopmentpythonvisualizationdata-science
copydelete
- community post
- history of this post
3Movie Review Data -- SentiWordNet
This page is a distribution site for movie-review data for use in sentiment-analysis experiments. Available are collections of movie-review documents labeled with respect to their overall sentiment polarity (positive or negative) or subjective rating (e.g., "two and a half stars") and sentences labeled with respect to their subjectivity status (subjective or objective) or polarity. These data sets were introduced in the following papers:
4 years ago by @ghagerer
show all tags
sentiwordnet
sentiment-analysis
dictionary-based
sentiwordnetsentiment-analysisdictionary-based
copydelete
- community post
- history of this post
1What meaning does the length of a Word2vec vector have? - Stack Overflow
When a word appears in different contexts, its vector gets moved in different directions during updates. The final vector then represents some sort of weighted average over the various contexts. Averaging over vectors that point in different directions typically results in a vector that gets shorter with increasing number of different contexts in which the word appears. For words to be used in many different contexts, they must carry little meaning. Prime examples of such insignificant words are high-frequency stop words, which are indeed represented by short vectors despite their high term frequencies ...
4 years ago by @ghagerer
show all tags
word-vector-length
word-vectors
word2vec
word-vector-lengthword-vectorsword2vec
copydelete
- community post
- history of this post
1 Measuring chatbot effectiveness - Visiativ Chatbot Solutions
These measurements are indispensable for tracking the results of your chatbot, identifying any stumbling blocks and continuously improving its performance. But which metrics should you choose?
a year ago by @ghagerer
show all tags
chatbots
ChatGPT
evaluation
kpis
chatbotsChatGPTevaluationkpis
copydelete
- community post
- history of this post
3Google "We Have No Moat, And Neither Does OpenAI"
We’ve done a lot of looking over our shoulders at OpenAI. Who will cross the next milestone? What will the next move be? But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch. I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today.
a year ago by @ghagerer
show all tags
openai
LLMs
open-source
google
openaiLLMsopen-sourcegoogle
copydelete
- community post
- history of this post
1Perplexity in Language Models. Evaluating language models using the… | by Chiara Campagnola | Towards Data Science
Perplexity is a useful metric to evaluate models in Natural Language Processing (NLP). This article will cover the two ways in which it is normally defined and the intuitions behind them. A language…
12 months ago by @ghagerer
show all tags
perplexity
entropy
llms
metrics
perplexityentropyllmsmetrics
copydelete
- community post
- history of this post
1Dynamic Few-Shot Prompting: Overcoming Context Limit for ChatGPT Text Classification | by Iryna Kondrashchenko | Jun, 2023 | Medium
Recent explosion in the popularity of large language models like ChatGPT has led to their increased usage in classical NLP tasks like language classification. This involves providing a context…
12 months ago by @ghagerer
show all tags
scikit-learn
few-shot
llms
gpt3
zero-shot
classification
scikit-learnfew-shotllmsgpt3zero-shotclassification
copydelete
- community post
- history of this post
1LMQL is a programming language for LLM interaction. | LMQL
Language Model Query Language
8 months ago by @ghagerer
show all tags
python
llms
programming
pythonllmsprogramming
copydelete
- community post
- history of this post
1Topic Modeling with Llama 2
In this article, we will explore how we can use Llama2 for Topic Modeling without the need to pass every single document to the model. Instead, we will leverage BERTopic, a modular topic modeling technique that can use any LLM for fine-tuning topic representations.
8 months ago by @ghagerer
show all tags
llama
llms
bert
topic-modeling
llamallmsberttopic-modeling
copydelete
- community post
- history of this post
1Advanced RAG with Knowledge Graphs (Neo4J demo)
I recently created a demo for some prospective clients of mine, demonstrating how to use Large Language Models (LLMs) together with graph databases like Neo4J. The two have a lot of interesting interactions, namely that you can now create knowledge graphs easier than ever before, by having AI find the graph entities and relationships from your unstructured data, rather than having to do all that manually. On top of that, graph databases also have some advantages for Retrieval Augmented Generation (RAG) applications compared to vector search, which is currently the prevailing approach to RAG.
7 months ago by @ghagerer
show all tags
youtube
rag
llms
knowledge-graphs
youtuberagllmsknowledge-graphs
copydelete
- community post
- history of this post
1BERT Vector Space shows issues with unknown words · Issue #164 · google-research/bert · GitHub
I'm not sure what these vectors are, since BERT does not generate meaningful sentence vectors. It seems that this is is doing average pooling over the word tokens to get a sentence vector, but we never suggested that this will generate meaningful sentence representations. And even if they are decent representations when fed into a DNN trained for a downstream task, it doesn't mean that they will be meaningful in terms of cosine distance. (Since cosine distance is a linear space where all dimensions are weighted equally).
5 years ago by @ghagerer
show all tags
cls
sentence-embeddings
bert
clssentence-embeddingsbert
copydelete
- community post
- history of this post
1The Conversational Intelligence Challenge 2 (ConvAI2) - NIPS (NeurIPS) 2018 Competition
There are currently few datasets appropriate for training and evaluating models for non-goal-oriented dialogue systems (chatbots); and equally problematic, there is currently no standard procedure for evaluating such models beyond the classic Turing test. The aim of our competition is therefore to establish a concrete scenario for testing chatbots that aim to engage humans, and become a standard evaluation tool in order to make such systems directly comparable.
5 years ago by @ghagerer
show all tags
chatbots
challenge
chatbotschallenge
copydelete
- community post
- history of this post
1Data Augmentation in NLP - Towards Data Science
In natural language processing (NLP) field, it is hard to augmenting text due to high complexity of language. Not every word we can replace it by others such as a, an, the. Also, not every word has synonym. Even changing a word, the context will be totally difference. On the other hand, generating augmented image in computer vision area is relative easier. Even introducing noise or cropping out portion of image, model can still classify the image.
4 years ago by @ghagerer
show all tags
data-augmentation
data-augmentation
copydelete
- community post
- history of this post

BibSonomy

bookmarks (hide)60
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1MACE - Multi-Annotator Competence Estimation

1Building a Private ChatGPT Interface With Azure OpenAI – Baldacchino Automation

1NLP Profiler - Profiling of Textual Dataset | Kaggle

1LlamaFS: An Open-Source Self-Organizing File system with Llama-3

2Statistical Significance Tests for Comparing Machine Learning Algorithms

1Latent Semantic Analysis & Sentiment Classification with Python

1Jensen–Shannon divergence - Wikipedia

1Exploratory Data Analysis Using D-Tale

3Movie Review Data -- SentiWordNet

1What meaning does the length of a Word2vec vector have? - Stack Overflow

1 Measuring chatbot effectiveness - Visiativ Chatbot Solutions

3Google "We Have No Moat, And Neither Does OpenAI"

1Perplexity in Language Models. Evaluating language models using the… | by Chiara Campagnola | Towards Data Science

1Dynamic Few-Shot Prompting: Overcoming Context Limit for ChatGPT Text Classification | by Iryna Kondrashchenko | Jun, 2023 | Medium

1LMQL is a programming language for LLM interaction. | LMQL

1Topic Modeling with Llama 2

1Advanced RAG with Knowledge Graphs (Neo4J demo)

1BERT Vector Space shows issues with unknown words · Issue #164 · google-research/bert · GitHub

1The Conversational Intelligence Challenge 2 (ConvAI2) - NIPS (NeurIPS) 2018 Competition

1Data Augmentation in NLP - Towards Data Science

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

SocialROM

discussion

tags

bookmarks (hide)60 displayallbookmarks onlybookmarks per page5102050100 sort byadded attitle RSSBibTeXXML

publications (hide) displayallpublications onlypublications per page5102050100 sort byadded attitleauthorpublication dateentry typehelp for advanced sorting... RSSBibTeXRDFmore...

SocialROM

discussion

tags

bookmarks (hide)60
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...