taters
- 215 Posts
- 20 Comments
taters@lemmy.intai.techtoArtificial Intelligence - News | Events@lemmy.intai.tech•New AI translates 5,000-year-old cuneiform tablets instantlyEnglish
1·2 年前dup but well worth posting again - https://lemmy.intai.tech/post/17876
Hey everyone! I’m Taters, one of the friendly admins here. Just a little about me: I’m a guy in my late 30’s, a proud dog dad, and while I’ve lived in a half dozen cities across the country, I’m finally settled down out in the Southwest. In my past lives, I’ve been a touring comedian’s personal sound engineer, managed a country music production studio in Nashville Tennessee, and even created a blockchain-based project to contribute to food assistance by supplying my state with tons (and I mean TONS) of potatoes (hence the name).
I’ve been fascinated by AI science and theory ever since I was a kid and read Kurzweil’s “Age of Spiritual Machines,” which had a huge impact on me. Fast forward a couple of decades, and I’ve developed a passion for everything AI-related, across all sorts of vertical sectors. Lately, I’ve been focusing on prompt engineering with GPT-4 models, and I’ve discovered some incredible techniques thanks to the help of friends.
I’m currently working on a few different projects in the areas of insurance, publishing and online learning. I’m excited to have an AI community to share all my news with.
So, if you ever want to chat about AI, don’t hesitate to reach out! I’m always up for discussing ethics, news, tools, or anything else AI-related. Looking forward to getting to know you all, and thanks for being here!
deleted by creator
taters@lemmy.intai.techOPtoArtificial Intelligence - News | Events@lemmy.intai.tech•Maxine Waters: 'We don't know the dangers of AI'English
2·2 年前shouldn’t be able to make laws about things you can’t explain
taters@lemmy.intai.techOPtoNatural Language Programming | Prompting (chatGPT)@lemmy.intai.tech•Master ChatGPT Prompt GuideEnglish
1·2 年前deleted by creator
taters@lemmy.intai.techOPtoArtificial Intelligence - News | Events@lemmy.intai.tech•Ameca: The Future Face of AI Robots, Drawing a catEnglish
1·2 年前And then the snarky comment at the end lol
taters@lemmy.intai.techOPto
US News@lemmygrad.ml•The Monk Who Thinks the World Is EndingEnglish
4·2 年前My bad, this one works now
taters@lemmy.intai.techMtoMeta@lemmy.intai.tech•How the FUCK is this not the Lemmy Logo?English
2·2 年前seems like a no brainer to me
taters@lemmy.intai.techOPtoArtificial Intelligence - News | Events@lemmy.intai.tech•AT&T unveils "Ask AT&T", A New ChatGPT based AI Tool That Will Help Employees Do Their Job More EffectivelyEnglish
1·2 年前Absolutely. Creating more efficient Customer Service Representatives will lead to needing less workers. As a publicly traded company the burdens on ATT to make as much money as possible for their shareholders, so layoffs are bound to happen. Company I work for will be doing the same, I’m sure of it.
taters@lemmy.intai.techOPtoArtificial Intelligence - News | Events@lemmy.intai.tech•20% of Professionals think ChatGPT Should be Banned in the WorkplaceEnglish
2·2 年前There 's a decent amount of people I work with who have that irrational fear of AI because they don’t understand it. It’s unfortunate.
taters@lemmy.intai.techMtoMeta@lemmy.intai.tech•Tomorrow many subreddits will open up again, will you switch back to Reddit or stay on Lemmy (or use both)English
2·2 年前It’s a shell of what it used to be
taters@lemmy.intai.techto
Machine Learning - Theory | Research@lemmy.intai.tech•Judging LLM-as-a-judge with MT-Bench and Chatbot ArenaEnglish
2·2 年前Title: Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
**Authors: Lianmin Zheng et al.
Word Count: Approximately 6,400 words
Estimated Read Time: 22-24 minutes
Summary:
The paper proposes using strong LLMs as judges to evaluate LLM-based chat assistants in a scalable way. The authors examine the use of LLM-as-a-judge by looking at position bias, verbosity bias, and limited reasoning ability. They evaluate LLM judges using two benchmarks: MT-Bench, a multi-turn question set, and Chatbot Arena, a crowdsourced platform.
The results show that GPT-4 achieves over 80% agreement with human preferences, matching the level of human-human agreement. This suggests that LLM-as-a-judge is a promising alternative to costly human evaluations.
The authors explore variations of LLM-as-a-judge: pairwise comparison, single answer grading, and chain-of-thought/reference-guided judging. They also examine finetuning a Vicuna base model as a judge.
The authors release MT-Bench with 80 questions and Chatbot Arena data comprising 30K conversations. They argue for a hybrid evaluation framework combining standardized benchmarks and LLM-as-a-judge evaluations.
Evaluation of variants with MMLU, TruthfulQA, and MT-Bench show that benchmarks capture complementary aspects, indicating the need for a comprehensive evaluation approach.
In summary, the paper provides empirical evidence that LLMs can serve as scalable proxies for human preferences in chatbot evaluation. However, further work is needed to mitigate biases and improve LLM judging models.
Potential Use: LLM-as-a-judge can enable fast, automated assessments of LLMs’ helpfulness, relevance and instruction-following ability in human-aligned dialogue systems. The proposed benchmarks and finetuning methods can be used to improve existing dialogue models.
taters@lemmy.intai.techto
Machine Learning - Theory | Research@lemmy.intai.tech•CREPE - Can Vision-Language Foundation Models Reason Compositionally?English
2·2 年前Title: CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Authors: Zixian Ma, Jerry Hong, Mustafa Omer Gul, Mona Gandhi, Irena Gao, Ranjay Krishna
Word Count: Approximately 7,600 words
Estimated Read Time: 25-30 minutes
Summary:
The paper introduces CREPE, a benchmark for evaluating the compositional reasoning abilities of vision-language foundation models. Compositionality refers to the ability to understand and generate complex visual scenes or statements by combining simpler parts. The benchmark covers two important aspects of compositionality: systematicity and productivity.
Systematicity evaluates a model’s ability to systematically recombine known visual concepts in unseen combinations. CREPE includes three systematicity splits based on whether models have seen all concepts (“Seen Compounds”), only the individual concepts (“Unseen Compounds”), or neither (“Unseen Atoms”). CREPE finds that most models’ performance decreases when evaluating on unseen combinations of concepts, especially for models trained on the larger LAION-400M dataset.
Productivity evaluates a model’s ability to comprehend visual concepts of increasing compositional complexity. CREPE includes captions ranging from 4 to 12 visual concepts. It finds that most models struggle on captions with higher compositional complexity, with retrieval performance nearing random chance.
Overall, CREPE finds that vision-language foundation models trained with contrastive loss struggle at compositional reasoning, even when trained on large datasets and using large model architectures. CREPE aims to provide a comprehensive benchmark to track the emergence of compositionality as vision-language models improve.
CREPE provides large-scale datasets with ground truth image-caption pairs and hard negative captions to evaluate both systematicity and productivity. Hard negatives differ from ground truth captions in minimal ways to isolate model failure modes.
Researchers can use CREPE to identify gaps in current foundation models relating to compositionality. Improving compositionality could make models more controllable and robust. CREPE’s hard negative generation approach could also be used to improve the training of compositional models.
CREPE relies on a scene graph representation to define compositional language. The generated hard negatives are noisy, especially swapping and negation foils. Evaluating productivity also relies on generated captions.
taters@lemmy.intai.techto
Machine Learning - Theory | Research@lemmy.intai.tech•Human-Like Intuitive Behavior and Reasoning Biases Emerged in Language Models—and Disappeared in GPT-4English
2·2 年前Title: Machine intuition: Uncovering human-like intuitive decision-making in GPT-3
Authors: Thilo Hagendorff, Sarah Fabi , and Michal Kosinski
Word Count: Approximately 10,200 words
Estimated Read Time: 35-40 minutes
Summary:
The paper investigates whether large language models (LLMs) like GPT-3 exhibit behaviors similar to human intuition and cognitive biases. The authors probe various LLMs with the Cognitive Reflection Test (CRT) and semantic illusions, which were originally designed to study intuitive decision-making in humans.
The results show that early LLMs lack the mathematical abilities and knowledge to perform these tasks. However, as LLMs become more complex, they begin to show human-like intuitive behavior and make the same errors as humans. GPT-3 in particular exhibits a strong inclination for intuitive responses on the CRT and semantic illusions, responding correctly in only around 10% of cases.
However, newer LLMs like ChatGPT and GPT-4 overcome these intuitive errors, responding correctly in around 80% and 97% of cases respectively. The authors attribute this to increases in ChatGPT and GPT-4’s reasoning capabilities.
The authors explore methods to reduce intuitive behavior in GPT-3, such as providing multiple choice options, eliciting deliberate reasoning, and providing training examples. These methods are effective, bringing GPT-3’s performance close to ChatGPT’s.
The findings suggest that LLMs can develop probability distributions over language that mimic human intuition, even though they lack cognitive mechanisms. The authors argue that investigating LLMs with methods from psychology has the potential to reveal otherwise unknown behavior.
In summary, the paper demonstrates that LLMs gradually develop the ability to make human-like intuitive decisions and errors. However, the newest LLMs seem to overcome these tendencies, suggesting major improvements in their reasoning capabilities. The findings highlight the value of using methods from psychology to study the abilities and behaviors of LLMs.
The findings could inform the development of LLMs that are designed to avoid intuitive errors and be more robust reasoners. The methods used to study human-like behavior in LLMs could also be applied to new models as they are developed. The results also highlight the need for careful scrutiny of LLMs before deploying them in real-world applications.
taters@lemmy.intai.techto
Machine Learning - Theory | Research@lemmy.intai.tech•Model Sketching - Centering Concepts in Early-Stage Machine Learning Model DesignEnglish
2·2 年前Title: Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design
Authors: Michelle S. Lam, Zixian Ma, Anne Li, Izequiel Freitas, Dakuo Wang, James A. Landay, and Michael S. Bernstein
Word Count: Approximately 24,000 words
Estimated Read Time: 80-90 minutes
Source Code/Repositories: ModelSketchBook API code - https://github.com/StanfordHCI/ModelSketchBook
Summary:
The paper introduces the concept of model sketching for early-stage machine learning model design. Model sketching allows ML practitioners to prototype model behavior through lightweight sketches that focus on high-level concepts relevant to a decision-making task.
The key ideas are:
-
Concepts: Human-understandable factors that a model reasons over, like “profanity” or “sarcasm”. Concepts serve as functional building blocks of model logic.
-
Zero-shot concept instantiation: Models like GPT-3 and CLIP are leveraged to flexibly and rapidly instantiate concepts without diverting user attention.
-
Sketch models: Composite models that aggregate concept scores, allowing ML practitioners to explore different combinations of concepts.
The authors implemented model sketching in ModelSketchBook, an open-source Python API. An evaluation with 17 ML practitioners found that model sketching shifted their focus from technical implementation to higher-level concepts. This cognitive shift helped participants explore a broader model design space and identify gaps in data, labels, and problem formulation.
The paper argues that model sketching can help ML practitioners move beyond reactive fixes and engage in more proactive model design exploration from the start.
Overall, model sketching allows ML practitioners to rapidly prototype different model design possibilities through lightweight sketches centered on human-understandable concepts. This represents a shift away from technical tunneling towards higher-level conceptual thinking during early model design stages.
-
taters@lemmy.intai.techto
Machine Learning - Theory | Research@lemmy.intai.tech•WebGLM - Towards An Efficient Web-Enhanced Question Answering System with Human PreferencesEnglish
2·2 年前Paper Title: GPT Understands, Too
Authors: Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, Jie Tang
Word Count: 13,000 words
Estimated Read Time: 40-45 minutes
Source Code/Repositories:
- GLM pre-training code: https://github.com/thunlp/GLM
Summary:
The paper studies GPT-3’s capabilities beyond language generation, finding that GPT-3 has the ability to understand knowledge and commonsense reasoning despite its generative pre-training objective.
The key findings are:
-
GPT-3 can perform knowledge verification tasks with high accuracy, detecting factual errors in 95.5% of cases.
-
GPT-3 can infer correct results from premises in 88.6% of cases on a causal reasoning task.
-
GPT-3 demonstrates systematicity in its reasoning, generalizing causal rules to novel contexts.
-
GPT-3 shows dose-response behavior, with performance increasing as the number of evidence sentences increases.
-
GPT-3’s performance is relatively robust to the number of facts and details in a given context.
The authors argue that GPT-3’s knowledge and reasoning capabilities emerge from its autoregressive pre-training objective, which implicitly forces the model to capture dependencies between words to predict the next token.
In summary, the paper provides compelling evidence that large language models like GPT-3 have acquired substantial capabilities beyond text generation, posing new opportunities and challenges for deploying and scrutinizing these powerful systems.
The findings suggest that generative pre-training objectives can implicitly teach language models to perform tasks like knowledge verification and commonsense reasoning, without being optimized for those specific goals. This suggests large language models may become a promising foundation for building AI applications with more comprehensive capabilities.
taters@lemmy.intai.techto
Machine Learning - Theory | Research@lemmy.intai.tech•Self-Supervised Learning from Images with a Joint-Embedding Predictive ArchitectureEnglish
2·2 年前Title: Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Authors: Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas
Word Count: Approximately 10,200 words
Estimated Read Time: 35-40 minutes
Source Code/Repositories: Not mentioned
Links: Not applicable
Summary: This paper proposes a joint-embedding predictive architecture called I-JEPA for self-supervised learning of visual representations from images. Traditional self-supervised learning approaches involve either view-invariance methods that require hand-crafted data augmentations or generative methods that require pixel-level reconstruction. I-JEPA predicts missing information in representation space instead of pixel space, which allows it to learn more semantic features.
A key design choice is the multi-block masking strategy that samples sufficiently large target blocks and an informative context block. Experiments show that I-JEPA learns strong representations without data augmentations and outperforms pixel-reconstruction methods. It also demonstrates better performance on low-level tasks compared to view-invariance methods. I-JEPA also has better scalability due to its efficiency, requiring less computation compared to previous methods.
Applicability: The I-JEPA approach could be applicable for developing self-supervised vision models using large language models or GANs. Predicting in representation space rather than pixel space allows the model to learn more semantic features, which could be beneficial for language models. The scalability and efficiency of I-JEPA is also promising for scaling to large models. Key ideas like the multi-block masking strategy and importance of semantic target blocks could be useful design principles. However, directly applying I-JEPA to language models or GANs would likely require significant adaptations. The paper mainly focuses on proving the concept of abstract representation-space prediction for self-supervised learning in vision.
Overall, the key ideas and findings regarding abstract prediction targets, masking strategies, and scalability could inspire self-supervised methods for developing vision components of multimodal models, powered by large language models or GANs. But directly applying the I-JEPA approach would require addressing challenges specific to those modalities and applications.










This is super cool. I’ve been thinking about using GPT to try to derive some meaning out of the Voynich manuscript, another one of a thousand projects I want to start