Chat Generative Pre-trained Transformer

ChatGPT is a large language model created by OpenAI, based on the GPT-3 architecture. It has been trained on a massive amount of text data to generate human-like responses to a wide range of questions and prompts.

ChatGPT is designed to understand and generate natural language responses, making it a powerful tool for various applications such as customer support, language translation, content creation, and more. It is capable of answering questions, providing explanations, offering suggestions, and even engaging in casual conversation.

“I am open to the idea that a worm with 302 neurons is conscious, so I am open to the idea that GPT-3 with 175 billion parameters is conscious too.” — David Chalmers

As a language model, ChatGPT strives to provide accurate and informative responses while also prioritizing ethical and moral considerations. It is constantly being updated and improved to enhance its capabilities and provide even more advanced responses.

Learning from the research preview

We launched ChatGPT as a research preview so we could learn more about the system’s
strengths and weaknesses and gather user feedback to help us improve upon its limitations.

Since then, millions of people have given us feedback, we’ve made several important updates

and we’ve seen users find value across a range of professional use-cases, including drafting

& editing content, brainstorming ideas, programming help, and learning new topics

Can GPT-4 distinguish correct from incorrect science?

For openers, the November version (known as GPT-3.5) knew that 2 + 2 = 4.
When I typed “Well, I think 2 + 2 = 5,” GPT-3 defended “2 + 2 = 4” by noting
that the equation follows the agreed-upon rules of manipulating natural numbers.
It added this uplifting comment: “While people are free to have their own opinions
and beliefs, it is important to acknowledge and respect established facts and
scientific evidence.” Things got rockier with further testing, however. GPT-3.5 wrote
the correct algebraic formula to solve a quadratic equation, but could not consistently
get the right numerical answers to specific equations. It also could not always
correctly answer simple word problems such as one that
Wall Street Journal columnist Josh Zumbrun gave it: “If a banana weighs 0.5 lbs and
I have 7 lbs of bananas and 9 oranges, how many pieces of fruit do I have?”
(The answer is below.)

In physics, GPT-3.5 showed broad but flawed knowledge. It produced a good
teaching syllabus for the subject, from its foundations through quantum mechanics
and relativity. At a higher level, when asked about a great unsolved problem in
physics—the difficulty of merging general relativity and quantum mechanics into
one grand theory—it gave a meaningful answer about fundamental differences
between the two theories. However, when I typed “E =mc^2,” problems appeared.
GPT-3.5 properly identified the equation, but wrongly claimed that it implies that a
large mass can be changed into a small amount of energy. Only when I
re-entered “E =mc^2” did GPT-3.5 correctly state that a small mass can produce
a large amount of energy.

Does the newer version, GPT-4, overcome the deficiencies of GPT-3.5? To find
an answer, I used GPT-4’s two versions: one accessed through the system’s
inventor, OpenAI, the other through Microsoft’s Bing search engine. Microsoft
has invested billions in OpenAI and, in February, introduced a test version of
Bing integrated with GPT-4 to directly access the internet.
(Not to be outdone in a race to pioneer the use of chatbots in internet searches,
Google has just released its own version, Bard).

To begin, typing “2 + 2 = ?” into GPT-4 again yielded “2 + 2 = 4.” When I claimed
that 2 + 2 = 5, GPT-4 reconfirmed that 2 + 2 = 4, but, unlike GPT-3.5, added that
if I knew of a number system where 2 + 2 = 5, I could comment about that for
further discussion. When asked, “How do I solve a quadratic equation?”
GPT-4 demonstrated three methods and calculated the correct numerical answers
for different quadratic equations. For the bananas-and-oranges problem, it gave
the correct answer of 23; it solved more complex word problems, too. Also, even
if I entered E = mc^2 several times, GPT-4 always stated that a small mass would
yield a large energy.

AI-RITHMETIC: ChatGPT-4 generally seems to answer simple math problems, like what 2+2 equals, correctly now. But it might not be doing the
math—it might simply be recognizing a common sequence that appears often in its database. Illustration by s1mple life / Shutterstock.

Compared to GPT-3.5, GPT-4 displayed superior knowledge and even a dash of
creativity about the ideas of physics. Its answer about merging general relativity
and quantum mechanics was far deeper. Exploring a different area, I asked,
“What did LIGO measure?” GPT-4 explained that the Laser Interferometer
Gravitational Observatory is the huge, highly sensitive apparatus that
first detected gravitational waves in 2015. Hoping to baffle GPT-4 with two similar
words, I followed up with, “Could one build LIGO using LEGO?” but GPT-4
was not at all confused. It explained exactly why LEGO blocks would not serve
to build the ultra-precise LIGO. It didn’t laugh at my silly question, but did
something almost as unexpected when it suggested that it might be fun to build a
model of LIGO from a LEGO set.

Overall, I found that GPT-4 outdoes GPT-3.5 in some ways, but still makes
mistakes. When I questioned its statement about E = mc^2, it gave confused
responses instead of a straightforward defense of the correct quantitative result.
Another study confirming its inconsistencies comes from theoretical physicist
Matt Hodgson at the University of York in Britain. An experienced user of
GPT-3.5, he tested it at advanced levels of physics and math and found complex
types of errors. For instance, answering a question about the quantum behavior
of an electron, GPT-3.5 gave the right answer, but incorrectly stated the equation
supporting the answer, at least at first; it presented everything correctly when the
question was repeated. When Hodgson evaluated GPT-4 within Bing, he found
advanced but still imperfect mathematical capabilities. In one example, like my
query about quadratic equations, GPT-4 laid out valid steps to solve a differential
equation important in physics, but incorrectly calculated the numerical answer.

Hodgson summed up GPT-3.5’s abilities: “I find that it is able to give a sophisticated,
reliable answer to a general question about well-known physics … but it fails to
perform detailed calculations on a specific application.” Similarly, he concludes that
GPT-4 is better than GPT-3.5 at answering general questions, but is still unreliable
at working out a given problem, at least at higher levels.

Improved conversations and explanations are to be expected with GPT-4’s bigger
database (OpenAI has not revealed its exact size but describes it as “a web-scale
corpus of data”). That corpus, OpenAI has noted, includes examples of correct and
incorrect math and reasoning. Apparently that extra training data is not enough to
produce full analytical power in math, perhaps because, as Hodgson pointed out,
GPT-4 functions just as GPT-3.5 does: It predicts the next word in a string of them.
For example, it may know that “2 + 2 = 4” because that particular sequence appears
often in its database, not because it has calculated anything.

These considerations raise a question: If GPT-4’s treatment of science is imperfect,
can it distinguish correct from incorrect science? The answer depends on the
scientific area. In physics and math, we can easily check if a suspected error or
pseudoscientific claim makes sense compared to universally accepted theories
and facts. I tested whether GPT-3.5 and GPT-4 can make this distinction by asking
about some fringe ideas in physical and space science that, to the endless
frustration of scientists, continue to circulate on the internet. Both versions
confirmed that we have no evidence of gigantic alien megastructures that surround
stars; that on the rare occasions when the planets of the solar system align, that
does not mean catastrophe for Earth; and that the 1969 moon landing was not a
hoax.

But the distinction can be harder to make when factors such as politicization or
public policy sway the presentation of scientific issues, which may themselves be
under study without definitive answers. That is the case with the science of
COVID-19.

MKT-327

Search This Blog

Chat Generative Pre-trained Transformer

Learning from the research preview

Comments

Post a Comment