{"id":37462,"date":"2022-01-26T09:00:00","date_gmt":"2022-01-26T07:00:00","guid":{"rendered":"https:\/\/forklog.com\/en\/?p=37462"},"modified":"2025-08-29T17:21:30","modified_gmt":"2025-08-29T14:21:30","slug":"what-is-natural-language-processing","status":"publish","type":"post","link":"https:\/\/forklog.com\/en\/what-is-natural-language-processing\/","title":{"rendered":"What is natural-language processing?"},"content":{"rendered":"<div class=\"wp-block-text-wrappers-cards single_card\">\n<h2 class=\"card_label\"><strong>What is natural-language processing?<\/strong><\/h2>\n<p>Natural-language processing (NLP) is a set of methods that help a computer system understand human speech.<\/p>\n<p>NLP is a branch of <a href=\"https:\/\/forklog.com\/en\/news\/artificial-intelligence-what-it-is-and-how-it-works\">artificial intelligence<\/a>. It remains one of AI\u2019s hardest problems and is still far from fully solved.<\/p>\n<\/div>\n<div class=\"wp-block-text-wrappers-cards single_card\">\n<h2 class=\"card_label\"><strong>When did NLP emerge<\/strong>?<\/h2>\n<p>The roots of NLP go back to the 1950s, when the British scientist Alan Turing published the essay <a href=\"https:\/\/ru.wikipedia.org\/wiki\/%D0%92%D1%8B%D1%87%D0%B8%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D0%BD%D1%8B%D0%B5_%D0%BC%D0%B0%D1%88%D0%B8%D0%BD%D1%8B_%D0%B8_%D1%80%D0%B0%D0%B7%D1%83%D0%BC\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">\u201cComputing Machinery and Intelligence\u201d<\/a>, proposing the <a href=\"https:\/\/ru.wikipedia.org\/wiki\/%D0%A2%D0%B5%D1%81%D1%82_%D0%A2%D1%8C%D1%8E%D1%80%D0%B8%D0%BD%D0%B3%D0%B0\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">\u201cTuring test\u201d<\/a>. One of its criteria is a machine\u2019s ability to interpret and generate human speech.<\/p>\n<p>On January 7th 1954, researchers at Georgetown University demonstrated machine translation. Engineers translated more than 60 Russian sentences into English fully automatically. The event spurred the field and went down in history as the <a href=\"https:\/\/ru.wikipedia.org\/wiki\/%D0%94%D0%B6%D0%BE%D1%80%D0%B4%D0%B6%D1%82%D0%B0%D1%83%D0%BD%D1%81%D0%BA%D0%B8%D0%B9_%D1%8D%D0%BA%D1%81%D0%BF%D0%B5%D1%80%D0%B8%D0%BC%D0%B5%D0%BD%D1%82\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Georgetown experiment<\/a>.<\/p>\n<p>In 1966, the American computer scientist of German origin Joseph Weizenbaum, at MIT, built the first chatbot, ELIZA. The program mimicked a conversation with a psychotherapist using active-listening techniques.<\/p>\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"485\" height=\"314\" src=\"https:\/\/forklog.com\/wp-content\/uploads\/chatbot-eliza.png\" alt=\"What is natural-language processing?\" class=\"wp-image-162928\" srcset=\"https:\/\/forklog.com\/wp-content\/uploads\/chatbot-eliza.png 485w, https:\/\/forklog.com\/wp-content\/uploads\/chatbot-eliza-300x194.png 300w\" sizes=\"auto, (max-width: 485px) 100vw, 485px\" \/><figcaption>A dialogue with \u201cEliza\u201d. Source: Wikipedia.<\/figcaption><\/figure>\n<p>In essence, the system paraphrased the user\u2019s messages to create the impression of understanding. In reality it did not grasp the substance of the dialogue. When it could not find an answer, it typically replied \u201cI see\u201d (\u201c\u041f\u043e\u043d\u044f\u0442\u043d\u043e\u201d) and steered the conversation elsewhere.<\/p>\n<p>That same year the Automatic Language Processing Advisory Committee (ALPAC) released a <a href=\"https:\/\/www.nap.edu\/read\/9547\/chapter\/1\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">report<\/a> concluding that a decade of research had fallen short. Funding for machine translation was sharply cut.<\/p>\n<p>For the next decades breakthroughs were scarce, until the rise of the first machine-learning algorithms in the 1980s. Around the same time, statistical machine-translation systems appeared and research resumed.<\/p>\n<p>NLP boomed in the 2010s as deep-learning algorithms took off. Many of today\u2019s mainstays appeared then\u2014chatbots, autocorrect, voice assistants and the like. Recurrent neural networks became the default tool for many tasks.<\/p>\n<p>Another revolution came in 2019, when OpenAI <a href=\"https:\/\/openai.com\/blog\/better-language-models\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">unveiled the language model<\/a> Generative Pre-Trained Transformer 2, or GPT-2. Unlike earlier generators, it could produce long, coherent passages, answer questions, compose verse and even suggest new recipes.<\/p>\n<p>A year later OpenAI introduced <a href=\"https:\/\/arxiv.org\/abs\/2005.14165\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GPT-3<\/a>, and big technology firms began showcasing their own large language models.<\/p>\n<\/div>\n<div class=\"wp-block-text-wrappers-cards single_card\">\n<h2 class=\"card_label\"><strong>How do NLP systems work?<\/strong><\/h2>\n<p>To answer that, consider how we humans use natural language.<\/p>\n<p>When we hear or read a phrase, several processes occur in parallel:<\/p>\n<ul class=\"wp-block-list\">\n<li>perception;<\/li>\n<li>understanding of meaning;<\/li>\n<li>response.<\/li>\n<\/ul>\n<p>Perception is the conversion of a sensory signal into symbols. We might hear a particular word or see it rendered in different fonts. Either way, the input must be turned into a single representation: words written with letters.<\/p>\n<p>Understanding meaning is the hardest task\u2014one that even people with their natural intelligence often mishandle. Lacking context or misreading a phrase can cause confusion, even serious conflict.<\/p>\n<p>For example, in 1956, at the height of the Cold War between the USSR and the United States, Soviet leader Nikita Khrushchev delivered a speech that <a href=\"http:\/\/content.time.com\/time\/subscriber\/article\/0,33009,867329,00.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">contained the phrase<\/a> \u201cWe will bury you.\u201d Americans took it too literally, construing it as a threat of nuclear attack. In fact, Khrushchev meant only that socialism would outlive capitalism; the phrase was an interpretation of a thesis by Karl Marx.<\/p>\n<p>The incident quickly escalated into an international scandal, prompting apologies from Soviet diplomats and the Communist Party\u2019s general secretary.<\/p>\n<p>Hence the importance of grasping meaning and context\u2014spoken or written\u2014to avoid situations that can affect people\u2019s lives.<\/p>\n<p>A response is the outcome of decision-making. It is comparatively simple: form a set of possible replies based on the perceived meaning, the context and, perhaps, internal states.<\/p>\n<p>NLP algorithms follow much the same pattern.<\/p>\n<p>Perception is the conversion of incoming information into a machine-readable set of symbols. For a chatbot, text is already in that form. For audio or handwriting, it must first be converted\u2014something modern neural networks do well.<\/p>\n<p>Responding to text has also been solved by weighing alternatives and comparing results. A chatbot may return a text answer from its knowledge base; a voice assistant may act on a smart-home device\u2014say, switching on a light.<\/p>\n<p>Understanding is trickier, and worth considering separately.<\/p>\n<\/div>\n<div class=\"wp-block-text-wrappers-cards single_card\">\n<h2 class=\"card_label\"><strong>How do AI systems understand speech?<\/strong><\/h2>\n<p>Today, three approaches to language understanding are common:<\/p>\n<ul class=\"wp-block-list\">\n<li>statistical;<\/li>\n<li>formal-grammatical;<\/li>\n<li>neural.<\/li>\n<\/ul>\n<p><strong>Statistical<\/strong> methods are widely used in machine translation, automated reviewers and some chatbots. The idea is to feed a model huge corpora of text to uncover statistical regularities. Such models are then used to translate or generate text, sometimes with an awareness of context.<\/p>\n<p>The <strong>formal-grammatical<\/strong> approach is a mathematical apparatus that aims to determine the meaning of a natural-language phrase as precisely and unambiguously as a machine can. This is not always possible: some phrases are unclear even to people.<\/p>\n<p>For rich languages such as Russian or English, a precise, detailed mathematical description is extremely difficult. Formal methods are therefore more often used for syntactic analysis of artificial languages, which are designed to remove ambiguity.<\/p>\n<p>In the <strong>neural<\/strong> approach, <a href=\"https:\/\/forklog.com\/en\/news\/what-is-a-neural-network\">neural networks<\/a> in deep learning are used to recognise the meaning of an input phrase and generate a response. They are trained on stimulus\u2013response pairs, where the stimulus is a natural-language phrase and the response is the system\u2019s reply in the same language or some action by the system.<\/p>\n<p>It is a highly promising approach, but it also inherits all the drawbacks of neural networks.<\/p>\n<\/div>\n<div class=\"wp-block-text-wrappers-cards single_card\">\n<h2 class=\"card_label\"><strong>What are NLP systems used for?<\/strong><\/h2>\n<p>NLP systems address many tasks, from building chatbots to analysing vast document collections.<\/p>\n<p>Core NLP tasks include:<\/p>\n<ul class=\"wp-block-list\">\n<li>text analysis;<\/li>\n<li>speech recognition;<\/li>\n<li>text generation;<\/li>\n<li>text-to-speech.<\/li>\n<\/ul>\n<p>Text analysis is the intelligent processing of large volumes of information to find patterns and similarities. It includes information extraction, search, sentiment analysis, question\u2013answering and opinion mining.<\/p>\n<p>Speech recognition converts audio or spoken language into digital information. A simple example: when you address Siri, the algorithm recognises speech in real time and converts it into text.<\/p>\n<p>Text generation is the creation of text using computer algorithms.<\/p>\n<p>Text-to-speech is the reverse of speech recognition\u2014for instance, voice assistants reading information from the internet.<\/p>\n<\/div>\n<div class=\"wp-block-text-wrappers-cards single_card\">\n<h2 class=\"card_label\"><strong>Where are NLP technologies applied?<\/strong><\/h2>\n<p>There are many everyday uses of NLP technologies:<\/p>\n<ul class=\"wp-block-list\">\n<li>email services use Bayesian spam filtering, a statistical NLP method that compares incoming messages with a database to identify junk mail;<\/li>\n<li>text editors such as Microsoft Word or Google Docs use language processing to correct not only grammatical errors but also context-specific ones;<\/li>\n<li>virtual keyboards on modern smartphones can predict subsequent words given sentence context;<\/li>\n<li>voice assistants such as Siri or Google Assistant can recognise users, execute commands, transcribe speech, search the internet, control smart-home devices and more;<\/li>\n<li>accessibility apps on PCs and smartphones can read aloud text and interface elements for the visually impaired thanks to speech-synthesis algorithms;<\/li>\n<li>large language models with vast numbers of parameters, such as GPT-3 or BERT, can generate texts of varying length and genre, assist with search and predict a sentence from its first few words;<\/li>\n<li>machine-translation systems use statistical and language models to translate texts between languages.<\/li>\n<\/ul>\n<\/div>\n<div class=\"wp-block-text-wrappers-cards single_card\">\n<h2 class=\"card_label\"><strong>What difficulties arise when using NLP technologies?<\/strong><\/h2>\n<p>Tasks in NLP have often relied on recurrent neural networks, which suffer from several drawbacks, including:<\/p>\n<ul class=\"wp-block-list\">\n<li>sequential word processing;<\/li>\n<li>an inability to retain large amounts of information;<\/li>\n<li>susceptibility to the <a href=\"https:\/\/towardsdatascience.com\/the-exploding-and-vanishing-gradients-problem-in-time-series-6b87d558d22\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">vanishing\/exploding-gradient problem<\/a>;<\/li>\n<li>the inability to process information in parallel.<\/li>\n<\/ul>\n<p>Beyond that, popular methods often misread context, requiring careful additional tuning.<\/p>\n<p>Large language models address many of these issues, but they bring challenges of their own\u2014chiefly accessibility. Training a large model such as GPT-3 or BERT is difficult, though big companies have increasingly released them openly.<\/p>\n<p>Many models also work only with popular languages, ignoring rarer dialects. That affects a system\u2019s ability to recognise diverse accents.<\/p>\n<p>In text processing via optical character recognition, many algorithms still struggle with handwriting.<\/p>\n<p>Beyond technical shortcomings, NLP can be misused. In 2016, Microsoft launched the chatbot Tay on Twitter, which learned from its human interlocutors. After just 16 hours the <a href=\"https:\/\/www.theverge.com\/2016\/3\/24\/11297050\/tay-microsoft-chatbot-racist\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">company shut the bot down<\/a> after it began posting racist and offensive tweets.<\/p>\n<p>In 2021, fraudsters in the UAE faked the voice of a company executive and convinced a bank employee to transfer $35m to their accounts.<\/p>\n<p>A similar case <a href=\"https:\/\/gizmodo.com\/scammer-successfully-deepfaked-ceos-voice-to-fool-under-1837835066\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">occurred<\/a> in 2019 with a British energy firm, where scammers stole about $243,000 by impersonating the CEO with a synthetic voice.<\/p>\n<p>Large language models can be used for mass spam, harassment or disinformation. The creators of GPT-3 <a href=\"https:\/\/openai.com\/blog\/openai-api\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">warned<\/a> about this. They also said their model exhibits biases toward certain groups. However, OpenAI said it had reduced GPT-3\u2019s toxicity and, at the end of 2021, granted broader developer access and allowed customisation.<\/p>\n<\/div>\n<p>Subscribe to ForkLog on Telegram: <a href=\"https:\/\/t.me\/forklogAI\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">ForkLog AI<\/a> \u2014 all the news from the world of AI!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Natural-language processing is a set of methods that help a computer system understand human speech.<\/p>\n","protected":false},"author":1,"featured_media":37463,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"1","news_style_id":"1","cryptorium_level":"","_short_excerpt_text":"","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[2113],"tags":[2130,438],"class_list":["post-37462","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cryptorium","tag-101-artificial-intelligence","tag-artificial-intelligence"],"aioseo_notices":[],"amp_enabled":true,"views":"81","promo_type":"1","layout_type":"1","short_excerpt":"","is_update":"","_links":{"self":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/37462","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/comments?post=37462"}],"version-history":[{"count":1,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/37462\/revisions"}],"predecessor-version":[{"id":37464,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/37462\/revisions\/37464"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media\/37463"}],"wp:attachment":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media?parent=37462"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/categories?post=37462"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/tags?post=37462"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}