Help


Frequently Asked Questions (FAQs)

Why does the View Text link on the full Page screen show misspellings and badly-formed words?

The View Text option in Pennsylvania Newspaper Archive displays machine-generated text that is produced by Optical Character Recognition (OCR) software. OCR is a fully automated process that converts the visual image of numbers and letters into computer-readable numbers and letters. Computer software can then search the OCR-generated text for words, phrases, numbers, or other characters. However, OCR is not 100 percent accurate, and, particularly if the original item has extraneous markings on the page, unusual text styles, or very small fonts, the searchable text OCR generates will contain errors that cannot be corrected by automated means. Digitization of microfilmed newspapers inherently includes a wide range of image quality in the content (quality derived from the original newspaper, the original newspaper when it was microfilmed and associated deterioration, or the film itself.)

Although errors in the process are unavoidable, OCR is still a powerful tool for making text-based items accessible to searching. For example, important concept words often appear more than once within an article. Therefore, if OCR misreads one instance of a key word in a passage, but correctly reads the second instance, the passage will still be found in a full-text search.

Why do diacritics and non-English language characters sometimes appear "Romanized" or not in their original alphabets?

The Newspaper Directory provides access to newspaper title records cataloged according to standard bibliographic rules. Until recently, most non-English language characters were difficult to represent in library records and so Romanization - or standard rules for transliterating other alphabets to the Roman alphabet - was used to convey phonetic pronunciations of non-English words.

How do I cite or reference newspaper Directory records and pages for re-use (e.g., in a Web site or other electronic display) or reference (e.g., in a bibliography or journal article)?

Pennsylvania Newspaper Archive supports persistent links to newspaper directory records and pages by providing a predictable URL, displayed in the descriptive information for that object. Using the proposed URI Template syntax the links will use the pattern:

  • https://panewsarchive.psu.edu/lccn/{lccn}/{date}/ed-{edition}/seq-{image sequence}

Where:

  • lccn is the # Library of Congress Control Number for the newspaper
  • date is the date of the issue, specified as yyyy-mm-dd (e.g. 1902-01-30)
  • edition is the edition number for that date (e.g. 1)
  • seq is the image sequence number (e.g. 23) for that issue.

For example:

  • https://panewsarchive.psu.edu/lccn/sn84026749/1903-01-08/ed-1/seq-2

When describing Pennsylvania Newspaper Archive as the source of content, please use the URL and a website citation, such as "from Pennsylvania Newspaper Archive".


How to View

Images of historic newspaper pages, as well as uncorrected page text, are displayed through your web browser. However, Pennsylvania Newspaper Archive also contains high-resolution images (JPEG2000) and enhanced text (PDF) that may require special viewers. Most viewers can be downloaded free from vendor sites. The links below explain the various formats used and how to access them.

Download and View Pages Offline

PDF
(Portable Document Format, .pdf)
Used for page images Adobe Acrobat Reader

Adobe text-only download page
- Sample PDF
- About this sample
JPEG2000
(.jp2)
- Wavelet compression technology

- Tiling supports decompression of only that portion of the image requested by the user

- Compression ratio is approximately 20:1, depending on image content and color depth
Windows:
- ERDAS ER Viewer
- Kdu_show
- IrfanView with JPEG2000 plug-in

OS X:Preview supports baseline JP2 only; commercial software may be needed to view tiled JP2 files, such as those in Pennsylvania Newspaper Archive.
- Sample JPEG2000 page
- About this sample

Some Web browsers incorrectly assume that Quicktime (automatically included with the browser software) can display a JPEG2000 image (JPEG2000, or .jp2, is not a "native" Web format.) To counteract this effect, download the JPEG2000 (JP2) image by "right-click*quot;-ing with the mouse on the image link --e.g., "JP2 (4.0 Mb)". In the dialog box that appears, you will see "Save Link As..." or "Save Target As..." (depending on the Web browser used). Selecting this option will result in downloading the image to your desktop for further review.

To view the JPEG2000 (.jp2) file you will need a JPEG2000-friendly software, such as those listed above.


Basic Searching in Pennsylvania Newspaper Archive

Pennsylvania Newspaper Archive provides access to historic newspaper pages digitized under the NEH/LC National Digital Newspaper Program (NDNP). For more information on the scope and content of the program, click here (http://www.neh.gov/projects/ndnp.html).

Search Pennsylvania Newspaper Archive to find

  • information on persons, places, or events;
  • specific topics or news of the day;
  • concepts or ideas;
  • unique passages of text, such as the source of a frequently-quoted phrase.

Users of Pennsylvania Newspaper Archive have the option of performing basic or advanced searches. The basic search box is designated as the blue Search Pages tab and is found on many of the pages of the site. Basic search options are limited to state, time period, and key words located near each other. The basic search returns all supported languages.

For basic searches, results listed first are most likely to be relevant to your search. Results will appear higher in the list when they contain

  • exact matches of your search terms;
  • more of your search terms;
  • repeated search terms;
  • search terms that occur near each other.

Your searches will yield better results if you keep the following points in mind:

  • Common words such as and, not, and the are ignored by the search engine.
  • Case of letters is ignored. For example, Civil and civil are treated the same.
  • Diacritic characters (accent marks, in non-English text) and other special characters produce inaccurate results, so plain (unaccented) letters should be substituted for letters with diacritics.

Pennsylvania Newspaper Archive's search engine utilizes language-specific dictionaries toinclude word variants for your search terms. This is often called stemming. For example, the search term house, when stemmed in English, would also return words like houses and housing.

For more search options, see Advanced Searching in Pennsylvania Newspaper Archive below. For information about language support in Pennsylvania Newspaper Archive, see Searching by Language in Pennsylvania Newspaper Archive.

Advanced Searching in Pennsylvania Newspaper Archive

To make the most of searching this text, take advantage of the search options provided on the Search page.

  • To limit your search to particular geographic area, select one or more States.
  • Or, you can limit your search to a particular newspaper, or select several newspapers, picked from the list of titles currently available in Pennsylvania Newspaper Archive.
  • In addition or alternatively, you can search the entire date range available (default), or select a specific date and limit your search to a specific year, month, or even day, using the begin date and end date lists provided. (Note: selecting the same begin month/day/year and end month/day/year will provide links to every page available for that specific date.)
  • In addition or alternatively, enter a specific search term or terms in the Keyword boxes provided. The operators provided will influence the results of your search significantly and can be used in separate searches or in conjunction within a single search.

Search for a Phrase

  • Select the Advanced Search tab and enter your phrase in the appropriate "...with the phrase" search box.
  • When searching for a phrase, enter the words in the order they are most likely to occur.
  • The order of search words does not affect the scope of the search results, but it will affect the order of their display.

Search for Words Near Each Other

  • Select the Advanced Search tab and enter your keywords into the "...with the words" search box.
  • Select a numeric value for how close the words should be to each other (proximity).
  • This type of search can be helpful in narrowing results on a given person, place or event to a specific aspect of that person, place or event. For example: "Roosevelt conservation" within 10 words will result mostly in articles about President Theodore Roosevelt's Conservation policies during his administration.

Too Many Results - If a search generates too many results, try using more specific terms and/or limiting to a specific State of publication or a particular newspaper title. Use the search box options in combination to narrow your results. For example, use "President Roosevelt" as phrase and "Roosevelt conservation" within 10 words to narrow results to text about only President Roosevelt's conservation policies.


Too Few Results - If a search generates too few results, try alternate terms or broader subjects and relax any limiting criteria (date ranges, state limitations, etc.).


Because language changes, be sure to use search terms used at the time the materials were created, even if those terms are now obsolete. For example, the following historic terms will produce more results than their modern-day counterparts:

Modern Usage vs. Historic Usage comparison table
Modern UsageHistoric Usage
gas, service stationfilling station
African AmericanAfro American, Negro
voting rightssuffrage

Use the names of towns, landmarks, bridges, buildings, and other geographic features that were current when the materials you are searching were created. For instance, the state of Oklahoma was referred to as both "Indian Territory" and "Oklahoma Territory" prior to its admission as a state, so searching for "Indian Territory" may produce more search results if searching on topics related to Oklahoma.

Matching a phrase can be useful for searching place names or when common words have a particular sense used in combination.

For example, the term "normal school" was used in the early twentieth century to describe schools for training teachers. Searching for the phrase may eliminate results containing the words "normal" and "school" in unrelated ways.

Note: Some very common words, such as and, of, the, a, and to, are ignored even when matching exact phrases.

Search and Browsing Tips

  • Many browsers have the capability of tabbed browsing, which opens a new pane in the current window, either in the background or the foreground. Users of Pennsylvania Newspaper Archive have reported this as a useful method of navigating through search results- bringing up each result in a new tab. This may be accomplished by clicking with the right-hand mouse button (for Mac, hold down the Command key) and selecting "Open Link in New Tab."
  • Search results are displayed on a page that can easily be bookmarked or navigated to via the "Back" button on the browser. Every page in the Pennsylvania Newspaper Archive application can be bookmarked, but only the addresses containing newspaper pages should be treated as canonical for purposes of citations and long-term referrals. These addresses are displayed in the address bar of the browser, and no special treatment is required for adding them to a citation database. (Select the "Persistent Link" URL displayed on each newspaper page view to store the link without search text highlighted.)
  • All pages are digitally scanned - primarily from microfilm, described, and automatically processed for full-text searching through a process called Optical Character Recognition (OCR). This text is organized in normal reading order (by column) and left uncorrected. Search strategies may take this into consideration (i.e., searching for shorter words and phrases when possible in order to maximize the number of search results returned).
  • One helpful way to use the full-text search feature is to enter a term or phrase containing many words that characterize the topic you wish to investigate. A full-text search will then retrieve pages with similar passages, displaying thumbnail page images with red highlights visible representing the occurrence of searched terms. This visual interface allows for quick review of full pages and search terms to determine the most useful results to view at full-size. An alternate results list is available through the List View, which will display descriptive textual links to individual pages, where search terms will be highlighted in red wherever they occur on the page.
  • Selecting a search result will bring up the newspaper page, initially displaying the full page. To read or view the page more closely, select the + or - to magnify the image, use the mouse scroll wheel, or simply click on the page image. Additionally, you can use the cursor hand to "grab" and move the image any direction, within the page frame. To return to the original full-page display, select the "go home" icon on the floating navigation bar.
  • In addition to the action icons used for this page image, other icons on this bar provide access to alternate digital formats for this newspaper page which can be downloaded. Click on the text link to download these formats.
  • In some newspapers in Pennsylvania Newspaper Archive, issues or pages in logical sequence are not available digitally (usually because images were absent from the microfilm used for digitization). Whenever possible, any known information about these issues is provided, as follows:
    • Not digitized, published
    • Not digitized, not published
    • Not digitized, publishing unknown

    A good and historically significant example of missing issues is in the San Francisco Call, where the April 19th and April 20th issues from 1906 are missing due to the devastating San Francisco earthquake that prevented the newspaper from publishing on those days.

Searching by Language

Pennsylvania Newspaper Archive supports language-specific searching in English, French, German, Italian, and Spanish (although not all languages may be represented at this time). By default, in both Basic and Advanced Search, all content is searched together regardless of language. To limit searches to a specific language, conduct an Advanced Search and choose the appropriate language from the Language drop-down menu. For additional technical information on how languages are encoded and identified for search, see current NDNP Technical Guidelines at http://www.loc.gov/ndnp/guidelines/.

Why use language-specific search?

Pennsylvania Newspaper Archive's search engine utilizes language-specific dictionaries to include word variants for your search terms. This is often called "stemming". For example, the search term house, when stemmed, would also return words like houses and housing. In Spanish, words like hermano would include stems such as hermanos. By default, the exact match (unstemmed) results will be ranked higher than the stemmed results.

Other reasons for language-specific search may be more content related. For example, reporting in Spanish about the building of the Panama Canal may convey a different perspective than reporting in the mainstream English-speaking press.

Return to Top