Package: LexisNexisTools
Title: Working with Files from 'LexisNexis'
Version: 0.1.2
Authors@R: person("Johannes", "Gruber", email = "j.gruber.1@research.gla.ac.uk", 
  role = c("aut", "cre"))
Description: My PhD supervisor once told me that everyone doing newspaper
 analysis starts by writing code to read in files from the 'LexisNexis'
 newspaper archive (retrieved e.g., from <http://www.nexis.com/> or any of the 
 partner sites). However, while this is a nice exercise I do recommend, not
 everyone has the time. This package takes TXT files downloaded from the
 newspaper archive of 'LexisNexis' in Since this package takes in TXT files
 which are unstructured in the sense that beginning and end of an article are
 not clearly indicated, the main function lnt_read() relies on certain keywords
 that signal to R where an article begins, ends and where meta-data is stored.
 lnt_checkFiles() thus tests if all keywords are in place. Every article in every
 TXT file should start with "X of X DOCUMENTS" and end with "LANGUAGE:". The
 end of the metadata is usually indicated by "LENGTH:". Some measures were
 taken to eliminate problems but where these keywords appear inside an article
 or headline, test1 or test2 from the lnt_checkFiles() will return FALSE and
 lnt_read() will not be able to do its job. In these cases, it is recommended to
 slightly alter the TXT files, e.g. by changing a headline to "language: never
 stop learning new ones" instead of "LANGUAGE: never stop learning new
 ones"---as "language:" without capital letters is not picked up by the
 functions.
Depends: R (>= 3.3.0)
License: GPL-3
Imports: stringi (>= 1.1.7), data.table (>= 1.10.4.3), methods (>=
        3.3.0), quanteda (>= 1.1.0), scales (>= 0.5.0), stats (>=
        3.3.0), utils (>= 3.3.0), stringdist (>= 0.9.4.0)
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
NeedsCompilation: no
Packaged: 2018-03-31 16:47:41 UTC; johannes
Author: Johannes Gruber [aut, cre]
Maintainer: Johannes Gruber <j.gruber.1@research.gla.ac.uk>
Repository: CRAN
Date/Publication: 2018-04-09 11:53:36 UTC
