Questions tagged [hdfs]

For questions regarding the Hadoop distributed file system (HDFS) which is part of the Apache Hadoop project.

73 questions
13
votes
4 answers

In Hadoop, how to show current process of -copyFromLocal

I am still a newbie learner of Hadoop, and this time I was trying to process a 106GB file. I used -copyFromLocal to copy that big file to my Hadoop DFS, but since the file is big I have to wait for a long time without a clue about the current…
Bang Dao
  • 233
  • 2
  • 6
7
votes
2 answers

HBASE Space Used Started Climbing Rapidly

Update 4,215: After looking at space usage inside of hdfs, I see that .oldlogs is using a lot of space: 1485820612766 /hbase/.oldlogs So new questions: What is it? How do I clean it up? How do I keep it from growing again What caused it to…
Kyle Brandt
  • 81,077
  • 70
  • 299
  • 442
7
votes
2 answers

Hadoop HDFS Backup & DR Strategy

We are preparing to implement our first Hadoop cluster. As such we are starting out small with a four node setup. (1 master node, and 3 worker nodes) Each node will have 6TB of storage. (6 x 1TB disks) We went with a SuperMicro 4-node chassis so…
Matt Keller
  • 221
  • 4
  • 7
6
votes
2 answers

Hadoop HDFS: set file block size from commandline?

I need to set the block-size of a file when I load it into HDFS, to some value lower than the cluster block size. For example, if HDFS is using 64mb blocks, I may want a large file to be copied in with 32mb blocks. I've done this before within a…
BigChief
  • 398
  • 1
  • 2
  • 12
5
votes
1 answer

Forward-sync to HDFS? (OR continue an incomplete hdfs upload?)

Anyone have a good suggestion for doing a forward sync to HDFS? ("forward-sync" in contrast to "bi-directional sync") Basically I have a large number of files I want to put into the HDFS. Its so large that I'll often, say, lose connectivity before…
Nate Murray
  • 963
  • 1
  • 7
  • 7
5
votes
2 answers

How to fix Hadoop HDFS cluster with missing blocks after one node was reinstalled?

I have a 5 slave Hadoop cluster (using CDH4)---slaves are where DataNode and TaskNode run. Each slave has 4 partitions dedicated to HDFS storage. One of the slaves needed a reinstall and this caused one of the HDFS partitions to be lost. At this…
Dolan Antenucci
  • 329
  • 1
  • 4
  • 16
5
votes
1 answer

Ceph: Why is a greater number of "placement groups" a "bad thing"?

I have been researching distributed databases and file systems, and while I was originally mostly interested in Hadoop/HBase because I'm a Java programmer, I found this very interesting document about Ceph, which as a major plus point, is now…
monster
  • 608
  • 1
  • 10
  • 17
4
votes
1 answer

mount.nfs: mount system call failed

I am trying to mount hdfs on my local machine running Ubuntu using the following command :--- sudo mount -t nfs -o vers=3,proto=tcp,nolock 192.168.170.52:/ /mnt/hdfs_mount/ But I am getting this error:- mount.nfs: mount system call failed Output…
Bhavya Jain
  • 141
  • 1
  • 1
  • 3
4
votes
1 answer

Upload large files with curl without RAM cache.

I'm using curl to upload large files (from 5 to 20Gb) to HOOP based on HDFS (Hadoop Cluster) as follows: curl -f --data-binary "@$file" "$HOOP_HOST$UPLOAD_PATH?user.name=$HOOP_USER&op=create" But when curl uploading large file it trying to fully…
Gening D.
  • 71
  • 1
  • 5
4
votes
3 answers

Is there a way to grep gzipped content in hdfs without extracting it?

I'm looking for a way to zgrep hdfs files something like: hadoop fs -zcat hdfs://myfile.gz | grep "hi" or hadoop fs -cat hdfs://myfile.gz | zgrep "hi" it does not really work for me is there anyway to achieve that with command line?
Jas
  • 681
  • 3
  • 12
  • 23
4
votes
0 answers

java.lang.NullPointerException When Doing A Read in HDFS

I have had a 10 node HBase cluster up and running for the past 4 months. The cluster was setup on VMs in a corporate environment which I do not control, but everything has been working great...until today. Today, every part of the system was down. I…
JasCav
  • 233
  • 1
  • 12
4
votes
1 answer

Can't connect to HDFS in pseudo-distributed mode

I followed the instructions here for installing hadoop in pseudo-distributed mode. However, I'm having trouble connecting to HDFS. When I execute this command : ./hadoop fs -ls / I get a directory listing just like I should. However, when I execute…
sangfroid
  • 193
  • 1
  • 3
  • 10
3
votes
0 answers

How can I launch hdfs on Mesos without DC/OS?

From my understand DC/OS is a freemium managed service. Because I'd rather just have a raw Mesos implementation, I'd rather not be dependent on DC/OS and so I just want to know how to implement HDFS on Mesos without it. Unfortunately google is…
Dr.Knowitall
  • 209
  • 1
  • 10
3
votes
1 answer

Linux Network tuning to prevent tcp rcvpruned and backlogdrop?

My datanodes in my hbase cluster are triggering some tcp rcvpruned and backlog drops from time to time: It seems to be there are at least two angles to approach this at: Tune HBase/HDFS etc... so that these are not triggered Tune the Linux network…
Kyle Brandt
  • 81,077
  • 70
  • 299
  • 442
3
votes
2 answers

Disable The Under Replicated Blocks Alert in Cloudera Manager

I have a single server Hbase cluster that I am only using as the sink end of HBase replication. Therefore I don't want to replicate any blocks within this cluster (since the source has replicated blocks I don't feel I need it). I would like to…
Kyle Brandt
  • 81,077
  • 70
  • 299
  • 442
1
2 3 4 5